lekernel changed the topic of #m-labs to: Mixxeo, Migen, MiSoC & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
<sb0> ysionneau, did you talk to the risc-v people about their cpu core being more than 10x larger and significantly slower than lm32?
sj_mackenzie has joined #m-labs
kilae has joined #m-labs
<ysionneau> I talked to him, but didn't say that :p
<sb0> well those are the numbers and they should be aware of them
<sb0> otherwise it's just going to be the 1337th open source CPU that sucks for some reason
<sb0> I wish they'd rather fix lm32 llvm instead of going with a new ISA
<sb0> phew
<ysionneau> he explained his new ISA saying "well, we didn't want a proprietary ISA so no x86 MIPS ARM"
<ysionneau> and he tried to do something a bit smarter than arm+thumb , but still allowing to have the normal 32bit RISCish load-store instruction set
<ysionneau> + a variable length instruction
<ysionneau> (set)
<ysionneau> basically as I understood it, it's one of the 32 bits of the 32 bit ISA which says if it's variable length or not
<ysionneau> he said he was not aware of LM32 at the time
<sb0> yeah and that "we are berkeley so it will be successful this time" and all the wrangling with ARM in linley consulting group publications
<ysionneau> he was aware of OpenRISC but he was not happy with the core (or12 I guess at the time)
<sb0> meanwhile the hard numbers are: >10x LM32 and slower. not bright.
<ysionneau> ahah
<ysionneau> sure I will tell him about those numbers if I am able to talk to him again today
<sb0> ah, they finally released the source...
<sb0> (for the whole thing)
<sb0> they have an "Area Efficiency" metric on their website, so I hope you won't get the bullshit argument "area doesn't matter thanks to Moore law" I heard a few times from some academics already
<ysionneau> yes they released "a week ago" (he said) the generator
<ysionneau> as he described Chisel (his generator language stuff) it seems to look a bit like Migen, but in Scala
<ysionneau> sb0: which FPGA did you target when you synthesized their risc-v generated design?
<sb0> I tried s6 and k7
<sb0> results are about the same on both
<sb0> and yeah, there are a number of things to be unhappy about with openrisc, and particularly or1200
<sb0> lm32 has none of these issues, but the sw support isn't great
<ysionneau> I guess if I ask him "instead of doing yet another CPU, why didn't you take lm32" he will answer something like he was not happy with the ISA anyway
<ysionneau> I mean, if start by modifying lm32 that much that it doesn't look like lm32 anymore
<ysionneau> maybe starting from scratch is not so stupid
<ysionneau> if you change all the ISA, put in variable length instructions etc
<ysionneau> it changes the design a lot
<ysionneau> doesn't it?
<sb0> there are a number of missing bits in the lm32 isa (fp, 64-bit, etc.)
<ysionneau> I agree that you can improve lm32
<sb0> but it is a practical, working, small CPU - not something whose main purpose is vomit kilometers of pages of PhD dissertations and grant applications about
<ysionneau> but if his design choices are quite different than what lm32 is today, then maybe there is no point in starting from lm32
<ysionneau> ahah
<ysionneau> sure sure
<ysionneau> it's very practical and nice working design
<ysionneau> works in ASIC and everything
<sb0> anyway, let's see. if it works and is not slow and bloated, it'll be the best thing I've seen from academia over the past 10 years by a very wide margin.
<ysionneau> OK I'm forwarding the answers
<ysionneau> so he agrees it's fat and slow in FPGA
<ysionneau> and he basically does not care
<ysionneau> he says it's aimed at ASIC and not FPGA
<sb0> yeah, that's a typical problem from academics
<ysionneau> so they fine tune for ASIC and not for FPGA
<sb0> and there's moores law, right
<sb0> ?
<sb0> and "dark silicon" is the new hype
<ysionneau> as to why they didn't chose LM32 they didn't know about it back in the days
<ysionneau> and they need to go multicore, and they need 64 bits integer registers
<ysionneau> so the ISA of both openrisc and lm32 was not satisfying for him
<ysionneau> I've got his card with email if you want to drop him a mail :)
<sb0> how does something that is fat in FPGA becomes magically optimized in asic?
<ysionneau> I asked him that
<ysionneau> he said they "write optimized verilog for ASIC"
<ysionneau> he didn't went into details
<ysionneau> but it seems his code is magically slow on fpga and fast on asic
<sb0> yeah, I don't believe in magicians (and PhDs)
<ysionneau> he said someone has written FPGA optimized version
<ysionneau> but by writting directly the bits in the bitstream (maybe kind of like what wolfgang was doing?)
<sb0> has he tried running lm32 through the same asic tools he uses for riscv?
<ysionneau> and he could (or wanted to?) run 1000 of those cores on the same fpga
<sb0> huh, WHAT?
<ysionneau> yeah that's weird
<ysionneau> oh and another reason he didn't like OpenRISC ISA is the 16 bits immediate
<ysionneau> they have 12 bits immediate
<sb0> and btw on FPGA the bloat factor is one full order of magnitude
<ysionneau> to give room for other things
<sb0> I do not believe it can be so much better on ASIC
<ysionneau> maybe you should drop an email to either their mailing list or directly to krste@eecs.berkeley.edu
<ysionneau> his name Krste Asanovic)
<sb0> meh, what? 16-bit immediates are great
<sb0> you can load any 32-bit word with two immediates
<sb0> anyway, that's independent of the bloat problem
<ysionneau> sure I like the 16 bits immediate also
<ysionneau> yes, it's just one of the argument for not using OpenRISC
<ysionneau> and he added they've been doing cores since a long time and for them doing a single issue in-order pipeline is very easy
<sb0> well, there are better ones: bloat, messy exception table, messy ABI, flags, delay slot, syscall instruction, etc.
<ysionneau> ah yes he mentioned delay slot as well
<ysionneau> he didn't want delay slot
<sb0> "doing a single issue in-order pipeline is very easy", yeah, as demonstrated by the cold hard fact they couldn't do it for less than >10x the size of lm32
<ysionneau> he says in ASIC they are more performant than ARM cores
<ysionneau> (he didn't say which one)
* sb0 wonders if he should fix his rocket generator installation problems and get finer bloat/speed numbers or just give up and throw all that crap away
<sb0> yeah, but it's only make believe
<sb0> things you put in grant applications, bullshit papers, etc.
<ysionneau> probably yes
<ysionneau> hard to know which part of this nice picture is actually an illusion
<sb0> well, one experiment that can be done is run lm32 through their asic tool and compare that with the asic-megaoptimized-by-one-order-of-magnitude riscv result
<sb0> and then they'll tell you, you need to sign a NDA to use the tool ;-)
<ysionneau> it was this guy btw : http://www.eecs.berkeley.edu/~krste/
<ysionneau> ahah
mumptai has joined #m-labs
<sb0> oh, I forgot that scala runs on java.
<sb0> it's installing tons of dependencies atm
<ysionneau> ^^"
<sb0> finally got the generated verilog. btw, chisel is much slower than migen.
<sb0> I took the "small" config...
<larsc> NDA for the tool, ieee membership to access the results
<sb0> got synthesis to run... ise is griding its bits right now
<sb0> I can already tell from the xst runtime that it's probably bloated
<sb0> phew, 13K registers. and that's the "small" config
<ysionneau> =)
<ysionneau> I've given my card to the lattice FAE and written the link to our lm32 github repo
<ysionneau> ahah
<ysionneau> so if I understand well, with cffi, you basically can directly call C code from Python, therefore you don't need any python glue library anymore, you can directly call the C API of LLVM, right?
<sb0> yeah. well, the code that uses cffi would still be a python glue library
<ysionneau> sure
<sb0> 18K LUTs! even worse than their zynq demo design
<sb0> that's 67% of the FPGA on the M1, that contains the tmu, pfpu, video sampling, dram controller and what not in addition to lm32
<ysionneau> yeha it's crazy big
<ysionneau> I also find it strange that "it's big and slow in FPGA but it's fast and not so big in ASIC" ...
<ysionneau> but yes that's what he says
<sb0> ysionneau, the main thing that bothers me is that llvm (a software project that academia has done relatively right, for once) doesn't support dynamic libraries on windows
<ysionneau> :(
<ysionneau> I saw the thread that we might need to support windows 7
<ysionneau> I've got llvm experts all over the room here
<ysionneau> maybe I can ask around :p
<sb0> cffi can also link static libraries, but it would compile some glue code and link it everytime you run the python program. the llvm libs being rather large, that would take significant startup time.
<sb0> as I understand it - I haven't searched a lot yet
<ysionneau> sb0: you mean LLVM cannot be compiled into a DLL?
<sb0> LLVM cannot export its functions from a DLL
<ysionneau> so how does it work? you need to directly link your code statically with LLVM code? (on widnows)
<sb0> the dynamic libs are not built by default, and are unix-only when you do ask for them
<sb0> a workaround can be to make a big DLL that uses LLVM internally, and reexports the C API we use, with the proper dll link attributes
<sb0> or patch llvm to put the dll link attributes on the C API
<sb0> ise is still routing... we'll soon know the bright *ahem* timing performance of risc-v
<sb0> routing seems stuck
<ysionneau> can't find enough room to put the giant design ;)
<sb0> routing still not complete...
<sb0> haha
<sb0> been like 45min now
<sb0> just for the routing. you could build misoc a few times on the same machine.
<ysionneau> maybe it will just fail
<sb0> just stopped it and ran rm -rf rocket-chip/
<sb0> this thing is ridiculous
<sb0> ysionneau, if you're still interested, ask them to try lm32 on their asic tools
<ysionneau> the guy left now :/
sj_mackenzie has quit [Ping timeout: 240 seconds]
sj_mackenzie has joined #m-labs
xiangfu has joined #m-labs
<GitHub7> [artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/jX79rg
<GitHub7> artiq/master 4361c7c Sebastien Bourdeauducq: language/core: support cycles_to_time and time_to_cycles outside of kernel
<larsc> 1qw111~.
xiangfu has quit [Remote host closed the connection]
sj_mackenzie has quit [Remote host closed the connection]
sj_mackenzie has joined #m-labs
sj_mackenzie has quit [Remote host closed the connection]
sj_mackenzie has joined #m-labs
sj_mackenzie has quit [Remote host closed the connection]
aeris has quit [*.net *.split]
gric_ has quit [*.net *.split]
gric has joined #m-labs
aeris has joined #m-labs
kilae has quit [Quit: ChatZilla 0.9.91 [Firefox 32.0.3/20140923175406]]
mumptai has quit [Quit: Verlassend]
siruf has quit [Read error: Connection reset by peer]
siruf has joined #m-labs