<ysionneau>
sure I will tell him about those numbers if I am able to talk to him again today
<sb0>
ah, they finally released the source...
<sb0>
(for the whole thing)
<sb0>
they have an "Area Efficiency" metric on their website, so I hope you won't get the bullshit argument "area doesn't matter thanks to Moore law" I heard a few times from some academics already
<ysionneau>
yes they released "a week ago" (he said) the generator
<ysionneau>
as he described Chisel (his generator language stuff) it seems to look a bit like Migen, but in Scala
<ysionneau>
sb0: which FPGA did you target when you synthesized their risc-v generated design?
<sb0>
I tried s6 and k7
<sb0>
results are about the same on both
<sb0>
and yeah, there are a number of things to be unhappy about with openrisc, and particularly or1200
<sb0>
lm32 has none of these issues, but the sw support isn't great
<ysionneau>
I guess if I ask him "instead of doing yet another CPU, why didn't you take lm32" he will answer something like he was not happy with the ISA anyway
<ysionneau>
I mean, if start by modifying lm32 that much that it doesn't look like lm32 anymore
<ysionneau>
maybe starting from scratch is not so stupid
<ysionneau>
if you change all the ISA, put in variable length instructions etc
<ysionneau>
it changes the design a lot
<ysionneau>
doesn't it?
<sb0>
there are a number of missing bits in the lm32 isa (fp, 64-bit, etc.)
<ysionneau>
I agree that you can improve lm32
<sb0>
but it is a practical, working, small CPU - not something whose main purpose is vomit kilometers of pages of PhD dissertations and grant applications about
<ysionneau>
but if his design choices are quite different than what lm32 is today, then maybe there is no point in starting from lm32
<ysionneau>
ahah
<ysionneau>
sure sure
<ysionneau>
it's very practical and nice working design
<ysionneau>
works in ASIC and everything
<sb0>
anyway, let's see. if it works and is not slow and bloated, it'll be the best thing I've seen from academia over the past 10 years by a very wide margin.
<ysionneau>
OK I'm forwarding the answers
<ysionneau>
so he agrees it's fat and slow in FPGA
<ysionneau>
and he basically does not care
<ysionneau>
he says it's aimed at ASIC and not FPGA
<sb0>
yeah, that's a typical problem from academics
<ysionneau>
so they fine tune for ASIC and not for FPGA
<sb0>
and there's moores law, right
<sb0>
?
<sb0>
and "dark silicon" is the new hype
<ysionneau>
as to why they didn't chose LM32 they didn't know about it back in the days
<ysionneau>
and they need to go multicore, and they need 64 bits integer registers
<ysionneau>
so the ISA of both openrisc and lm32 was not satisfying for him
<ysionneau>
I've got his card with email if you want to drop him a mail :)
<sb0>
how does something that is fat in FPGA becomes magically optimized in asic?
<ysionneau>
I asked him that
<ysionneau>
he said they "write optimized verilog for ASIC"
<ysionneau>
he didn't went into details
<ysionneau>
but it seems his code is magically slow on fpga and fast on asic
<sb0>
yeah, I don't believe in magicians (and PhDs)
<ysionneau>
he said someone has written FPGA optimized version
<ysionneau>
but by writting directly the bits in the bitstream (maybe kind of like what wolfgang was doing?)
<sb0>
has he tried running lm32 through the same asic tools he uses for riscv?
<ysionneau>
and he could (or wanted to?) run 1000 of those cores on the same fpga
<sb0>
huh, WHAT?
<ysionneau>
yeah that's weird
<ysionneau>
oh and another reason he didn't like OpenRISC ISA is the 16 bits immediate
<ysionneau>
they have 12 bits immediate
<sb0>
and btw on FPGA the bloat factor is one full order of magnitude
<ysionneau>
to give room for other things
<sb0>
I do not believe it can be so much better on ASIC
<ysionneau>
maybe you should drop an email to either their mailing list or directly to krste@eecs.berkeley.edu
<ysionneau>
his name Krste Asanovic)
<sb0>
meh, what? 16-bit immediates are great
<sb0>
you can load any 32-bit word with two immediates
<sb0>
anyway, that's independent of the bloat problem
<ysionneau>
sure I like the 16 bits immediate also
<ysionneau>
yes, it's just one of the argument for not using OpenRISC
<ysionneau>
and he added they've been doing cores since a long time and for them doing a single issue in-order pipeline is very easy
<sb0>
well, there are better ones: bloat, messy exception table, messy ABI, flags, delay slot, syscall instruction, etc.
<ysionneau>
ah yes he mentioned delay slot as well
<ysionneau>
he didn't want delay slot
<sb0>
"doing a single issue in-order pipeline is very easy", yeah, as demonstrated by the cold hard fact they couldn't do it for less than >10x the size of lm32
<ysionneau>
he says in ASIC they are more performant than ARM cores
<ysionneau>
(he didn't say which one)
* sb0
wonders if he should fix his rocket generator installation problems and get finer bloat/speed numbers or just give up and throw all that crap away
<sb0>
yeah, but it's only make believe
<sb0>
things you put in grant applications, bullshit papers, etc.
<ysionneau>
probably yes
<ysionneau>
hard to know which part of this nice picture is actually an illusion
<sb0>
well, one experiment that can be done is run lm32 through their asic tool and compare that with the asic-megaoptimized-by-one-order-of-magnitude riscv result
<sb0>
and then they'll tell you, you need to sign a NDA to use the tool ;-)
<ysionneau>
I've given my card to the lattice FAE and written the link to our lm32 github repo
<ysionneau>
ahah
<ysionneau>
so if I understand well, with cffi, you basically can directly call C code from Python, therefore you don't need any python glue library anymore, you can directly call the C API of LLVM, right?
<sb0>
yeah. well, the code that uses cffi would still be a python glue library
<ysionneau>
sure
<sb0>
18K LUTs! even worse than their zynq demo design
<sb0>
that's 67% of the FPGA on the M1, that contains the tmu, pfpu, video sampling, dram controller and what not in addition to lm32
<ysionneau>
yeha it's crazy big
<ysionneau>
I also find it strange that "it's big and slow in FPGA but it's fast and not so big in ASIC" ...
<ysionneau>
but yes that's what he says
<sb0>
ysionneau, the main thing that bothers me is that llvm (a software project that academia has done relatively right, for once) doesn't support dynamic libraries on windows
<ysionneau>
:(
<ysionneau>
I saw the thread that we might need to support windows 7
<ysionneau>
I've got llvm experts all over the room here
<ysionneau>
maybe I can ask around :p
<sb0>
cffi can also link static libraries, but it would compile some glue code and link it everytime you run the python program. the llvm libs being rather large, that would take significant startup time.
<sb0>
as I understand it - I haven't searched a lot yet
<ysionneau>
sb0: you mean LLVM cannot be compiled into a DLL?
<sb0>
LLVM cannot export its functions from a DLL
<ysionneau>
so how does it work? you need to directly link your code statically with LLVM code? (on widnows)
<sb0>
the dynamic libs are not built by default, and are unix-only when you do ask for them
<sb0>
a workaround can be to make a big DLL that uses LLVM internally, and reexports the C API we use, with the proper dll link attributes
<sb0>
or patch llvm to put the dll link attributes on the C API
<sb0>
ise is still routing... we'll soon know the bright *ahem* timing performance of risc-v