#m-labs on 2014-10-12 — irc logs at freenode.irclog.whitequark.org

2013-12-11 12:34 lekernel changed the topic of #m-labs to: Mixxeo, Migen, MiSoC & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

02:19 <sb0> ysionneau, did you talk to the risc-v people about their cpu core being more than 10x larger and significantly slower than lm32?

05:20 sj_mackenzie has joined #m-labs

07:58 kilae has joined #m-labs

08:12 <ysionneau> I talked to him, but didn't say that :p

08:30 <sb0> well those are the numbers and they should be aware of them

08:31 <sb0> otherwise it's just going to be the 1337th open source CPU that sucks for some reason

08:32 <sb0> I wish they'd rather fix lm32 llvm instead of going with a new ISA

08:32 <sb0> phew

08:32 <ysionneau> he explained his new ISA saying "well, we didn't want a proprietary ISA so no x86 MIPS ARM"

08:33 <ysionneau> and he tried to do something a bit smarter than arm+thumb , but still allowing to have the normal 32bit RISCish load-store instruction set

08:33 <ysionneau> + a variable length instruction

08:33 <ysionneau> (set)

08:34 <ysionneau> basically as I understood it, it's one of the 32 bits of the 32 bit ISA which says if it's variable length or not

08:34 <ysionneau> he said he was not aware of LM32 at the time

08:34 <sb0> yeah and that "we are berkeley so it will be successful this time" and all the wrangling with ARM in linley consulting group publications

08:35 <ysionneau> he was aware of OpenRISC but he was not happy with the core (or12 I guess at the time)

08:35 <sb0> meanwhile the hard numbers are: >10x LM32 and slower. not bright.

08:35 <sb0> http://faplab.fr/wp-content/uploads/2014/03/aVORqgn_460sa_v1.gif

08:35 <ysionneau> ahah

08:35 <ysionneau> sure I will tell him about those numbers if I am able to talk to him again today

08:41 <sb0> ah, they finally released the source...

08:41 <sb0> (for the whole thing)

08:42 <sb0> they have an "Area Efficiency" metric on their website, so I hope you won't get the bullshit argument "area doesn't matter thanks to Moore law" I heard a few times from some academics already

08:44 <ysionneau> yes they released "a week ago" (he said) the generator

08:45 <ysionneau> as he described Chisel (his generator language stuff) it seems to look a bit like Migen, but in Scala

09:03 <ysionneau> sb0: which FPGA did you target when you synthesized their risc-v generated design?

09:03 <sb0> I tried s6 and k7

09:03 <sb0> results are about the same on both

09:04 <sb0> and yeah, there are a number of things to be unhappy about with openrisc, and particularly or1200

09:04 <sb0> lm32 has none of these issues, but the sw support isn't great

09:05 <ysionneau> I guess if I ask him "instead of doing yet another CPU, why didn't you take lm32" he will answer something like he was not happy with the ISA anyway

09:06 <ysionneau> I mean, if start by modifying lm32 that much that it doesn't look like lm32 anymore

09:06 <ysionneau> maybe starting from scratch is not so stupid

09:06 <ysionneau> if you change all the ISA, put in variable length instructions etc

09:06 <ysionneau> it changes the design a lot

09:06 <ysionneau> doesn't it?

09:07 <sb0> there are a number of missing bits in the lm32 isa (fp, 64-bit, etc.)

09:08 <ysionneau> I agree that you can improve lm32

09:08 <sb0> but it is a practical, working, small CPU - not something whose main purpose is vomit kilometers of pages of PhD dissertations and grant applications about

09:09 <ysionneau> but if his design choices are quite different than what lm32 is today, then maybe there is no point in starting from lm32

09:09 <ysionneau> ahah

09:09 <ysionneau> sure sure

09:09 <ysionneau> it's very practical and nice working design

09:09 <ysionneau> works in ASIC and everything

09:11 <sb0> anyway, let's see. if it works and is not slow and bloated, it'll be the best thing I've seen from academia over the past 10 years by a very wide margin.

09:18 <ysionneau> OK I'm forwarding the answers

09:18 <ysionneau> so he agrees it's fat and slow in FPGA

09:18 <ysionneau> and he basically does not care

09:18 <ysionneau> he says it's aimed at ASIC and not FPGA

09:18 <sb0> yeah, that's a typical problem from academics

09:18 <ysionneau> so they fine tune for ASIC and not for FPGA

09:19 <sb0> and there's moores law, right

09:19 <sb0> ?

09:19 <sb0> and "dark silicon" is the new hype

09:19 <ysionneau> as to why they didn't chose LM32 they didn't know about it back in the days

09:19 <ysionneau> and they need to go multicore, and they need 64 bits integer registers

09:20 <ysionneau> so the ISA of both openrisc and lm32 was not satisfying for him

09:20 <ysionneau> I've got his card with email if you want to drop him a mail :)

09:20 <sb0> how does something that is fat in FPGA becomes magically optimized in asic?

09:20 <ysionneau> I asked him that

09:20 <ysionneau> he said they "write optimized verilog for ASIC"

09:20 <ysionneau> he didn't went into details

09:20 <ysionneau> but it seems his code is magically slow on fpga and fast on asic

09:21 <sb0> yeah, I don't believe in magicians (and PhDs)

09:21 <ysionneau> he said someone has written FPGA optimized version

09:22 <ysionneau> but by writting directly the bits in the bitstream (maybe kind of like what wolfgang was doing?)

09:22 <sb0> has he tried running lm32 through the same asic tools he uses for riscv?

09:22 <ysionneau> and he could (or wanted to?) run 1000 of those cores on the same fpga

09:22 <sb0> huh, WHAT?

09:23 <ysionneau> yeah that's weird

09:23 <ysionneau> oh and another reason he didn't like OpenRISC ISA is the 16 bits immediate

09:23 <ysionneau> they have 12 bits immediate

09:23 <sb0> and btw on FPGA the bloat factor is one full order of magnitude

09:23 <ysionneau> to give room for other things

09:24 <sb0> I do not believe it can be so much better on ASIC

09:24 <ysionneau> maybe you should drop an email to either their mailing list or directly to krste@eecs.berkeley.edu

09:25 <ysionneau> his name Krste Asanovic)

09:25 <sb0> meh, what? 16-bit immediates are great

09:25 <sb0> you can load any 32-bit word with two immediates

09:25 <sb0> anyway, that's independent of the bloat problem

09:25 <ysionneau> sure I like the 16 bits immediate also

09:26 <ysionneau> yes, it's just one of the argument for not using OpenRISC

09:26 <ysionneau> and he added they've been doing cores since a long time and for them doing a single issue in-order pipeline is very easy

09:26 <sb0> well, there are better ones: bloat, messy exception table, messy ABI, flags, delay slot, syscall instruction, etc.

09:27 <ysionneau> ah yes he mentioned delay slot as well

09:27 <ysionneau> he didn't want delay slot

09:28 <sb0> "doing a single issue in-order pipeline is very easy", yeah, as demonstrated by the cold hard fact they couldn't do it for less than >10x the size of lm32

09:29 <ysionneau> he says in ASIC they are more performant than ARM cores

09:29 <ysionneau> (he didn't say which one)

09:30 * sb0 wonders if he should fix his rocket generator installation problems and get finer bloat/speed numbers or just give up and throw all that crap away

09:30 <sb0> yeah, but it's only make believe

09:30 <sb0> things you put in grant applications, bullshit papers, etc.

09:31 <ysionneau> probably yes

09:31 <ysionneau> hard to know which part of this nice picture is actually an illusion

09:33 <sb0> well, one experiment that can be done is run lm32 through their asic tool and compare that with the asic-megaoptimized-by-one-order-of-magnitude riscv result

09:34 <sb0> and then they'll tell you, you need to sign a NDA to use the tool ;-)

09:34 <ysionneau> it was this guy btw : http://www.eecs.berkeley.edu/~krste/

09:34 <ysionneau> ahah

09:43 mumptai has joined #m-labs

09:52 <sb0> oh, I forgot that scala runs on java.

09:53 <sb0> it's installing tons of dependencies atm

09:56 <ysionneau> ^^"

09:57 <sb0> finally got the generated verilog. btw, chisel is much slower than migen.

09:58 <sb0> I took the "small" config...

10:01 <larsc> NDA for the tool, ieee membership to access the results

10:09 <sb0> got synthesis to run... ise is griding its bits right now

10:10 <sb0> I can already tell from the xst runtime that it's probably bloated

10:11 <sb0> phew, 13K registers. and that's the "small" config

10:11 <ysionneau> =)

10:13 <sb0> http://m-labs.hk/academia_google.png

10:14 <ysionneau> I've given my card to the lattice FAE and written the link to our lm32 github repo

10:14 <ysionneau> ahah

10:18 <ysionneau> so if I understand well, with cffi, you basically can directly call C code from Python, therefore you don't need any python glue library anymore, you can directly call the C API of LLVM, right?

10:19 <sb0> yeah. well, the code that uses cffi would still be a python glue library

10:19 <ysionneau> sure

10:19 <sb0> 18K LUTs! even worse than their zynq demo design

10:20 <sb0> that's 67% of the FPGA on the M1, that contains the tmu, pfpu, video sampling, dram controller and what not in addition to lm32

10:20 <ysionneau> yeha it's crazy big

10:21 <ysionneau> I also find it strange that "it's big and slow in FPGA but it's fast and not so big in ASIC" ...

10:21 <ysionneau> but yes that's what he says

10:21 <sb0> ysionneau, the main thing that bothers me is that llvm (a software project that academia has done relatively right, for once) doesn't support dynamic libraries on windows

10:21 <ysionneau> :(

10:21 <ysionneau> I saw the thread that we might need to support windows 7

10:22 <ysionneau> I've got llvm experts all over the room here

10:22 <ysionneau> maybe I can ask around :p

10:23 <sb0> cffi can also link static libraries, but it would compile some glue code and link it everytime you run the python program. the llvm libs being rather large, that would take significant startup time.

10:23 <sb0> as I understand it - I haven't searched a lot yet

10:23 <ysionneau> sb0: you mean LLVM cannot be compiled into a DLL?

10:24 <sb0> LLVM cannot export its functions from a DLL

10:24 <ysionneau> so how does it work? you need to directly link your code statically with LLVM code? (on widnows)

10:24 <sb0> the dynamic libs are not built by default, and are unix-only when you do ask for them

10:25 <sb0> a workaround can be to make a big DLL that uses LLVM internally, and reexports the C API we use, with the proper dll link attributes

10:25 <sb0> or patch llvm to put the dll link attributes on the C API

10:27 <sb0> ise is still routing... we'll soon know the bright *ahem* timing performance of risc-v

10:29 <sb0> they are berkeley (http://www.etn.se/index.php?option=com_content&view=article&id=59969&via=s/) so it should be fast

10:41 <sb0> routing seems stuck

10:45 <ysionneau> can't find enough room to put the giant design ;)

11:02 <sb0> routing still not complete...

11:02 <sb0> haha

11:02 <sb0> been like 45min now

11:03 <sb0> just for the routing. you could build misoc a few times on the same machine.

11:03 <ysionneau> maybe it will just fail

11:04 <sb0> just stopped it and ran rm -rf rocket-chip/

11:05 <sb0> this thing is ridiculous

11:12 <sb0> ysionneau, if you're still interested, ask them to try lm32 on their asic tools

12:09 <ysionneau> the guy left now :/

12:53 sj_mackenzie has quit [Ping timeout: 240 seconds]

13:21 sj_mackenzie has joined #m-labs

15:47 xiangfu has joined #m-labs

16:11 <GitHub7> [artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/jX79rg

16:11 <GitHub7> artiq/master 4361c7c Sebastien Bourdeauducq: language/core: support cycles_to_time and time_to_cycles outside of kernel

16:13 <larsc> 1qw111~.

16:31 xiangfu has quit [Remote host closed the connection]

17:54 sj_mackenzie has quit [Remote host closed the connection]

17:55 sj_mackenzie has joined #m-labs

18:36 sj_mackenzie has quit [Remote host closed the connection]

18:36 sj_mackenzie has joined #m-labs

18:50 sj_mackenzie has quit [Remote host closed the connection]

20:00 aeris has quit [*.net *.split]

20:00 gric_ has quit [*.net *.split]

20:00 gric has joined #m-labs

20:01 aeris has joined #m-labs

20:12 kilae has quit [Quit: ChatZilla 0.9.91 [Firefox 32.0.3/20140923175406]]

22:46 mumptai has quit [Quit: Verlassend]

22:56 siruf has quit [Read error: Connection reset by peer]

22:56 siruf has joined #m-labs