Topic for #milkymist is now Milkymist One, Milkymist SoC & Flickernoise development channel (LLHDL/Antares are welcome too) :: Logs: :: JFDI
aw_ joined #milkymist
aw joined #milkymist
wolfspraul joined #milkymist
xiangfu joined #milkymist
aw_ joined #milkymist
Guest42658 joined #milkymist
errordeveloper joined #milkymist
wolfspraul joined #milkymist
Gurty joined #milkymist
rejon joined #milkymist
<GitHub187> [scripts] xiangfu pushed 2 new commits to master:
<GitHub187> [scripts/master] snapshot: don't flash data by default - Xiangfu Liu
<GitHub187> [scripts/master] update the power-on message - Xiangfu Liu
wolfspraul joined #milkymist
xiangfu joined #milkymist
aw joined #milkymist
mumptai joined #milkymist
aw_ joined #milkymist
Martoni joined #milkymist
lekernel_ joined #milkymist
nightlybuild joined #milkymist
<qi-bot> The Firmware build was successfull, see images here:
azonenberg joined #milkymist
<GitHub134> [flickernoise] sbourdeauducq pushed 1 new commit to master:
<GitHub134> [flickernoise/master] performance: fix unmapped key handling - Sebastien Bourdeauducq
<azonenberg> lekernel_: Any experience working with multiprocessor softcores?
<azonenberg> I'm particularly interested in the interconnect
<azonenberg> in terms of cache coherency and how multiple processors share the bus
<azonenberg> i'm working on a triple-core SoC from scratch
<azonenberg> and am designing the interconnect fabric now
<azonenberg> i'm using a shared bus (only one core can talk at a time, but it's full duplex)
<lekernel_> what's at the other end of the shared bus? DRAM?
<lekernel_> also, softcores are slow. why not use dedicated accelerators?
<azonenberg> A fixed-mapping MMU
<azonenberg> That splits the address bus between memory mapped IO and DDR2
<azonenberg> the DDR2 has an L2 cache in front of it
<lekernel_> MMU? you mean address decoder?
<azonenberg> each core has its own dedicated L1
<azonenberg> Basically, yes
<azonenberg> hardwired mapping
<azonenberg> The L1 is gonig to be structured in such a way as to be a passthrough for the IO address range and cache DRAM and flash addresses
<azonenberg> then DRAM and flash will have their own SoC-wide L2 caches
<azonenberg> I know i'm reinventing the wheel a bit, its mostly an educational exercise
* azonenberg is writing a dissertation on computer architecture soon and wants to sharpen his skills first
<azonenberg> But its actually going to be quite fast
<azonenberg> on spartan6 -2 speed i am shooting for 200 MHz
<azonenberg> * 2-way superscalar
<azonenberg> = 800 mflops for 2 cores
<azonenberg> I had to pipeline the heck out of it, but its looking feasible
<lekernel_> until you get timing paths into the bus arbiter? :)
<azonenberg> Actually, the bus arbiter is looking just fine
<azonenberg> i just did a standalone test of it at 200 mhz and it works just fine
<azonenberg> on hardware
<azonenberg> My solution to this thing is, pipeline it like crazy
<azonenberg> its a barrel processor
<lekernel_> with all the cores and memory controller connected to it?
<azonenberg> so a 16 stage pipeline means zero latency
<azonenberg> and 32 stages means one stall
<azonenberg> i run 16 threads and context switch every clock
<lekernel_> ah, i see
<azonenberg> Right now its looking like when running out of L1 cache with a 16 stage pipeline i will have no stalls
<azonenberg> despite not having any forwarding whatsoever
<azonenberg> an L1 cache miss that hits in L2 will most likely stall one instruction
<azonenberg> if i can fit the L1=>L2 and back in 16 clocks
<azonenberg> or 2 instructions if it takes me 32
<azonenberg> as long as i can keep the entire bus structure pipelined
<azonenberg> this is a very GPU-esque architecture
<azonenberg> hiding latency by multithreading
<lekernel_> what about cache miss rates when you have 16 threads switching so fast?
<azonenberg> I envision it being something like CUDA, each thread executing mostly the same instructions
<azonenberg> But they can branch as they see fit'
<azonenberg> The entire architecture is mostly an experiment
<lekernel_> you should compile dedicated hardware accelerators ...
<azonenberg> you mean, ASIC level?
<lekernel_> adding layers over layers makes things slow
<azonenberg> Sure, go get me $30K and i'll get it fabbed in MOSIS :p
<lekernel_> yes, generate VHDL from CUDA directly
<azonenberg> and no, this is mostly an educational exercise
<lekernel_> no, I mean use the FPGA fabric directly
<azonenberg> The goal is to see how many flops i can pull out of a softcore CPU
<azonenberg> running real code
<lekernel_> softcores are only good to run housekeeping or legacy software
<azonenberg> also i have a project in mind that will involve me working with non-hardware people
<azonenberg> I have dedicated accelerators for stuff like JPEG encoding that i'm working on
<azonenberg> But the flight control code has to be in C
<azonenberg> or C++
<azonenberg> or assembly
<azonenberg> since i am working with CS people who dont knowh hardware
<azonenberg> So i want to design a nice powerful architecture for them to run it on
<azonenberg> the other motivation as i said is just cutting my teeth on computer architecture
<azonenberg> this is not something i envision being a softcore forever, but custom ASICs are not cheap
<azonenberg> if things go well and it works as planned i might try sending it out to mosis eventually
<azonenberg> i would love to have a laptop running a CPU i designed
<azonenberg> in 180nm TSMC or something
<azonenberg> But i'm not that advanced yet :p
<azonenberg> I read your post about the latticemico32 synthesis lol
<azonenberg> and i think my processor will be faster
<azonenberg> But i'd have to reimplement some of the xilinx hard IP cores like the memory controller
<azonenberg> and their soft FPU
<azonenberg> I'm pretty sure i can write a better FPU but i havent gotten around to it yet, and as long as it's interface-compatible with theirs it'd be a drop-in replacement
<lekernel_> their soft fpu? what's that?
<lekernel_> you're using coregen for a fpu?
<azonenberg> Yes, for now
<azonenberg> i wanted to focus on the datapath and interconnect first
<azonenberg> then go and write myself an FPU when i had all of the surrounding stuff done
<azonenberg> in the meantime i have theirs because it tells me an FPU of that size and speed is possible
<azonenberg> iow, setting a lower bound
<azonenberg> then i can try and outperform it with an open one
<azonenberg> Coregen lets you generate floating point add/sub, multiply, divide, and sqrt units separately
<azonenberg> So i'll replace them with my own one by one
<azonenberg> But again the focus for now is on the datapath and microarchitecture more than implementation
<lekernel_> you can use the milkymist pfpu pipelines btw ...
<azonenberg> The goal here is to practice efficient pipelined architecture
<azonenberg> So i want to use as little premade code as possible
<azonenberg> like i said i'm doing a thesis on computer architecture soon and i want practice
<lekernel_> but you reused the coregen pipelines already :-)
<azonenberg> Temporarily, so i could build the other stuff around them
<azonenberg> its not expected to stay
<azonenberg> if i had used a free one i'd have less incentive to replace it :p
<lekernel_> so that's what I get for developing free hardware ...
<azonenberg> production project? Sure
<azonenberg> But for educational value sometimes its better to reimplement
<azonenberg> Once i build mine, i'll compare it to yours and any other open ones i find
<azonenberg> and use the best one in real projects
Gurty joined #milkymist
Thihi_ joined #milkymist
Thihi joined #milkymist
Thihi_ joined #milkymist
<qi-bot> The Firmware build was successfull, see images here:
Thihi joined #milkymist
aw_ joined #milkymist
aw joined #milkymist
rejon joined #milkymist
<GitHub122> [flickernoise] sbourdeauducq pushed 5 new commits to master:
<GitHub122> [flickernoise/master] Do not create ramdisk folder - Sebastien Bourdeauducq
<GitHub122> [flickernoise/master] filedialog: lock in ssd - Sebastien Bourdeauducq
<GitHub122> [flickernoise/master] filedialog: prevent slash in filenames - Sebastien Bourdeauducq
gbraad joined #milkymist
gbraad joined #milkymist
Martoni joined #milkymist
sh4rm4 joined #milkymist
xiangfu joined #milkymist
Gurty joined #milkymist
<GitHub168> [flickernoise] sbourdeauducq pushed 1 new commit to master:
<GitHub168> [flickernoise/master] shutdown: rename button - Sebastien Bourdeauducq
<GitHub120> [flickernoise] sbourdeauducq pushed 1 new commit to master:
<GitHub120> [flickernoise/master] png: enable loading of RGBA images - Sebastien Bourdeauducq
xiangfu joined #milkymist
r33p joined #milkymist
Martoni joined #milkymist
<qi-bot> The Firmware build was successfull, see images here:
wolfspraul joined #milkymist
<GitHub36> [flickernoise] sbourdeauducq pushed 1 new commit to master:
<GitHub36> [flickernoise/master] New patch - Sebastien Bourdeauducq
DJTachyon joined #milkymist
r33p joined #milkymist
azonenberg joined #milkymist
zer1her1 joined #milkymist
Martoni joined #milkymist
Gurty joined #milkymist
wolfspraul joined #milkymist
wolfspraul joined #milkymist
wolfspraul joined #milkymist
<xiangfu> Hi
<xiangfu> what is the different between MicroBlaze and LM32.
<xiangfu> is that same thing in one SOC system. on LM32 is open but MicroBlaze?
<xiangfu> s/on/only
<wpwrak> kinda like MIPS vs. ARM. same purpose, different origin, different style, etc.
<xiangfu> wpwrak, got it.
<qi-bot> The Firmware build was successfull, see images here:
wolfspraul joined #milkymist
<lekernel_> new screenshots
<kristianpaul> MMM... :-)
<kristianpaul> too much zoomed effects i think
<wpwrak> bah. the end of the year is nearing. fireworks !! :)
wolfspraul joined #milkymist
<kristianpaul> :-)
<kristianpaul> yeah , fireworks are nice
<lekernel_> kristianpaul: if you design new patches that look better, there's no reason I would refuse them...
* wpwrak is amazed by how well USB can work even though he completely misunderstood the handshake between fpga and navre ...
<wpwrak> let's see if anything still works after fixing that
<lekernel_> ?
<wpwrak> i thought the SYNC would also set rx_pending ...
<lekernel_> no, rx_pending is only set after the first byte is completely received
<wpwrak> (but i never tried to retrieve it. sometimes, two wrongs make an almost right :)
<lekernel_> but it doesn't make much change, does it?
<lekernel_> (I mean the first byte of "payload" after the sync, ofc)
<wpwrak> yeah, just means that my loop was a little late
<wpwrak> and unnecessarily complicated, too
<kristianpaul> lekernel: i dont wanted to mean that, i just a comment (from what i like) no rush :-)
<kristianpaul> and no i dont imaging designing patches soon
r333p joined #milkymist
Alarm joined #milkymist
<Alarm> What is the best way to load the latest binary M1.?
<lekernel> Alarm: as I said, web update
<Alarm> no with the jtag ?
<lekernel> no, JTAG is for developers
<lekernel> and generally slower and harder to use than the web update if you just want a release upgrade
<lekernel> "Remove funky (ab-)use of the usb devices in bluetooth and milkymist." wtf?
<kristianpaul> lekernel: nice !!!)
<kristianpaul> ""
<kristianpaul> Once an application for custom ASIC cores, this demanding computer graphics process is now the province of low-cost FPGAs.
<Alarm> The problem is to download the latest version I'm using wget but it's not great for a set of files
<lekernel> the M1 downloads the latest version itself
<lekernel> just connect it to your internet router ...
<wpwrak> lekernel: (ab-use) what on earth is that presentation about anyway ?
<lekernel> USB in QEMU it seems
<lekernel> but I asked myself the same question for a while ;)
Alarm joined #milkymist
<wpwrak> ;-))
<Alarm> I want to do the update by the jtag for pedagogic reasons. The method "WebUpdate" has no interest for me
<Alarm> my problem is basic. I am looking for a simple command to download binaries
<Alarm> "wget-r" aspire all files
DJTachyon joined #milkymist
Gurty` joined #milkymist
mumptai joined #milkymist
* lekernel is giving orcc a try. of course, hundreds of MB of java bloat to install ...
Alarm_ joined #milkymist
mumptai joined #milkymist
Alarm joined #milkymist
errordeveloper joined #milkymist
juliusb joined #milkymist
mumptai joined #milkymist
<kristianpaul> some comments from a friend "you can get video switch for 8usd, but mixer.. as minimun do fading from one picture to another"
<kristianpaul> and please dont be angry with me for posting this, i'm just replying comments
<lekernel> the M1 isn't a video switch or mixer. the switch functionality is just a little add-on. you can also get an arduino led blinker for $25 which can do the same as the front panel LEDs on the M1... same kind of stupid comparison
<wpwrak> mixer may be tricky: you need two codecs for that
<wpwrak> and i'm not sure if the chip we use has multiple codecs inside
<lekernel> it does not
<lekernel> M1 was never intended as a video mixer
<kristianpaul> i'm very exited to bug other friends about M1/FN new features also bring back some feedback
<kristianpaul> sure not
<lekernel> the main feature of this software update is image support - and stress that it can be used with MIDI controllers. the rest is secondary.
juliusb_ joined #milkymist
<kristianpaul> sure sure
<kristianpaul> and for you hapiness he really likes the pacman video from wpwrak
<wpwrak> and one more device enumerates :)
<wpwrak> hehe ;-)
<wpwrak> we need a few more images per patch. then we can have real games :)
<kristianpaul> wee :)
<wpwrak> C64 retro style :)
<wpwrak> of course, the LV3 is still mute. that one's a tough cookie
juliusb joined #milkymist
antgreen joined #milkymist
<wpwrak> stekern: the latest patch set may also fix the low-speed regression you experienced.
<wpwrak> stekern: at least it removes quite a bit of confusion i had added before :)
<stekern> wpwrak: cool, do you keep those patches in a git repo somewhere?
<wpwrak> only locally
<stekern> ok, well, lekernel seems to be quite quick to apply them anyways
<stekern> I need to sign up on the ML
<wpwrak> yeah. he probably has his alarm clock connected to "grep PATCH" :)
<mwalle> lekernel: (usb abuse) thats qemu and it used the hid layer in a strange way
<mwalle> gerd and i fixed that some time ago ;)