##openfpga on 2018-12-27 — irc logs at freenode.irclog.whitequark.org

00:43 <GuzTech> I can't be at 35C3, so I made a small design in Clash that reads from a frame buffer in BRAM and output VGA signals.

00:43 <GuzTech> https://gitlab.com/GuzTech/clash-fb_vga

00:43 <GuzTech> https://twitter.com/BitlogIT/status/1078069375679254529

00:43 <GuzTech> And now it's time for bed :)

00:47 oter has joined ##openfpga

00:49 oter has quit [Client Quit]

00:53 Flea86 has quit [Ping timeout: 246 seconds]

01:09 zng has quit [Quit: ZNC 1.8.x-nightly-20181211-72c5f57b - https://znc.in]

01:11 zng has joined ##openfpga

01:15 <digshadow> thanks RaYmAn

01:16 <digshadow> oops

01:16 <digshadow> rqou

01:16 <digshadow> didn't see anyone there, but eh its late

02:11 Maylay has quit [Quit: Pipe Terminated]

02:12 unixb0y has quit [Ping timeout: 268 seconds]

02:13 unixb0y has joined ##openfpga

02:22 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

02:28 Maylay has joined ##openfpga

02:32 oter has joined ##openfpga

02:36 pie__ has joined ##openfpga

02:43 oter has quit [Quit: My iMac has gone to sleep. ZZZzzz…]

02:45 pie___ has joined ##openfpga

02:47 pie__ has quit [Ping timeout: 246 seconds]

02:54 oter has joined ##openfpga

02:55 oter has quit [Client Quit]

03:25 pie___ has quit [Ping timeout: 250 seconds]

03:31 GenTooMan has quit [Quit: Leaving]

03:42 catplant is now known as demonplant

03:47 Miyu has quit [Ping timeout: 246 seconds]

03:48 ayjay_t has quit [Read error: Connection reset by peer]

03:49 ayjay_t has joined ##openfpga

04:05 rohitksingh has joined ##openfpga

04:15 rohitksingh has quit [Ping timeout: 246 seconds]

04:23 _whitelogger has joined ##openfpga

04:25 rohitksingh has joined ##openfpga

04:39 ayjay_t has quit [Read error: Connection reset by peer]

04:39 ayjay_t has joined ##openfpga

04:45 oter has joined ##openfpga

04:50 oter has quit [Quit: Textual IRC Client: www.textualapp.com]

05:08 ayjay_t has quit [Read error: Connection reset by peer]

05:09 ayjay_t has joined ##openfpga

06:29 jcarpenter2 has quit [Read error: Connection reset by peer]

06:29 rofl_ has joined ##openfpga

06:41 _whitelogger_ has joined ##openfpga

06:46 _whitelogger has joined ##openfpga

07:00 demonplant is now known as tavycat

07:57 m4ssi has joined ##openfpga

07:57 rohitksingh has quit [Ping timeout: 245 seconds]

07:58 rohitksingh has joined ##openfpga

08:18 rohitksingh_ has joined ##openfpga

08:18 rohitksingh has quit [Ping timeout: 272 seconds]

09:07 <tnt> gruetzkopf: ping ?

09:12 rohitksingh_ has quit [Remote host closed the connection]

09:14 rohitksingh has joined ##openfpga

09:14 <noopwafel> quiet at CCC :p nice demo though :)

09:26 <tnt> d

09:43 Miyu has joined ##openfpga

09:47 jcreus has joined ##openfpga

09:48 Miyu has quit [Ping timeout: 250 seconds]

09:48 futarisIRCcloud has joined ##openfpga

10:30 <tnt> gruetzkopf: The ECP3 board is waiting for you at the openfpga assembly :)

10:33 sunxi_fan has joined ##openfpga

11:32 mumptai has joined ##openfpga

12:07 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

12:50 _whitelogger has joined ##openfpga

13:07 soylentyellow has joined ##openfpga

13:32 Morn_ has joined ##openfpga

13:47 m_t has joined ##openfpga

13:48 soylentyellow_ has joined ##openfpga

13:51 soylentyellow has quit [Ping timeout: 272 seconds]

13:58 <tnt> Anyone at the ##openfpga assembly ?

13:58 <tnt> I need someone to unplug and replug the icebreaker from the rpi :p

14:10 <tnt> no one ? :/

14:10 <tnt> please don't make me walk :)

14:15 <Zorix> you know you need the walk heh

14:16 <tnt> I need a way to do a USB reset on a device :p

14:17 <Zorix> easy

14:17 <tnt> really ?

14:17 <tnt> I found some people saying unloading hcd driver ... but it's on a rpi, if I unload the hcd, I'll loose network :/

14:18 <Zorix> i have a script i run to reset all the usb devices

14:19 <Zorix> but now that i look at it, it probably doesn't work on an individual device

14:19 <Zorix> but it doesn't unload the driver either

14:19 <Zorix> https://pastebin.com/c9DDrXKD

14:22 Miyu has joined ##openfpga

14:28 <tnt> I ended up rebooting the pi ... that worked :p

14:31 sunxi_fan has quit [Quit: Leaving.]

14:41 pie___ has joined ##openfpga

14:52 sunxi_fan has joined ##openfpga

15:02 <miek> tnt: fwiw this has worked for me before https://askubuntu.com/questions/645/how-do-you-reset-a-usb-device-from-the-command-line/661#661

15:04 <whitequark> yeah usbreset.c is useful

15:07 <tnt> miek: oh, tx, looks useful indeed.

15:17 sunxi_fan has quit [Read error: Connection reset by peer]

15:18 sunxi_fan has joined ##openfpga

15:25 X-Scale has quit [Quit: HydraIRC -> http://www.hydrairc.com <- Now with extra fish!]

15:25 sunxi_fan has quit [Ping timeout: 244 seconds]

15:28 pie___ has quit [Ping timeout: 246 seconds]

15:44 kristianpaul has joined ##openfpga

16:13 m_t has quit [Read error: Connection reset by peer]

16:24 sunxi_fan has joined ##openfpga

17:01 <jcreus> daveshah: you mentioned it'd be interesting to have an analytic placer going. I might try to give it a shot, I don't expect it to be quick due to time limitations and figuring out the codebase, but to get started I've been reading placer1 and router1. I was wondering if there's a set of benchmarks to compare results between PnRs while working on it?

17:01 <daveshah> jcreus: https://github.com/YosysHQ/nextpnr-bench

17:01 <daveshah> that's what we've got at the moment

17:01 <jcreus> okay crap sorry I should've searched more

17:02 <daveshah> no worries, it's not very well published

17:02 <daveshah> that benchmarks nextpnr against old versions of itself and arachne-pnr

17:03 <jcreus> what's the current philosophy for the optimization objectives - i.e. tradeoff between size and space?

17:03 <jcreus> like the analytic stuff I've been thinking about while showering would have the ability to trade-off, I think, and some literature I've read has similar stuff

17:06 <daveshah> At the moment I feel we mostly aim for Fmax, we don't try and optimise for size

17:07 <daveshah> so long as it fits

17:07 <jcreus> right, makes sense

17:08 <tnt> Fmax FTW !

17:09 <jcreus> I've also seen ppl say that it doesn't compare great compared to commercial stuff, but it to me it looks pretty good vs Lattice's stuff - is the worry that the current system doesn't scale to bigger chips?

17:09 <tnt> One thing the placer does really badly at the moment is dealing with fixed blocks. Things like SPRAMs for instance that are essentially unmoveable. It won't occur to the router to shift _all_ the luts closer to the SPRAM.

17:10 <daveshah> Scaling is definitely a problem with the current placer in terms of runtime

17:10 <daveshah> SA isn't great for bigger parts

17:11 <jcreus> right

17:11 <jcreus> also, I realize it's absolutely none of my business

17:11 <jcreus> and I'm just starting out so I might be missing the greater picture

17:12 <jcreus> but for situations like what tnt mentioned about situations handled poorly by the current placer, would it make sense to try to look for ice40 scripts on github semi randomly and add them liberally to the benchmarking repo?

17:12 <daveshah> yes, that would be awesome

17:13 <jcreus> 3 designs might not be very useful for comparison - for linear programs for instance progress is really awesome to track, since there's a standard library of thousands of linear programs and mixed-integer programs

17:19 sunxi_fan has quit [Read error: Connection reset by peer]

17:19 sunxi_fan has joined ##openfpga

17:27 pie__ has joined ##openfpga

17:44 jcreus has quit [Remote host closed the connection]

17:47 rohitksingh has quit [Ping timeout: 272 seconds]

17:48 <_whitenotifier-6> [whitequark/Boneless-CPU] whitequark pushed 2 commits to master [+7/-5/±7] https://git.io/fhkuk

17:48 <_whitenotifier-6> [whitequark/Boneless-CPU] whitequark 86d3621 - Rearrange the code for a nicer layout.

17:48 <_whitenotifier-6> [whitequark/Boneless-CPU] whitequark 22b299d - Convert everything to use nMigen. Yay!

17:55 rohitksingh has joined ##openfpga

17:57 sunxi_fan has left ##openfpga [##openfpga]

18:04 jcreus has joined ##openfpga

18:21 zng has quit [Quit: ZNC 1.8.x-nightly-20181211-72c5f57b - https://znc.in]

18:22 GuzTech has quit [Ping timeout: 250 seconds]

18:22 GuzTech has joined ##openfpga

18:23 zng has joined ##openfpga

18:24 pie__ has quit [Remote host closed the connection]

18:24 pie__ has joined ##openfpga

18:33 gruetzkopf has quit [Remote host closed the connection]

18:34 gruetzkopf has joined ##openfpga

18:39 GuzTech has quit [Ping timeout: 272 seconds]

18:46 <jcreus> sorry to keep going with the noob nextpnr questions, but for the ice40 case, the GUI seems to suggest that the individual bels are the each of the 8 logic cells (as opposed to the full PLB)?

18:46 <jcreus> how are the shared CEN/CLK signals handled?

18:46 <jcreus> that are shared across those 8

18:47 <jcreus> as the SA possibly messes them around and far away

19:40 tmeissner has joined ##openfpga

19:45 m_w has joined ##openfpga

19:49 GuzTech has joined ##openfpga

20:13 <tnt> Oh, my CPU executed its first few instructions, so cute :P

20:14 <tnt> jcreus: there is a validty check to make sure a BEL doesn't conflict with other ones in the same PLB

20:15 <jcreus> tnt: I see, thanks!

20:18 <jcreus> trying to figure out how to best deal with those constraints when doing it analytically instead of SA where you can do these checks as you go

20:18 <jcreus> why can't everything just be convex?

20:19 <daveshah> So there are two possible ways to solve this

20:19 <daveshah> one would be to have a first stage "tile packer" that makes legal tiles for the analytical placer

20:19 <tnt> Somewhat unsurprisingly yosys doesn't like a switch case with 65536 entries ...

20:19 <daveshah> the other option would be to start SA at a low temperature to legalise the placement created by the analytical placer

20:20 <jcreus> I was thinking about doing the latter either way in order to not have to implement all the legalisation logic again myself

20:20 <daveshah> That is probably the best option

20:21 <jcreus> is the legaliser efficient when doing that?

20:21 <jcreus> actually nvm it is

20:23 <jcreus> also, I see that it's technically a 3D grid - are the bels with z != 0 always special cases (like, idk, BRAMs or something) such that I can assume the big optimization is just in xy?

20:23 <daveshah> Yes

20:24 <daveshah> z != 0 is mostly for logic tiles, where z = 0..7

20:37 _whitelogger has joined ##openfpga

20:40 pie__ has quit [Ping timeout: 252 seconds]

20:50 <azonenberg> jcreus: regarding scalability, prjxray is trying to reverse engineer the xilinx 7 series bitstream

20:51 <azonenberg> So when thinking about scalability and performance, don't think about your par on an ice40 or ecp5

20:51 <azonenberg> think about how it will run on a virtex-7

20:51 <azonenberg> Ideally i'd want to be able to multithread it too

20:51 <azonenberg> actually, *really* ideal would be an MPI cluster or similar so you can run on hundreds of cores :p

20:52 <azonenberg> but multithreading is a good start

20:53 <jcreus> azonenberg: gotcha. My background is mostly in optimization (distributed convex optimization being my biggest kink) so I'm hoping to formulate it that way, and scalability should follow nicely

20:54 <azonenberg> awesome

20:54 <azonenberg> Basically, my long term dream is being able to take a synthesized netlist (we'll worry about optimizing synthesis later, lol)

20:54 <azonenberg> for a full virtex ultrascale

20:54 <jcreus> yeah, that would be awesome

20:55 <jcreus> I recently realized that a kinda nasty thing are the distributed complex elements like DSPs and RAMs, that need to be placed, too

20:55 <jcreus> and they're special in that you can't really pretend they're continuous like you can do with LUTs

20:55 <azonenberg> throw it on a rack of xeons or a few dozen t3.2xlarge instances

20:55 <azonenberg> and get a bitstream back in minutes

20:55 <azonenberg> i have no idea how feasible this is because i havent had the time to even look into scaling bottlenecks etc

20:55 <azonenberg> But that's my goal :p

20:56 <jcreus> yeppp

20:56 <jcreus> convex solver would be a good start, I recently worked on a distributed QP solver using Regent/Legion (which has seen some use on supercomputers)

20:56 <azonenberg> I'm thinking start with a simple sequential implementatino of the solver core to prototype a bit

20:56 <azonenberg> then openmp on a single node

20:57 <azonenberg> then rewrite in either MPI or openmp + sockets

20:57 <azonenberg> for scaling to larger platforms

20:57 <jcreus> oh, yeah, for sure. For now actually I'll probably start by jankily communicating with Python and using cvxpy

20:57 <azonenberg> keep in mind we dont want to sacrifice usability on single-node jobs just to get scaling

20:57 <jcreus> then when I like the cost function and constraints go back to c++ land and do it there properly

20:57 <azonenberg> Yeah maeks sense

20:57 <jcreus> but yeah I'll need to think more carefully about DSPs and RAMs

20:58 <jcreus> they seem annoying

20:58 <jcreus> in theory you could always consider permutations of cell choices?

20:59 <jcreus> which blows up, obviously, so maybe some meta-simulated-annealing

20:59 <azonenberg> Loooong term i want iterative optimization capability

20:59 <jcreus> you could always do device specific hacks - iirc ice40 has DSP at the edges only...

20:59 <azonenberg> so do an initial placement, try routing it

21:00 <azonenberg> fine tune placement based on routing feedback (say, if you have heavy routing congestion move some bels closer together to shorten routing delays)

21:00 <azonenberg> or even (very long term) adjust register balancing

21:00 <jcreus> right

21:00 <jcreus> actually, quick question about that

21:00 <azonenberg> based on feedback with actual routing delays

21:00 <jcreus> how much of a factor is routing?

21:00 <azonenberg> But to start, forget register balancing and netlist changes and focus on P&R only

21:01 <azonenberg> i'd say on average routing delay can be expected to be the same OOM as logic delay

21:01 <jcreus> is it something like "well, if it routes successfully, then the solution won't be far from the optimal given that placement, so go work on the placer?"

21:01 <azonenberg> but if you have longer range nets between IP blocks or something it's usually 2-4x as big as net delay

21:01 <azonenberg> so IMO the placer needs to be aware of wire delay to get optimal results

21:01 <azonenberg> first order approximation can be just Manhattan distance between nodes times a cost factor

21:02 <azonenberg> but down the road i want to consider congestion and such

21:02 <azonenberg> i.e. there are no free paths between these two slices so we have to detour around

21:02 <azonenberg> that adds delay, so move the source of that net closer to us to compensate

21:02 <jcreus> makes sense

21:02 <azonenberg> i suspect there will need to be several iterations of this until we converge

21:03 <jcreus> is the quadratic cost ppl use purely for optimization purposes (since it does make things nice), or is there some validity to it? something like, longer paths go through more interconnects, so the increase is worse than linear?

21:04 <azonenberg> i think quadratic wirelength metrics are intended to disproportionately penalize the longest nets since those are the most likely to make you fail timing

21:04 tmeissner has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

21:04 <azonenberg> ideally, i would want delay calculations based in time rather than distanec

21:04 <azonenberg> keeping in mind that, say, an x4 wire vs a x1 may not be 4x the delay once you factor in the switch block

21:04 <azonenberg> it's 4x the RC delay but probably not 4x the buffer/mux delay

21:05 <azonenberg> That is likely to be too expensive to do in the inner loop though

21:05 <azonenberg> so maybe adjust cost tables between inner loop iterations or something with actual timing data

21:05 <azonenberg> You can also do fun stuff like consider that the northmost vs southmost bel in a slice are not quite timing identical

21:05 <jcreus> agh, right

21:06 <azonenberg> I have characterization data for greenpak that shows eastbound and westbound wires are not timing identical either

21:06 <azonenberg> and, in fact, within a given direction some wires are slower than others

21:06 <azonenberg> And i can measure this delay reliably

21:06 <jcreus> dang

21:06 <jcreus> interesting

21:07 <azonenberg> http://thanatos.virtual.antikernel.net/unlisted/greenpak-characterization-07.png

21:07 <azonenberg> this is ten east and ten west routes on a slg46620, before calibrating for i/o buffer delay (which is constant since i used the same pins and just changed internal routes)

21:07 <azonenberg> measured for five dies

21:07 pie__ has joined ##openfpga

21:08 <jcreus> nice

21:08 <azonenberg> you can see the fast and slow process corners pretty clearly, as well as a kind of sawtooth pattern where the delay increases, dips, increases, dips, increases, and dips again

21:08 <azonenberg> then the left half is slower than the right

21:08 <azonenberg> i forget which half is east and which is west

21:08 <azonenberg> but the difference is obvious and significant

21:08 <jcreus> yeah

21:09 <azonenberg> http://thanatos.virtual.antikernel.net/unlisted/greenpak-characterization-13.png

21:09 <azonenberg> http://thanatos.virtual.antikernel.net/unlisted/greenpak-characterization-14.png

21:09 <azonenberg> here's voltage variation

21:10 <azonenberg> i have a thermal setup but moved before i had time to gather data from it

21:10 <azonenberg> eventual plan was to characterize across P/T/V and even be able to bin chips myself (they only list typical values, not min/max)

21:10 <azonenberg> obviously the higher end chips are better characterized than this

21:10 <azonenberg> but it was a fun challenge

21:11 <jcreus> yeah, sounds fun

21:32 tmeissner has joined ##openfpga

21:51 tmeissner has quit [Quit: Textual IRC Client: www.textualapp.com]

21:52 rohitksingh has quit [Read error: Connection reset by peer]

22:27 futarisIRCcloud has joined ##openfpga

22:32 Richard_Simmons has joined ##openfpga

22:33 Bob_Dole has quit [Ping timeout: 260 seconds]

23:10 dj_pi has joined ##openfpga

23:16 <cr1901_modern> I keep misreading thanatos as thanos, and I kinda wish that was what it actually said

23:36 pie__ has quit [Ping timeout: 250 seconds]

23:39 jcreus has quit [Ping timeout: 250 seconds]

23:39 pie__ has joined ##openfpga

23:56 dj_pi has quit [Ping timeout: 244 seconds]