<GuzTech> I can't be at 35C3, so I made a small design in Clash that reads from a frame buffer in BRAM and output VGA signals.
<GuzTech> And now it's time for bed :)
oter has joined ##openfpga
oter has quit [Client Quit]
Flea86 has quit [Ping timeout: 246 seconds]
zng has quit [Quit: ZNC 1.8.x-nightly-20181211-72c5f57b - https://znc.in]
zng has joined ##openfpga
<digshadow> thanks RaYmAn
<digshadow> oops
<digshadow> rqou
<digshadow> didn't see anyone there, but eh its late
Maylay has quit [Quit: Pipe Terminated]
unixb0y has quit [Ping timeout: 268 seconds]
unixb0y has joined ##openfpga
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
Maylay has joined ##openfpga
oter has joined ##openfpga
pie__ has joined ##openfpga
oter has quit [Quit: My iMac has gone to sleep. ZZZzzz…]
pie___ has joined ##openfpga
pie__ has quit [Ping timeout: 246 seconds]
oter has joined ##openfpga
oter has quit [Client Quit]
pie___ has quit [Ping timeout: 250 seconds]
GenTooMan has quit [Quit: Leaving]
catplant is now known as demonplant
Miyu has quit [Ping timeout: 246 seconds]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
rohitksingh has joined ##openfpga
rohitksingh has quit [Ping timeout: 246 seconds]
_whitelogger has joined ##openfpga
rohitksingh has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
oter has joined ##openfpga
oter has quit [Quit: Textual IRC Client: www.textualapp.com]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
jcarpenter2 has quit [Read error: Connection reset by peer]
rofl_ has joined ##openfpga
_whitelogger_ has joined ##openfpga
_whitelogger has joined ##openfpga
demonplant is now known as tavycat
m4ssi has joined ##openfpga
rohitksingh has quit [Ping timeout: 245 seconds]
rohitksingh has joined ##openfpga
rohitksingh_ has joined ##openfpga
rohitksingh has quit [Ping timeout: 272 seconds]
<tnt> gruetzkopf: ping ?
rohitksingh_ has quit [Remote host closed the connection]
rohitksingh has joined ##openfpga
<noopwafel> quiet at CCC :p nice demo though :)
<tnt> d
Miyu has joined ##openfpga
jcreus has joined ##openfpga
Miyu has quit [Ping timeout: 250 seconds]
futarisIRCcloud has joined ##openfpga
<tnt> gruetzkopf: The ECP3 board is waiting for you at the openfpga assembly :)
sunxi_fan has joined ##openfpga
mumptai has joined ##openfpga
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
_whitelogger has joined ##openfpga
soylentyellow has joined ##openfpga
Morn_ has joined ##openfpga
m_t has joined ##openfpga
soylentyellow_ has joined ##openfpga
soylentyellow has quit [Ping timeout: 272 seconds]
<tnt> Anyone at the ##openfpga assembly ?
<tnt> I need someone to unplug and replug the icebreaker from the rpi :p
<tnt> no one ? :/
<tnt> please don't make me walk :)
<Zorix> you know you need the walk heh
<tnt> I need a way to do a USB reset on a device :p
<Zorix> easy
<tnt> really ?
<tnt> I found some people saying unloading hcd driver ... but it's on a rpi, if I unload the hcd, I'll loose network :/
<Zorix> i have a script i run to reset all the usb devices
<Zorix> but now that i look at it, it probably doesn't work on an individual device
<Zorix> but it doesn't unload the driver either
Miyu has joined ##openfpga
<tnt> I ended up rebooting the pi ... that worked :p
sunxi_fan has quit [Quit: Leaving.]
pie___ has joined ##openfpga
sunxi_fan has joined ##openfpga
<whitequark> yeah usbreset.c is useful
<tnt> miek: oh, tx, looks useful indeed.
sunxi_fan has quit [Read error: Connection reset by peer]
sunxi_fan has joined ##openfpga
X-Scale has quit [Quit: HydraIRC -> http://www.hydrairc.com <- Now with extra fish!]
sunxi_fan has quit [Ping timeout: 244 seconds]
pie___ has quit [Ping timeout: 246 seconds]
kristianpaul has joined ##openfpga
m_t has quit [Read error: Connection reset by peer]
sunxi_fan has joined ##openfpga
<jcreus> daveshah: you mentioned it'd be interesting to have an analytic placer going. I might try to give it a shot, I don't expect it to be quick due to time limitations and figuring out the codebase, but to get started I've been reading placer1 and router1. I was wondering if there's a set of benchmarks to compare results between PnRs while working on it?
<daveshah> that's what we've got at the moment
<jcreus> okay crap sorry I should've searched more
<daveshah> no worries, it's not very well published
<daveshah> that benchmarks nextpnr against old versions of itself and arachne-pnr
<jcreus> what's the current philosophy for the optimization objectives - i.e. tradeoff between size and space?
<jcreus> like the analytic stuff I've been thinking about while showering would have the ability to trade-off, I think, and some literature I've read has similar stuff
<daveshah> At the moment I feel we mostly aim for Fmax, we don't try and optimise for size
<daveshah> so long as it fits
<jcreus> right, makes sense
<tnt> Fmax FTW !
<jcreus> I've also seen ppl say that it doesn't compare great compared to commercial stuff, but it to me it looks pretty good vs Lattice's stuff - is the worry that the current system doesn't scale to bigger chips?
<tnt> One thing the placer does really badly at the moment is dealing with fixed blocks. Things like SPRAMs for instance that are essentially unmoveable. It won't occur to the router to shift _all_ the luts closer to the SPRAM.
<daveshah> Scaling is definitely a problem with the current placer in terms of runtime
<daveshah> SA isn't great for bigger parts
<jcreus> right
<jcreus> also, I realize it's absolutely none of my business
<jcreus> and I'm just starting out so I might be missing the greater picture
<jcreus> but for situations like what tnt mentioned about situations handled poorly by the current placer, would it make sense to try to look for ice40 scripts on github semi randomly and add them liberally to the benchmarking repo?
<daveshah> yes, that would be awesome
<jcreus> 3 designs might not be very useful for comparison - for linear programs for instance progress is really awesome to track, since there's a standard library of thousands of linear programs and mixed-integer programs
sunxi_fan has quit [Read error: Connection reset by peer]
sunxi_fan has joined ##openfpga
pie__ has joined ##openfpga
jcreus has quit [Remote host closed the connection]
rohitksingh has quit [Ping timeout: 272 seconds]
<_whitenotifier-6> [whitequark/Boneless-CPU] whitequark pushed 2 commits to master [+7/-5/±7] https://git.io/fhkuk
<_whitenotifier-6> [whitequark/Boneless-CPU] whitequark 86d3621 - Rearrange the code for a nicer layout.
<_whitenotifier-6> [whitequark/Boneless-CPU] whitequark 22b299d - Convert everything to use nMigen. Yay!
rohitksingh has joined ##openfpga
sunxi_fan has left ##openfpga [##openfpga]
jcreus has joined ##openfpga
zng has quit [Quit: ZNC 1.8.x-nightly-20181211-72c5f57b - https://znc.in]
GuzTech has quit [Ping timeout: 250 seconds]
GuzTech has joined ##openfpga
zng has joined ##openfpga
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
gruetzkopf has quit [Remote host closed the connection]
gruetzkopf has joined ##openfpga
GuzTech has quit [Ping timeout: 272 seconds]
<jcreus> sorry to keep going with the noob nextpnr questions, but for the ice40 case, the GUI seems to suggest that the individual bels are the each of the 8 logic cells (as opposed to the full PLB)?
<jcreus> how are the shared CEN/CLK signals handled?
<jcreus> that are shared across those 8
<jcreus> as the SA possibly messes them around and far away
tmeissner has joined ##openfpga
m_w has joined ##openfpga
GuzTech has joined ##openfpga
<tnt> Oh, my CPU executed its first few instructions, so cute :P
<tnt> jcreus: there is a validty check to make sure a BEL doesn't conflict with other ones in the same PLB
<jcreus> tnt: I see, thanks!
<jcreus> trying to figure out how to best deal with those constraints when doing it analytically instead of SA where you can do these checks as you go
<jcreus> why can't everything just be convex?
<daveshah> So there are two possible ways to solve this
<daveshah> one would be to have a first stage "tile packer" that makes legal tiles for the analytical placer
<tnt> Somewhat unsurprisingly yosys doesn't like a switch case with 65536 entries ...
<daveshah> the other option would be to start SA at a low temperature to legalise the placement created by the analytical placer
<jcreus> I was thinking about doing the latter either way in order to not have to implement all the legalisation logic again myself
<daveshah> That is probably the best option
<jcreus> is the legaliser efficient when doing that?
<jcreus> actually nvm it is
<jcreus> also, I see that it's technically a 3D grid - are the bels with z != 0 always special cases (like, idk, BRAMs or something) such that I can assume the big optimization is just in xy?
<daveshah> Yes
<daveshah> z != 0 is mostly for logic tiles, where z = 0..7
_whitelogger has joined ##openfpga
pie__ has quit [Ping timeout: 252 seconds]
<azonenberg> jcreus: regarding scalability, prjxray is trying to reverse engineer the xilinx 7 series bitstream
<azonenberg> So when thinking about scalability and performance, don't think about your par on an ice40 or ecp5
<azonenberg> think about how it will run on a virtex-7
<azonenberg> Ideally i'd want to be able to multithread it too
<azonenberg> actually, *really* ideal would be an MPI cluster or similar so you can run on hundreds of cores :p
<azonenberg> but multithreading is a good start
<jcreus> azonenberg: gotcha. My background is mostly in optimization (distributed convex optimization being my biggest kink) so I'm hoping to formulate it that way, and scalability should follow nicely
<azonenberg> awesome
<azonenberg> Basically, my long term dream is being able to take a synthesized netlist (we'll worry about optimizing synthesis later, lol)
<azonenberg> for a full virtex ultrascale
<jcreus> yeah, that would be awesome
<jcreus> I recently realized that a kinda nasty thing are the distributed complex elements like DSPs and RAMs, that need to be placed, too
<jcreus> and they're special in that you can't really pretend they're continuous like you can do with LUTs
<azonenberg> throw it on a rack of xeons or a few dozen t3.2xlarge instances
<azonenberg> and get a bitstream back in minutes
<azonenberg> i have no idea how feasible this is because i havent had the time to even look into scaling bottlenecks etc
<azonenberg> But that's my goal :p
<jcreus> yeppp
<jcreus> convex solver would be a good start, I recently worked on a distributed QP solver using Regent/Legion (which has seen some use on supercomputers)
<azonenberg> I'm thinking start with a simple sequential implementatino of the solver core to prototype a bit
<azonenberg> then openmp on a single node
<azonenberg> then rewrite in either MPI or openmp + sockets
<azonenberg> for scaling to larger platforms
<jcreus> oh, yeah, for sure. For now actually I'll probably start by jankily communicating with Python and using cvxpy
<azonenberg> keep in mind we dont want to sacrifice usability on single-node jobs just to get scaling
<jcreus> then when I like the cost function and constraints go back to c++ land and do it there properly
<azonenberg> Yeah maeks sense
<jcreus> but yeah I'll need to think more carefully about DSPs and RAMs
<jcreus> they seem annoying
<jcreus> in theory you could always consider permutations of cell choices?
<jcreus> which blows up, obviously, so maybe some meta-simulated-annealing
<azonenberg> Loooong term i want iterative optimization capability
<jcreus> you could always do device specific hacks - iirc ice40 has DSP at the edges only...
<azonenberg> so do an initial placement, try routing it
<azonenberg> fine tune placement based on routing feedback (say, if you have heavy routing congestion move some bels closer together to shorten routing delays)
<azonenberg> or even (very long term) adjust register balancing
<jcreus> right
<jcreus> actually, quick question about that
<azonenberg> based on feedback with actual routing delays
<jcreus> how much of a factor is routing?
<azonenberg> But to start, forget register balancing and netlist changes and focus on P&R only
<azonenberg> i'd say on average routing delay can be expected to be the same OOM as logic delay
<jcreus> is it something like "well, if it routes successfully, then the solution won't be far from the optimal given that placement, so go work on the placer?"
<azonenberg> but if you have longer range nets between IP blocks or something it's usually 2-4x as big as net delay
<azonenberg> so IMO the placer needs to be aware of wire delay to get optimal results
<azonenberg> first order approximation can be just Manhattan distance between nodes times a cost factor
<azonenberg> but down the road i want to consider congestion and such
<azonenberg> i.e. there are no free paths between these two slices so we have to detour around
<azonenberg> that adds delay, so move the source of that net closer to us to compensate
<jcreus> makes sense
<azonenberg> i suspect there will need to be several iterations of this until we converge
<jcreus> is the quadratic cost ppl use purely for optimization purposes (since it does make things nice), or is there some validity to it? something like, longer paths go through more interconnects, so the increase is worse than linear?
<azonenberg> i think quadratic wirelength metrics are intended to disproportionately penalize the longest nets since those are the most likely to make you fail timing
tmeissner has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<azonenberg> ideally, i would want delay calculations based in time rather than distanec
<azonenberg> keeping in mind that, say, an x4 wire vs a x1 may not be 4x the delay once you factor in the switch block
<azonenberg> it's 4x the RC delay but probably not 4x the buffer/mux delay
<azonenberg> That is likely to be too expensive to do in the inner loop though
<azonenberg> so maybe adjust cost tables between inner loop iterations or something with actual timing data
<azonenberg> You can also do fun stuff like consider that the northmost vs southmost bel in a slice are not quite timing identical
<jcreus> agh, right
<azonenberg> I have characterization data for greenpak that shows eastbound and westbound wires are not timing identical either
<azonenberg> and, in fact, within a given direction some wires are slower than others
<azonenberg> And i can measure this delay reliably
<jcreus> dang
<jcreus> interesting
<azonenberg> this is ten east and ten west routes on a slg46620, before calibrating for i/o buffer delay (which is constant since i used the same pins and just changed internal routes)
<azonenberg> measured for five dies
pie__ has joined ##openfpga
<jcreus> nice
<azonenberg> you can see the fast and slow process corners pretty clearly, as well as a kind of sawtooth pattern where the delay increases, dips, increases, dips, increases, and dips again
<azonenberg> then the left half is slower than the right
<azonenberg> i forget which half is east and which is west
<azonenberg> but the difference is obvious and significant
<jcreus> yeah
<azonenberg> here's voltage variation
<azonenberg> i have a thermal setup but moved before i had time to gather data from it
<azonenberg> eventual plan was to characterize across P/T/V and even be able to bin chips myself (they only list typical values, not min/max)
<azonenberg> obviously the higher end chips are better characterized than this
<azonenberg> but it was a fun challenge
<jcreus> yeah, sounds fun
tmeissner has joined ##openfpga
tmeissner has quit [Quit: Textual IRC Client: www.textualapp.com]
rohitksingh has quit [Read error: Connection reset by peer]
futarisIRCcloud has joined ##openfpga
Richard_Simmons has joined ##openfpga
Bob_Dole has quit [Ping timeout: 260 seconds]
dj_pi has joined ##openfpga
<cr1901_modern> I keep misreading thanatos as thanos, and I kinda wish that was what it actually said
pie__ has quit [Ping timeout: 250 seconds]
jcreus has quit [Ping timeout: 250 seconds]
pie__ has joined ##openfpga
dj_pi has quit [Ping timeout: 244 seconds]