##openfpga on 2018-08-05 — irc logs at freenode.irclog.whitequark.org

00:17 pie_ has joined ##openfpga

00:23 pie__ has joined ##openfpga

00:23 pie_ has quit [Excess Flood]

00:24 pie___ has joined ##openfpga

00:28 pie__ has quit [Ping timeout: 240 seconds]

00:39 X-Scale has joined ##openfpga

01:30 m_t has quit [Quit: Leaving]

02:37 s1dev has joined ##openfpga

02:40 <s1dev> awygle, I was thinking about some stuff and in the short term, HPWL is probably computable with SIMD -> GPU acceleration

02:41 <awygle> s1dev: how so? I think the final reduction can be but the per-net math doesn't seem amenable

02:43 s1dev has quit [Remote host closed the connection]

02:43 s1dev has joined ##openfpga

02:44 <s1dev> awygle, for a given placement, isn't HPWL just a matter of looping through the nets without branching?

02:45 <awygle> s1dev: that's true... not all the nets are the same width, but I guess that doesn't matter much

02:46 <awygle> oh but nodes can be on many nets, so you can't store nodes on the same net near each other without duplication (which might be fine)

02:47 <s1dev> but if you have some replicas, then you're fine

02:47 * awygle is visualizing memory layouts while cooking dinner

02:47 <s1dev> you just compute the HPWL a particular net across all the replicacs

02:47 <awygle> yeah that's fine at the cost of your update operation being somewhat more expensive

02:47 <awygle> probably not a lot tho and that happens way less often

02:48 <s1dev> it's just a matter of reordering some loops

02:49 <s1dev> in the case of SA you'd just run multiple restarts simultaneously. PA and PT naturally have a bunch of replicas to parallelize

02:50 <awygle> hm I guess I'm picturing a different thing

02:50 <awygle> gimme like 30m for dinner and then I'll try again

02:51 <sorear> surely for SA you can evaluate a few dozen swaps in parallel, and apply the chosen and non-conflicting ones

02:51 <sorear> since conflicts will be relatively rare

02:53 <s1dev> sounds like there might be some branching involved in that

02:57 <sorear> maybe a little. i'd need to work out a lot more details

02:58 <sorear> other question: is there a foss tool that will run ordinary C++ code on a gpu

02:58 GenTooMan has quit [Quit: Leaving]

02:58 <sorear> (in order to share as much arch code as possible between the gpu placer and the current placer)

02:59 <s1dev> well, I've heard that CUDA these days is just a matter of using their STL implementations and keywords for marking kernel code

02:59 <s1dev> *GPU kernels

03:13 digshadow has quit [Ping timeout: 260 seconds]

03:39 pie___ has quit [Remote host closed the connection]

03:39 pie___ has joined ##openfpga

03:44 s1dev has quit [Quit: Leaving]

04:11 <awygle> yeah okay I thought about it more and it does totally work, you have to do some memory movement but there's plenty of work to use to hide the latency

04:14 Bike has quit [Quit: Lost terminal]

05:06 s1dev has joined ##openfpga

05:12 <rqou> offtopic: the HTME youtube channel is apparently teaching me that modern glass is an amazing technology

05:14 <rqou> or just modern materials science in general

06:29 cyrozap has quit [Ping timeout: 265 seconds]

06:32 cyrozap has joined ##openfpga

06:34 rohitksingh has joined ##openfpga

06:38 rohitksingh has quit [Ping timeout: 240 seconds]

06:41 cyrozap has quit [Ping timeout: 265 seconds]

06:45 cyrozap has joined ##openfpga

07:09 cyrozap has quit [Ping timeout: 256 seconds]

07:09 cyrozap has joined ##openfpga

07:44 digshadow has joined ##openfpga

07:53 <s1dev> awygle, if you try hard enough.... https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug_altclock.pdf

08:06 <pie___> rqou, htme?

08:06 <rqou> how to make everything

08:06 <pie___> ah

09:15 s1dev has quit [Quit: Leaving]

09:22 kuldeep has quit [Ping timeout: 256 seconds]

09:22 kuldeep has joined ##openfpga

11:48 Bike has joined ##openfpga

12:10 rohitksingh has joined ##openfpga

12:20 rohitksingh has quit [Quit: Leaving.]

12:33 rohitksingh has joined ##openfpga

12:34 rohitksingh has quit [Client Quit]

12:43 gruetzko- has joined ##openfpga

12:44 gruetzkopf has quit [Ping timeout: 264 seconds]

12:44 moho1 has quit [Ping timeout: 264 seconds]

12:47 moho1 has joined ##openfpga

13:56 Miyu has joined ##openfpga

14:18 rohitksingh has joined ##openfpga

14:55 rohitksingh has quit [Quit: Leaving.]

15:21 rohitksingh has joined ##openfpga

15:33 <q3k> when you have formal verification tools, everything looks like a formally verified nail https://blog.dragonsector.pl/2018/08/code-blue-ctf-2018-quals-watchcats.html

15:47 <pie___> oh. shit. "Here's where formal methods come into play. We'll be using 'yosys-smtbmc', which is a flow that involves running the Yosys synthesizer with an SMT2 backend, and then feeding those SMT2 circuit descriptions into SMT2 solvers. These circuit 'models' can be used for different modes of operation of the solver:"

15:48 <pie___> obviously im missing a lot of contextual knowldge but i wouldnt have thought of going through verilog lol

15:49 <daveshah> I've used Verilog in the past for formal analysis of stuff other than circuits tbh. Because I'm much quicker writing Verilog than any proper formal language, etc

16:00 rohitksingh has quit [Quit: Leaving.]

16:04 pie___ has quit [Quit: Leaving]

16:08 GenTooMan has joined ##openfpga

16:13 <cr1901> SMTv2 is s-expr based, so it's not that bad to write by hand

16:14 <sorear> naive q: nextpnr has recently gained the ability to insert luts into nets. Should there be some kind of no_new_glitches per-net option to prevent this?

16:17 <awygle> glitches_get_stitches

16:20 <daveshah> sorear: FPGA synthesis is not glitch free for so many reasons. If you're worried about them, inserting the odd LUT is the least of your worries

16:22 <daveshah> I'm not even sure what a pass thru LUT would cause that other interconnect doesn't use anyway

16:23 <sorear> You can avoid the entire synthesis problem by manually instantiating primitives. PnR seems inescapable though

16:24 <daveshah> Can you actually show what kind of glitch you mean? I'm not convinced a pass thru LUT is actually worse than interconnect (but might be wrong), or other architectural stuff like the fact each LUT input has a different delay

16:24 <sorear> logic synthesis *in general* introduces glitches because logical equivalence allows that

16:25 <sorear> A LUT can turn one input edge into multiple output edges

16:26 <daveshah> I would be interested to know if a pass thru LUT also did that in practice

16:26 <sorear> Not sure if that’s possible in the specific case of an ice40 lut being used for pass through

16:26 <daveshah> I can see how a LUT with more than one utilised inout could

16:27 <daveshah> *input

16:27 <daveshah> The thing is, if one says that, then one could also extend that to the fact that interconnect buffers could introduce glitches

16:27 <sorear> Do we have transistor-level schematics for a LUT in any product?

16:28 <daveshah> Not sure, might be for some academic architecures at least

16:28 <daveshah> It's typically a cascade of muxes though

16:28 <sorear> It’s much less plausible for interconnect buffers, since there is one obvious way to do an enable buffer and it doesn’t glitch

16:28 <daveshah> True

16:45 X-Scale has quit [Ping timeout: 264 seconds]

17:27 X-Scale has joined ##openfpga

18:01 rohitksingh has joined ##openfpga

18:10 rohitksingh has quit [Quit: Leaving.]

18:42 digshadow has quit [Quit: Leaving.]

18:42 digshadow has joined ##openfpga

19:06 <sorear> after giving the matter more thought, i can think of 3 ways to implement a N:1 mux (N pass transistors + buffer; N tristate buffers; N AND + 1 OR), and none of them will produce glitches in the pass-through case

19:12 <sorear> however. none of these methods can take advantage of the fact that a sram fpga has dual-rail *data* inputs. which makes me wonder if there's a different approach that'd be used instead

19:24 X-Scale has quit [Ping timeout: 256 seconds]

19:38 <cr1901> >A LUT can turn one input edge into multiple output edges

19:38 <cr1901> Could you elaborate w/ a toy example?

19:38 <cr1901> (must be combining LUTs where this happens, not just a single LUT)

19:41 <sorear> cr1901: let's say you have a 2:1 MUX implemented as an AND/OR tree, both data inputs are 1, the output is 1

19:42 <sorear> cr1901: now say the select input changes from 0 to 1. depending on the order the decoder lines change, the MUX output could briefly go 0 before returning to 1

19:45 <sorear> i have just found out what SAED stands for o_O

19:48 <daveshah> I think such an option in nextpnr would nonetheless useful for debugging if nothing else (likewise being able to disable lut input permutation)

19:48 <daveshah> LUT input permutation could introduce extra glitches if you were relying on the exact mux structure

19:49 <daveshah> The different delays of the LUT inputs are characterised in the ice40 timing model

19:50 <awygle> SAED is supercool

19:53 <awygle> i love electron microsopes

19:53 <awygle> even if all the pictures make me very uncomfortable

20:00 X-Scale has joined ##openfpga

20:11 mumptai has joined ##openfpga

20:40 <sorear> (more generally: i want the open tools to have features for people to get excited about other than just "it's open". do at least a couple things the vendor tools simply can't)

20:59 <daveshah> Yes, that's very much the spirit of nextpnr

20:59 <daveshah> That's why we are working on a Python API, nice GUI, bitstream reader, etc

21:00 <awygle> daveshah: in your opinion how ready is nextpnr for plugging in alternative/experimental placement algorithms?

21:00 <daveshah> awygle: should be ready already

21:00 <awygle> hmmm

21:03 <daveshah> For anything parallel, you'd need to have a first pass to combine a Bels into tiles. But we have some API functions for working with Bels by tile for that purpose

21:06 <awygle> to use ice40 parlance, Bels are LCs and tiles are PLBs? or are Bels even lower level?

21:08 <daveshah> Bels are LCs

21:08 <daveshah> Tiles are PLBs, but lots of ice40 stuff just calls them tiles

21:08 <awygle> ok. so you can end up with an illegal placement due to carries, but you don't have to pack LUTs with FFs

21:09 <daveshah> You can end up with an illegal placement for many more reasons than carries

21:09 <daveshah> The Arch API provides functions to check validity of arch specific stuff

21:09 <daveshah> Carries are specified and validated using relative placement constraints

21:10 <awygle> hm. well i'll poke at the source ... eventually. thanks

21:18 <sorear> https://github.com/YosysHQ/nextpnr/commit/2853149c682eca805739a25e46dfb18c006efed9 seems like it's committing us to keeping all CellInfo objects in memory at all times ?

21:19 <daveshah> They are kept in memory already

21:19 <daveshah> All that commit changes is the handle to access them

21:25 Bike has quit [Ping timeout: 240 seconds]

21:35 <sorear> intuition: the working representation of a SA PNR between each tick is equivalent to a bitstream, and it should be possible for it to use roughly the same amount of space

22:02 <awygle> the first half is close to true, i'd say it may be slightly _less_ information than a bitstream because routing may not be fully defined

22:03 <awygle> the second half is probably true but i don't think you'd _want_ to represent it in a way that allowed it to take up the same amount of space

22:18 mumptai has quit [Quit: Verlassend]

22:25 <sorear> there's a lot to be said for fitting in cache

22:33 <prpplague> i always prefer cash

22:37 <awygle> cache money

22:37 <awygle> i should calculate how much bit-twiddling equals a cache miss

22:37 <awygle> at L1 L2 and L3

22:37 <awygle> just so i know

22:39 * prpplague takes a break from board assembly to read the channel log

23:32 wpwrak has quit [Read error: Connection reset by peer]

23:33 wpwrak has joined ##openfpga

23:49 Bike has joined ##openfpga