<whitequark> ok, one bug fixed, one more left
<pie__> qu1j0t3, that would be tragic
<qu1j0t3> pie__: :)
genii has quit [Remote host closed the connection]
<whitequark> daveshah: it is DONE
<whitequark> daveshah: ^
<whitequark> "write your own logic optimization program", he said.
<whitequark> "we are not a billion dollar company", he said.
<whitequark> well I am not one either :P
futarisIRCcloud has joined ##openfpga
<adamgreig> woo! first light on my first fpga pcb
<adamgreig> and my stupid bodged programmer worked first time too
<whitequark> nice!!!
<adamgreig> also the phy has link so I can't be far off blatting network packets :p
<swetland> nice!
<swetland> I survived kicad schematic creation yesterday, but still need to actually lay out a PCB and sent it out... https://pbs.twimg.com/media/DtkQ02BUcAA34cm.jpg:large
<adamgreig> i sent these to jlcpcb and they got the boards to me in one week, incredible
<whitequark> is that a switch?
<adamgreig> yes but not an ethernet switch
<whitequark> oh
<whitequark> what kinda switch
<adamgreig> don't really know yet
<adamgreig> tbc
<adamgreig> going to see how far i can push ice40's pseudo-lvds down utp
<whitequark> that's a lot of FPGAs
<adamgreig> with some sort of 8b10b and etc
<adamgreig> i want to make a circuit switched network
<whitequark> uhhhhh
<adamgreig> yes yes "not far"
<whitequark> ice40 cannot do clock recovery
<adamgreig> well
<whitequark> it's pointless to do 8b10b for the most part
<adamgreig> I was hoping to use ddr on the gpio, ice40 at 100MHz, and the data clock is less
<whitequark> I mean, you've ran at least two pairs to each port, right
<adamgreig> so you can oversample
<adamgreig> yea, there's a tx and an rx pair on each port
<whitequark> that sounds like it'll fail but I'm curious.
<swetland> could do a dedicated diffpair clock and one or more data lanes then, no?
<swetland> similar to MIPI CSI
<adamgreig> (and power on the other pairs)
<whitequark> yeah that's what I would do
<whitequark> clock pair
<whitequark> could still do half-duplex I guess
<adamgreig> not easily; the ice40 lvds is hard wired to tx or rx
<whitequark> oh right
<whitequark> ok
<adamgreig> you don't reckon you could do cdr with 4x oversampling?
<adamgreig> not looking to push bandwidth or distance records here really
<adamgreig> already not going to have equalisation and the diff voltage is small too
<whitequark> i mean... with that level of oversampling, you could run, like, uart
<adamgreig> well sure :p
<whitequark> what's the point in 8b10b if you're not actually doing clock recovery?
<whitequark> is it capacitively coupled even?
<adamgreig> guess it still gives you some sort of framing
<adamgreig> no, dc
<whitequark> so
<whitequark> you don't need dc balance
<whitequark> you don't need guaranteed transitions
<whitequark> you just use it as a framing with 20% overhead
<whitequark> this is literally uart but more complex
<adamgreig> you can see how that's appealing, though?
<whitequark> no?
<adamgreig> fun to write an 8b10b enc/dec
<whitequark> that's just a LUT
<adamgreig> hmm
<whitequark> i'd probably put it into a BRAM, even
<adamgreig> well in any event the objective here was strictly to make some fpgas and experiment with connecting them
<adamgreig> so really anything goes
<swetland> did that on ZYBO to drive HDMI. sadly without OSERDES you can't really get the data you need for something like that
<adamgreig> anyway uart also has 20% overhead ;)
<adamgreig> if I'm going to transmit ten bits for each eight data bits, 8b10b seems like it'l be more fun than a start and stop bit
<whitequark> uart gives you higher clock rate
<whitequark> and less device utilization
<whitequark> with everything else being equal
<swetland> I think the only advantage to a symbol based system is that if you plug together two sides where one is constantly chattering you might avoid character mis-alignment
<whitequark> indeded
<whitequark> *indeed
<adamgreig> honestly I'd do it just because I've implemented uarts in fpgas before
<adamgreig> step one is the ethernet side anyway
<swetland> if you want to run 100Mbps over a reasonable distance, ethernet PHYs are about $1, and RJ45 + Magnetics are about $4 (qty 1), and RMII is a 2bit/clk 50MHz interface, very easy to talk to with FPGAs
<adamgreig> totally, I already have ethernet on this for "uplink"
<adamgreig> the objective for the other side is having a synchronised system clock and circuit switched data though
<adamgreig> which okay you could just send udp packets and maybe even use ptp
<swetland> yeah, there is plenty of knowledge about how to do clock sync
pie___ has joined ##openfpga
pie__ has quit [Ping timeout: 268 seconds]
egg|egg is now known as egg|zzz|egg
azonenberg_work has quit [Ping timeout: 245 seconds]
unixb0y has quit [Ping timeout: 268 seconds]
unixb0y has joined ##openfpga
<whitequark> siiiiigh
<whitequark> so i'm gonna write a techmapper i think.
Miyu has quit [Ping timeout: 272 seconds]
catplant has joined ##openfpga
catplant has quit [Ping timeout: 250 seconds]
rohitksingh_work has joined ##openfpga
Bike has quit [Quit: Lost terminal]
prpplague has joined ##openfpga
<prpplague> anyone know if the details for orconf2019 have been announced?
catplant has joined ##openfpga
catplant has quit [Ping timeout: 250 seconds]
azonenberg_work has joined ##openfpga
emeb has quit [Quit: Leaving.]
<whitequark> daveshah: lmao what the fuck
<whitequark> naive techmapping: 51 LUT
<whitequark> naive techmapping followed by my opt_lut: 18 LUT
<whitequark> abc: ............. 17 LUT
<whitequark> this isn't even in C, this mostly just uses Yosys techmap pass...
azonenberg_work has quit [Ping timeout: 250 seconds]
<swetland> ooh, I need to give this a try. yosys is using 60% more LUTs than icecube2
jevinskie has joined ##openfpga
<whitequark> swetland: grab my other PR
<whitequark> and try doing synth_ice40 -relut
<swetland> 717?
<whitequark> 717?
<whitequark> oh yeah
<whitequark> that one
jevinski_ has quit [Ping timeout: 268 seconds]
_whitelogger has joined ##openfpga
jevinski_ has joined ##openfpga
<swetland> ERROR: timing analysis failed due to presence of combinatorial loops, incomplete specification of timing ports, etc.
genii has joined ##openfpga
<swetland> w/ tot+717 (vs tot which works without complaint)
<whitequark> interesting
<whitequark> can you try to reduce the design?
jevinskie has quit [Ping timeout: 250 seconds]
<whitequark> or, can you post it in the issue? yosys json or something like that
<swetland> I can toss the json up right now and can poke at it a bit later and see if I can find a smaller failure case
<whitequark> sure, that works
<swetland> actually is the json (output from yosys) useful here?
<whitequark> I think so yeah
<swetland> interesting. only fails if I infer this 256x16b ram instead of invoking SB_RAM40_4K manually.
pie___ has quit [Quit: Leaving]
<whitequark> interesting
<whitequark> if you instantiate, does the design work?
<swetland> provided I don't use -relut it does work
<swetland> with -relut nextpnr fails
<swetland> without -relut both inferred and instantiated version of the design works. with -relut instantiated version will not pass nextpnr, but inferred version does and also works
<whitequark> what fails exactly?
<whitequark> timing?
<whitequark> wait
<whitequark> with -relut instantiated version will not pass
<whitequark> nextpnr, but inferred version does and also works
<whitequark> I'm confused
<whitequark> didn't you just say the opposite of this?..
<swetland> sorry, I may have misspoke. if I infer the ram, nextpnr succeeds whether or not I used -relut with yosys synth_ice40 and both resulting bitfiles work
<swetland> if I instantiate the ram nexpnr only succeeds if I do not use -relut, and the resulting bitfile works
<whitequark> the plain one has *more* LUTs? that seems like an obvious bug
genii has quit [Remote host closed the connection]
<swetland> plain.asc is the design built without using -relut. relut.asc is the design built with -relut
<whitequark> er
<whitequark> I meant
<whitequark> relut has *more* LUTs in your paste
<whitequark> relut: 1248, plain: 1220
<swetland> that the relut version *also* has an additional bram is even weirder
<whitequark> huh
ZipCPU|Laptop has joined ##openfpga
rohitksingh_work has quit [Read error: No route to host]
rohitksingh_work has joined ##openfpga
azonenberg_work has joined ##openfpga
rofl__ has quit [Read error: Connection reset by peer]
rohitksingh_work has quit [Read error: Connection reset by peer]
rohitksingh_work has joined ##openfpga
jcarpenter2 has joined ##openfpga
rohitksingh_work has quit [Ping timeout: 240 seconds]
rohitksingh_work has joined ##openfpga
f003brv has joined ##openfpga
<f003brv> Hello friends
catplant has joined ##openfpga
<f003brv> hi catplant
f003brv has quit [Quit: Page closed]
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<daveshah> whitequark: Awesome
<daveshah> Seems my second observation about abc being like vpr was right...
<daveshah> I guess getting timing up is the next challenge. One way to approach that might be trying to balance critical path length when merging LUTs
catplant has quit [Ping timeout: 272 seconds]
<tnt> Is there any existing to feedback a pnr result back into synthesis to guide it for a second pass to know where to optimize better ?
<tnt> "existing ways"
<daveshah> No, not yet
<daveshah> You could go all the way back through icebox_vlog
<daveshah> But that's probably going to make things a lot worse
mumptai has joined ##openfpga
GuzTech has joined ##openfpga
catplant has joined ##openfpga
<tnt> whitequark: mmm, I get what(): Assertion failure: cout_port.net != nullptr (/tmp/ice40/nextpnr/ice40/chains.cc:92)
<tnt> (with your -relut option)
<daveshah> Sounds like there might be a dangling carry somewhere
<daveshah> Surprised that Yosys' own optimisations haven't dealt with that
<tnt> Mmm, I don't actually see anything wrong in the .json
<tnt> I also have your carry chain pull request in that tree.
<daveshah> tnt: Maybe try without that PR?
<tnt> yeah, it works without that PR.
<daveshah> Can you post the JSON somewhere?
<tnt> nextpnr-ice40 --up5k --package sg48 --json top.json --pcf top-icebreaker.pcf --asc top.asc --freq 60 --opt-timing -
catplant has quit [Quit: WeeChat 2.2]
catplant has joined ##openfpga
<tnt> Mmm, the DFFs have a different control set for the different bits of the alu result.
<daveshah> That should still be dealt with
<daveshah> looking niw
<daveshah> I think I've pushed a fix
<daveshah> can you test that it actually functions?
jevinski_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<tnt> daveshah: yeah, seems to be working fine.
<tnt> tx !
<daveshah> no problem
<tnt> On another design I also get the comb loop issue raised above (with -relut).
<daveshah> Can you post that netlist too?
<tnt> Sure done. Same location, I overwrote the files.
<daveshah> cheers
<daveshah> tnt: The problem seems to be that `top.$abc$4480$n629` is undriven
<daveshah> Running `setundef -undriven -zero` on the netlist does cause nextpnr to accept it
<daveshah> I suspect that an undriven wire feeding logic is a problem
<tnt> of course with such explicit names, I directly know where that is in my design :p
<tnt> ok, yeah, I see what that signal to be. Some pretty big comb path with adders and muxes ... so exactly what relut should modify.
mumptai has quit [Remote host closed the connection]
jevinskie has joined ##openfpga
catplant has quit [Quit: aaaaaaa]
<whitequark> daveshah: hm, any idea what should I do in opt_lut to fix that?
<whitequark> call setundef -undriven -zero? something else?
<tnt> Minimal test case : https://pastebin.com/wxvw9VCN
<daveshah> whitequark: It depends where they are coming from
<daveshah> If they are genuinely floating, then you should modify the LUT function to remove that input
<whitequark> mmmm, okay
<q3k> whitequark: does the abc-less boneless tech mapping flow then run (in yosys?) any logic minimization step?
<q3k> i'm not even sure what kind of work does 'opt' do...
<daveshah> afaik opt is a mix of more advanced coarse-grain optimisations, and some generic stuff like trimming dead logic or merging equivalent stuff
<daveshah> I don't think opt does any real low-level logic optimisation though
<q3k> that's what abc did, right?
<whitequark> q3k: what is "logic minimization"
<whitequark> exactly
<whitequark> q3k: like removing redundant LUTs?
<q3k> whitequark: no, something like espresso
<tnt> more minimal ... https://pastebin.com/xeVwqGri
<q3k> whitequark: or you know, karnaugh maps if you did that manually :)
<whitequark> oh!
<q3k> i'm not sure it makes sense to run that per-lut (especially on narrow 4luts)
<whitequark> yes, probably not per lut
<tnt> per-lut ... at best you'd find useless inputs.
<q3k> yeah
<whitequark> might still be valuable
<whitequark> but not very generic
<tnt> I'm not sure how a karnaugh maps helps to map a N input comb function to a minimal amount of LUT4 (and then ... what do you consider minimal, depth ? or total # of luts)
catplant has joined ##openfpga
<whitequark> tnt: can you give me the json that needs setundef?
<daveshah> whitequark: Couldn't resist experimenting with the topological ordering idea
<daveshah> This now gives identical results to ABC on the big and case
<daveshah> going to see how it affects larger designs
<whitequark> daveshah: what the hell, nice
<whitequark> I was just opening my editor...
<tnt> whitequark: I posted https://pastebin.com/xeVwqGri
<tnt> whitequark: it's the verilog source that creates the issue
<tnt> (well ... a minimal test case)
<whitequark> tnt: oh thanks!
<daveshah> Doesn't help boneless much sadly
<whitequark> daveshah: oh it's okay, boneless has a real awful alu i think
<whitequark> i mean
<whitequark> this whole thing grew out of me trying to make a less bad alu for boneless
<whitequark> and discovering that yosys generates absurdly bad output for it
<whitequark> and fixing that
<daveshah> boneless is down to 713 vs 745
<daveshah> without abc
<whitequark> that's actually pretty good
<whitequark> that's approaching abc quality, which is 669
<daveshah> 482 for me?
<whitequark> oh, LUTs
<whitequark> not total cells
<daveshah> yeah
<whitequark> ok sure
<whitequark> still a nice improvement
<whitequark> what about -abc -relut?
<daveshah> gives me 463 LUTs
<daveshah> with the topological ordering, it seems to converge (in the noabc case) after two runs of -relut
<daveshah> don't know if that is different to before
<whitequark> oh, that's a bug i'm about to fix
<whitequark> it should converge immediately
<tnt> Damn, the default yosys output for that minimal example is really bad ... I mean, there are 3 LUT-1 following each other ...
<daveshah> picorv32 does pretty well without abc. 1953 LUTs without compared to 1538 LUTs with (so only about 27% overhead)
* daveshah eats hat....
rohitksingh_work has quit [Ping timeout: 268 seconds]
rohitksingh_work has joined ##openfpga
<daveshah> but Fmax is 16MHz compared to 56MHz with abc
<whitequark> yes, I've noticed that Fmax gets pretty bad
<whitequark> there should probably be some kind of K-map based (?) logic rebalancing (?)
<daveshah> Yes, it's definitely the rebalancing that's the issue
<whitequark> I mean, that could probably be done naively, even
<daveshah> This might be as simple as a heuristic when merging LUTs to start with
<whitequark> oh, yeah!
<daveshah> Just try and merge the one that with the larger path length
<whitequark> bleh, probably need to base gate2lut PR on opt_lut PR...
<whitequark> kind of messy
<whitequark> or, hm
<whitequark> hmmmm
m4ssi has joined ##openfpga
<whitequark> daveshah: take a look at what i just pushed
<daveshah> yeap
<whitequark> converges immediately now?
<whitequark> or did i miss something?
<whitequark> seems to converge right away here
<daveshah> Yes, looks good
<daveshah> I think that should always converge fine now
<whitequark> let me add some stats to opt_lut while I'm at it.
<whitequark> oh, this fails a test...
<whitequark> ah I think this is the same issue tnt hits
<whitequark> daveshah: ok, figured the cause i think
<whitequark> daveshah: Found top.$abc$163$auto$blifparse.cc:492:parse_blif$187 (cell A) feeding top.$auto$alumacc.cc:474:replace_alu$19.slice[2].adder (cell B).
<whitequark> Cell A is a 1-LUT. Cell B is a 3-LUT. Cells share 0 input(s) and can be merged into one 3-LUT.
<whitequark> Not combining LUTs into cell A (cell B has attribute \lut_keep).
<whitequark> Combining LUTs into cell B.
<whitequark> Connecting input 0 as \d [2].
<whitequark> Leaving input 1 as \c [2].
<whitequark> Leaving input 2 as $abc$163$n52.
<whitequark> Leaving input 3 as $auto$alumacc.cc:474:replace_alu$19.C [2].
<whitequark> this is... an off by one of some sort?
<whitequark> ok I think I see
jevinskie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<whitequark> tnt: can you recheck?
<whitequark> I think I fixed all the bugs you've hit
<tnt> whitequark: sure
<tnt> whitequark: seems to work :) builds and the bitstream appear to operate properly on the device.
<whitequark> wonderful :D
<whitequark> tnt: what about timing? how bad is it?
<tnt> It really didn't change anything wrt to timing.
<tnt> I mean on that particular design it only combined 4 LUTs out of 260.
<whitequark> ah ok
<tnt> I tried another where it combined a bit more LUTs but they were not in the critical path either.
<tnt> whitequark: you only consider LUT -> LUT connections where there is only 1 user of the signal ?
<whitequark> tnt: yes
<whitequark> it might make sense to consider more than that, e.g. 1-LUTs can *always* be folded
<whitequark> yeah, definitely, that would be a significant improvement
<tnt> yeah, I was looking at a couple netlist and I saw plently of cases or 1 or 2 luts feeding other 2/3 luts ...
<tnt> the original one has to be kept because sometime the signal goes else where that can't be folded, but that would still be cutting the path for the other signals, at the expense of a higher fanout ...
<whitequark> right
<whitequark> hm it might make sense to do that as a part of a more general pass...
<cr1901_modern> How can you merge a 1-LUT and a 3-LUT into a 3-LUT when none of the inputs are shared?
<whitequark> cr1901_modern: no, it's a different case
<whitequark> it's a case of 1-LUT feeding a 3-LUT and something else
<whitequark> merging 1-LUT into this 3-LUT *and* keeping the original 1-LUT trades fanout for logic levels
<whitequark> this should be almost always advantageable
<whitequark> gonna try that soon
s_frit has joined ##openfpga
<daveshah> whitequark: Small issue with the LUT merging stuff
<daveshah> If a CARRY input is 1'b0, then the corresponding LUT input needs to stay 1'b0 too
rohitksingh_work has quit [Read error: Connection reset by peer]
<daveshah> it seems this is not being preserved and creates a monstrous carry chain full of legalisation LCs which breaks nextpnr on picorv32
<whitequark> lol
<cr1901_modern> I understand the fanout decreases if it's merged, but what do you mean by "trades fanout for logic levels"?
<whitequark> can you reduce a testcase?
<daveshah> sure
<whitequark> cr1901_modern: fanout *increases*
<cr1901_modern> How does it increase? 1-LUT is no longer driving the 3-LUT if it's merged.
<cr1901_modern> Oh, whatever was driving the 1-LUT has its fanout increase tho...
<whitequark> yes.
<whitequark> actually
<whitequark> in case of 1-LUT that doesn't increase the fanout at all
<whitequark> it just moves things around
* cr1901_modern nods
<whitequark> daveshah: can you rebase your branch btw?
<whitequark> I refactored opt_lut a bit, to use a worker
<cr1901_modern> so what did you mean by the "logic levels" part then?
<whitequark> this should go nicely with timing reports... once proc learns to not assign some dumbass internal names
<whitequark> that is on my shortlist
<whitequark> i want to have ZERO $fuckyou$ names in the reports.
<tnt> cr1901_modern: well imagine sig_in -> LUT1 -> LUT3 -> sig_out .. if you merge the LUT1 function into the LUT3 and you get sig_in -> LUT3 -> sit_out (and in parallel you may still have sig_in -> LUT1 -> other places that signal went).
<tnt> cr1901_modern: you reduced the depth of the path from sig_in to sig_out but you increased the sig_in fanout.
* cr1901_modern nods
<whitequark> tnt: however you decreased LUT1 fanout
<whitequark> so in this case it's even
<whitequark> now, if you are merging LUT2 to LUT3, it is not as clear cut
<tnt> sure ... but the delay on the net depends on the fanout of that net, not the total fanout of the whole fpga.
<tnt> so propagation time for sig_in are a bit worse.
<tnt> (tbh, I'm not sure if that works like that on the ice40, I'm just basing that on my experience of xilinx where net driving lots of loads are slower)
<sorear> whitequark: it completely destroys buffer trees though
<daveshah> The I2(1'b0) should be preserved so the carry and LUT can be packed
<daveshah> will sort out rebase in a bit
<whitequark> sorear: can you elaborate?
<whitequark> daveshah: so... the constraint here is that I2 must be the same as I2.
<whitequark> er.
<whitequark> lut_i.I2 must be the same as carry_i.I1.
<sorear> *finds keyboard*
<sorear> let's say you have a signal with a fanout of 256. maybe a clock or a reset
<daveshah> whitequark: yes, ditto with I1 and I0
<sorear> an electrical fanout of 256 will be *extremely slow* because you have far more capacitance on the output than a gate is designed to drive
<sorear> but if you turn it into a 4-level tree of inverters, each with a fanout of 4, you have a faster circuit
<whitequark> daveshah: ooooh, so effectively... the inputs bound to SB_CARRY should not be considered "free" like normal constant inputs
<whitequark> and should not be used for reencoding
<whitequark> that's definitely doable
<daveshah> Yes
<sorear> of course multipass optimizers quite frequently do "pessimize something in pass A that you know pass B will clean up", and it probably makes more sense to do this kind of selective duplication after logic optimization (possibly even combined with placement)
<daveshah> Probably best as a attr on SB_CARRY
<whitequark> daveshah: are you sure?
<sorear> so I'm not saying abc/relut would be *wrong* to do this, merely that it's not *prima facie optimal8
<whitequark> daveshah: oh hm, is this because SB_CARRY can be optimized out?
<whitequark> sorear: but modern FPGAs have routing buffers instead of routing pass transistors
<whitequark> so in effect you have buffer trees whether you want it or not, no?
<sorear> yes, but I was on a terrible phone keyboard and thought "buffer tree" sufficiently implied "ASIC"
<whitequark> oh!
<whitequark> I have no idea about anything related to ASICs
<whitequark> and besides
<whitequark> opt_lut is not intended for ASIC flow?
<whitequark> in fact, *abc* is probably good for ASIC flow, it does area optimizations and stuff
<whitequark> I mean, I assume it is good for at least something. definitely not FPGA flow.
<tnt> lol
<sorear> pretty sure I've heard boomcpu complain about critical paths and net naming
<whitequark> there are 2 kinds of people: those who complain about critical paths and net naming, and those who suffer silently.
<whitequark> daveshah: now that I think about it... might be a better idea to ditch attributes entirely
<whitequark> and have something like...
<whitequark> -dlogic SB_CARRY:1=I0:2=I1:3=CI
<whitequark> daveshah: this could help ecp5 too, maybe?
Miyu has joined ##openfpga
scrts has joined ##openfpga
rohitksingh has joined ##openfpga
rohitksingh has quit [Ping timeout: 250 seconds]
<daveshah> whitequark: looks good
<daveshah> The problem with ECP5 is that the CCU2C carry primitive is 2 LUT4s with the output XORd with carry for the sum output and 2 LUT2s sharing inits with the bottom of the LUT4s plus some add and ors to generate carry
<daveshah> It's a pretty tricky one to optimise or even split
<whitequark> ohhhh I see
<daveshah> if you are curious
<daveshah> I think ABC might have some support for this, but being ABC the documentation for that sort of thing is just two words "fuck off"
<whitequark> lol
<whitequark> i feel like this is a good job for an SMT solver?
<daveshah> Yes, probably
<whitequark> anyway, halvarflake is reading some papers on my behalf
<daveshah> nice
<whitequark> apparently, there is some obscure connection between equation system solvers on GF(2) and Quine/McCluskey algorithm
<whitequark> they are equivalent or something??
<whitequark> and halvarflake's MSc was on the former...
genii has joined ##openfpga
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
<daveshah> whitequark: rebased commit, hopefully didn't break anything in a somewhat messy merge
<whitequark> ok, I added dlogic recognition
<whitequark> now just need to wire it to avoid disturbing those
<whitequark> ok, I *think* I'm done.
<whitequark> daveshah: oh holy shit
<whitequark> this *really* improves timing *dramatically*
<whitequark> like by 10 MHz
<daveshah> sweeet
<daveshah> I guess the timing problems before might have been excessive feedthroughs being inserted
<whitequark> yeah
<whitequark> let me check with -noabc too
<daveshah> The Yosys/nextpnr changes over the last month must mean we are close to a 30-40% improvement in timing overall by now
<whitequark> that's a lot
<whitequark> this would make UP5K Glasgow actually usable :D
<daveshah> next big step will be vpr-style criticality driven placement
<daveshah> I might play with that now in fact
<daveshah> I'm not sure if that will actually lead to an overall improvement in performace, or just make the opt-timing pass redundant
ZipCPU|Laptop has quit [Ping timeout: 245 seconds]
<daveshah> The other thing I want to try is swapping macros, at the moment I think the inability to perform swaps after constraint legalisation limits Fmax with carrys
<daveshah> without macro swapping support, LUTCascade will probably cause a step back in QoR too
<whitequark> daveshah: ah no, I misread the report
<daveshah> :(
<whitequark> doesn't seem to lead to that much of an improvement in timing, sadly
<daveshah> definitely not the first time I did that
<whitequark> ok
<daveshah> once I remember thinking that I had like a 30% increase in Fmax
<daveshah> turns out I was compared hx8k against lp8k
<tnt> whitequark: is it on your repo already ?
<tnt> daveshah: lol
<whitequark> lol
<sorear> improve timing 30% with this one weird trick
<whitequark> daveshah: can you check if this actually works as intended?
<whitequark> just pushed
<daveshah> sure
GuzTech has quit [Quit: Leaving]
<whitequark> daveshah: I looked at your MCVE and it looks like there's no actual change if I run opt_lut on it at all?
<whitequark> I mean
<whitequark> it has one LUT
<whitequark> opt_lut would not change it...
<daveshah> It should have two LUTs
<daveshah> opt_lut was previously illegaly merging those two
<whitequark> oh, `a+b`
<whitequark> oh sorry
<daveshah> yeah
<whitequark> let me recheck
<daveshah> 2 LUT4s, looks good
<whitequark> hm, the log is a bit confusing
<whitequark> let me tweak it a bit
<daveshah> picorv32 example seems to work fine too now
<daveshah> :)
<whitequark> :D :D
<whitequark> so, what changed? fmax before/after? lc before/after?
<whitequark> is this -noabc or?
<daveshah> No -noabc
<daveshah> But a big jump in timing
<daveshah> from 67MHz average without -relut to 72MHz with
<whitequark> ooooh wow
<daveshah> let me test on a soc design to make sure it still works on hardware
<whitequark> I test on hardware periodically, seems to work still
<daveshah> cool
<daveshah> just want to test it together with my nextpnr carry changes
<daveshah> That example that's at 72MHz now was pretty much stuck around 52MHz for a long time
<whitequark> yeah, definitely curious
<whitequark> oh wow
<daveshah> like until a few weeks ago
<daveshah> I don't think I even have min_ce_use in there, so it can probably get even better
<daveshah> But I know opt-timing and the nextpnr carry changes each added about 10%
<whitequark> what is opt-timing?
<daveshah> It's a post-placement path that uses a fairly odd algorithm to minimise the critical path
<daveshah> *post-placement pass
<daveshah> basically, a BFS of neighbour bels of critical path bels
<daveshah> hardware test is working (design is a picorv32 soc, qspi controller, and CSI-2 interface if you are curious)
<daveshah> that design is now getting 24MHz on an ultraplus
<daveshah> which is pretty good
<whitequark> Cell A is a 3-LUT with 3 dedicated connections. Cell B is a 2-LUT.
<whitequark> Cells share 0 input(s) and can be merged into one 4-LUT.
<whitequark> Not combining LUTs into cell B (combined LUT wider than cell B).
<whitequark> Combining LUTs into cell A.
<daveshah> oops, forgot to add relut to the syn script for that hardware test
<daveshah> let me actually check again
<daveshah> yep, still works
<whitequark> :D :D
<whitequark> any change in fmax?
<daveshah> dropped to 22MHz
<whitequark> or is it just size?
<whitequark> huh
<whitequark> average?
<daveshah> this is one run
<daveshah> unlike the previous test
<daveshah> let me run some proper 16-run comparisons on this design too
<daveshah> size drops from 3371 LCs to 3311 LCs
dingbat has quit [Quit: Updating details, brb]
dingwat has joined ##openfpga
<tnt> I tried it on a couple of designs here (over 10 runs each). Doesn't seem to affect F_avg / F_max (it's within the noise ... < 1 MHz variation on a 70 MHz design)
dingwat has quit [Client Quit]
dingwat has joined ##openfpga
<daveshah> I dare say, this is where a Threadripper was a good buy :P
<tnt> ~ 5 % less LUTs
<whitequark> I'm guessing the critical path is some sort of long carry chain
<daveshah> difference is in the noise here too
<daveshah> with relut: min = 23.45 MHz, avg = 25.30 MHz, max = 27.32 MHz
<daveshah> without relut: min = 24.22 MHz, avg = 25.35 MHz, max = 27.05 MHz
<daveshah> let me check with min_ce_use too
<daveshah> whitequark: certainly a big part of it
<daveshah> There's some disturbingly long arcs in there too like (14,23) -> (23,16)
<whitequark> yeah
<daveshah> This is hopefully improveable with a better placer
<whitequark> hm, going to merge your topological ordering stuff now
<daveshah> thanks
<whitequark> daveshah: that... actually pessimizes boneless.
<whitequark> by 9 LTUs
<whitequark> *LUTs
<whitequark> let me push into a branch...
<whitequark> daveshah: pull from opt_lut_topo_noabc
<daveshah> Maybe it is not the best way forward
<whitequark> think you can take a look at the reason?
<daveshah> sure
<daveshah> Think it might have been a merge issue
<whitequark> oh?
<daveshah> Accidentally left in a line of old code
<daveshah> if (lutA_output_ports.size() != 2)
<daveshah> continue;
<daveshah> before the loop that iterates over ports
<daveshah> but now I'm getting an attribute-related assert fail
<daveshah> *param-related
<daveshah> almost as if there are LUTs without init
<whitequark> hm, odd
<daveshah> actually, looks like that if statement should be there
<whitequark> yes
<daveshah> I can't see any other problem
<whitequark> I think it's required rn
<whitequark> hm, ok
<daveshah> I fear that topological ordering is just not always optimal
<whitequark> I'm going to try and massage LUTs into a form that opt_merge can deal with
<daveshah> I was thinking too much about the specific tree-of-gates case
<daveshah> now that the carry issue is fixed, the small picorv32 test is doing much better with noabc btw - from 19MHz up to 45MHz based on one run
<daveshah> didn't have a frequency constraint, oops
<daveshah> running 8 runs with --freq 50 and --opt-timing gives
<daveshah> min = 51.77 MHz, avg = 54.53875 MHz, max = 57.28 MHz
<daveshah> not bad at all
<whitequark> wow
<whitequark> so... -noabc picorv32 now is the same as abc picorv32 1 month ago?
<daveshah> yeah
<whitequark> niiiiice
<tnt> daveshah: that's not on a up5k is it ?
<daveshah> no, hx8k
<whitequark> hell no
<tnt> :)
<whitequark> up5k can barely run a vga sync gen at 50 MHz
<q3k> i mean, you gotta pipeline the fuck out of it
<whitequark> oh?
<q3k> also maybe it's a bit better now, i haven't tried in a while
<whitequark> out of what
<whitequark> syncgen?
<q3k> yes
<whitequark> picorv3@?
<tnt> most of my 5k design are > 60 MHz so far ... I ran a MIPI-DSI display at 80 MHz.
<whitequark> it's... just counters?
<q3k> vga on up5k
<q3k> yep
<whitequark> how do you pipeline counters like that
<q3k> i mean, for me the longest chain was counter -> comparison -> pixel_{x,y}
<whitequark> oh i don't do comparisons
<q3k> so if you just register counter -> comparison it's much easier
<whitequark> or rather
<whitequark> yes
<q3k> (where comparison was == 0 iirc)
<whitequark> that's what i'm already doing
<whitequark> i get like 60 mhz but barely
<q3k> but yeah, the counter chain for vga barely fit
<q3k> yes
<daveshah> I would do a comparison with the vendor tools at this point
<daveshah> But my icecube license expired and I still haven't heard from them after requesting a new one yesterday
<daveshah> So the vendor tools are disqualified and get 0MHz
<q3k> heh
<whitequark> lmao
<daveshah> ∞% better
<tnt> daveshah: lol
<tnt> But I was thinking of running a couple design through icecube to compare because it must be getting pretty close by now.
<whitequark> daveshah: so... can abc do what opt_lut -dlogic does?
<whitequark> like, unnderstand the constraints added by SB_CARRY
<daveshah> I think it theoretically can
<whitequark> hmm
<daveshah> Or at least second-hand rumours say it can
<whitequark> but no one knows how to make it do that? :D
<daveshah> Including understanding the logic of SB_CARRY
<daveshah> yeah, basically
<whitequark> hm
<sorear> how ice40-specific is the new pass?
<whitequark> sorear: not at all.
<whitequark> it supports any hard logic attached to LUT inputs.
<daveshah> The only ice40-specific thing is assuming all LUTs are the same size
<whitequark> daveshah: it does not assume that!
<daveshah> For ECP5 and Xilinx you really want some way of mapping larger LUTs with an increasing cost
<whitequark> I spent a lot of time making sure it would e.g. pack into a wider LUT if it's possible.
<whitequark> what it does *not* do is combine to larger LUTs
<daveshah> yep
<whitequark> but if ou *already* mapped to larger LUTs it will optimize those.
rohitksingh has joined ##openfpga
<whitequark> daveshah: the ecp5 cells_sim.v is so weird
<whitequark> strangely out of order and weird verilog
<whitequark> hm, nevermind
<daveshah> The out of order was mostly because mithro also wanted split files for SymbiFlow stuff
<daveshah> And I think I started it split then combined it
<daveshah> what verilog is weird?
<sorear> so you map to larger LUTs, instead of mapping to LUTs, PFUMUXs, and L6MUXs?
<sorear> s:2nd/LUTs/LUT4s/
<daveshah> More precisely, we tell ABC to map to larger LUTs then split to small LUTs and muxes with a techmap rule
<daveshah> This is because the documented subset of ABC doesn't map muxes directly
<daveshah> I understand this is definitely possible
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fp1sF
<_whitenotifier> [whitequark/Glasgow] whitequark b4deab5 - arch.boneless: fix typo.
<whitequark> "the documented subset of ABC"...
<mithro> daveshah: we use the ABC map to LUT8s and then a techmap to split into LUT6s + F7MUX + F8MUX on Xilinx
<daveshah> yes, it's the same on the ECP5 just LUT4..LUT7 instead of LUT6..LUT8
<daveshah> In fact I think that's where I based my implementation on
emeb has joined ##openfpga
<mithro> Anyone know much about LTO and soft-float? I'm getting "undefined reference to `__divsi3'" when building with LTO
<cr1901_modern> mithro: I would assume the two are in fact unrelated and that __divsi3 isn't provided by compiler_builtins
<cr1901_modern> but LTO is creating an opt that uses it
<sorear> __divsi3 isn't soft float, it's soft-integer
<sorear> soft float is __divsf3
<sorear> either way it's a symbol from libgcc
<sorear> is -lgcc somehow getting lost from the LTO build?
<mithro> compiler_rt/lib/builtins/divsi3.c seems to provide it?
<sorear> compiler_rt is clang's version of libgcc and provides most of the same stuff
<mithro> sorear: In migen we seem to be linking against that instead of libgcc
<sorear> (i'm not very familiar with LTO)
parport0 has quit [Ping timeout: 272 seconds]
parport0 has joined ##openfpga
<mithro> sorear: From what I can see is that LTO is dropping the __divsf3 symbol because it is "unused" until a later pass which generates the symbol?
Zorix has quit [Ping timeout: 268 seconds]
rohitksingh has quit [Ping timeout: 244 seconds]
<sorear> possible?
<sorear> what toolchain are you using?
<mithro> sorear: gcc
m4ssi has quit [Remote host closed the connection]
s_frit has quit [Remote host closed the connection]
s_frit has joined ##openfpga
* shapr hugs mithro for so much awesome
<mithro> shapr: ?
<shapr> mithro: TinyFPGA is the specific awesome of the moment
<mithro> shapr: I didn't actually do the TinyFPGA, that was tinyfpga
<shapr> ok
* tinyfpga hugs shapr and mithro
<shapr> in that case, there are more good reasons for supportive hugs :-)
* shapr hugs tinyfpga
<mithro> sorear: Well, is I install the lm32 toolchain with libgcc and use -lgcc it links....
<mithro> sorear: I wonder if gcc handles libgcc in some special way
<cr1901_modern> lm32 is configured with --disable-libgcc, fwiw
<whitequark> daveshah: LMAO
<whitequark> ok, let me verify because this is absurd
<daveshah> what is happening?
<whitequark> daveshah: ahahaha
<whitequark> so
<whitequark> abc cannot merge two identical LUTs
<whitequark> with inputs in different order.
<daveshah> lol
<whitequark> I've just proven them identical with equiv_simple to be extra sure that I didn't fuck this up
<whitequark> I did not abc is just that bad
<whitequark> daveshah: even better
<whitequark> if I XOR these cells, so they are *definitely* in the same comb network
<daveshah> yes
<whitequark> it STILL cannot figure out that these are the same LUTs
<whitequark> i mean? what? why are we using this??
<whitequark> oh I see
<whitequark> the cause of this is it doesn't understand what SB_LUT4 is
<whitequark> let me try again
<whitequark> daveshah: nevermind, if I unlut them abc manages to figure it out
<daveshah> that does make more sense
<whitequark> so it's just the techmapping/techunmapping issue
bubble_buster has quit [Ping timeout: 252 seconds]
<whitequark> ok, going to add canonicalization to opt_lut now.
<whitequark> still not sure if there's some more general approach
pointfree has quit [Ping timeout: 264 seconds]
digshadow has quit [Ping timeout: 264 seconds]
jeandet has quit [Ping timeout: 264 seconds]
bubble_buster has joined ##openfpga
pointfree has joined ##openfpga
jeandet has joined ##openfpga
digshadow has joined ##openfpga
ZipCPU|Laptop has joined ##openfpga
f003brv has joined ##openfpga
<sensille> shapr: as you recommended haskell so vehemently i now started to read on it :)
<shapr> sensille: what are your thoughts?
<shapr> sensille: I like to think I really advocate learning one programming from each of 1. imperative 2. functional 3. logic
<sensille> now i know where rusts's typesystem comes from :)
<shapr> like, *really* learning
f003brv has quit [Ping timeout: 256 seconds]
<shapr> I once spent three or four months using only prolog, so I'm not sure I'm even following my own advice
<shapr> yeh! lots of rust things come from Haskell
<sensille> but when i read something like 'all (`elem` ['a'..'z']) "Frobozz"' i immediately think: this might be nice, but can this ever perform well?
<shapr> Haskell is surprisingly fast
<sensille> (from the book "real world haskell")
lambdabot has joined ##openfpga
<shapr> > let ones = 1 : ones in take 15 ones
<lambdabot> [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
<shapr> > let fib = 1 : 1 : zipWith (+) fib (tail fib) in take 15 fibs
<whitequark> daveshah: so, i'm entertaining myself right now by repeatedly running
<lambdabot> error:
<lambdabot> • Variable not in scope: fibs :: [a]
<lambdabot> • Perhaps you meant ‘fib’ (line 1)
<shapr> > let fib = 1 : 1 : zipWith (+) fib (tail fib) in take 15 fib
<whitequark> `lut2mux ; abc -lut 4`
<lambdabot> [1,1,2,3,5,8,13,21,34,55,89,144,233,377,610]
<sorear> *blink*
<whitequark> each time i get slightly different result
<sorear> @help
<lambdabot> help <command>. Ask for help for <command>. Try 'list' for all commands
<shapr> sorear: want to learn Haskell? again? ;-)
<whitequark> sometimes it infers more logic
<whitequark> sometimes less
<sensille> i'd like to see a simple loop like for (i=0; i < 10000000; ++i) a += i; written in haskell and yielding a nice result in disassembly
<sorear> @list
<lambdabot> What module? Try @listmodules for some ideas.
<whitequark> @no
<lambdabot> Error: expected a Haskell expression or declaration
<shapr> sensille: tried godbolt?
<shapr> lambdabot: @leave ##openfpga
lambdabot has left ##openfpga [##openfpga]
<shapr> bye now
<shapr> no more offtopic spam from that bot
<miek> i'm having some trouble bringing up a Glasgow revB - `glasgow factory` seems to read back all 0s from the eeprom but `fx2tool` suggests it programmed ok? https://pastebin.com/raw/ZLjxQWce
<sensille> shapr: i just used ghc and looked at some results
<tnt> Is there such things as gearboxes ICs that take 2 * 5G serdes and make a 10G one ?
<whitequark> miek: hm, interesting
<sensille> shapr: but it might be too offtopic for this channel
<whitequark> miek: can you try this firmware? https://cloud.whitequark.org/s/Kzcq5gJP43DRnFq
<shapr> sensille: in general (very broad brush strokes) , naive straightforward Haskell runs in about twice the time of naive straightforward C or C++
<shapr> I'd argue that naive straightforward Haskell takes less than half the human thinking time, compared to C or C++, to implement the same solution.
<shapr> sorear: you have experience on both sides, what do you think?
<sensille> what i really need to understand is copying data vs. manipulating in place
<shapr> I'd really like to see someone solving the Advent of Code puzzles on an FPGA
<shapr> (going back on topic)
<shapr> sensille: want to try #haskell-beginners or just #haskell for this topic?
<miek> whitequark: same results with that firmware
<whitequark> miek: very strange
<sensille> shapr: i haven't even read one third of the book, so definitely -beginners :(
<shapr> works for me
<sensille> s/(/)
<shapr> oh wow, I found such a project! https://github.com/alokmenghrajani/adventofcode2018 using the icestick even
<whitequark> miek: if you re-plug the device, it comes up with Z-99999... serial, right?
<whitequark> or rather
<whitequark> what VID/PID/DID does it have?
<miek> so after `factory` it comes up with Z-9999.., after replugging comes up with 20b7:9db1 but no product/manufacturer/serial
<whitequark> mmm, try this
<whitequark> do `glasgow factory` then `glasgow flash`
<whitequark> daveshah: hey, what do you think about lut cascade?
<whitequark> should this be done on yosys level? nextpnr?
<daveshah> nextpnr
<daveshah> I wrote a half finished attempt at it
<daveshah> Atm it's actually hurting QoR
<daveshah> This is because nextpnr's placer doesn't handle relative constraints that well
<daveshah> It can't swap chains after constraint legalisation
<whitequark> ahhh ok
<whitequark> i was just thinking about what i should do next...
<whitequark> daveshah: any suggestions btw?
<whitequark> miek: any chance you can take a look at the i2c bus?
<whitequark> i have never seen anything like this
<whitequark> wait.
<whitequark> waaaaait.
<daveshah> whitequark: imo looking at timing driven synthesis optimisations would be awesome
<whitequark> writes succeed, but reads come up with 0
<whitequark> miek: your I2C SDA is stuck at 0.
<whitequark> SDA and/or SCL.
<daveshah> At first just critical path based lut merging
<whitequark> daveshah: tell me more
<daveshah> This is not something I really know about, just random thoughts that would be fun to play with
<whitequark> oh ok
<daveshah> But I think the topological ordering could be used to work out path lengths
<daveshah> And that could be used to guide the LUT merger to make decisions based on minimising the critical path
<whitequark> so what i was thinking about is doing something to `proc` (i think it's proc?)
<whitequark> so that it would actually make sensible names
<whitequark> and not $fuck$you
<daveshah> Yes please
<whitequark> ok
<whitequark> gonna do that next.
<daveshah> Also the alu/macc stuff
<whitequark> yeah i'm going to start with simple logic
<daveshah> Carry chains always have stupid names atm
<whitequark> then alu
<whitequark> then ffs
<whitequark> *everything* has stupid names atm.
<daveshah> Yep
<whitequark> why cannot yosys work out that x | y should be called _x_or_y_ or something
<whitequark> grumble grumble
<daveshah> Where there is no sensible naming, it should be possible to use the src attribute to get a source file and line and use that
<whitequark> i fucking *knew* it in like *2015* that i will end up wriitng this
<whitequark> and here we are
<whitequark> yes
<daveshah> From memory Yosys other than abc is quite good at tracking src
<whitequark> yes
<whitequark> that i appreciate for sure
<whitequark> but src is used basically nowhere right now
azonenberg_work has quit [Ping timeout: 246 seconds]
<daveshah> Yes, it should be possible to replace almost all autogen names with src for a minimum
<whitequark> daveshah: looked through dumped verilog
<whitequark> looks like with -noabc -relut, every single cell has proper src
<daveshah> Sweet
<TD-Linux> miek, you might have i2c stuck. it goes all the way to the adc and dac chips so it may be a short on any of those
<TD-Linux> oh I see it was already answered
<whitequark> TD-Linux: I think I need to add a detector for that...
<whitequark> query a nonexistent chip
<whitequark> if it ACKs, I2C is stuck.
<TD-Linux> it would be nice, it's an easy failure because i2c goes so many places
<whitequark> ok sure
_whitenotifier has quit [Remote host closed the connection]
<whitequark> TD-Linux: actually
<whitequark> what the hell is happening on that board?
<whitequark> i just tried and if I deliberately add an i2c fault it hangs...
<TD-Linux> it sounds to me kind of what happened when I had scl and sda shorted together
<whitequark> yeah ,I tried that too
<daveshah> Missing pullup?
<daveshah> That might make i2c go funky
<TD-Linux> but on mine, it would get halfway through factory, change ids, and then return to original id when unplugged and replugged
<whitequark> oh, that sounds about right!
<whitequark> TD-Linux: hmmmm
<TD-Linux> (er to be clear, never appear as the new id)
<miek> i2c seems to be behaving: https://imgur.com/a/f8LbyVD
<miek> (that's while running `glasgow -vv flash`)
<whitequark> miek: that does not seem normal
<whitequark> those gaps at low
<whitequark> not sure though
<whitequark> can you try and decode it?
<whitequark> miek: i am about 95% sure you have an i2c fault somewhere
<whitequark> double-check pullups
<whitequark> double-check continuity and solder bridges
<miek> ok, will double-check everything and see if i can find an fx2 to decode it
<TD-Linux> solder wick the dacs and adcs
azonenberg_work has joined ##openfpga
<mwk> huh
<mwk> so 7a15t/7a35t/7a30t/7s50 are basically the same shit with different markings?
<daveshah> Yes
<mwk> nice
<daveshah> They are resource count limited
<daveshah> So the entire die is guaranteed working
<whitequark> daveshah:
<whitequark> Info: 1.2 19.8 Source boneless.v:176$893_LC.O
<whitequark> Info: 1.8 21.5 Net s_opB[0] budget 3.783000 ns (13,21) -> (12,22)
<whitequark> Info: Sink boneless.v:269$1058_LC.I0
<whitequark> Info: 1.3 22.8 Source boneless.v:269$1058_LC.O
<whitequark> Info: 2.4 25.2 Net boneless.v:269$14[0] budget 3.783000 ns (12,22) -> (11,24)
<whitequark> Info: Sink boneless.v:269$646$CARRY.I2
* mwk just got a very grateful friend with a 7a35t and a h4xed bitstream
<whitequark> daveshah: ooooooh
<whitequark> so my critical path is a subtractor in ALU
<whitequark> holy shit this is SO USEFUL
<daveshah> Nice
<whitequark> i can't believe no one using yosys spent like 30 minutes writing a pass
<whitequark> what the hell lmao
<tnt> is that the 'dress' stuff ? or something else ?
<whitequark> no
<whitequark> rename -src
<whitequark> I don't really care about abc anymore :P
<whitequark> might actually (gasp) look at floorplanning
<miek> so i checked/reflowed a bunch of stuff but no joy. i haven't got anything around to decode easily, but the waveform is identical on the scope between using `glasgow flash` (all 0s) and `fx2tool read_eeprom` (correct readback)
<whitequark> miek: very strange
<whitequark> unfortunately, i don't really know how to help you at the moment
<whitequark> i'll let you know if i have ideas, or ping me in a few days
<miek> ok, no worries, i'll keep playing around. cheers for the help so far
<whitequark> daveshah: so, like half of the boneless cpu design is attributed to the FSM
<whitequark> cells wise
<whitequark> and nets
<daveshah> whitequark: PR looks good, thanks
<daveshah> Interesting that the FSM is so significant even with a 16 bit datapath
<miek> oh good it gets stranger, wireshark shows the correct data coming in
<whitequark> miek: ohhhhh
<whitequark> now *this* is something i know
<whitequark> try updating your python-libusb
<whitequark> miek: are you using the one in debian by any chance?
<miek> ubuntu, but yeah. i just installed one from pip and it works! thanks!
<whitequark> miek: what is the version you have in debian
<miek> 1.6.3-1
<daveshah> Guess the nextpnr team all have shitty laptops :P
<whitequark> hehe
_whitenotifier has joined ##openfpga
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fp1XA
<_whitenotifier> [whitequark/Glasgow] whitequark 3aafe48 - software: require libusb1>=1.6.6.
<miek> yay, little bit of rework and it's passing selftest :)
<whitequark> sweet!
<tnt> I still have one failing the loopback test for the second EP pair ... couldn't see anything wrong with the solder joinst under magnification.
<tnt> Mmm, data end up being slightly mangled :/ b'\xaaU.god yzal eht rgvo spmuj xof nworb kciuq ehT') vs b'\xaaU.god yzal eht revo spmuj xof nworb kciuq ehT'
<whitequark> uh
<whitequark> those look the same?
<daveshah> rgvo vs revo
<whitequark> oh, so bit 2
<whitequark> fascinating
<whitequark> tnt: are you using an up to date toolchain?
<tnt> whitequark: From a few hours ago ... with about every experimental patches from daveshah, you and me ...
ClausPillow has joined ##openfpga
<whitequark> mm, okay, so it's probably not gateware
<whitequark> only the second ep pair though? let me see
<whitequark> hm those pins aren't really close
<whitequark> dunno
<tnt> yeah .. EP2OUT->EP6IN works fine.
<whitequark> weird.
<whitequark> always the same error?
<prpplague> anyone heard of details for orconf 2019?
<tnt> whitequark: no, seems change sometimes. b'\xaaW.god yzal eht rgvo spmuj xof nworb kckuq ehT') kckuq vs kciuq
<whitequark> same bit
<whitequark> hmmm
<tnt> Ok, I'll recheck bit 2.
<whitequark> wonder if it's PTV
<whitequark> it *could* be PTV but i don't know
<whitequark> the FX2 bus stuff is still slightly suspect to me
Zorix has joined ##openfpga
<tnt> Those look just fine to me :/
<whitequark> those look damn great
<whitequark> i have never seen a better qfn solder joint in my life
<whitequark> tnt: what about sending a stream of 55 aa
<whitequark> via the loopback pipe
<whitequark> and then looking at it via a scope?
genii has quit [Remote host closed the connection]
<_whitenotifier> [Glasgow] miek opened pull request #88: access.direct.demultiplexer: fix TypeError when length is None - https://git.io/fp15i
<tnt> I can give it a shot. There was a benchmark somewhere right ? Probably the easiest to mod to send that.
<whitequark> tnt: the benchmark applet uses an LFSR
<whitequark> actually
<whitequark> try running it
<whitequark> it uses EP2-EP6
<whitequark> so if it's electrical you'll probably see the bug on those two too
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fp157
<_whitenotifier> [whitequark/Glasgow] miek 8806951 - access.direct.demultiplexer: fix TypeError when length is None
<_whitenotifier> [Glasgow] whitequark closed pull request #88: access.direct.demultiplexer: fix TypeError when length is None - https://git.io/fp15i
<tnt> Seem to 'hang' at the loopback test (i.e. never return)
<whitequark> no, that doesn't work
<whitequark> it's uh
<whitequark> bug #44
<whitequark> run either source, sink, or both, explicitly
<tnt> I: glasgow.applet.benchmark: running benchmark mode source for 4.000 MiB
<tnt> I: glasgow.applet.benchmark: mode source: 10.193 MiB/s
<tnt> I: glasgow.applet.benchmark: running benchmark mode sink for 4.000 MiB
<tnt> I: glasgow.applet.benchmark: mode sink: 0.969 MiB/s
<whitequark> tnt: hmmm
<tnt> Can I easily make the benchmark use the other EP ?
<whitequark> tnt: add a dummy target.multiplexer.claim_interface() call in Benchmark.build
<whitequark> target.multiplexer.claim_interface(self, args=None)
<whitequark> something like this
<tnt> ran just fine too
<whitequark> bizarre.
<whitequark> does -v mention EP4/EP8?
<whitequark> -vv
<tnt> Yeah T: glasgow.device.hardware: USB: BULK EP8 IN data=<dcf3b8e771cfe29ec43d8 .....
<whitequark> ok
<whitequark> your hardware is likely fine
<whitequark> this is probably my shitty FX2 arbiter then
<whitequark> I really need to rewrite it and, I dunno, add tests...
<SolraBizna> why route when you can have a 128-layer board and each signal its own plane
<tnt> whitequark: Do you use the IO registers ?
<whitequark> tnt: yes
<whitequark> before that it barely worked
<tnt> yeah not surprising, timing would be highly dependent of the PnR results.
<whitequark> i ned a model of fx2 in migen...
<whitequark> need*
<tnt> why would it affect only D1 though ?
<whitequark> no idea
<whitequark> i have not observed this particular failure
<whitequark> can you try hmmm
<whitequark> tnt: can you locally modify migen to pass --randomize-seed to nextpnr
<whitequark> and see if that changes things
<tnt> Yeah it seems it does
<tnt> Is there a way to force rebuilt ?
<whitequark> yes
<whitequark> --rebuild :p
<tnt> It actually works most of the time ... I guess just not with the default seed in my particular machine.
<whitequark> tnt: so this is a timing issue... bleh
<whitequark> :S
<whitequark> i was afraid of that
<tnt> nextpnr doesn't really analyze the path to/from D_{IN,OUT} as part of the sync logic. It doesn't seem to know when IO registers are enabled or not.
<daveshah> Yes, that needs fixing
<daveshah> It will count as the $async <-> clock paths though
<whitequark> ohhhh
<tnt> yeah, that's how I know it doesn't work atm :) because I see those path in <async>
<daveshah> I'm not convinced icetime handles them entirely correctly either
<daveshah> However, if the delay in the <async> path is still less than the clock period then its not a problem
<daveshah> If it is, then that will be it
<whitequark> daveshah: there is also setup/hold timing of fx2
<whitequark> which is rather complicated.
<daveshah> Yes, it is on my masters todo list to look at this kind of stuff in nextpnr
<daveshah> But that won't be until next year now
<whitequark> the fx2 timing is nightmarish in places
<whitequark> because it has setup/hold timings... longer than one clock cycle
<whitequark> like, what?
<daveshah> yeah that's crazy
<tnt> whitequark: mmm ...
<tnt> whitequark: instead of having SB_IO followed by a SB_GB, can't you use SB_GB_IO ?
<whitequark> tnt: where?
<whitequark> also, is that actually different?
<tnt> Yes.
<whitequark> shit
<whitequark> ok fine
<daveshah> Yes SB_IO, SB_GB adds a bit of fabric routing
<tnt> As is, the clokc will be routed to the fabric and brought to a random SB_GB depending on placement.
<tnt> which means the clock phase will vary run to run.
<whitequark> ughhhhhh
<daveshah> Seems that the ice40up5k input register has a whole 4ns of its own setup time
<daveshah> And clock to out of 1.5ns
<daveshah> Just the pin and register excluding global network etc
<whitequark> daveshah: what the fuck
<Richard_Simmons> I'm seeing more and more of these Gowin fpgas, yet I still know nothing about them
<whitequark> this makes the benchmark applet fail on my glasgow
<whitequark> using SB_GB_IO
<whitequark> but only in sink mode
<whitequark> tnt: can you check this patch https://hastebin.com/susujikoro.diff
<daveshah> Hmm
<tnt> Well, tusing SB_GB_IO the phase will be constant .... I didn't say it was going to be right :p
<whitequark> actually, selftest now consistently fails
<daveshah> Add a few manually placed LUTs and a manually placed GB to sort it out :P
<whitequark> AAAAAAAA
<tnt> Yeah, now both test fails ... consistently.
<whitequark> this looks like uh
<whitequark> 54686520717569636b2062726f776e20666f78206a756d7073206f76657220746865206c617a7920646f672e55aa
<whitequark> 5454686520717569636b2062726f776e20666f78206a756d7073206f76657220746865206c617a7920646f672e55
<whitequark> this looks like SLRD strobe is not registered in time
<daveshah> The delay with a GB_IO is probably much lower than with a GB
<whitequark> lmao this *totally* fucks up all of my strobes
<daveshah> This is why most FPGAs have input delay blocks
<tnt> You can use the PLL to select the phase of the clock ...
<whitequark> tnt: try this https://hastebin.com/izovololen.diff
<whitequark> in addition
<whitequark> yeah, this unfucks selftest and benchmark for me
<whitequark> now, i'm not sure why is that
<whitequark> will need to read the spec again
<tnt> Yeah, passes self test.
<whitequark> shit
<whitequark> ok
<whitequark> thanks
<_whitenotifier> [Glasgow] whitequark created branch sb_gb_io - https://git.io/fp4Wh
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 2 commits to sb_gb_io [+0/-0/±6] https://git.io/fp1A4
<_whitenotifier> [whitequark/Glasgow] whitequark 50cc07a - cli: add --synthesis-opts, for passing options to Yosys' synth_ice40.
<_whitenotifier> [whitequark/Glasgow] whitequark 1df4379 - WIP
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±3] https://git.io/fp1AB
<_whitenotifier> [whitequark/Glasgow] whitequark efb6bc6 - cli: add --synthesis-opts, for passing options to Yosys' synth_ice40.
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to sb_gb_io [+0/-0/±3] https://git.io/fp1AR
<_whitenotifier> [whitequark/Glasgow] whitequark eadfa5a - WIP
<_whitenotifier> [Glasgow] Error. The Travis CI build could not complete due to an error - https://travis-ci.org/whitequark/Glasgow/builds/464103933?utm_source=github_status&utm_medium=notification
<_whitenotifier> [Glasgow] whitequark opened issue #89: Use SB_GB_IO instead of SB_IO+SB_GB - https://git.io/fp1A2
<_whitenotifier> [Glasgow] whitequark assigned issue #89: Use SB_GB_IO instead of SB_IO+SB_GB - https://git.io/fp1A2
<whitequark> tnt: ^
<tnt> It weird how that one board seem to behave so differently from other people's ... (and from the other board I built). First the FX2 LEDs and now this :p
<whitequark> well, yeah
<whitequark> it's interesting
<whitequark> tnt: think you can try and abuse the design a bit in that branch?
<whitequark> if it works decently enough i might just merge it...
<whitequark> or write a model...
<tnt> You could try to run the design in icecube to get the "official" timing number for the IO (i.e. how much sys_clk is delayed compared to the clock at the io pin, and how much setup/hold is expected on each pin and the clk to out etc ...)
<whitequark> I donn't even have icecub
<tnt> Ah :) Well, I can give it a shot.
<tnt> Is there an option to save the .v / .pcf ?
<tnt> CTRL-C during nextpnr works :p
<whitequark> `glasgow build -t v`
<whitequark> just for verilog
<whitequark> or
<whitequark> `glasgow build -t zip`
<whitequark> for the entire design
<whitequark> caution: zipbomb
<tnt> E2792: Instance SB_IO_18 incorrectly constrained at SB_IO_OD location
<tnt> damn
<tnt> wtf ... they removed all the underscore in the ports names from SB_IO to SB_IO_OD ...
Bike has joined ##openfpga
ZipCPU|Laptop has quit [Ping timeout: 240 seconds]