awygle_m_ has quit [Ping timeout: 248 seconds]
awygle_m has quit [Ping timeout: 248 seconds]
awygle_m has joined ##openfpga
promach_ has joined ##openfpga
promach_ has quit [Remote host closed the connection]
awygle_m has quit [Ping timeout: 252 seconds]
promach_ has joined ##openfpga
promach_ has quit [Remote host closed the connection]
_florent_ has quit [Ping timeout: 240 seconds]
_florent_ has joined ##openfpga
X-Scale has quit [Quit: HydraIRC -> http://www.hydrairc.com <- In tests, 0x09 out of 0x0A l33t h4x0rz prefer it :)]
theMagnumOrange has quit [Ping timeout: 248 seconds]
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
jhol has quit [Quit: Coyote finally caught me]
fpgacraft2_ has joined ##openfpga
fpgacraft2 has quit [Read error: Connection reset by peer]
fpgacraft2_ is now known as fpgacraft2
pie__ has joined ##openfpga
pie_ has quit [Read error: Connection reset by peer]
pointfree1 has quit [Ping timeout: 240 seconds]
pointfree1 has joined ##openfpga
<rqou> whee, "slow-motion moving out roommate" finally took his bed out of the room
<rqou> now i can have more space
eduardo__ has joined ##openfpga
eduardo_ has quit [Ping timeout: 248 seconds]
ZipCPU has quit [Ping timeout: 246 seconds]
teepee has quit [Ping timeout: 240 seconds]
teepee has joined ##openfpga
digshadow has quit [Ping timeout: 240 seconds]
digshadow has joined ##openfpga
digshadow has quit [Ping timeout: 240 seconds]
digshadow has joined ##openfpga
Hootch has joined ##openfpga
teepee has quit [Ping timeout: 258 seconds]
teepee has joined ##openfpga
teepee has quit [Ping timeout: 248 seconds]
teepee has joined ##openfpga
teepee has quit [Ping timeout: 252 seconds]
teepee has joined ##openfpga
teepee has quit [Ping timeout: 240 seconds]
teepee has joined ##openfpga
teepee has quit [Ping timeout: 258 seconds]
teepee has joined ##openfpga
Hootch has quit [Ping timeout: 240 seconds]
qu1j0t3 has quit [Ping timeout: 246 seconds]
Hootch has joined ##openfpga
qu1j0t3 has joined ##openfpga
qu1j0t3 has quit [Ping timeout: 260 seconds]
qu1j0t3 has joined ##openfpga
teepee has quit [Ping timeout: 248 seconds]
<eduardo__> azonenberg_work: This sounds very much like your delay measurement device http://www.nanoxplore.com/products/30-application-and-usage.html
teepee has joined ##openfpga
<sn00n> cool
pie__ has quit [Ping timeout: 248 seconds]
<sn00n> ok, most stuff happens at night :(
enriq has joined ##openfpga
ZipCPU has joined ##openfpga
m_t has joined ##openfpga
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
enriq has joined ##openfpga
pie__ has joined ##openfpga
<azonenberg_work> That looks like something you add to the chip though
<azonenberg_work> the whole point of my work is that its bodged around whatever silicon i get thrown
<azonenberg_work> and i accept less accurate/complete data as a consequence
<sn00n> what does it do?
<sn00n> i mean your thing
<sn00n> as in: what are you guys talking about? ^^
<azonenberg_work> sn00n: I have a characterization setup for extracting timing data from greenpak4 chips
<azonenberg_work> which i need to finish (been traveling for a month straight between vacation and work)
<azonenberg_work> across P/T/V corners
<azonenberg_work> Because i want to do timing driven placement and the vendor doesnt publish enough data for me to make a timing analyzer
<sn00n> ah ok
<sn00n> cool
X-Scale has joined ##openfpga
azonenberg_work has quit [Ping timeout: 240 seconds]
enriq_ has joined ##openfpga
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
enriq has joined ##openfpga
enriq_ has quit [Quit: Mutter: www.mutterirc.com]
azonenberg_work has joined ##openfpga
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
enriq has joined ##openfpga
enriq_ has joined ##openfpga
enriq_ has quit [Client Quit]
enriq_ has joined ##openfpga
azonenberg_work has quit [Ping timeout: 240 seconds]
azonenberg_work has joined ##openfpga
pie__ has quit [Ping timeout: 240 seconds]
Hootch has quit [Quit: Leaving]
pie__ has joined ##openfpga
azonenberg_work1 has joined ##openfpga
azonenberg_work has quit [Ping timeout: 240 seconds]
azonenberg_work1 is now known as azonenberg_work
azonenberg_work1 has joined ##openfpga
azonenberg_work has quit [Ping timeout: 248 seconds]
azonenberg_work1 is now known as azonenberg_work
enriq__ has joined ##openfpga
enriq__ has quit [Client Quit]
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
enriq_ has quit [Ping timeout: 248 seconds]
enriq__ has joined ##openfpga
jhol has joined ##openfpga
ZipCPU|Laptop has joined ##openfpga
enriq__ has quit [Quit: Mutter: www.mutterirc.com]
Hootch has joined ##openfpga
shapr has joined ##openfpga
enriq has joined ##openfpga
<azonenberg_work> o/ shapr
<shapr> howdy!
<azonenberg_work> So by way of background right now we have icestorm (which we're aware of but this channel is not actively developing) for ice40, then my flow for silego greenpak4 and rqou+me's flow for xilinx coolrunner-2
<azonenberg_work> pointfree and cyrozap are doing work on psoc5lp
<azonenberg_work> but i think it's mostly been RE and less toolchain creation so far
<shapr> is there a tutorial on reverse engineering?
<azonenberg_work> clifford has been doing a little bit of RE on xilinx 7 series but i dont know how far he got etc
<azonenberg_work> Sadly no
<azonenberg_work> But we don't just need RE
<azonenberg_work> RE is only half the battle
<shapr> I don't mind frying a bunch of expensive FPGAs and buying more and doing it all over again.
<azonenberg_work> we need synthesis support for various devices
<azonenberg_work> place-and-route
<azonenberg_work> etc
<azonenberg_work> personally, my next target will be spartan-3a
<azonenberg_work> it's old, cheap, readily available, not end-of-lifed yet
<azonenberg_work> big enough to be useful
<azonenberg_work> simple enough to be easy to RE (not a ton of complex hard IP)
<azonenberg_work> and is a direct ancestor of modern xilinx parts so a lot of the architecture will make more sense once you understand s3a
<azonenberg_work> And it even comes in non-bga packages for the silly people who prefer hard-to-solder TQFP to a nice friendly 1mm BGA
<shapr> I'm an FPGA noob, are there place and route libraries that can work for multiple chips?
<azonenberg_work> Vendor tools are always chip specific
<azonenberg_work> in the open world, there's VPR which is supposed to be generic
<shapr> ah, "supposed to be"
<azonenberg_work> But it will require technology libraries for specific chips to be useful
<shapr> hm
<azonenberg_work> It is also largely an academic R&D tool
<azonenberg_work> i don't know if it can become production grade for large modern FPGAs with acceptable performance
<azonenberg_work> me and awygle have been talking about creating a parallel analytic placer instead of using VPR
<shapr> VPR isn't parallel?
<shapr> must not be
<azonenberg_work> Not well, i believe it's annealing based
<azonenberg_work> when i say parallel i dont mean it runs on 4 threads
<azonenberg_work> i mean something you can push out to 1024 cores in EC2 or something
<shapr> oh!
<azonenberg_work> and PAR a virtex ultrascale+ in a few minutes
<shapr> man that would be *popular*
<azonenberg_work> This is a major research project, as you can imagine
<shapr> that's a neat idea for taking over FPGA dev
<azonenberg_work> And is a long term project
<azonenberg_work> So i expect we will be doing RE of smaller chips in parallel with that moonshot
<shapr> makes sense
<azonenberg_work> During that time we will be either using VPR or writing our own less-scalable PAR to make a complete flow
<azonenberg_work> Right now we have a generic annealing-based PAR for crossbar CPLDs, so we may make something similar for 2D LUT fabric FPGAs
<shapr> I'd like to see a tutorial for reverse engineering FPGAs
<azonenberg_work> (the routing topologies are different enough we'd need a totally diff tool for CPLD vs FPGA)
<azonenberg_work> So if you're interested in assisting with the spartan3a effort
ZipCPU|Laptop has quit [Ping timeout: 240 seconds]
<shapr> yeah, I'm interested
<shapr> though I currently have zero idea what that involves, other than purchasing hardware :-)
<azonenberg_work> Forget buying hardware for the moment
<azonenberg_work> in particular, page 243 on
<azonenberg_work> Read it as many times as you need to
pie__ has quit [Ping timeout: 240 seconds]
<azonenberg_work> Download ISE (not vivado, s3a uses the older toolchain) and create some bitstreams for the XC3S50A (the smallest part in the series, and our initial target)
<azonenberg_work> Parse the configuration frames and print them out, make sure you understand what's going on at that level
<azonenberg_work> Some of the code in https://github.com/azonenberg/jtaghal may be of use
<azonenberg_work> The goal at this point is not to figure out anything new, it's to understand what is documented
<shapr> hm, ok
<shapr> I can read :-)
<azonenberg_work> So, the xilinx .bit file is multiple levels of framing
<azonenberg_work> The highest level is basically the same for all parts from s3a up to the present
<azonenberg_work> it's a metadata header produced by bitgen for PC-based tool consumption and ignored by the actual silicon
<azonenberg_work> there's about five fields with things like the .ncd file name, the date and time the bitstream was generated, etc
<azonenberg_work> followed by the actual config data read by the FPGA, which is what you'd write to a .rbt file or burn to a flash chip
<azonenberg_work> This metadata header can be largely ignored, it's not documented but was easy to RE
<azonenberg_work> jtaghal has some code in the XilinxFPGA or XilinxDevice class, i think, that parses it
<azonenberg_work> The next level of framing is documented in the user guide i sent you
<azonenberg_work> at a high level it's the same from S3A to the present, although the actual bit coding varies from chip to chip
<azonenberg_work> basically you have a sequence of 16 or 32 bit registers (depending on the chip family, same size within one family) registers
<shapr> ah, ParseBitstreamCore
<azonenberg_work> the bitstream is a sequence of writes to these registers
<azonenberg_work> The names, addresses, and high-level funcitonal description of the regs is all documented
<azonenberg_work> The Xilinx*Device class in jtaghal parses these as well
<azonenberg_work> for all families i've worked with
<azonenberg_work> The bulk of the data in the bitstream is a write to the FDRI (frame data register in) register
<azonenberg_work> Which is what actually configures the chip
<azonenberg_work> the rest of the registers do things like set fallback options in case of CRC failure, the SPI clock rate, etc
<azonenberg_work> The default (size optimized) bitstreams set FAR (frame address register) to zero then writes the whole chip with one giant write to FDRI
<azonenberg_work> If you use the debug-bitstream option in BitGen, you can create a bitstream that does multiple smaller writes with a CRC check after each write
<azonenberg_work> to different addresses
<shapr> so that's the first step to understanding the chip structure?
<azonenberg_work> Presumably, each of these smaller blocks is aligned to some logical boundary in the chip
<shapr> I'm enjoying this extemporaneous RE tutorial :-)
<azonenberg_work> At a low level, the data going into FDRI configures "frames" that go to different X/Y locations on the chip
<azonenberg_work> each frame controls one type of logic
<azonenberg_work> for example block ram, io, etc
<azonenberg_work> The frames are, i believe, bit sliced
<azonenberg_work> so one frame might be one bit from each of 100 luts
<azonenberg_work> the next frame would be the next bit, etc
<azonenberg_work> according to the physical die structure
<azonenberg_work> Look at the chip floorplan in PlanAhead and FPGA Editor
<azonenberg_work> as well as the physical silicon (photos here https://siliconpr0n.org/archive/doku.php?id=azonenberg:xilinx:xc3s50a)
<azonenberg_work> And that's about as far as we know for certain :p
<azonenberg_work> The "readback" config output by bitgen may have some useful stuff in it
<azonenberg_work> The physical layout gives a good idea of how the chip is actually structured
<shapr> are there different dies for spartan3a ? or is it always the same die?
<azonenberg_work> note that for example the physical chip has bram and mult swapped L/R relative to how planahead shows them :)
<azonenberg_work> Each device in the family is a different die
<azonenberg_work> we have only seen reuse of dies starting in spartan6 (LX and LXT are same silicon) and 7 series (7a15/35t are fused 50t, 75t is fused 100t)
sunxi_fan has joined ##openfpga
<shapr> ah, golden screwdriver
<azonenberg_work> I have sem photos of the chip too somewhere but i guess i never uploaded them to pr0n
<azonenberg_work> just optical
<azonenberg_work> anyway my short term priority is work plus getting ready for a con talk on higher level netlist RE the week of the 15th
<balrog> Altera side, Cyclone 3 and 4 also is reuse of dies (different process, same structure and bitstream and JTAG IDs)
<azonenberg_work> so i wont be able to put any time into this until after
<shapr> next week?
<shapr> ORCON?
<azonenberg_work> no, hardwear.io
<azonenberg_work> i'm onsite with a client this week, next week i'm finishing my research and slides
<azonenberg_work> the next week i'm in Den Haag for the conference
<shapr> wat loek :-)
<shapr> er "very cool"
<azonenberg_work> Lol
* azonenberg_work doesnt speak any dutch, this will be interesting
<azonenberg_work> Anyway, after the con is over i will be starting to gear up research for REcon 2018 in montreal
<azonenberg_work> This will be a combination of higher level netlist RE and more work on xilinx chips like s3a
<azonenberg_work> So if you have time to contribute, i'd say begin by reading the docs and generating some test bitstreams
<azonenberg_work> your goal is not to RE the chip yet
<azonenberg_work> your goal is to understand what is already publicly known
<azonenberg_work> inside out and backwards
<azonenberg_work> you should be able to draw a floorplan of the chip from memory and describe, in some detail, what each block does
<shapr> this is an excellent introduction, thank you so much for making this accessible
<azonenberg_work> Then we can talk about division of labor etc to actually move forward and figure out previously unknown stuff
<balrog> it might not be a bad idea for someone to turn this introduction into a wiki page
<balrog> I wish I had the time
<azonenberg_work> at some point we'll probably make a github wiki for s3
<azonenberg_work> Lol that too
<shapr> I'll attempt to condense this live intro into a blog post, and ask for any improvements on this channel
<azonenberg_work> shapr: the long term goal is to have something like this https://github.com/azonenberg/openfpga/blob/master/doc/coolrunner/xc2c32a-notes.txt
<azonenberg_work> except better formatted b/c this isnt too readable
<azonenberg_work> for the xc3s50a
<azonenberg_work> The coolrunner is a much simpler chip, and is addressed in 2D
<azonenberg_work> the jtag interface allows direct writes of the 48 260-bit rows of config data
<azonenberg_work> So it was pretty easy to map the physical layout back to bits in the bitstream
<azonenberg_work> with spartan since its a more abstracted interface, it's less trivially obvious what bits go to what part of the chip
<azonenberg_work> But again, first figure out what we already know
<azonenberg_work> next step will probably be to make a wiki page on azonenberg/openfpga summarizing what's known
<azonenberg_work> shapr: also, look at XDL
<azonenberg_work> it's kind of like an assembly language for xilinx chips
<shapr> ah, neat
<azonenberg_work> you can turn a .ncd (placed-and-routed netlist, not a bitstream) into XDL
<azonenberg_work> and back
<azonenberg_work> however once you go to .bit there is no documented way to go to XDL
<azonenberg_work> i'm sure xilinx has an internal unreleased tool :p
<azonenberg_work> But XDL may be useful to look at the NCD you turned into a BIT
<azonenberg_work> and understand exactly what data is in that BIT (just not where it's stored or how it's encoded)
<shapr> I may know some people who have tips/pointers
<shapr> former xilinx employees, dunno if they have an active NDA or not
<shapr> azonenberg_work: many thanks
awygle_m has joined ##openfpga
azonenberg_work has quit [Ping timeout: 240 seconds]
azonenberg_work has joined ##openfpga
<azonenberg_work> aaand back
<azonenberg_work> (09:10:10) azonenberg_work: Thats one of the nice things about spartan3a, its an older (i think 11-12 yr old) chip
<azonenberg_work> (09:10:15) azonenberg_work: less likely to cause legal problems for those of us in the US
<azonenberg_work> (09:10:34) azonenberg_work: while still being a direct ancestor of the current stuff so most of the knowledge and intuition abut their architecture should transfer
<awygle_m> Virtex ultra plus in a few minutes? Feels like goalpost moving :-P didn't it used to be a 100t in a minute? Or did I make that up?
<azonenberg_work> lol
<azonenberg_work> Depends on how many cores :p
<azonenberg_work> First step will be to par a 3s50a with a few basic gates on it
<azonenberg_work> then to do it on say 16 cores
<azonenberg_work> then to demonstrate scalability as we do so
azonenberg_work has quit [Ping timeout: 248 seconds]
azonenberg_work has joined ##openfpga
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
azonenberg_work has quit [Ping timeout: 240 seconds]
awygle_m has quit [Remote host closed the connection]
awygle_m has joined ##openfpga
<awygle_m> I wonder if the Lattice ECP5 series uses the same bitstream format as the ice40
digshadow has quit [Ping timeout: 260 seconds]
<shapr> awygle_m: did the ECP5 also come from silicon blue?
<shapr> I just heard earlier today that ice40 was designed by silicon blue, and lattice bought them in 2011
<awygle_m> It came out in 2014, so not in the sense of being straight purchased. And the format does seem to be different
digshadow has joined ##openfpga
enriq has joined ##openfpga
azonenberg_work has joined ##openfpga
massi has quit [Remote host closed the connection]
<awygle_m> Oh and there's actually documentation, look at that: https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.latticesemi.com/~/media/LatticeSemi/Documents/ApplicationNotes/EH/TN1260.pdf%3Fdocument_id%3D50462&ved=0ahUKEwiyhZej0JPWAhUJ2WMKHeSMCw8QFgglMAA&usg=AFQjCNHgiwBd1H2SuB6QSiUKYHVefNrqTQ appendix b
<shapr> are you RE'ing ECP5?
<awygle_m> Nah, I just got curious, I happen to be using it
<awygle_m> Maybe when I get PAR to the point that I need a better test case than ice40 I'll spend some time on that, but hopefully somebody else has something for me by then
<shapr> is PAR your own place and route?
<shapr> ah, looks like you're using arachne
<awygle_m> I think azonenberg_work mentioned the scaleable analytic placer? I'm working on that.
<shapr> if I knew more about place and route, I'd like to try one in Haskell
<shapr> but I doubt anyone else would want to learn/write Haskell
<shapr> maybe christiaanb, but he's already doing clash-lang
<shapr> so you're doing distributed simulated annealing?
<azonenberg_work> awygle: So, re ecp5
<azonenberg_work> it looks like there is doc of the high level structure
<azonenberg_work> same as xilinx
<azonenberg_work> but they do not document the contents of each frame
<azonenberg_work> Except for the block RAM, interestingly
<azonenberg_work> i guess they wanted to support bitstream patching for bram
digshadow has quit [Ping timeout: 248 seconds]
<awygle_m> azonenberg_work, yeah of course, that would be too good to be true
<rqou> hey, enough information for you to cheat like MAME :P
pie_ has joined ##openfpga
awygle_m has quit [Remote host closed the connection]
awygle_m has joined ##openfpga
<awygle_m> shapr: so there's two basic approaches used for placement, simulated annealing and analytic placement. I'm taking an analytic approach
<awygle_m> Massively parallelizing simulated annealing has often come at the cost of quality of results (basically you chop the part up into smaller blocks and SA each independently, which prevents global optimization)
<awygle_m> There are some approaches to improve that situation (don't break up into the same blocks on every iteration for one thing) but it's complicated and results are questionable
<awygle_m> Analytic placement basically formulates the problem as a huge system of equations which is then solved for optimal placement. The canonical example is minimizing squared wire length, which is basically "what if this was a big mass-spring system?"
<shapr> huh, ok
<shapr> I dug up some research papers for simulated annealing for mapreduce, but I don't have an academic membership at the moment, so I didn't read them.
<balrog> if you have a problem using something like scihub, you can ask since several people here have academic access
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
enriq__ has joined ##openfpga
<awygle_m> I've seen a few of those, ime they tend to have a significant serial component. Good for going from "a day" to "four hours" but not "four minutes". If you link the papers though I'll take a look (after work :-P)
<awygle_m> My ultimate goal is a heterogeneous distributed solution that can leverage GPUs as well as CPU cores
<awygle_m> Which is... Ambitious :-P we'll see
<shapr> is it easy to do place and route as graph reduction? that could be easier to distribute, especially across GPU and CPu
<shapr> I immediately think of the graph reduction CPUs I've seen built for FPGAs
enriq__ has quit [Remote host closed the connection]
<rqou> so PAR is equivalent to subgraph isomorphism, but that is NP-complete anyways
<shapr> rqou: do you have any references I could read that explicitly connect PAR to subgraph isomorphism?
<rqou> no, just "think about it" :P
<rqou> it's very obvious the way azonenberg_work has written his xbpar code
<rqou> you have one graph representing all possible connections in the device
<rqou> nodes = sites where you can place logic blocks
<rqou> edges = potential connections between logic blocks
<rqou> and then your netlist (your design) also contains logic blocks and connections
<awygle_m> rqou: not being completely up on my graph theory - will subgraph isomorphism techniques optimize, or just find _an_ answer?
<rqou> and you need to match up netlist nodes and device nodes, such that the edges in the netlist also correspond to edges in the device
<rqou> subgraph isomorphism is a decision problem, so just an answer
<rqou> finding an "optimal" in some sense answer is probably NP-hard
<shapr> just means "is graph A a subgraph of graph B"
<rqou> yeah
<shapr> that's subgraph isomorphism
Hootch has quit [Quit: Leaving]
<rqou> but usually algorithms for decision problems also tell you how, not just yes/no :P
<shapr> I wonder if the recent darpa graph processing challenge came up with anything new?
<awygle_m> Everything is NP-hard lol
<shapr> it does seem like that
<rqou> the heuristic i like to use is "optimize <foo>" is always NP-hard lol
enriq has joined ##openfpga
<rqou> unless the problem is clearly linear programming or something like that
<awygle_m> One of my biggest gripes about my formal education is the way they treated "NP-hard" as "don't even bother"
<rqou> no, there's "the other" course that ignores np-hardness and teaches a number of useful algorithms
<rqou> but it's not a "traditional" algorithms class :P
<rqou> (cs188)
<balrog> how do you deal with NP-hard problems in reality is a worthwhile field :p
<awygle_m> That was going to be my next comment lol "and nobody in 188 explained that we were working on NP-hard problems"
<balrog> cs188 is AI?
<azonenberg_work> rqou: to be clear
<azonenberg_work> i dont think that PAR in the general case is subgraph isomorphism
<azonenberg_work> PAR on a PLD is
<azonenberg_work> but not for an ASIC
<awygle_m> balrog: yup
<azonenberg_work> as you dont have a second graph to map to
<balrog> also that recent purported P!=NP proof failed :p
<azonenberg_work> it's closer to "traveling salesman where you design the map"
<rqou> PAR for an FPGA should also be subgraph isomorphism though?
<awygle_m> azonenberg_work: wouldn't the decision problem formulation on an ASIC just be "return true"? Lol
<azonenberg_work> Lol
<azonenberg_work> If there's no size or metal layer constraints etc, yes
<azonenberg_work> :p
<rqou> no, it still has constraints
<awygle_m> I guess you have to constrain it to a size /process /cost etc
<azonenberg_work> And timing
<rqou> e.g. "we can only pay <foo> dollars" :P
<balrog> out of curiosity has anyone looked at the open source ASIC tools?
<azonenberg_work> And when I said PLD I meant CPLD/FPGA/etc
<balrog> (e.g. qflow/Qrouter)
<awygle_m> Looked at, yes. Said "Ooo cool", yes. Comprehended or used, no.
<rqou> wait a minute
<azonenberg_work> awygle_m: and yes, NP-hard means "it is impossible to create a solver that will exactly solve all instances of the problem and still runs in polynomial time"
<rqou> i just went on wikipedia and read the article about subgraph isomorphismn
<azonenberg_work> In general, you can create a solver that gives approximate solutions all of the time in polynomial time
<rqou> and it can run in linear time if the larger graph is planar
<azonenberg_work> you can usually also create a solver that gives exact solutions some of the time in polynomial time
<azonenberg_work> rqou: Which it is not going to be in the FPGA case
<rqou> "or more generally a graph of bounded expansion"
<rqou> i'm not actually sure what that means
<azonenberg_work> Also we have constraints about some nodes mapping to some subsets of other nodes, etc
<azonenberg_work> I would definitely like to explore creating an analytic placer for greenpak down the road
<rqou> intuitively i would expect that that should just make the problem easier
<azonenberg_work> it would be a fun math problem
<azonenberg_work> I just dont have time to do it right now
<azonenberg_work> Basically make an alternative to the current PAREngine class called AnalyticPAREngine or something
<azonenberg_work> that has substantially the same interface, uses the same graph classes
<azonenberg_work> but doesnt use annealing
<rqou> hmm
<azonenberg_work> awygle_m: Related issue
<azonenberg_work> Routing, in your proposed analytic PAR
<azonenberg_work> Have you looked at that?
<rqou> "we use special properties of colored graphs with multiplicity bounded by 3 to prove that 3-GI is in the deterministic-logarithmic-space complexity class L"
<awygle_m> I am attempting to design my placer to allow substantial opportunities to experiment with algorithms as well
<rqou> can we just make a device that has only three unique labels? :P :P
<azonenberg_work> Lolk
<awygle_m> azonenberg_work: some? Not nearly as much as placement.
<azonenberg_work> awygle_m: Also, are you going to make placement and routing two isolated steps?
<azonenberg_work> Or are you going to support feedback
<azonenberg_work> i.e. when you encounter a tricky situation during routing, you can go back and modify placement slightly
<shapr> seems like figure out which math problems map onto PaR can say how easy it is to distribute the problem across a bunch of computers
<azonenberg_work> shapr: it seems like the global energy minimization approach is very similar to molecular dynamics
<azonenberg_work> (except with very large numbers of atoms and strange force equations)
<awygle_m> I think the first iteration everything will be isolated but I'll try to leave room for feedback. I need to spend more time on routing generally.
<azonenberg_work> Which has well established parallel algorithms
<shapr> I work across the street from georgia tech, I could wander over and visit some of their hardware academics
<awygle_m> The traditional algorithm is basically a Dijkstra type approach, right?
<azonenberg_work> awygle_m: for what, MD? PAR?
<azonenberg_work> Routing?
<azonenberg_work> i havent actually looked into routing algorithms much
<awygle_m> Routing
<azonenberg_work> This is what i want to avoid
<awygle_m> I have the phrase "maze router" in my head and that's about it without checking my notes at home
<balrog> lol why'd it do that?
<azonenberg_work> note the placement of that one flipflop all the way off in the corner nowhere near any of the logic that it talks to
<azonenberg_work> balrog: because ISE
<azonenberg_work> because fuck you
<azonenberg_work> because xilinx
<balrog> LOL
<azonenberg_work> your choice :p
<balrog> does vivado do a better job? it doesn't work with xc6 anyway
<balrog> xilinx has refused to put out any patches for ISE at all
<balrog> makes me wonder what they'll do when Win7 hits EOL
<azonenberg_work> Say you have to use RHEL5 in a VM
<azonenberg_work> :p
<rqou> run it on linux?
<azonenberg_work> And yes, it still works on linux fine
<rqou> it doesn't even need RHEL5
<azonenberg_work> Vivado seems significantly better in some regards but i am just starting to use it seriously for a project
<rqou> it can use some ubuntu LTS
<azonenberg_work> i cant use it on antikernel yet b/c i dont have a vivado backend for splash
<azonenberg_work> (yet)
<balrog> I think I told some of you, the reason it fails on Win10 is because they use the microquill smartheap malloc replacement
<rqou> and most importantly it doesn't require an ancient kernel
<balrog> which does in-memory dll patching
<rqou> so containers work
<azonenberg_work> But i also havent touched antikernel in months b/c my research on bitstream RE is higher priority
<shapr> oh, antikernel was your thesis?
<azonenberg_work> shapr: yes
<azonenberg_work> i'm in the process of refactoring it and cleaning it up massively
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<azonenberg_work> but i have higher priority stuff going on, i'm also starting to gear up for a lab move in a couple of months
<azonenberg_work> looking at buying a house next spring
<azonenberg_work> so i dont want to get too many complex experimental setups going and then have to rip them up
<balrog> azonenberg_work: modularize :)
enriq has joined ##openfpga
<azonenberg_work> balrog: well the thing is more
<azonenberg_work> i am literally rebuilding the lab completely after the move
<awygle_m> If I recall the blog post you're fairly nosilarized already right?
pie_ has quit [Ping timeout: 240 seconds]
<awygle_m> ... Modularized wtf phone
<azonenberg_work> recabling the rack entirely, upgrading some other stuff
<azonenberg_work> in general reorganizing completely
<azonenberg_work> and before i can do THAT i am going to be doing construction to get a proper climate controlled workspace with cabinets bolted to the wall/floor (this is earthquake country and I don't want to lose all my gear if a small quake hits)
<shapr> what state?
<shapr> california?
<azonenberg_work> WA
<awygle_m> Hey azonenberg_work: thoughts on the required robustness model for PAR? I.E. How seriously should I take "node goes away in the middle of the computation"?
<azonenberg_work> you mean a compute node?
<awygle_m> Yeah
<azonenberg_work> I would say, that isnt something to worry about - detect it and abort then restart the job on a new, degraded cluster config
<azonenberg_work> For Splash i am considering this, because i am targeting long term nightly builds and stuff
<azonenberg_work> but if any single job loses a node, that job is restarted
<azonenberg_work> only the entire build is robust to loss of nodes
gnufan has joined ##openfpga
* awygle_m needs to learn about splash at some point
<azonenberg_work> i.e. you dont have to redo your whole nightly because the node running a single link step died
* azonenberg_work needs to finish writing it
<azonenberg_work> :p
<azonenberg_work> it's functional-ish but has some really annoying quirks and bugs
<azonenberg_work> And i never fully implemented some really-would-be-nice features
<azonenberg_work> That has, along with antikernel, been on hold due to lack of time while working on this other research
<awygle_m> Mk. The current "easy mode" model is bail and re-launch. I have a theory about how to finish the current job, just slower, but I'll just log a feature request and move on.
<azonenberg_work> Yeah
<azonenberg_work> Well the issue is, you lose the state that node had
<azonenberg_work> so unless you replicate state somehow
<azonenberg_work> you no longer have enough information to finish th epar
<azonenberg_work> even with migration of work to other nodes
<shapr> I like optimistic parallelism with software transactional memory
<awygle_m> In the current design state is gratuitously replicated :-P I'm excited to see how this turns out, it'll either be really great or absolutely terrible
<shapr> that lets you fire off a bunch of tasks, even duplicates, and the first one to return commits a transactional change
<shapr> latecomers drop their changes and pick up a new task
<azonenberg_work> Problem is, this leads to massive wasted effort
<azonenberg_work> its ok for producer-consumer, and this is basically how bitcoin etc work
<shapr> it can, you end up needing to tune the number of workers vs separate chunks of state
<azonenberg_work> but we all know how energy efficient blockchains are :p
<shapr> not very?
<azonenberg_work> Exactly
<azonenberg_work> again, we're dreaming of a PAR that scales hugely
<shapr> STM is extremely compositional
<shapr> unlike C programs that use threads
* shapr shrugs
<awygle_m> Have we had our monthly visitor asking about mining Etherium for September yet?
<shapr> on the other hand, a completed and working PaR that can build bitstreams in significantly less time *will* become wildly popular
<shapr> even if it's inefficient, a shorter feedback loop means more money for the company
<azonenberg_work> shapr: Yes, that is the goal
<azonenberg_work> Rapid design closure
<azonenberg_work> Enable test-driven development, continuous integration, etc with hardware-in-loop elements
<shapr> I try to keep my feedback loops less than two seconds, I doubt I'd survive designing FPGAs
<shapr> right!
<azonenberg_work> So if i have to spin up 128 m4.16xlarge instances (8192 cores, about $400/hr)
<azonenberg_work> and get a PAR done in 10 seconds
<azonenberg_work> it just cost me $1 to do that build
<awygle_m> The advantage of the "naive" approach is it keeps the common case fast
<shapr> one of the companies I worked at would have LOVED to have that when we had some serious issue
<shapr> I used to work at digium, the T1 and E1 cards were all xilinx
<azonenberg_work> i mean realistically we dont need 10 second builds
<azonenberg_work> a minute or two would be fine
<azonenberg_work> what's not fine is half a day to test a change in hardware
<azonenberg_work> Or multiple days
<azonenberg_work> So being able to parallelize by even a factor of 16 or 32 beyond the current state of the art would be awesome
<azonenberg_work> But i hope to get better
<azonenberg_work> awygle_m: on that topic...
<azonenberg_work> At some point i want to explore multithreading in Yosys
<awygle_m> Yeah despite my earlier comment about goalpost moving I'm basically aimed at "do big thing fast"
<shapr> when you're losing all the phone traffic for a large multimillion dollar customer, $400 is cheap
<azonenberg_work> Once we get massively fast PAR, the obvious next step is to improve the currently non-parallel synthesis step
<azonenberg_work> shapr: exactly, but not just that
pie_ has joined ##openfpga
<awygle_m> Interesting. I was thinking about multithreading re:simulators the other day (a colleague is running a 14ms sim which takes nearly two hours). Is Yosys slow enough to benefit?
<azonenberg_work> You spent $1 on a compile that, assuming linear scaling, would have taken 22 minutes
<azonenberg_work> How much is your developer worth per hour?
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<shapr> I'm cheap, probably $60 an hour
<azonenberg_work> Still, $60/hr means 20 minutes is about $20
<shapr> yeah, exactly
<azonenberg_work> you saved $20 of engineering time with $1 of CPU time
<azonenberg_work> scale that over a year and multiple devs
<azonenberg_work> Anyway, that's the vision
<shapr> some of my friends cost $200 an hour for their employer
<azonenberg_work> We're years from achieving it :)
* azonenberg_work is probably more
<azonenberg_work> i know we charge clients more than tht
<shapr> I like that vision.
<awygle_m> Oh BTW azonenberg, current PAR work is GPLv3. Is that an issue for you?
<awygle_m> Obviously it can be re licensed as long as it's just me, but I would prefer something copyleft..
<azonenberg_work> awygle_m: So my general approach is permissive licensing for as much as possible, then copyleft when i really really need it
<azonenberg_work> So for example a BSD par engine with a chip specific copyleft front end
<azonenberg_work> Also, I would prefer LGPL to GPL
<azonenberg_work> rationale is, it keeps the code you wrote open source while still facilitating integration with commercial tool suites if a vendor adopts it
<azonenberg_work> i'm not trying to use free software as a weapon to kill closed source, proprietary software isn't going to go away
<azonenberg_work> Keep the open code open and let it be useful to as many people as possible
<awygle_m> I don't have a problem with LGPL
<azonenberg_work> GPL means that, say, you can't have someone making a closed source GUI that links to our PAR
<azonenberg_work> Which significnatly reduces the utility to the EDA industry as a whole IMO
<azonenberg_work> LGPL just means they have to give out source to their patches to our tool, and allow us to replace the binary with another one of our choice if we want
<azonenberg_work> Which keeps the open component open while being generally more useful
<awygle_m> Yeah, agreed. I was thinking of it as a UNIX-style text-in-text-out tool originally, in which case it wouldn't matter, but the design has changed since then. LGPL is a good choice.
enriq has joined ##openfpga
<azonenberg_work> yeah thats what gp4par is using
<azonenberg_work> because i wanted to allow e.g. silego to add a text editor to their design studio
<azonenberg_work> and click a button to build with gp4par then export a schematic to their UI
<azonenberg_work> That is much easier if you can link to the app and poke all the data structures
<azonenberg_work> vs exporting to a file and re-parsing it
<azonenberg_work> etc
<awygle_m> Mhm
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
enriq has joined ##openfpga
<awygle_m> Oh good lord you can output a "raw bitstream" file from Lattice's IDE. It contains every bit from the bitstream as an ASCII 1 or 0. Would it have killed them to use hex?
<azonenberg_work> loool
<azonenberg_work> to be fair
<azonenberg_work> this is what greenpak bitstreams look like
<azonenberg_work> and coolrunner
<azonenberg_work> but those are much smaller chips
digshadow has joined ##openfpga
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
pie_ has quit [Ping timeout: 248 seconds]
<rqou> arrgh i'm trying to do problem sets involving rotation matrices, but it's super confusing which convention everybody is using
<rqou> e.g. is the matrix mapping from local->global or global->local? left-multiply or right-multiply?
<rqou> add on top of this those weird ML people that use row vectors
ZipCPU|Laptop has joined ##openfpga
m_t has quit [Quit: Leaving]
<shapr> machine learning? or ML proglang?
m_t has joined ##openfpga
<awygle_m> Yeah, this bit file is just one enormous write to configuration RAM lol. So not so useful. But still fun!
<awygle_m> Between LPF, Floorplan View, and Physical View, it really shouldn't be too hard to RE this chip...
enriq__ has joined ##openfpga
enriq__ has quit [Ping timeout: 260 seconds]
enriq__ has joined ##openfpga
DocScrutinizer05 is now known as influenca
influenca is now known as influencar
digshadow has quit [Ping timeout: 248 seconds]
influencar is now known as DocScrutinizer05
enriq__ has quit [Ping timeout: 248 seconds]
digshadow has joined ##openfpga
teepee has quit [Ping timeout: 252 seconds]
ZipCPU|Laptop has quit [Ping timeout: 248 seconds]
teepee has joined ##openfpga
azonenberg_work has quit [Ping timeout: 248 seconds]
azonenberg_work has joined ##openfpga
enriq has joined ##openfpga
GenTooMan has joined ##openfpga
enriq has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
awygle_m has quit [Remote host closed the connection]
awygle_m has joined ##openfpga
awygle_m has quit [Read error: Connection reset by peer]
awygle_m has joined ##openfpga
enriq has joined ##openfpga
m_t has quit [Quit: Leaving]
pie_ has joined ##openfpga