##openfpga on 2018-10-20 — irc logs at freenode.irclog.whitequark.org

00:00 <kc8apf> Lustre is similar but can only do a filesystem

00:00 <kc8apf> I wanted a mix of CephFS, iSCSI volumes, and S3

00:00 <azonenberg_work> yeah all i need is filesystems for now

00:00 <azonenberg_work> and i can't see needing anything else any time soon

00:01 <azonenberg_work> Right now i'm running NFS just fine, but i dont like the protocol and i have a SPOF in the server

00:01 <kc8apf> for example, if a kubernetes job asks for storage, a Ceph volume is automatically provisioned

00:01 <kc8apf> I don't remember if Lustre does multi-master

00:03 <kc8apf> Ceph relies on a quorom model

00:03 <azonenberg_work> well i guess that is something to look into once i'm done with the initial lab buildout

00:04 <azonenberg_work> Short term my current nas is sufficient to get me up and running

00:04 <azonenberg_work> And i should probably have walls and power and a floor before i do too much more...

00:29 azonenberg_work has quit [Ping timeout: 245 seconds]

00:55 emeb has quit [Quit: Leaving.]

01:27 <openfpga-github> [Glasgow] whitequark pushed 1 new commit to master: https://github.com/whitequark/Glasgow/commit/60cf959986bf7df92035a69de5b800376645e0a0

01:27 <openfpga-github> Glasgow/master 60cf959 whitequark: applet.jtag.pinout: also probe TRST# if pulldowns are detected....

01:37 <whitequark> ugh NFS

01:40 <travis-ci> whitequark/Glasgow#87 (master - 60cf959 : whitequark): The build has errored.

01:40 <travis-ci> Change view : https://github.com/whitequark/Glasgow/compare/4afd2fd75b78...60cf959986bf

01:40 <travis-ci> Build details : https://travis-ci.org/whitequark/Glasgow/builds/443955971

01:56 unixb0y has quit [Ping timeout: 252 seconds]

01:56 unixb0y has joined ##openfpga

02:08 <Bob_Dole> what's it take to make a pci host controller? can a risc-v and pci host controller fit on the ecp5 comfortably?

02:08 <Bob_Dole> pci is something I just want.

02:10 <whitequark> pci or pcie?

02:13 <Bob_Dole> pci, because a bridge chip is an option.

02:13 <Bob_Dole> pcie would be nice but a bridge chip is an option.

02:13 <Bob_Dole> (if needed at all.)

02:13 <whitequark> pci isn't really complex at all

02:14 <SolraBizna> plain old PCI is probably easier to implement from scratch than DDR4

02:14 <Bob_Dole> I thought it wasn't, thought an ice40 could implement it.

02:14 <whitequark> yeah

02:15 <whitequark> isn't it just address, data, strobes

02:17 <SolraBizna> plus a few interrupt lines and some control signals

02:18 Miyu has quit [Ping timeout: 244 seconds]

02:25 <Bob_Dole> but pci+risc-v+some sort of Memory Controller

02:30 <sorear> we know pcie on ecp5 is a thing bc lattice offers a core for it

02:30 <Bob_Dole> soft core yeah, I saw that

02:30 <sorear> idk if there’s a usable open pcie

02:30 <whitequark> litepcie? :P

02:31 <Bob_Dole> but I kinda want something that I have a chance in hell of getting SolraBizna to design. >.>

02:31 <Bob_Dole> and fit it all together logically.

02:31 <Bob_Dole> and I solder

02:34 <sorear> litepcie is neat

02:34 <sorear> so uhhhhhhhh

02:35 <sorear> how many person-years to an open tb endpoint

02:36 <Bob_Dole> tuberculosis?

02:37 <pie___> pci over tuberculosis

02:37 <SolraBizna> sourcing tuberculosis bacteria that are rated for operation at 33MHz is... difficult

02:38 <pie___> watchlist++

02:38 <zkms> i'm vaccinated against tuberculosis, can't say the same for thunderbolt.

02:38 <SolraBizna> the fastest I've seen were 18μHz

02:39 <SolraBizna> I'm sure Moore's Law will fix this eventually

02:39 <sorear> tb3 is extremely cursed but laptop usable Pcie on a fpga board would be fun maybe

02:40 <zkms> thats what m.2 is for ;p

02:41 <sorear> Then I’d need an external drive to boot from:p

02:52 <sorear> Also that imposes dimensional constraints

03:03 noobineer has joined ##openfpga

03:12 mumptai_ has joined ##openfpga

03:16 mumptai has quit [Ping timeout: 252 seconds]

03:39 rohitksingh has joined ##openfpga

03:43 <sorear> litepcie seems to have a lot of hardcoded 32s

04:14 rohitksingh has quit [Quit: Leaving.]

04:17 rohitksingh has joined ##openfpga

04:34 rohitksingh has quit [Quit: Leaving.]

04:34 rohitksingh has joined ##openfpga

04:43 lexano has quit [Ping timeout: 246 seconds]

04:47 lexano has joined ##openfpga

05:02 _whitelogger has joined ##openfpga

05:10 <SolraBizna> the datasheet says an iCE40-LP1k bitstream image is 32303 bytes long, but the .bin file I get from icepack is 32220 bytes long

05:10 <SolraBizna> why the discrepancy?

05:17 <sorear> the "bitstream" is a packet format which can omit or reorder packets in some cases

05:17 <sorear> i'm not familiar with the details but it's possible icestorm handles the packets slightly differently from icecube

05:17 <SolraBizna> hm...

05:18 <sorear> it is not the case that byte 3456 of the bitstream has a prior determinable meaning, because you have to parse the packet structure

05:18 <SolraBizna> so, I should still be able to just plop that .bin onto my EEPROM and have it work

05:18 <sorear> should be

05:19 <SolraBizna> guess I'll find out in 5 weeks!

05:19 <SolraBizna> (this is why I normally prefer working in software...)

05:27 <SolraBizna> (this and because I'm insanely poor)

05:27 <rqou> SolraBizna: are you using arachne? there is a known 'feature' where it generates a broken comment packet

05:27 <SolraBizna> I am

05:27 <Bob_Dole> nextpnr is the future

05:38 fseidel has quit [Ping timeout: 250 seconds]

05:38 fseidel has joined ##openfpga

05:40 Kitlith_ has quit [Ping timeout: 272 seconds]

05:41 Kitlith_ has joined ##openfpga

06:08 Bike has quit [Quit: Lost terminal]

06:09 noobineer has quit [Ping timeout: 252 seconds]

06:35 luvpon has joined ##openfpga

06:39 Kitlith_ has quit [Ping timeout: 272 seconds]

06:40 <SolraBizna> should I really have a decoupling capacitor for *every* positive/negative pair of *every* complex IC?

06:40 <sorear> you mean differential I/Os?

06:41 <whitequark> probably power

06:41 <whitequark> SolraBizna: it is not necessary to have a decoupling cap for every Vcc pin, it is just a safe guideline

06:41 <sorear> yeah but power/ground doesn't come in pairs

06:42 <whitequark> sometimes it does

06:44 <SolraBizna> coincidentally, it has on every IC in my design that I've considered needing a decoupling cap for

06:45 <whitequark> SolraBizna: usually you'd place af ootprint on every vcc/gnd pair

06:45 <whitequark> and then if you don't need them all, you don't populate

06:48 Kitlith_ has joined ##openfpga

06:53 <sorear> current thought: given a $5 fpga with 99 GND and 42 total VCC, is it possible to use without spending well over $5 in passives

06:53 <sorear> vcc+vccaux+vccio

06:54 <whitequark> sure

06:54 <whitequark> you won't even be able to fit them meaningfully

06:54 <whitequark> follow the mfgr guidelines

06:54 azonenberg_work has joined ##openfpga

06:55 <SolraBizna> half of my PCB is going to end up being just footprints for caps

06:56 <sensille> and when the fpga ends up only needing 100mA, what should all those caps be good for?

06:58 <sorear> well if I'm using ~half of the 197 user I/Os as 800 MT/s DDR outputs, there'll probably be quite a bit of noise current from that

07:03 <TD-Linux> the point of decoupling caps is to be close

07:03 <TD-Linux> if you pack so many in that some get pushed further away, the far ones are useless

07:08 <SolraBizna> Since I have dedicated power and ground planes, do I need a dedicated trace from each end of the decoupling cap to the corresponding pins on the IC, or is it enough to connect things to the planes (as long as the vias are close)?

07:09 <SolraBizna> (Having dedicated planes is something very new to me)

07:21 <azonenberg_work> SolraBizna: i generally run vias directly from the bga dogbone to the plane

07:21 <azonenberg_work> Then i put the cap tangent to the vias on the back of the PCB

07:21 <azonenberg_work> forming a []B shape

07:22 <SolraBizna> oh... right... because once I have a via, I have a via

07:22 <azonenberg_work> o[----]o mounting of caps has higher inductance which hurts high frequency performance

07:22 <azonenberg_work> 8[]8 is even better but is overkill for most applications

07:22 <azonenberg_work> []8 is good enough most of the time

07:22 <azonenberg_work> if you have two sets of vias just use two caps :p

07:23 <azonenberg_work> What FPGA are you using btw?

07:23 <SolraBizna> an ICE40 LP1k for this test board

07:23 <azonenberg_work> And what are you doing that needs so much ddr io?

07:23 <SolraBizna> it's not DDR IO, it was supply

07:23 <azonenberg_work> oh wait that was sorear

07:24 <azonenberg_work> sorry

07:24 <azonenberg_work> SolraBizna: anyway, in general you're best off having only the smallest (0402 or similar) caps under the fpga

07:24 <SolraBizna> (I was wondering the same thing about sorear's project though)

07:24 <azonenberg_work> the 0603-esque stuff is targeting lower frequency ranges so it can be moved further away

07:24 <azonenberg_work> Typically i put them very close to but not under the fpga

07:24 <azonenberg_work> then bigger caps can go almost anywhere

07:26 <SolraBizna> so, aiming to have a tiny cap for each supply pin, a not-as-tiny cap for each IC, and a big cap for the board is a good way to go?

07:26 <sorear> azonenberg_work: still going through the details on "how to get as much bandwidth as possible between N ecp5s in close proximity"

07:27 <azonenberg_work> sorear: what are you using the cluster for?

07:27 <azonenberg_work> SolraBizna: It depends on the fpga, read decoupling recommendations if the vendor has them

07:27 <azonenberg_work> Xilinx has optimized decoupling recommendations that don't require a cap on every pin

07:27 <sorear> weird hpc ideas

07:27 <azonenberg_work> sorear: Lol

07:27 <azonenberg_work> Any particular problem domains?

07:28 <sorear> cryptographic mostly

07:28 <azonenberg_work> machine learning? /me ducks incoming hype storm

07:28 <azonenberg_work> ooh rsa factorization?

07:28 <sorear> computing GB-sized FFTs over GF(2^255), etc

07:28 <azonenberg_work> Gigabyte sized FFTs?

07:28 <sorear> yes.

07:29 * azonenberg_work tries to think of what that's good for

07:29 <azonenberg_work> is that for ECC stuff?

07:29 <azonenberg_work> my ECC-fu is weak

07:29 <sorear> non-ECC zero-knowledge stuff

07:30 <azonenberg_work> Either way sounds interesting

07:30 <azonenberg_work> i would love to have somebody make a Deep Crack equivalent that can factorize rsa keys

07:30 <sorear> but at this point it's more of a "motivating example" than a "design target"

07:31 <azonenberg_work> How far do you think we are from a public break of rsa-1024?

07:31 <azonenberg_work> or a 1024-bit DH group precomputation?

07:31 <sorear> i don't think this machine will be the most cost-effective way to attack rsa

07:31 <azonenberg_work> (I assume TLAs have been doing it for years but it's never been publicly demoed)

07:32 <sorear> for attacking RSA with state of the art algorithms, you want something with a lot of ~200 bit adders/multipliers, which you can build with FPGAs but GPUs are probably a better bet

07:33 <azonenberg_work> Hmm

07:33 <azonenberg_work> so is this just for tinkering then?

07:33 <azonenberg_work> or did you have a problem in mind that FPGA would be effective for

07:34 <sorear> see above "non-ECC zero-knowledge stuff"

07:34 <sorear> i have no idea why you brought up rsa

07:36 <azonenberg_work> i thought i remembered there being fun FFT based algorithms for breaking RSA

07:36 <azonenberg_work> but that's way beyond my level of cryptographic knowledge

07:36 * azonenberg_work will laugh if this thing ends up just being used to mind bitcoins

07:37 <azonenberg_work> mine*

07:37 <SolraBizna> now I'm just trying to figure out what's what in "[]B"

07:37 <azonenberg_work> SolraBizna: Vias to the side of the cap footprints

07:37 <sorear> i mean it won't *just* be used to mine bitcoins but if I build it that's probably what it will be doing when I run out of project ideas

07:38 <azonenberg_work> [] is cap, B/8 is the vias

07:38 <SolraBizna> [ and ] are the ends?

07:38 <azonenberg_work> https://www.xilinx.com/support/documentation/user_guides/ug483_7Series_PCB.pdf figure 2-1

07:38 <azonenberg_work> illustrates better than i can do in ascii art

07:38 <sorear> assuming there is at least one coin where doing so is marginally profitable (the machine exists, but it needs to be powered)

07:39 <azonenberg_work> sorear: the only time i've ever mined anything was dogecoins, and it was just to stress-test a flaky machine

07:39 <azonenberg_work> SolraBizna: I normally do option C

07:39 <SolraBizna> [ and ] are the sides

07:39 <azonenberg_work> Yeah

07:39 <SolraBizna> got it now

07:39 <azonenberg_work> except i make it even closer, so the via disks are tangent to the cap pads

07:40 <azonenberg_work> then the trace just fills in the gaps

07:40 <sorear> (heating a house with a mining rig uses ~2.5x as much primary energy as heating a house with oil or a heat pump, as a consequence of Carnot, it ain't free)

07:40 <azonenberg_work> sorear: it beats a resistive heater, though

07:40 <azonenberg_work> If you live in a location that has cheap electricity and is too cold to use a heat pump

07:40 <azonenberg_work> some folks in Scandinavia have done that iirc

07:41 <sorear> the specific house that I am in is heated by an oil burner, which turns 100% of the energy content of the oil into house heat

07:42 <azonenberg_work> not 100%

07:42 <azonenberg_work> Some is lost out the chimney

07:42 <azonenberg_work> But a high fraction

07:42 <azonenberg_work> Combustion heat is only 100% efficient transfer in a closed system where you don't vent the exhaust anywhere

07:43 <SolraBizna> breathing the exhaust would increase the efficiency of the heater

07:43 <azonenberg_work> SolraBizna: exactly

07:44 <azonenberg_work> You obviously run the exhaust through a heat exchanger but you cant get 100% of the combustion energy absorbed

07:45 <SolraBizna> I can't believe I resisted HDLs for so long

07:45 <SolraBizna> I blame video games

07:46 <SolraBizna> When Bob_Dole dragged me kicking and screaming into the world of FPGAs, I seriously considered manually working out the logic

07:47 <azonenberg_work> lolol

07:47 <azonenberg_work> meanwhile here i am thinking of doing pcb design in HDL

07:47 <azonenberg_work> So i dont have to ever see a schematic again

07:47 <SolraBizna> dooo eeet

07:47 <azonenberg_work> structural verilog description of a PCB (including generate loops, etc)

07:47 <azonenberg_work> synthesized to a kicad netlist

07:47 <azonenberg_work> import to pcbnew and go to town

07:48 <azonenberg_work> and it's nontrivial when it comes time to do things like figure out refdes for PCB elements

07:48 <azonenberg_work> since long hierarchial hdl instance names dont map well to silkscreen

07:48 <azonenberg_work> I have done small scale PoC's

07:48 <azonenberg_work> i designed a verilog IP for a LTC3374-based buck converter

07:48 <SolraBizna> use sorear's giant ECP5 array to run a machine learning algorithm to make better names

07:49 <azonenberg_work> :p

07:49 <azonenberg_work> And i actually made a pic12 based board using an early draft of the flow

07:49 <azonenberg_work> ERC/DRC is nontrivial too

07:49 <azonenberg_work> i wanted to add a lot of metadata to the component designs, but it would have massively increased complexity of creating a part

07:49 <azonenberg_work> Things like doing Vih/Vil sanity checks on all digital connections

07:49 <azonenberg_work> Making sure Vdd for a part is within safe limits

07:51 <azonenberg_work> Doing all of the engineering for that was a pain and i just didnt have the time with all the other stuff on my plate

07:57 gnufan has quit [Remote host closed the connection]

08:08 <sorear> finally found lattice TN1068

08:31 <azonenberg_work> sorear: so i dont entirely agree with that

08:31 <azonenberg_work> in particular modern MLCCs are such that for almost all frequency bands

08:31 <azonenberg_work> you are better off using a larger cap in a given package size

08:32 <azonenberg_work> typically 0.47 uF vs 0.1 for 0402, and 4.7 uF vs 1 uF for 0603

08:32 <SolraBizna> the research I found said that you should use the largest cap that fits the footprint and there's no advantage to smaller ones / putting multiple different ones right next to each other

08:32 <azonenberg_work> (that note is from 2004, over the past 14 years capacitor design has come quite a long way)

08:32 <azonenberg_work> SolraBizna: correct

08:32 <azonenberg_work> keep in mind voltage derating though

08:33 rohitksingh has quit [Quit: Leaving.]

08:33 <azonenberg_work> a super high cap in a small footprint may not buy you anything under DC bias

08:33 <azonenberg_work> My last research indicated 0.47 and 4.7 X*R were the sweet spots

08:33 <azonenberg_work> for typical FPGA power rails

08:34 <azonenberg_work> The xilinx decoupling guidelines are well written and reasoned

08:34 <azonenberg_work> (Just don't pull exact numbers of caps out for other chips obviously)

08:39 <sorear> device if built would have roughly 500K supply pins to decouple, so minimizing the total cost of capacitors is a consideration

08:40 <SolraBizna> o_o

08:40 <azonenberg_work> sorear: how many ecp5s are you planning to use?

08:40 <azonenberg_work> and how many logic cells each?

08:41 <azonenberg_work> And have you thought about physical form factor yet?

08:43 <sorear> nominally 10,000 x 25K each ($50k in FPGA parts)

08:45 <sorear> current thought on physical form factor is "2-3 square meters of PCB, split between [TBD] boards in a roughly cubical box"

08:46 <sorear> since people don't make 3 square meter PCBs, splitting is necessary, but the details are mostly tbd

08:46 <azonenberg_work> I would rack mount it personally

08:47 <azonenberg_work> also do you have $50K to spend on this? :p

08:47 <azonenberg_work> Also, what does the price per LUT come out to for the FPGAs?

08:48 <sorear> $.0002/LUT4

08:48 <azonenberg_work> So 500 LUT4/$?

08:48 <sorear> 5000

08:48 <azonenberg_work> oops, missed a zero

08:49 <sorear> 250M total, for about the BOM cost of the biggest us+ qty 1

08:50 <azonenberg_work> comparing... xc7a100t is $109 on digikey, 15850 slices of 4 LUT6s or 101,440 logic cells by Xilinx's marketing numbers

08:50 <azonenberg_work> Which comes out to about 1000 LUT4/$

08:50 <sorear> for a fair comparison, count unofficial capacity, because I am

08:51 <azonenberg_work> you mean using fused chips to full capacity?

08:51 <sorear> yes

08:51 <azonenberg_work> The xc7a75t is $92.61 for 101440 LCs or 1095 LC/$

08:52 <azonenberg_work> xc7a15t is $27.93 for 52160 LCs or 1867 LC/$

08:52 <azonenberg_work> But you also have to consider the xilinx parts probably clock faster and have more block ram etc

08:53 <azonenberg_work> also i doubt a 7a100t needs four times the caps of a 25k ecp5

08:53 <azonenberg_work> and certainly not 4x the PCB real estate

08:53 <sensille> "fused" chips?

08:53 <azonenberg_work> unless f/oss tools NOW (vs soon) are a priority, 7 series is probably worth considering on that metric alone

08:53 <sorear> right, clock is a complication i know about but haven't attempted to control for in any way

08:53 <azonenberg_work> sorear: i would just make all the links source synchronous

08:53 <azonenberg_work> dont even attempt a global clock

08:54 <azonenberg_work> put oscillators and buffers every few fpgas

08:54 <sorear> it's much less interesting for xilinx because *other people have done xilinx*

08:54 <azonenberg_work> sorear: also consider that no matter what fpga you use

08:54 <azonenberg_work> if you are buying five-digit volumes the price will come waaaay odwn

08:54 <azonenberg_work> So 10K $5 chips may cost you $15K or something, not $50K

08:55 <SolraBizna> making a high-speed clock sync across a 1.5x1.5x1.5 cube would be ... hard

08:55 <azonenberg_work> Also what network topology did you have in mind?

08:55 <azonenberg_work> Also consider thermal dissipation... I would not build it as a cube

08:55 <sorear> azonenberg_work: nearest neighbors only

08:55 <azonenberg_work> sure but how many dimensions?

08:55 <azonenberg_work> 2D? 3D? 4D?

08:56 <azonenberg_work> My recommendation would be vertically mounted blades in rack mounted modules of some sort

08:56 <sensille> are you still pondering the 10k-chip-array?

08:56 <azonenberg_work> a fan tray every couple units of blades

08:56 <sensille> what is the application?

08:56 <SolraBizna> immerse the whole thing

08:57 <azonenberg_work> i designed a smaller scale version of this (two backplanes side by side in 3U with a 48->12V DC power supply, two ethernet switches, two management cards, and 16 compute nodes)_

08:57 <sorear> sorry, how is the backplane oriented relative to the rack?

08:58 <azonenberg_work> sorear: normal of the backplane points to the front of the rack

08:58 <azonenberg_work> compute blades are vertical and plug into the backplane

08:58 <azonenberg_work> 21 blades in 3U

08:58 <sorear> so the backplane is a skinny rectangle

08:59 <azonenberg_work> Yeah

08:59 <azonenberg_work> 3U x 160mm eurocard for mfactor

08:59 <azonenberg_work> My design called for one VRM blade and two 10-card backplanes side by side

08:59 <azonenberg_work> just so the pcbs would be smaller and easier to work with

09:00 <azonenberg_work> each backplane had 8 compute nodes, a management card, and 13-port 1/10G ethernet switch (9 gig ports to the other cards on the backplane, one 10G front panel port, and three 1G front panel ports because i had transceivers left over)

09:00 <azonenberg_work> i did most of the pcb for that and finished the backplane pcb desing but never made either of them

09:00 <azonenberg_work> i did some mechanical mockups of the architecture though to confirm the things fit

09:06 <sorear> sensille: yes; mostly a design exercise, a bit of "I would use this for fiddling with algorithms"

09:07 <sensille> i imagine it will be very hard to beat a big xilinx device with this, really needs to be a specialized algorithm

09:09 <sorear> there was a specific thing I was fiddling with last year that doesn't fit in interal memory on an xcvu9p and is badly bottlenecked on I/O if you try to do it with DDR4 (and is similarly bottlenecked on CLMUL units on SKL)

09:09 <sensille> so you want tons of DDR3 instead?

09:10 <sorear> yes

09:10 <azonenberg_work> doesnt fit in a vu9p??

09:10 <sensille> like... monero?

09:10 <azonenberg_work> oh dear

09:10 <sorear> makes 3 passes over about 128GB of data

09:11 <sensille> linearly?

09:11 <sorear> the vu9p only has 90MB of SRAM

09:11 <sorear> not quite, but close to

09:11 <sensille> rmw cycles?

09:11 <sorear> yes

09:12 <sensille> or reading from one DRAM and writing to another?

09:12 <azonenberg_work> hbm?

09:12 <sensille> keeping the flow in one direction would be great

09:12 <azonenberg_work> also predictable

09:13 <azonenberg_work> optimized ddr controller that prefetches

09:13 <azonenberg_work> or hard to tell in advance?

09:13 <sensille> or read a good chunk into an internal cache and alternate big read/write chunks

09:13 <azonenberg_work> that too

09:14 <sensille> and do it fast enough so you don't need refresh cycles :)

09:14 <sorear> don't make me remember the details of the memory access pattern

09:15 <azonenberg_work> sorear: lol well that is very important if you're memory bound

09:15 <azonenberg_work> something like, say, doing convolutions on a large 2D array is an optimal case that's very easy to tweak things for (I did a lot of GPU tuning for such things)

09:16 <travis-ci> whitequark/Glasgow#88 (master - 60cf959 : whitequark): The build has errored.

09:16 <travis-ci> Change view : https://github.com/whitequark/Glasgow/compare/4afd2fd75b78ce8fc960b6f68c19fd472c85565e...60cf959986bf7df92035a69de5b800376645e0a0

09:16 <travis-ci> Build details : https://travis-ci.org/whitequark/Glasgow/builds/444024474

09:19 <sorear> azonenberg_work: i trust myself that I spent a lot of time optimizing the memory access pattern and couldn't get it under a minute with the four x72 DDR4 interfaces on aws f1, I'm only asking that you do the same

09:20 <sorear> i think the problem might have been that the FFT has a working set larger than the 90MB internal memory (it's the Gao-Mateer "additive FFT" over a finite field, which works with some but not all of the standard FFT cache optimization techniques)

09:21 <sorear> anyway I would like to stress that THIS IS NOT A YAK SHAVE

09:22 <sorear> i'm designing it because it's there, the fact that my project last year would have used it is non-causal

09:23 <sensille> (sorry)

09:25 <azonenberg_work> sorear: clearly you need an XCVU9000P

09:25 <azonenberg_work> with 90 GB of block RAM

09:26 <sorear> unfortunately I do not have $50MM

09:26 <azonenberg_work> The die is round so it fits in a 12" wafer and is in FFG65536 package

09:27 <azonenberg_work> i dont even want to speculate what yield on a die like that would be like :p

09:27 <azonenberg_work> oh and you better have half a petabyte of RAM to run P&R for it...

09:30 <sorear> you're obviously not using this for mass production, so you just accept that each die is unique and ship it with a p&r database

09:33 <azonenberg_work> lolol

09:35 <sorear> (@jangray posted a weirdly perspective photo of an xcvu9p which caused me to spend most of 2016 thinking it *was* a wafer-sized chip and interposers were magic)

09:37 <azonenberg_work> where?

09:37 * azonenberg_work isnt in the mood to search his entire tweet stream

09:37 <azonenberg_work> also fwiw if you were going to make such a big chip

09:38 <azonenberg_work> what you'd probably do is fill like a 12" wafer with a giant interposer

09:38 <azonenberg_work> Then put known-good xcvu+ logic dies onto it

09:38 <azonenberg_work> That way your P&R db only has to handle the occasional SLL that doesn't work

09:38 <azonenberg_work> And routing within each xcvu+ module is normal

09:42 <sorear> ok my timing is a bit off

09:42 <sorear> https://twitter.com/jangray/status/816367554671222784

09:42 <sorear> i have handled the board on the lower left

09:42 <sorear> on the top board, the heatsink and fan look by perspective to be about a square foot

09:42 m_t has joined ##openfpga

09:46 <azonenberg_work> The VCU118 heatsink is large

09:47 <azonenberg_work> But it isn't that big

09:48 <azonenberg_work> it looks to be about the size of the pcie x16 connector?

09:51 luvpon has quit [Ping timeout: 252 seconds]

09:54 <azonenberg_work> Which is 89 mm according to the pcie spec

09:54 <azonenberg_work> or a 3.5 x 3.5 inch heatsink

09:54 <azonenberg_work> roughly "beefy x86 CPU" sized heatsink iirc

09:54 <azonenberg_work> i've been around vcu118s but dont have one in front of me right now

09:55 <azonenberg_work> atm i'm working on a puny little ac701 :p

09:59 Prf_Jakob has quit [Quit: Spoon!]

09:59 Prf_Jakob has joined ##openfpga

10:57 ayjay_t has quit [Read error: Connection reset by peer]

10:57 ayjay_t has joined ##openfpga

11:03 <whitequark> azonenberg_work: how do you feel about 3.5" floppies

11:03 <whitequark> i wonder how much you can stuff on one with say 128b/130b instead of the braindead MFM encoding and also using some proper ECC

11:04 <whitequark> unfortunately shingled recording isn't going to happen because of erase heads...

11:18 rohitksingh has joined ##openfpga

11:31 mmicko has joined ##openfpga

11:31 mmicko has quit [Quit: leaving]

11:31 rqou has quit [Remote host closed the connection]

11:32 rqou has joined ##openfpga

11:56 genii has joined ##openfpga

12:02 rohitksingh has quit [Ping timeout: 276 seconds]

12:13 <gruetzkopf> ooh

12:13 <gruetzkopf> scaling that to LS120 disks..

12:50 Bike has joined ##openfpga

14:00 genii has quit [Read error: Connection reset by peer]

14:08 rohitksingh has joined ##openfpga

14:13 wpwrak has quit [Quit: Leaving]

14:17 wpwrak has joined ##openfpga

14:27 <azonenberg_work> whitequark: havent touched one in years and wouldn't miss it :p

14:28 <azonenberg_work> the access speed would still be super slow because the RPM is necessarily low

14:28 <azonenberg_work> due to mechanical issues

14:38 <azonenberg_work> whitequark: that being said i would laugh if you tried to write a custom FPGA-based floppy drive controller using modern tech

14:41 <whitequark> azonenberg_work: guess what am i doing right now

14:43 GenTooMan has joined ##openfpga

14:44 <whitequark> azonenberg_work: this thing uses

14:44 <whitequark> actual TTL logic

14:44 <whitequark> as in

14:44 <whitequark> pullups and open drain......

14:44 <azonenberg_work> wait

14:45 <azonenberg_work> you're implementing ECC and 128/130 in discrete ttl logic??

14:45 <azonenberg_work> how big is this board gonna be???

14:45 <whitequark> no i mean

14:45 <whitequark> the floppy interface

14:46 <whitequark> it's WEIRD

14:46 <whitequark> all signals are active low and open drain and have massive pullups and sink capability

14:46 <whitequark> like THIRTY TWO MILLIAMPS PER PIN

14:46 <whitequark> i could run SEVERAL FPGAS ON EACH PIN CURRENT ALONE

14:55 rohitksingh has quit [Quit: Leaving.]

15:01 azonenberg_work has quit [Ping timeout: 245 seconds]

15:17 rohitksingh has joined ##openfpga

15:32 m4ssi has joined ##openfpga

16:00 rohitksingh has quit [Ping timeout: 260 seconds]

16:01 Miyu has joined ##openfpga

16:03 rohitksingh has joined ##openfpga

16:20 m_t_ has joined ##openfpga

16:23 m_t has quit [Ping timeout: 245 seconds]

16:24 m_t_ has quit [Read error: Connection reset by peer]

17:34 carl0s has joined ##openfpga

17:52 m4ssi has quit [Quit: Leaving]

18:12 ZipCPU has quit [Ping timeout: 252 seconds]

18:41 kuldeep_ has quit [Read error: Connection reset by peer]

18:42 Bob_Dole has quit [Ping timeout: 250 seconds]

18:44 Bob_Dole has joined ##openfpga

18:59 <openfpga-github> [Glasgow] whitequark pushed 2 new commits to master: https://github.com/whitequark/Glasgow/compare/60cf959986bf...e82ecabb6219

18:59 <openfpga-github> Glasgow/master e82ecab whitequark: access: allow hinting reads for dramatically improved performance....

18:59 <openfpga-github> Glasgow/master aa80b03 whitequark: gateware.fx2: replace "non-streaming" FIFOs with "auto-flush" FIFOs....

19:11 <travis-ci> whitequark/Glasgow#89 (master - e82ecab : whitequark): The build has errored.

19:11 <travis-ci> Change view : https://github.com/whitequark/Glasgow/compare/60cf959986bf...e82ecabb6219

19:11 <travis-ci> Build details : https://travis-ci.org/whitequark/Glasgow/builds/444160492

19:57 Maylay has quit [Quit: Pipe Terminated]

20:02 carl0s has quit [Quit: Page closed]

20:04 Maylay has joined ##openfpga

20:13 lovepon has joined ##openfpga

21:03 rohitksingh has quit [Ping timeout: 252 seconds]

21:07 rohitksingh has joined ##openfpga

21:53 rohitksingh has quit [Quit: Leaving.]

22:10 <Bob_Dole> http://miaowgpu.org/ subset of the GCN ISA, looks like in verilog, but xilinx centric. how pissy would AMD get if you made an actual gpu with it?

22:12 <sorear> probably less so than if you did the same thing with mali

22:14 <Bob_Dole> I suppose. if it's just being used for very-low-performance-embedded type things.. it wouldn't be competing with AMD's products, but doing the same thing with mali would compete with arm's

22:15 <sorear> my impression of miaow is that it's fairly low on productization

22:16 <Bob_Dole> just not a lot of options for doing what I want: pair some gpu without an NDA to a risc-v core, and that probably means some DIY solution but doing that without being able to have a lot of reuse of some other thing is probably not going to be viable as a project

22:17 <Bob_Dole> just enough to be able to run, say, MATE on it, without being horrifyingly sluggish

22:26 rofl__ has joined ##openfpga

22:30 <sorear> ah, I was mixing up miaow with one of the other projects, miaow seems a bit more mature but not "drop it in to your project" ready

22:30 rofl_ has quit [Ping timeout: 250 seconds]

22:30 <Bob_Dole> yeah. it's got the graphics stuff stripped out, and it's xilinx centric

22:31 <Bob_Dole> and is meant for a narrow range of figuring out compute stuff.. BUT, that it has a lot done means it has some advantages to starting from scratch

22:31 <Bob_Dole> (I think.)

22:35 <sorear> what's the target environment anyway

22:38 <Bob_Dole> kind of where smarttops were at.

22:40 <Bob_Dole> I'm a bit worried by how prevalent javascript has gotten vs my last trials of lower speed cpus for that kinda role. A 400mhz UltraSPARC IIi was fast enough, with only the 8MB Rage II+DVD only supporting 800x600, and video playback being untenable being the major drawbacks for me on it then

22:40 <sorear> i have a big advantage here in that i can't stand video

22:40 <Bob_Dole> I rarely watch it

22:41 <Bob_Dole> but I had had a pentium mmx run video smoother than that system somehow..

22:42 <Bob_Dole> my pentium mmx is now Gone, parents lost the thinkpad, so I can't test that anymore, but I somehow image the UltraSPARC is a better example of how a RISC-V would turn out.

22:42 <Bob_Dole> s/image/imagine/

22:47 <Bob_Dole> but having an x86 coprocessor for handling shit that doesn't work right is An Idea.

22:50 <Bob_Dole> I should buy a new Super Skt7 mobo and see if I can't find my 500mhz K6-2 for more Testing.

22:55 mumptai_ has quit [Quit: Verlassend]

23:30 Bob_Dole has quit [Read error: No route to host]

23:32 Bob_Dole has joined ##openfpga

23:41 <SolraBizna> sometimes I want to make a GPU

23:41 <SolraBizna> then I remember that I don't really understand how modern GPUs work, and that I'm bad at math

23:41 <SolraBizna> then I don't want to anymore

23:41 <Bob_Dole> hi

23:42 <Bob_Dole> SolraBizna, look at the price of Socket 7 motherboards. look at that datasheets exist for socket7 cpus, and then at the performance of K6-2s. I think there might be a Product there.. if 66mhz is something you might consider touching.

23:43 <Bob_Dole> and I do some market-research first

23:43 <SolraBizna> it's a little bit too AC for me

23:44 <Bob_Dole> ...50?

23:44 <SolraBizna> technically the NTSC stuff was too AC for me, honestly not sure how I muddled through it

23:47 azonenberg_work has joined ##openfpga

23:50 <Bob_Dole> well, here, you have many Smart people. and you have my money.

23:51 <sorear> azonenberg_work: it occurred to me last night that while $needsaname isn't the most useful for sieving, it's great for block Lanczos, could do the matrix step of the current open RSA record (768-bit) in a couple days

23:52 <azonenberg_work> sorear: nice

23:52 <azonenberg_work> serious q btw... do you have any plans to actually build the thing? :p

23:52 <azonenberg_work> budget wise i mean

23:56 <sorear> if the project gets that far along and I find work, the budget is serious

23:57 <balrog> what's keeping rsa-1024 from being publicly cracked sooner? :)

23:58 <sorear> moore's law and the size of academic budgets

23:59 <sorear> the sha1 break was a bit of an anomaly, that would have been about $1M without google's subsidy

23:59 <sorear> google *probably could* demonstrate a rsa1024 factorization now but they haven't, why?