#m-labs on 2016-03-28 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:08 <GitHub18> [artiq] whitequark pushed 3 new commits to master: https://git.io/vVJEN

00:08 <GitHub18> artiq/master f4e6b18 whitequark: compiler: implement kernel constant attributes....

00:08 <GitHub18> artiq/master ca7463a whitequark: compiler: do not write back kernel constant attributes....

00:08 <GitHub18> artiq/master 507ad96 whitequark: coredevice: add some kernel_constant_attributes specifications.

00:18 <bb-m-labs> build #248 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/248

00:20 <bb-m-labs> build #495 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/495 blamelist: whitequark <whitequark@whitequark.org>

00:39 <whitequark> sb0: should I enable -ffast-math mode for FP operations emitted by ARTIQ?

00:39 <whitequark> without this e.g. LLVM refuses to optimize the result of x*0.0 to zero

00:44 <GitHub167> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVJa4

00:44 <GitHub167> artiq/master 418f0a5 whitequark: compiler: mark loads of kernel constant attributes as load invariant....

00:55 <bb-m-labs> build #249 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/249

01:00 <bb-m-labs> build #70 of artiq-win64-test is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/70

01:02 <bb-m-labs> build #496 of artiq is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/496

01:19 sandeepkr_ has quit [Ping timeout: 264 seconds]

01:43 _rht has joined #m-labs

01:49 fengling has joined #m-labs

02:05 kuldeep has quit [Ping timeout: 252 seconds]

02:20 ylamarre has joined #m-labs

02:22 kuldeep has joined #m-labs

02:25 ylamarre has quit [Client Quit]

02:46 <whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all

02:46 <bb-m-labs> build #21 forced

02:46 <bb-m-labs> I'll give a shout when the build finishes

02:48 <bb-m-labs> build #112 of conda-lin64 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/112

02:48 <bb-m-labs> build #54 of conda-win32 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-win32/builds/54

02:48 <bb-m-labs> build #85 of conda-win64 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-win64/builds/85

02:48 <bb-m-labs> build #21 of conda-all is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/21

02:50 <whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all

02:50 <bb-m-labs> build #22 forced

02:50 <bb-m-labs> I'll give a shout when the build finishes

02:50 <GitHub131> [conda-recipes] whitequark pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/c51a7fa674379e86d438032d8dc700b8796a9d4d

02:50 <GitHub131> conda-recipes/master c51a7fa whitequark: llvmlite-artiq: bump.

02:51 <bb-m-labs> build #113 of conda-lin64 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/113

02:53 <GitHub20> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVJ1D

02:53 <GitHub20> artiq/master 1d8b0d4 whitequark: compiler: mark FFI functions as ModRef=Ref using TBAA metadata....

02:54 <bb-m-labs> build #55 of conda-win32 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-win32/builds/55

02:54 <bb-m-labs> build #86 of conda-win64 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-win64/builds/86

02:54 <bb-m-labs> build #22 of conda-all is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/22

02:55 <sb0> whitequark, yes? why not?

02:56 <whitequark> sb0: then if you get a NaN or +Inf, the results can be unpredictable

02:56 <whitequark> or -Inf or -0

02:57 <whitequark> moreover (this is a separate flag) LLVM will also do algebraically equivalent transformations such as reassociation, which can dramatically change precision in some cases

02:57 <whitequark> but this wins us almost a 2x gain on PulseRateDDS...

02:57 <sb0> how fast is it now?

02:57 <whitequark> 182us per batch of two writes

02:58 <whitequark> will be 100us

02:59 <sb0> Unfortunately this isn't feasible because our attribute writeback machinery allows outside code to grab a pointer to any object in the graph, which is exactly what it's ought to do

03:00 <sb0> but this writeback code only reads the objects, no?

03:01 <whitequark> no way to tell LLVM that.

03:02 <whitequark> besides it doesn't really do constant *propagation* through globals

03:02 <sb0> and if you don't tell it anything?

03:02 <whitequark> instead it pulls in the entire global and replace its value

03:02 <whitequark> it doesn't do anything.

03:02 <whitequark> since it assumes there can be writes

03:02 <sb0> yes, but if you get the objects without telling LLVM about it?

03:03 <whitequark> impossible

03:04 <whitequark> if I don't tell LLVM about objects, it will mangle them beyond recognition. in fact there won't *be* any objects

03:05 <sb0> then disable attribute writeback.

03:05 <bb-m-labs> build #250 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/250

03:05 <whitequark> completely?

03:06 <sb0> maybe leave the code there, activable with some flag, if that's easy

03:06 <sb0> otherwise yes

03:07 <sb0> my version of attribute writeback worked by adding a bunch of RPCs that read the values at the end of the code

03:07 <sb0> would such an implementation also cause the problem?

03:10 <bb-m-labs> build #497 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/497 blamelist: whitequark <whitequark@whitequark.org>

03:12 <whitequark> sb0: partly

03:13 <whitequark> if all constant fields, including those from setattr_device, are actually marked as constant, then that implementation shouldn't present any problems

03:13 <whitequark> we'll also need to bring in the concept of a fire-and-forget RPC back... hrm

03:14 <whitequark> and, compared to the current situation, it will inflate code size

04:35 <sb0> whitequark, maybe disable writeback and have only explicit host attribute writes. setattr(self, "name", value) as RPC...

04:42 <whitequark> sb0: ok. without attribute writeback that dds batch takes 41us

04:42 <whitequark> still not quite as fast as it could be due to some LLVM FP silliness

04:43 <whitequark> without that silliness it would be 27us

04:53 <whitequark> sb0: if we ported the OR1K backend to LLVM 3.6 then we could keep attribute writeback.

04:54 <whitequark> there is less than a dozen changes to the backend interfaces between 3.5 and 3.6 and none of them are functional

04:54 <whitequark> so this should take less than a day

04:58 <whitequark> actually, we could probably go right to 3.9 without much hassle.

04:58 <whitequark> well, 3.8, last released one...

04:58 <whitequark> this also has the advantage that we can use upstream llvmlite

05:07 rohitksingh has joined #m-labs

05:31 <sb0> 27us for a 2x batch? that's pretty good

05:34 rohitksingh has quit [Ping timeout: 260 seconds]

05:40 <sb0> whitequark, ok, try going for the llvm upgrade. but do not break 1.0.

06:05 rohitksingh has joined #m-labs

07:08 rohitksingh has quit [Quit: Leaving.]

07:36 sandeepkr_ has joined #m-labs

08:39 <sb0> whitequark, so the square wave minimum period is still 1.34us?

08:39 <sb0> wasn't it 1us at some point?

08:44 <sb0> whitequark, the time I get per batch of two DDS writes is 300us, not 182

08:44 <sb0> artiq 1.0rc1+46.g1d8b0d4

09:01 sandeepkr_ has quit [Ping timeout: 244 seconds]

09:10 <rjo> sb0: ~1.3 µs is what i remember it being for a long time.

09:11 <sb0> rjo, ok.

09:12 <sb0> btw the problem of getting a conservative number of free entries in a async FIFO is interesting, as it can be used to optimize the DRTIO protocol

09:13 <sb0> maintain a local copy of the number of free entries, when it is >0, write blindly, otherwise ask the remote side for entries

09:14 <sb0> underflows and other errors can be detected locally (similar as they are now) since there will be time sync with the remote

09:14 <sb0> I think DRTIO should be a pretty separate core design, except for RTLink... it won't share much code with the current one

09:15 <larsc> just like a async fifo

09:16 <sb0> yeah, that's the basic idea, but the implementation is very different

09:17 sandeepkr_ has joined #m-labs

09:21 <sb0> rjo, it seems the spinboxes absolute min/max values can be exceeded when dragging the sliders

09:21 <sb0> and then things go out of sync if you touch the spinboxes and then the sliders

09:22 <rjo> ah yes. sounds possible. could you file a bug so that i don't forget?

09:24 <rjo> sb0: could you add unary minus support to value_bits_sign()? i don't know where to dig to determine the correct behavior in that case.

09:24 <rjo> does a unary minus actually change the signedness?

09:26 <rjo> sb0: DRTIO: but this fire-and-forget way of doing writes only works for the output and does not work for the other errors that can only be detected at the phy, right?

09:26 <sb0> yes, overflow and busy - same problem as before...

09:26 <sb0> I'll look into that. are you working on the JESD204 signal generator?

09:27 <rjo> a bit. yes. i did some sketches and some math on what it can conceivably do.

09:29 <rjo> from the design i can pretty much reverse engineer what AD does inside the DDSes and why certain things are as they are...

09:35 <rjo> sb0: and there is a nasty bug in the simulator with Mux() and signals wider than one bit as the selector IIRC. but i had worked around it a while ago and i don't remember the details.

09:44 <sb0> rjo, ok, can you file issues for those things?

10:01 sb0 has quit [Quit: Leaving]

10:03 sb0 has joined #m-labs

11:49 <sb0> rjo, apparently there is some secret content on Ben's wiki, e.g. http://wiki.phys.ethz.ch/figwiki/amc_1ghz_awg#wp1jesd204b_soft-core_dds_on_dac

11:50 <sb0> not too secret, you just need to register an account

12:13 <rjo> the good old phys wiki

12:17 <sb0> I think the AMC standalone mode proposed here doesn't make much sense

12:18 <sb0> where is the power supply going to be? what about protecting the board with an enclosure? where will the extra SFP go, on the already crowded front panel?

12:20 <sb0> rjo, btw, since xilinx had the bright idea to remove the phase detectors from the IOSERDES in 7-series, we might have to halve the max data rate on the backplane

12:20 <sb0> unless we can assume that, once started, the clock/data timing relationship won't vary enough to cause trouble.

12:21 <sb0> might be actually ok

12:22 <rjo> i would have to read up on that xapp again to comment on that.

12:22 <rjo> what speed would be un-halved?

12:24 <rjo> as i see it, amc standalone would basically be a very minimal amc infrastructure. yes: with power supply, potentially enclosure, sfp.

12:24 <sb0> 1200Mbps -> 600Mbps between MCH and AMC

12:25 <sb0> per lane

12:25 <rjo> for spartan6 with that quad oversampling, that would be 1060M/4, right?

12:25 <sb0> we can of course use the transceivers there, as Greg suggests, which obviously have a phase detector

12:25 <sb0> and are much faster.

12:26 <sb0> for Spartan6/Oxford hardware, there is a phase detector, so you can run at ~1Gbps

12:26 <sb0> I can take care of this if you want, since I've already done it for HDMI

12:26 <rjo> are the ones on the milldown on transcievers or standard io?

12:26 <sb0> the Spartan-6 IOSERDES (standard IO)

12:27 <rjo> then what is the quad oversampling from that xapp note needed for?

12:27 <sb0> what xapp note?

12:27 <sb0> with the spartan-6 phase detector, there is no oversampling at all

12:28 <rjo> xapp1064. ah. that is indeed 1050M.

12:28 <rjo> no. not that one. that's source synchronous.

12:29 <sb0> maybe the phase detector is not necessary

12:29 <sb0> you can just scan the delays and note if you're able to get a valid data stream, then just go in the middle of the working range

12:29 <sb0> and stay there

12:30 <sb0> this is more likely to work on 7-series, which have calibrated delays

12:30 <sb0> whereas on the s6... the delays are actually a very fast ring oscillator

12:30 <sb0> uncalibrated

12:31 <sb0> FWIW, we don't recalibrate the DDR3 delays, and it seems stable

12:31 <rjo> how do they reconstruct the clock for spartan6 ioserdes?

12:31 <sb0> there is no clock reconstruction possible with the ioserdes

12:32 <sb0> you receive a clock which is phase-locked with the data, but you don't know the phase

12:32 <sb0> ...well, if you do 4x oversampling, you can implement a digital PLL that will do some form of clock reconstruction

12:33 <sb0> this is what is used in some 12Mbps USB PHYs

12:33 <sb0> sampling at 48MHz

12:33 <sb0> with the 48MHz asynchronous to the data, and the DPLL fixes it up

12:36 <sb0> but since we have this fancy backplane, we can send the clock to the AMCs, and then the receiver only have to determine the phase, not the complete clock

12:42 <rjo> but in general we don't have the clock.

12:45 <sb0> what do you mean?

12:46 <rjo> for many non-backplane versions there will only be the rx tx pair.

12:47 <sb0> yes, in that case you need a transceiver, or do the slow 4x oversampling + DPLL trick

12:48 <rjo> then what needs to be designed anyway are a) a 4x+DPLL or 7 series transciever version, b) the one for the milldown spartan6 transcievers.

12:49 <sb0> you can use the IOSERDES for b

12:49 <sb0> and send a clock

12:49 <rjo> and you are sayting that c) ioserdes with 7 series for the M-Labs ARTIQ HW is something that should be done as well?

12:50 <rjo> i tought the transcievers were fixed and you can't use those pads as standard logic.

12:50 <sb0> once we have one IOSERDES the other ones are semi-trivial. similar to another IOSERDES RTIO PHY

12:50 <sb0> transceivers have dedicated pads yes, but AFAIK their backplane also has links on regular IO

12:51 <rjo> on that adapter board design it seemed to be very little additional io

12:52 <sb0> the kc705 adapter?

12:52 <sb0> hmm

12:52 <rjo> yes.

12:54 <sb0> I think that in general we should prefer IOSERDES over transceivers. they are less messy, magical, proprietary, messy and a pain to use

12:54 <rjo> my guess is that in the long run we will want/need/be force to use transcievers whether we like it or not. and it would be nice to being able to generally fall back to the reconstructed clock and to not worry about speed limitations.

12:54 <sb0> IOSERDES are more portable too

12:55 <sb0> note that using a transceiver reconstructed clock requires an off-chip PLL/VCXO in many cases

12:56 <rjo> isn't the portability already disproven be the removal of the phase detector between 6 and 7 series?

12:56 <rjo> yes but no additional link.

12:56 <sb0> we can simulate the s6 phase detector by using 2x oversampling in the IOSERDES, with minimal code modifications

12:57 <sb0> the first version of my HDMI core did that, because the phase detector is only available on differential IOs that were not possible with my hacky adapter

12:57 <rjo> i don't have the strongest optinion on this. but we will invest a lot into the transcievers anyway.

12:58 <sb0> transceivers are different on each fpga family, and the hundreds of obscure parameters they have change.

12:59 <sb0> in fact, instantiating a transceiver in migen breaks the normal python function calls syntax, which is limited to 255 arguments

12:59 <sb0> the workaround is to put them in a dict and use **kwargs

13:00 <rjo> that hassle seems to be on par with the details of implementing the iodelay interface, ioserdes changes, master/slave pin pair limitations, phase detector intrinsics or unavailability, oversampling, lack of clock reconstruction, speed limitations, etc.

13:01 <rjo> one would hope to manage the arguments in a dict and not as a big instantiation.

13:01 <sb0> iodelay and phase detectors are rather simple things

13:01 <sb0> there are no master/slave pin pair limitations, each differential input has a master and a slave

13:01 <rjo> and then the gearbox, the scrambling/encoding, framing symbols.

13:02 <sb0> yes. better have those items as open source components (which are available in my HDMI core) than obscure transceiver features that _will not_ work and _will_ be a pain in the arse to debug

13:03 <sb0> it's not even that hard or performance-critical, and I wonder why Xilinx has those as hard-blocks

13:03 <sb0> it just makes things more complicated imo

13:10 <rjo> well. i am perfectly fine with abstaining because i have never implemented or used the transcievers myself.

13:10 <rjo> but we should really consider all factors here.

13:11 <sb0> there are valid reasons for using transceivers, but the fact that they contain the data encoding logic is not one of them

13:11 <rjo> isn't 600 Mbit something that we might actually sustainably saturate pretty quickly with our cpu. that would not be smart.

13:12 <rjo> a drtio write might be something like 200 bit. 400 bit for a pulse.

13:13 <rjo> depending on how smart we are with the protocol.

13:13 <rjo> we certainly saturate it with dma very soon.

13:14 <rjo> oh. and i don't even know wether the SFP transcievers work fine at low frequency.

13:14 <sb0> they do

13:15 <sb0> minimum data rate is some hundred Mbps iirc

13:15 <sb0> or, you mean the fiber PHY? this I don't know

13:15 <rjo> yes.

13:16 <sb0> but I'm not suggesting IOSERDES for SFP. the case for transceivers is pretty clear there.

13:17 <sb0> how fast can one SFP go?

13:17 <rjo> a 8b10b 1.25 GBit transciever needs to work down to something like 125 MHz but i don't know how steep the dc correction edge is below that.

13:17 <rjo> i think 10 GBit on a SFP+ is doable. let me check.

13:18 <rjo> yep.

13:18 <sb0> without fancy (but jittery) signal encoding on the link?

13:19 <rjo> pretty sure.

13:19 <rjo> that is one electrical pair.

13:20 <rjo> one optical wavelength.

13:20 <sb0> how do they modulate the laser that fast? kerr cell?

13:20 <rjo> no. plain vecsel

13:21 <sb0> ok. sounds good

13:32 <rjo> iirc that speed was a pretty hard barrier when they built the first transcievers. they could not get 10 GBit with 8b10b working. that barrier was one reason for 64b66b

13:33 <sb0> we can probably run them at e.g. 6Gbit-ish

13:34 <sb0> may make things simpler in the standalone digital box - we could use a low-end fpga

13:35 <rjo> artix with transcievers for the box?

13:35 <sb0> yes, or maybe spartan6 even

13:36 <sb0> btw, do you know that the 3 smallest artix have the exact same silicon die? the only limitation is on the total LUT/BRAM count that vivado will accept to use

13:37 <sb0> and those are placed anywhere on the chip, so I think that if you rewrite the bitstream header you can run a 55 bitstream on a 15 chip

13:43 <rjo> nice.

13:44 <rjo> but they probably had to do something like that. the artix things seemed really cheap to me and maintaining the entire fab line for two more silicons might not be worth it.

13:50 <rjo> hmm. it would make sense to get a good number for the sustained throughput needed in the pulse shaping wideband rf case. the superconducting labs will probably need a lot.

15:00 <GitHub2> [artiq] jordens pushed 1 new commit to master: https://git.io/vVUHQ

15:00 <GitHub2> artiq/master 049bd11 Robert Jordens: scanwidget: handle min, max, suffix (closes #352)

15:01 <rjo> when i triggered orders in november it was fine.

15:01 <rjo> ignore that

15:12 <bb-m-labs> build #251 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/251

15:16 <bb-m-labs> build #498 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/498 blamelist: Robert Jordens <rj@m-labs.hk>

15:45 fengling has quit [Ping timeout: 240 seconds]

16:55 <GitHub110> [artiq] sbourdeauducq pushed 1 new commit to release-1: https://git.io/vVTOF

16:55 <GitHub110> artiq/release-1 b04b5c8 Robert Jordens: scanwidget: handle min, max, suffix (closes #352)

17:35 <rjo> sb0: ack. i was about to do that as well. could you quickly check that the top of the text of the scanwidget is not cut of on your machine?

18:09 <whitequark> sb0: >the time I get per batch of two DDS writes is 300us, not 182

18:09 <whitequark> 182us is what my test returns, not the PulseRateDDS one

19:01 <whitequark> mhm, it crashes

19:13 bb-m-labs has quit [Quit: buildmaster reconfigured: bot disconnecting]

19:13 <GitHub144> [buildbot-config] whitequark pushed 1 new commit to master: https://github.com/m-labs/buildbot-config/commit/85eaa802841da70aabadd91e9255b58d5fdf3a26

19:13 <GitHub144> buildbot-config/master 85eaa80 whitequark: Pass `-v` to `python -m unittest`.

19:13 bb-m-labs has joined #m-labs

19:19 <whitequark> hrm. yeah, I went overzealous on TBAA.

19:57 <GitHub75> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVT79

19:57 <GitHub75> artiq/master 6f5332f whitequark: compiler: allow flagging syscalls, providing information to optimizer....

20:09 <bb-m-labs> build #252 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/252

20:10 <bb-m-labs> build #499 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/499 blamelist: whitequark <whitequark@whitequark.org>

20:20 stekern has quit [Ping timeout: 246 seconds]

20:21 stekern has joined #m-labs

21:45 <GitHub46> [artiq] whitequark pushed 3 new commits to master: https://git.io/vVkOt

21:45 <GitHub46> artiq/master f31249a whitequark: Commit missing parts of 6f5332f8.

21:45 <GitHub46> artiq/master 1038f13 whitequark: compiler: allow specifying per-function "fast-math" flags....

21:45 <GitHub46> artiq/master 3ed852e whitequark: Commit missing parts of 1d8b0d46.

21:46 mumptai has quit [Quit: Verlassend]

21:47 sandeepkr_ has quit [Ping timeout: 264 seconds]

21:56 <bb-m-labs> build #253 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/253

22:01 <bb-m-labs> build #71 of artiq-win64-test is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/71

22:03 <bb-m-labs> build #500 of artiq is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/500

22:33 _rht has quit [Quit: Connection closed for inactivity]