sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
<GitHub18> [artiq] whitequark pushed 3 new commits to master: https://git.io/vVJEN
<GitHub18> artiq/master f4e6b18 whitequark: compiler: implement kernel constant attributes....
<GitHub18> artiq/master ca7463a whitequark: compiler: do not write back kernel constant attributes....
<GitHub18> artiq/master 507ad96 whitequark: coredevice: add some kernel_constant_attributes specifications.
<bb-m-labs> build #248 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/248
<bb-m-labs> build #495 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/495 blamelist: whitequark <whitequark@whitequark.org>
<whitequark> sb0: should I enable -ffast-math mode for FP operations emitted by ARTIQ?
<whitequark> without this e.g. LLVM refuses to optimize the result of x*0.0 to zero
<GitHub167> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVJa4
<GitHub167> artiq/master 418f0a5 whitequark: compiler: mark loads of kernel constant attributes as load invariant....
<bb-m-labs> build #249 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/249
<bb-m-labs> build #70 of artiq-win64-test is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/70
<bb-m-labs> build #496 of artiq is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/496
sandeepkr_ has quit [Ping timeout: 264 seconds]
_rht has joined #m-labs
fengling has joined #m-labs
kuldeep has quit [Ping timeout: 252 seconds]
ylamarre has joined #m-labs
kuldeep has joined #m-labs
ylamarre has quit [Client Quit]
<whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all
<bb-m-labs> build #21 forced
<bb-m-labs> I'll give a shout when the build finishes
<bb-m-labs> build #112 of conda-lin64 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/112
<bb-m-labs> build #54 of conda-win32 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-win32/builds/54
<bb-m-labs> build #85 of conda-win64 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-win64/builds/85
<bb-m-labs> build #21 of conda-all is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/21
<whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all
<bb-m-labs> build #22 forced
<bb-m-labs> I'll give a shout when the build finishes
<GitHub131> [conda-recipes] whitequark pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/c51a7fa674379e86d438032d8dc700b8796a9d4d
<GitHub131> conda-recipes/master c51a7fa whitequark: llvmlite-artiq: bump.
<bb-m-labs> build #113 of conda-lin64 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/113
<GitHub20> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVJ1D
<GitHub20> artiq/master 1d8b0d4 whitequark: compiler: mark FFI functions as ModRef=Ref using TBAA metadata....
<bb-m-labs> build #55 of conda-win32 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-win32/builds/55
<bb-m-labs> build #86 of conda-win64 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-win64/builds/86
<bb-m-labs> build #22 of conda-all is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/22
<sb0> whitequark, yes? why not?
<whitequark> sb0: then if you get a NaN or +Inf, the results can be unpredictable
<whitequark> or -Inf or -0
<whitequark> moreover (this is a separate flag) LLVM will also do algebraically equivalent transformations such as reassociation, which can dramatically change precision in some cases
<whitequark> but this wins us almost a 2x gain on PulseRateDDS...
<sb0> how fast is it now?
<whitequark> 182us per batch of two writes
<whitequark> will be 100us
<sb0> Unfortunately this isn't feasible because our attribute writeback machinery allows outside code to grab a pointer to any object in the graph, which is exactly what it's ought to do
<sb0> but this writeback code only reads the objects, no?
<whitequark> no way to tell LLVM that.
<whitequark> besides it doesn't really do constant *propagation* through globals
<sb0> and if you don't tell it anything?
<whitequark> instead it pulls in the entire global and replace its value
<whitequark> it doesn't do anything.
<whitequark> since it assumes there can be writes
<sb0> yes, but if you get the objects without telling LLVM about it?
<whitequark> impossible
<whitequark> if I don't tell LLVM about objects, it will mangle them beyond recognition. in fact there won't *be* any objects
<sb0> then disable attribute writeback.
<bb-m-labs> build #250 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/250
<whitequark> completely?
<sb0> maybe leave the code there, activable with some flag, if that's easy
<sb0> otherwise yes
<sb0> my version of attribute writeback worked by adding a bunch of RPCs that read the values at the end of the code
<sb0> would such an implementation also cause the problem?
<bb-m-labs> build #497 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/497 blamelist: whitequark <whitequark@whitequark.org>
<whitequark> sb0: partly
<whitequark> if all constant fields, including those from setattr_device, are actually marked as constant, then that implementation shouldn't present any problems
<whitequark> we'll also need to bring in the concept of a fire-and-forget RPC back... hrm
<whitequark> and, compared to the current situation, it will inflate code size
<sb0> whitequark, maybe disable writeback and have only explicit host attribute writes. setattr(self, "name", value) as RPC...
<whitequark> sb0: ok. without attribute writeback that dds batch takes 41us
<whitequark> still not quite as fast as it could be due to some LLVM FP silliness
<whitequark> without that silliness it would be 27us
<whitequark> sb0: if we ported the OR1K backend to LLVM 3.6 then we could keep attribute writeback.
<whitequark> there is less than a dozen changes to the backend interfaces between 3.5 and 3.6 and none of them are functional
<whitequark> so this should take less than a day
<whitequark> actually, we could probably go right to 3.9 without much hassle.
<whitequark> well, 3.8, last released one...
<whitequark> this also has the advantage that we can use upstream llvmlite
rohitksingh has joined #m-labs
<sb0> 27us for a 2x batch? that's pretty good
rohitksingh has quit [Ping timeout: 260 seconds]
<sb0> whitequark, ok, try going for the llvm upgrade. but do not break 1.0.
rohitksingh has joined #m-labs
rohitksingh has quit [Quit: Leaving.]
sandeepkr_ has joined #m-labs
<sb0> whitequark, so the square wave minimum period is still 1.34us?
<sb0> wasn't it 1us at some point?
<sb0> whitequark, the time I get per batch of two DDS writes is 300us, not 182
<sb0> artiq 1.0rc1+46.g1d8b0d4
sandeepkr_ has quit [Ping timeout: 244 seconds]
<rjo> sb0: ~1.3 µs is what i remember it being for a long time.
<sb0> rjo, ok.
<sb0> btw the problem of getting a conservative number of free entries in a async FIFO is interesting, as it can be used to optimize the DRTIO protocol
<sb0> maintain a local copy of the number of free entries, when it is >0, write blindly, otherwise ask the remote side for entries
<sb0> underflows and other errors can be detected locally (similar as they are now) since there will be time sync with the remote
<sb0> I think DRTIO should be a pretty separate core design, except for RTLink... it won't share much code with the current one
<larsc> just like a async fifo
<sb0> yeah, that's the basic idea, but the implementation is very different
sandeepkr_ has joined #m-labs
<sb0> rjo, it seems the spinboxes absolute min/max values can be exceeded when dragging the sliders
<sb0> and then things go out of sync if you touch the spinboxes and then the sliders
<rjo> ah yes. sounds possible. could you file a bug so that i don't forget?
<rjo> sb0: could you add unary minus support to value_bits_sign()? i don't know where to dig to determine the correct behavior in that case.
<rjo> does a unary minus actually change the signedness?
<rjo> sb0: DRTIO: but this fire-and-forget way of doing writes only works for the output and does not work for the other errors that can only be detected at the phy, right?
<sb0> yes, overflow and busy - same problem as before...
<sb0> I'll look into that. are you working on the JESD204 signal generator?
<rjo> a bit. yes. i did some sketches and some math on what it can conceivably do.
<rjo> from the design i can pretty much reverse engineer what AD does inside the DDSes and why certain things are as they are...
<rjo> sb0: and there is a nasty bug in the simulator with Mux() and signals wider than one bit as the selector IIRC. but i had worked around it a while ago and i don't remember the details.
<sb0> rjo, ok, can you file issues for those things?
sb0 has quit [Quit: Leaving]
sb0 has joined #m-labs
<sb0> rjo, apparently there is some secret content on Ben's wiki, e.g. http://wiki.phys.ethz.ch/figwiki/amc_1ghz_awg#wp1jesd204b_soft-core_dds_on_dac
<sb0> not too secret, you just need to register an account
<rjo> the good old phys wiki
<sb0> I think the AMC standalone mode proposed here doesn't make much sense
<sb0> where is the power supply going to be? what about protecting the board with an enclosure? where will the extra SFP go, on the already crowded front panel?
<sb0> rjo, btw, since xilinx had the bright idea to remove the phase detectors from the IOSERDES in 7-series, we might have to halve the max data rate on the backplane
<sb0> unless we can assume that, once started, the clock/data timing relationship won't vary enough to cause trouble.
<sb0> might be actually ok
<rjo> i would have to read up on that xapp again to comment on that.
<rjo> what speed would be un-halved?
<rjo> as i see it, amc standalone would basically be a very minimal amc infrastructure. yes: with power supply, potentially enclosure, sfp.
<sb0> 1200Mbps -> 600Mbps between MCH and AMC
<sb0> per lane
<rjo> for spartan6 with that quad oversampling, that would be 1060M/4, right?
<sb0> we can of course use the transceivers there, as Greg suggests, which obviously have a phase detector
<sb0> and are much faster.
<sb0> for Spartan6/Oxford hardware, there is a phase detector, so you can run at ~1Gbps
<sb0> I can take care of this if you want, since I've already done it for HDMI
<rjo> are the ones on the milldown on transcievers or standard io?
<sb0> the Spartan-6 IOSERDES (standard IO)
<rjo> then what is the quad oversampling from that xapp note needed for?
<sb0> what xapp note?
<sb0> with the spartan-6 phase detector, there is no oversampling at all
<rjo> xapp1064. ah. that is indeed 1050M.
<rjo> no. not that one. that's source synchronous.
<sb0> maybe the phase detector is not necessary
<sb0> you can just scan the delays and note if you're able to get a valid data stream, then just go in the middle of the working range
<sb0> and stay there
<sb0> this is more likely to work on 7-series, which have calibrated delays
<sb0> whereas on the s6... the delays are actually a very fast ring oscillator
<sb0> uncalibrated
<sb0> FWIW, we don't recalibrate the DDR3 delays, and it seems stable
<rjo> how do they reconstruct the clock for spartan6 ioserdes?
<sb0> there is no clock reconstruction possible with the ioserdes
<sb0> you receive a clock which is phase-locked with the data, but you don't know the phase
<sb0> ...well, if you do 4x oversampling, you can implement a digital PLL that will do some form of clock reconstruction
<sb0> this is what is used in some 12Mbps USB PHYs
<sb0> sampling at 48MHz
<sb0> with the 48MHz asynchronous to the data, and the DPLL fixes it up
<sb0> but since we have this fancy backplane, we can send the clock to the AMCs, and then the receiver only have to determine the phase, not the complete clock
<rjo> but in general we don't have the clock.
<sb0> what do you mean?
<rjo> for many non-backplane versions there will only be the rx tx pair.
<sb0> yes, in that case you need a transceiver, or do the slow 4x oversampling + DPLL trick
<rjo> then what needs to be designed anyway are a) a 4x+DPLL or 7 series transciever version, b) the one for the milldown spartan6 transcievers.
<sb0> you can use the IOSERDES for b
<sb0> and send a clock
<rjo> and you are sayting that c) ioserdes with 7 series for the M-Labs ARTIQ HW is something that should be done as well?
<rjo> i tought the transcievers were fixed and you can't use those pads as standard logic.
<sb0> once we have one IOSERDES the other ones are semi-trivial. similar to another IOSERDES RTIO PHY
<sb0> transceivers have dedicated pads yes, but AFAIK their backplane also has links on regular IO
<rjo> on that adapter board design it seemed to be very little additional io
<sb0> the kc705 adapter?
<sb0> hmm
<rjo> yes.
<sb0> I think that in general we should prefer IOSERDES over transceivers. they are less messy, magical, proprietary, messy and a pain to use
<rjo> my guess is that in the long run we will want/need/be force to use transcievers whether we like it or not. and it would be nice to being able to generally fall back to the reconstructed clock and to not worry about speed limitations.
<sb0> IOSERDES are more portable too
<sb0> note that using a transceiver reconstructed clock requires an off-chip PLL/VCXO in many cases
<rjo> isn't the portability already disproven be the removal of the phase detector between 6 and 7 series?
<rjo> yes but no additional link.
<sb0> we can simulate the s6 phase detector by using 2x oversampling in the IOSERDES, with minimal code modifications
<sb0> the first version of my HDMI core did that, because the phase detector is only available on differential IOs that were not possible with my hacky adapter
<rjo> i don't have the strongest optinion on this. but we will invest a lot into the transcievers anyway.
<sb0> transceivers are different on each fpga family, and the hundreds of obscure parameters they have change.
<sb0> in fact, instantiating a transceiver in migen breaks the normal python function calls syntax, which is limited to 255 arguments
<sb0> the workaround is to put them in a dict and use **kwargs
<rjo> that hassle seems to be on par with the details of implementing the iodelay interface, ioserdes changes, master/slave pin pair limitations, phase detector intrinsics or unavailability, oversampling, lack of clock reconstruction, speed limitations, etc.
<rjo> one would hope to manage the arguments in a dict and not as a big instantiation.
<sb0> iodelay and phase detectors are rather simple things
<sb0> there are no master/slave pin pair limitations, each differential input has a master and a slave
<rjo> and then the gearbox, the scrambling/encoding, framing symbols.
<sb0> yes. better have those items as open source components (which are available in my HDMI core) than obscure transceiver features that _will not_ work and _will_ be a pain in the arse to debug
<sb0> it's not even that hard or performance-critical, and I wonder why Xilinx has those as hard-blocks
<sb0> it just makes things more complicated imo
<rjo> well. i am perfectly fine with abstaining because i have never implemented or used the transcievers myself.
<rjo> but we should really consider all factors here.
<sb0> there are valid reasons for using transceivers, but the fact that they contain the data encoding logic is not one of them
<rjo> isn't 600 Mbit something that we might actually sustainably saturate pretty quickly with our cpu. that would not be smart.
<rjo> a drtio write might be something like 200 bit. 400 bit for a pulse.
<rjo> depending on how smart we are with the protocol.
<rjo> we certainly saturate it with dma very soon.
<rjo> oh. and i don't even know wether the SFP transcievers work fine at low frequency.
<sb0> they do
<sb0> minimum data rate is some hundred Mbps iirc
<sb0> or, you mean the fiber PHY? this I don't know
<rjo> yes.
<sb0> but I'm not suggesting IOSERDES for SFP. the case for transceivers is pretty clear there.
<sb0> how fast can one SFP go?
<rjo> a 8b10b 1.25 GBit transciever needs to work down to something like 125 MHz but i don't know how steep the dc correction edge is below that.
<rjo> i think 10 GBit on a SFP+ is doable. let me check.
<rjo> yep.
<sb0> without fancy (but jittery) signal encoding on the link?
<rjo> pretty sure.
<rjo> that is one electrical pair.
<rjo> one optical wavelength.
<sb0> how do they modulate the laser that fast? kerr cell?
<rjo> no. plain vecsel
<sb0> ok. sounds good
<rjo> iirc that speed was a pretty hard barrier when they built the first transcievers. they could not get 10 GBit with 8b10b working. that barrier was one reason for 64b66b
<sb0> we can probably run them at e.g. 6Gbit-ish
<sb0> may make things simpler in the standalone digital box - we could use a low-end fpga
<rjo> artix with transcievers for the box?
<sb0> yes, or maybe spartan6 even
<sb0> btw, do you know that the 3 smallest artix have the exact same silicon die? the only limitation is on the total LUT/BRAM count that vivado will accept to use
<sb0> and those are placed anywhere on the chip, so I think that if you rewrite the bitstream header you can run a 55 bitstream on a 15 chip
<rjo> nice.
<rjo> but they probably had to do something like that. the artix things seemed really cheap to me and maintaining the entire fab line for two more silicons might not be worth it.
<rjo> hmm. it would make sense to get a good number for the sustained throughput needed in the pulse shaping wideband rf case. the superconducting labs will probably need a lot.
<GitHub2> [artiq] jordens pushed 1 new commit to master: https://git.io/vVUHQ
<GitHub2> artiq/master 049bd11 Robert Jordens: scanwidget: handle min, max, suffix (closes #352)
<rjo> when i triggered orders in november it was fine.
<rjo> ignore that
<bb-m-labs> build #251 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/251
<bb-m-labs> build #498 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/498 blamelist: Robert Jordens <rj@m-labs.hk>
fengling has quit [Ping timeout: 240 seconds]
<GitHub110> [artiq] sbourdeauducq pushed 1 new commit to release-1: https://git.io/vVTOF
<GitHub110> artiq/release-1 b04b5c8 Robert Jordens: scanwidget: handle min, max, suffix (closes #352)
<rjo> sb0: ack. i was about to do that as well. could you quickly check that the top of the text of the scanwidget is not cut of on your machine?
<whitequark> sb0: >the time I get per batch of two DDS writes is 300us, not 182
<whitequark> 182us is what my test returns, not the PulseRateDDS one
<whitequark> mhm, it crashes
bb-m-labs has quit [Quit: buildmaster reconfigured: bot disconnecting]
<GitHub144> [buildbot-config] whitequark pushed 1 new commit to master: https://github.com/m-labs/buildbot-config/commit/85eaa802841da70aabadd91e9255b58d5fdf3a26
<GitHub144> buildbot-config/master 85eaa80 whitequark: Pass `-v` to `python -m unittest`.
bb-m-labs has joined #m-labs
<whitequark> hrm. yeah, I went overzealous on TBAA.
<GitHub75> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVT79
<GitHub75> artiq/master 6f5332f whitequark: compiler: allow flagging syscalls, providing information to optimizer....
<bb-m-labs> build #252 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/252
<bb-m-labs> build #499 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/499 blamelist: whitequark <whitequark@whitequark.org>
stekern has quit [Ping timeout: 246 seconds]
stekern has joined #m-labs
<GitHub46> [artiq] whitequark pushed 3 new commits to master: https://git.io/vVkOt
<GitHub46> artiq/master f31249a whitequark: Commit missing parts of 6f5332f8.
<GitHub46> artiq/master 1038f13 whitequark: compiler: allow specifying per-function "fast-math" flags....
<GitHub46> artiq/master 3ed852e whitequark: Commit missing parts of 1d8b0d46.
mumptai has quit [Quit: Verlassend]
sandeepkr_ has quit [Ping timeout: 264 seconds]
<bb-m-labs> build #253 of artiq-kc705-nist_clock is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-kc705-nist_clock/builds/253
<bb-m-labs> build #71 of artiq-win64-test is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/71
<bb-m-labs> build #500 of artiq is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/500
_rht has quit [Quit: Connection closed for inactivity]