sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
cr1901_modern has joined #m-labs
rohitksingh_work has joined #m-labs
ohama has quit [Ping timeout: 260 seconds]
ohama has joined #m-labs
loetkoenig has joined #m-labs
loetkoenig has quit [Quit: Page closed]
rohitksingh_work has quit [Ping timeout: 264 seconds]
rohitksingh_work has joined #m-labs
<whitequark> sb0: um
<whitequark> how do I reverse a signal in migen?
<sb0> reverse all its bits?
<whitequark> yeah
<sb0> Cat(s[i] for i in reversed(range(len(s))))
<sb0> or Cat(*[]), I don't remember if it can take generators directly
<whitequark> shouldn't there be a helper function for that...
<sb0> that's a pretty uncommon operation
<whitequark> ok
_whitelogger has joined #m-labs
_whitelogger has joined #m-labs
cyrozap has quit [Ping timeout: 264 seconds]
<whitequark> OverflowError: Python int too large to convert to C ssize_t
<whitequark> what on earth
<whitequark> if (n >> i) & 1: nr |= 1 << (n - 1 - i)
<whitequark> this is on this operation
<whitequark> oh the shift should be ssize_t
cyrozap has joined #m-labs
<GitHub> [artiq] whitequark pushed 2 new commits to master: https://github.com/m-labs/artiq/compare/f5aa73b8faf1...6b63322106e2
<GitHub> artiq/master 6b63322 whitequark: gateware: reverse SDRAM words in RTIO DMA engine.
<GitHub> artiq/master 4b14887 whitequark: gateware: work around ISE/Vivado bugs with very wide shifts.
<bb-m-labs> build #469 of artiq-board is complete: Failure [failed conda_build] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/469 blamelist: whitequark <whitequark@whitequark.org>
<bb-m-labs> build #1399 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1399 blamelist: whitequark <whitequark@whitequark.org>
<sb0> whitequark, if timing isn't met, that could explain the intermittent failure
<sb0> when will we have RSFQ FPGAs...
<sb0> bb-m-labs, force build artiq
<bb-m-labs> build #1400 forced
<bb-m-labs> I'll give a shout when the build finishes
<sb0> trying with the latest vivado
<bb-m-labs> build #470 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/470
key2 has joined #m-labs
<bb-m-labs> build #1400 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1400
<whitequark> these spurious failures are driving SNR way down.
<whitequark> there's been many cases where genuine issues were obscured by one of those flaky tests
<whitequark> they should be either fixed or skipped
<whitequark> (or relaxed)
<whitequark> uhhh
<whitequark> OutputMessage(channel=128, timestamp=333529387896, rtio_counter=333479716120, address=0, data=141046726000768)
<whitequark> this makes no sense
<whitequark> the data is 0x804800000080
<whitequark> ... oh
<whitequark> I think I know
<whitequark> of *course*
<whitequark> I've reversed the entire word, but should've reversed the word *and* the octets
<sb0> i've been relaxing the dds test, but it keeps increasing
<sb0> and I agree, the CI has been messy for the last months
<whitequark> sb0: let's relax it 2x.
<whitequark> this will catch catastrophic failures.
<whitequark> and then, once we decide to commit to a certain WCET for that testcase, someone will investigate and fix it.
<whitequark> right now we clearly can't guarantee WCET even if the test passes, because of how sensitive to something it is.
<GitHub> [artiq] jordens commented on commit 6b63322: `Cat(reversed(s))` https://github.com/m-labs/artiq/commit/6b63322106e25b098c4c0cc4c4edc87cd1942f05#commitcomment-21369353
<whitequark> rjo: try writing that.
<whitequark> that was the first thing I tried
<whitequark> oh, wait, that was without Cat.
<rjo> i'd also welcome a migen patch that makes Cat() a noop and makes all iterables in expressions interpreted accordingly.
<whitequark> if I spend ten more minutes on this endianness idiocy I'll just leave in the software fix.
<whitequark> what a monumental waste of time.
<whitequark> hm, misoc is very slow when simulating nested Cat's for some reason...
<whitequark> sb0: I'll also relax test_rpc_timing, but only on Windows.
<whitequark> I'm not sure what's the exact issue.
<whitequark> having a mean roundtrip time of 2ms or 4ms shouldn't really matter anyway
bb-m-labs has quit [Quit: buildmaster reconfigured: bot disconnecting]
<GitHub30> [buildbot-config] whitequark pushed 1 new commit to master: https://git.io/vyQcd
<GitHub30> buildbot-config/master 74af454 whitequark: Don't consider build broken if only the Windows worker fails; warn.
bb-m-labs has joined #m-labs
<GitHub> [artiq] whitequark pushed 3 new commits to master: https://github.com/m-labs/artiq/compare/6b63322106e2...e9cf451c0b30
<GitHub> artiq/master e9cf451 whitequark: test: relax test_rpc_timing on Windows.
<GitHub> artiq/master 7dc7dcd whitequark: test: relax test_pulse_rate_dds to only catch catastrophic slowdown.
<GitHub> artiq/master 4de336f whitequark: gateware: reverse bytes of SDRAM word, not bits.
<rjo> 2ms vs 4ms matters.
<whitequark> okay
<rjo> i am extremely uneasy about the constant creepage of slowness in all parts.
<GitHub> [artiq] whitequark pushed 1 new commit to master: https://github.com/m-labs/artiq/commit/dbea679e96f85e2166d74014f35a7e9ef3daf262
<GitHub> artiq/master dbea679 whitequark: Revert "test: relax test_rpc_timing on Windows."...
<rjo> dds programming, rpc latency, data throughput.
<whitequark> um
<whitequark> what?
<rjo> whitequark: we can do it temporarily if we really think that it will advance progress. but we have to commit to making a significant effort to speeding things up again.
<whitequark> we started off with 15ms rpc latency on lwip.
<whitequark> smoltcp has improved things by a factor of eight or so
<rjo> really? i remember a few ms before rust.
<whitequark> was 15ms in 2015
<whitequark> 10ms slightly before that
<whitequark> and the test was checked in with 10ms.
<rjo> ok. got a link handy to the build for artiq 2.x?
<whitequark> no, I'm looking at the sources of the test in git log
<rjo> whitequark: then i am content; scratch rpc latency out of that list. but i am adding worker startup/kernel compilation again.
<whitequark> throughput should be better now than it ever was with lwip as well, because we've added a few more buffers to thethmac
<whitequark> and I've limited the advertised receive window by the buffer size too, in smoltcp
<rjo> unfortunately we never printed out the rpc latency.
<whitequark> the rpc latency is pretty hard to instrument at this point
<rjo> whitequark: i seem remember a discussion with you where you gave a 100kB/s-ish throughput number.
<whitequark> although
<rjo> just ask the host for the time twice.
<whitequark> no, scratch that, a sampling profiler collecting perhaps ten times per second should not disturb it too much to skew the results
<whitequark> no, that's not it
<whitequark> measuring it is easy.
<whitequark> learning where time is wasted, not so much.
<whitequark> regarding throughput, that doesn't sound right. let me check something
<rjo> network traces would tell you about the network things.
<whitequark> rjo: actually, it's more like 25kB/s.
<whitequark> which... is interesting
<whitequark> the cause is that packets are only sent every 40ms
<whitequark> looking at how stable that number is, it's probably something like an unwanted interaction with delayed ACK.
<whitequark> I'll take a look at it later.
<rjo> whitequark: right. that's not much fun. we should track number for that in the unittests.
<bb-m-labs> build #471 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/471
<whitequark> it should be, well, at least 40 times faster
<rjo> whitequark: in general it might be useful to have some long term performance tracking mechanism for all these things.
<rjo> whitequark: mind if i file a bug and track it for 3.0 for that throughput thing?
<whitequark> sure.
<whitequark> rjo: we can make the unit tests output some magic string, which gets exported through buildbot's API.
<bb-m-labs> build #1401 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1401 blamelist: whitequark <whitequark@whitequark.org>
<whitequark> oh, wonderful
<whitequark> that test fails because DMA now works
<whitequark> fsvo works
<GitHub> [artiq] jordens opened issue #685: TCP throughput https://github.com/m-labs/artiq/issues/685
<whitequark> scrape this then draw a graph.
<whitequark> could reuse some code that rust uses.
<whitequark> you can file an issue to do this if you want it.
<GitHub> [artiq] jordens opened issue #686: track performance https://github.com/m-labs/artiq/issues/686
<GitHub> [artiq] whitequark pushed 1 new commit to master: https://github.com/m-labs/artiq/commit/ac9e8b8568dbad2c23a3c9b3af6f9b630c60538e
<GitHub> artiq/master ac9e8b8 whitequark: test: avoid underflow in DMA replay test.
<whitequark> um
<whitequark> sb0: why do I get awk: symbol lookup error: awk: undefined symbol: mpfr_z_sub
<whitequark> rjo: great initiative. (re ml)
<bb-m-labs> build #472 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/472
rohitksingh_work has quit [Read error: Connection reset by peer]
<bb-m-labs> build #1402 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1402 blamelist: whitequark <whitequark@whitequark.org>
<rjo> whitequark: ack. i wonder what we are going to get. when i was still in the lab i had a pretty good handle. but now i feel that i am diverging a bit as well.
<bb-m-labs> build #473 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/473
<whitequark> bb-m-labs: stop build
<bb-m-labs> try 'stop build WHICH <REASON>'
<whitequark> bb-m-labs: stop build artiq broken
<bb-m-labs> build 1403 interrupted
<bb-m-labs> build #1403 of artiq is complete: Exception [exception python_unittest_2 interrupted] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1403 blamelist: whitequark <whitequark@whitequark.org>
allen0s has joined #m-labs
<allen0s> whitequark: I'm not currently on the artiq mailing list, but was forwarded the call for a survey. New to artiq and setting it up on a clean install of ubuntu 16.04. I was able to work through a conda install, but when I was here yesterday, you indicated that my version was out of date. So I went back to install from source. And a day later, I'm still hitting dependency hell, mismatch, etc. It's not necessarily artiq, but the 100
<rjo> allen0s: you were here yesterday?
<whitequark> allen0s: "but the 100" and then the message was cut off.
<rjo> allen0s: help us triage your problem: who are you and what do you want to do? what did you do? and what was the outcome?
<allen0s> whitequark: 1000 things it depends on directly and indirectly
<whitequark> allen0s: okay.
<whitequark> so, it's not necessary to install artiq from source to get the latest version.
<whitequark> adding the "dev" conda channel is enough
<whitequark> install from source is mostly there for people who detest conda (like me).
<allen0s> rjo: stewart allen w/ ionq. building ion trap quantum computers
<whitequark> the docs describe how to use the dev channel: https://m-labs.hk/artiq/manual-release-2/installing.html#installing-the-artiq-packages
<allen0s> whitequark: would like to be able to work from src to be able to contribute back more effectively anyway
<whitequark> allen0s: conda can still help you there; install the "artiq-dev" package
<whitequark> this has all the dependencies of artiq, but not artiq itself
<whitequark> it's fairly unlikely that you'll need e.g. a custom build of llvm. so if you can cope with conda, sure, use it!
<allen0s> whitequark: like you, not a fan
<allen0s> whitequark: but would like to get up and running faster
<rjo> allen0s: there is no fast and easy installation of all the packages from source that does not require you to learn most of the build machinery. you have to choose between the pain of handling the dependencies and the building yourself and the somewhat less flexible artiq-dev package based development.
<sb0> whitequark, you can safely ignore that "undefined symbol: mpfr_z_sub" message from vivado.
<sb0> allen0s, you installed nist_qc1. it is no longer maintained. use nist_qc2, nist_clock or pipistrello for the latest version
<whitequark> sb0: that doesn't seem like a thing one should safely ignore...
<whitequark> why is it even calling awk?
<sb0> well as far as I can tell, it still runs fine despite printing that messge
<sb0> it seems to be a common problem too, if you google for it
<sb0> as to "why", because xilinx shitware sucks.
<whitequark> you'd think the awk call serves some purpose...
<whitequark> but I guess you never know with xilinx.
<GitHub> [artiq] jbqubit commented on issue #686: Thank you for advocating for this @jordens. Much needed. Add TCP throughput to the list #685. ... https://github.com/m-labs/artiq/issues/686#issuecomment-287365235
<GitHub> [artiq] whitequark commented on issue #686: That's RPC throughput in the list. https://github.com/m-labs/artiq/issues/686#issuecomment-287365509
<sb0> iirc lwip with the C runtime was 1MB/s
<whitequark> it's not limited by resource depletion.
<whitequark> it's limited by not pushing the window hard enough
<whitequark> it's probably a one-line fix somewhere, too.
<whitequark> sb0: wtf.
<whitequark> I've added CSRStatus for FSM state.
<whitequark> now it doesn't hang anymore (!)
<sb0> whitequark, okay. typical xilinx garbage. just leave the CSR there...
<whitequark> ...
<whitequark> I thought suggesting this for a moment but then considered it too hacky
<sb0> write a comment indicating that the CSRs can be removed once xilinx get their shit together, if ever
<whitequark> something something observing the state of the system causes it to stop collapsing
<sb0> is that with the latest vivado?
<whitequark> um, it's with whatever is in the PATH
<whitequark> on the buildserver
<whitequark> I ran "python3 -m artiq.gateware.targets.kc705_dds" about two hours ago.
<sb0> so it should be the latest one, I changed it so that timing passes
<sb0> they did seem to make the compilation result less slow with later vivado version
<whitequark> hmm
<sb0> and migen doesn't use PATH, it looks into /opt/Xilinx for the latest version
<sb0> the problem though, is that this latest version segfaults when compiling the drtio core
<whitequark> sb0: oh, that was what fixed it.
<whitequark> it doesn't crash with the flashed gateware either.
<sb0> so we probably have a problem if we want drtio and dma at the same time
<sb0> maybe the version just before can compile both without fucking up
<whitequark> here's the patch in case anyone in the futre wants to try it out. https://paste.debian.net/922266/
<sb0> whitequark, vivado has some code that recognizes FSM and reencodes/optimizes them
<whitequark> sb0: no, I mean, your timing changes
<sb0> probably what happens is that code has some bug, but when you add the CSR it no longer recognizes a FSM and that bug is not tickled
<whitequark> I tried it again with the buildbot-built gateware
<whitequark> bitstream even
<whitequark> it still doesn't hang
<sb0> ah!
<sb0> so it was just a timing problem. not some xilinx bug.
<whitequark> I don't know, the xilinx version also changed, no?
<whitequark> and timing passed many times before
<whitequark> and it still hung
<sb0> timing broke when you replaced the shift with a Case
<sb0> it passed before
<whitequark> ah I see
<whitequark> it could be hanging for a different reason before that
<sb0> so there were two versions before: 1) shift that passes timing but miscompiles 2) Case that doesn't miscompile but passes timing
<whitequark> btw how did you fix timing?
<sb0> *breaks timing
<sb0> I just changed the vivado version for the latest one
<sb0> as I said they seem to have made the result less slow
<sb0> once in a blue moon things improve with vivado upgrades
<whitequark> oh
<whitequark> it hung.
<whitequark> wtf
<whitequark> it hung when I tried to write a test for it, specifcally
<whitequark> ah I see, that happens after an underflow.
<whitequark> sb0: ok so
<whitequark> it still hangs
<whitequark> I don't know what's the exact condition for hanging it but the *combination* of tests (not pushed yet) reproduces it
<whitequark> ... and every FSM is at zero when it's hung.
<whitequark> oh
<whitequark> it's at zero because that mechanism is just broken.
<whitequark> hm, it might not be, actually
<whitequark> no, the mechanism is not broken, I was just querying it after the core actually *did* finish for this test
<whitequark> sb0: so. it definitely hangs. and all FSMs are definitely in IDLE.
<whitequark> let's see how is that possible...
<whitequark> sb0: what's "CRI" and how does this thing work?
<sb0> common rtio interface
<sb0> just a set of signals shared between rtio/drtio, a bit like wishbone buses, but specific to rtio
hartytp has joined #m-labs
<whitequark> ok. well, I don't know why the arbiter is broken.
hartytp has quit [Quit: Page closed]
<allen0s> whitequark: i just went back to the conda packages using -dev and when i try to flash the board using nist_qc2, i get:
<allen0s> pkg_resources.DistributionNotFound: The 'artiq==3.0.dev0+820.gf4ae166' distribution was not found and is required by the application
<allen0s> the other day using non-dev and nist_qc1 worked
<sb0> whitequark, what are the symptoms?
<whitequark> sb0: same as before
<sb0> broken arbiter should not hang
<whitequark> after a certain sequence of events, csr::rtio_dma::arb_gnt_read() never equals 1
<sb0> did you do csr::rtio::arb_req_write(0); csr::rtio_dma::arb_req_write(1) ?
<sb0> is it where it hangs? waiting for the arbiter?
<whitequark> sure. I'm using your code.
<whitequark> it hangs in the loop in rtio_arb_dma().
<whitequark> and just before calling that function, all FSMs are at idle
<sb0> if you set the arbiter permanently to dma, is there any dma bug left?
<GitHub> [artiq] sbourdeauducq commented on issue #681: > If the ion trap is working with a long chain of ions, asynchronous kernel termination could cause ion loss. Hooks should be in place so we can do clean up.... https://github.com/m-labs/artiq/issues/681#issuecomment-287390855
<whitequark> sb0: I cannot set the arbiter permanently to DMA.
<whitequark> I added a condition: if csr::rtio_dma::arb_gnt_read() == 0 { rtio_arb_dma(); }
<whitequark> well, somehow, arb_gnt gets reset back to 0 at some point
<whitequark> despite me never doing it (in fact the code that could do it is commented out)
<sb0> whitequark, how come? just set it to dma when the runtime starts, and don't touch it afterwards
<whitequark> sb0: I don't touch it.
<whitequark> it still resets itself back to 1.
<whitequark> erm, 0.
allen0s has quit [Quit: Page closed]
<sb0> whitequark, feel free to replace the arbiter with a switch
<sb0> i.e. have a csr for selecter and remove arb_req and arb_gnt
<sb0> *selected
<whitequark> ok
acathla has quit [Quit: Coyote finally caught me]
acathla has joined #m-labs
acathla has quit [Changing host]
acathla has joined #m-labs
key2 has quit [Quit: Page closed]
AndChat|326081 has quit [Ping timeout: 240 seconds]