<GitHub>
artiq/master dbea679 whitequark: Revert "test: relax test_rpc_timing on Windows."...
<rjo>
dds programming, rpc latency, data throughput.
<whitequark>
um
<whitequark>
what?
<rjo>
whitequark: we can do it temporarily if we really think that it will advance progress. but we have to commit to making a significant effort to speeding things up again.
<whitequark>
we started off with 15ms rpc latency on lwip.
<whitequark>
smoltcp has improved things by a factor of eight or so
<rjo>
really? i remember a few ms before rust.
<whitequark>
was 15ms in 2015
<whitequark>
10ms slightly before that
<whitequark>
and the test was checked in with 10ms.
<rjo>
ok. got a link handy to the build for artiq 2.x?
<whitequark>
no, I'm looking at the sources of the test in git log
<rjo>
whitequark: then i am content; scratch rpc latency out of that list. but i am adding worker startup/kernel compilation again.
<whitequark>
throughput should be better now than it ever was with lwip as well, because we've added a few more buffers to thethmac
<whitequark>
and I've limited the advertised receive window by the buffer size too, in smoltcp
<rjo>
unfortunately we never printed out the rpc latency.
<whitequark>
the rpc latency is pretty hard to instrument at this point
<rjo>
whitequark: i seem remember a discussion with you where you gave a 100kB/s-ish throughput number.
<whitequark>
although
<rjo>
just ask the host for the time twice.
<whitequark>
no, scratch that, a sampling profiler collecting perhaps ten times per second should not disturb it too much to skew the results
<whitequark>
no, that's not it
<whitequark>
measuring it is easy.
<whitequark>
learning where time is wasted, not so much.
<whitequark>
regarding throughput, that doesn't sound right. let me check something
<rjo>
network traces would tell you about the network things.
<whitequark>
rjo: actually, it's more like 25kB/s.
<whitequark>
which... is interesting
<whitequark>
the cause is that packets are only sent every 40ms
<whitequark>
looking at how stable that number is, it's probably something like an unwanted interaction with delayed ACK.
<whitequark>
I'll take a look at it later.
<rjo>
whitequark: right. that's not much fun. we should track number for that in the unittests.
<rjo>
whitequark: ack. i wonder what we are going to get. when i was still in the lab i had a pretty good handle. but now i feel that i am diverging a bit as well.
<bb-m-labs>
build #1403 of artiq is complete: Exception [exception python_unittest_2 interrupted] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1403 blamelist: whitequark <whitequark@whitequark.org>
allen0s has joined #m-labs
<allen0s>
whitequark: I'm not currently on the artiq mailing list, but was forwarded the call for a survey. New to artiq and setting it up on a clean install of ubuntu 16.04. I was able to work through a conda install, but when I was here yesterday, you indicated that my version was out of date. So I went back to install from source. And a day later, I'm still hitting dependency hell, mismatch, etc. It's not necessarily artiq, but the 100
<rjo>
allen0s: you were here yesterday?
<whitequark>
allen0s: "but the 100" and then the message was cut off.
<rjo>
allen0s: help us triage your problem: who are you and what do you want to do? what did you do? and what was the outcome?
<allen0s>
whitequark: 1000 things it depends on directly and indirectly
<whitequark>
allen0s: okay.
<whitequark>
so, it's not necessary to install artiq from source to get the latest version.
<whitequark>
adding the "dev" conda channel is enough
<whitequark>
install from source is mostly there for people who detest conda (like me).
<allen0s>
rjo: stewart allen w/ ionq. building ion trap quantum computers
<allen0s>
whitequark: would like to be able to work from src to be able to contribute back more effectively anyway
<whitequark>
allen0s: conda can still help you there; install the "artiq-dev" package
<whitequark>
this has all the dependencies of artiq, but not artiq itself
<whitequark>
it's fairly unlikely that you'll need e.g. a custom build of llvm. so if you can cope with conda, sure, use it!
<allen0s>
whitequark: like you, not a fan
<allen0s>
whitequark: but would like to get up and running faster
<rjo>
allen0s: there is no fast and easy installation of all the packages from source that does not require you to learn most of the build machinery. you have to choose between the pain of handling the dependencies and the building yourself and the somewhat less flexible artiq-dev package based development.
<sb0>
whitequark, you can safely ignore that "undefined symbol: mpfr_z_sub" message from vivado.
<sb0>
allen0s, you installed nist_qc1. it is no longer maintained. use nist_qc2, nist_clock or pipistrello for the latest version
<whitequark>
sb0: that doesn't seem like a thing one should safely ignore...
<whitequark>
why is it even calling awk?
<sb0>
well as far as I can tell, it still runs fine despite printing that messge
<sb0>
it seems to be a common problem too, if you google for it
<sb0>
as to "why", because xilinx shitware sucks.
<whitequark>
you'd think the awk call serves some purpose...
<whitequark>
but I guess you never know with xilinx.
<sb0>
whitequark, vivado has some code that recognizes FSM and reencodes/optimizes them
<whitequark>
sb0: no, I mean, your timing changes
<sb0>
probably what happens is that code has some bug, but when you add the CSR it no longer recognizes a FSM and that bug is not tickled
<whitequark>
I tried it again with the buildbot-built gateware
<whitequark>
bitstream even
<whitequark>
it still doesn't hang
<sb0>
ah!
<sb0>
so it was just a timing problem. not some xilinx bug.
<whitequark>
I don't know, the xilinx version also changed, no?
<whitequark>
and timing passed many times before
<whitequark>
and it still hung
<sb0>
timing broke when you replaced the shift with a Case
<sb0>
it passed before
<whitequark>
ah I see
<whitequark>
it could be hanging for a different reason before that
<sb0>
so there were two versions before: 1) shift that passes timing but miscompiles 2) Case that doesn't miscompile but passes timing
<whitequark>
btw how did you fix timing?
<sb0>
*breaks timing
<sb0>
I just changed the vivado version for the latest one
<sb0>
as I said they seem to have made the result less slow
<sb0>
once in a blue moon things improve with vivado upgrades
<whitequark>
oh
<whitequark>
it hung.
<whitequark>
wtf
<whitequark>
it hung when I tried to write a test for it, specifcally
<whitequark>
ah I see, that happens after an underflow.
<whitequark>
sb0: ok so
<whitequark>
it still hangs
<whitequark>
I don't know what's the exact condition for hanging it but the *combination* of tests (not pushed yet) reproduces it
<whitequark>
... and every FSM is at zero when it's hung.
<whitequark>
oh
<whitequark>
it's at zero because that mechanism is just broken.
<whitequark>
hm, it might not be, actually
<whitequark>
no, the mechanism is not broken, I was just querying it after the core actually *did* finish for this test
<whitequark>
sb0: so. it definitely hangs. and all FSMs are definitely in IDLE.
<whitequark>
let's see how is that possible...
<whitequark>
sb0: what's "CRI" and how does this thing work?
<sb0>
common rtio interface
<sb0>
just a set of signals shared between rtio/drtio, a bit like wishbone buses, but specific to rtio
hartytp has joined #m-labs
<whitequark>
ok. well, I don't know why the arbiter is broken.
hartytp has quit [Quit: Page closed]
<allen0s>
whitequark: i just went back to the conda packages using -dev and when i try to flash the board using nist_qc2, i get:
<allen0s>
pkg_resources.DistributionNotFound: The 'artiq==3.0.dev0+820.gf4ae166' distribution was not found and is required by the application
<allen0s>
the other day using non-dev and nist_qc1 worked
<sb0>
whitequark, what are the symptoms?
<whitequark>
sb0: same as before
<sb0>
broken arbiter should not hang
<whitequark>
after a certain sequence of events, csr::rtio_dma::arb_gnt_read() never equals 1
<sb0>
did you do csr::rtio::arb_req_write(0); csr::rtio_dma::arb_req_write(1) ?
<sb0>
is it where it hangs? waiting for the arbiter?
<whitequark>
sure. I'm using your code.
<whitequark>
it hangs in the loop in rtio_arb_dma().
<whitequark>
and just before calling that function, all FSMs are at idle
<sb0>
if you set the arbiter permanently to dma, is there any dma bug left?
<GitHub>
[artiq] sbourdeauducq commented on issue #681: > If the ion trap is working with a long chain of ions, asynchronous kernel termination could cause ion loss. Hooks should be in place so we can do clean up.... https://github.com/m-labs/artiq/issues/681#issuecomment-287390855
<whitequark>
sb0: I cannot set the arbiter permanently to DMA.
<whitequark>
I added a condition: if csr::rtio_dma::arb_gnt_read() == 0 { rtio_arb_dma(); }
<whitequark>
well, somehow, arb_gnt gets reset back to 0 at some point
<whitequark>
despite me never doing it (in fact the code that could do it is commented out)
<sb0>
whitequark, how come? just set it to dma when the runtime starts, and don't touch it afterwards
<whitequark>
sb0: I don't touch it.
<whitequark>
it still resets itself back to 1.
<whitequark>
erm, 0.
allen0s has quit [Quit: Page closed]
<sb0>
whitequark, feel free to replace the arbiter with a switch
<sb0>
i.e. have a csr for selecter and remove arb_req and arb_gnt
<sb0>
*selected
<whitequark>
ok
acathla has quit [Quit: Coyote finally caught me]
acathla has joined #m-labs
acathla has quit [Changing host]
acathla has joined #m-labs
key2 has quit [Quit: Page closed]
AndChat|326081 has quit [Ping timeout: 240 seconds]