cr1901_modern has quit [Ping timeout: 264 seconds]
attie has quit [Ping timeout: 245 seconds]
attie has joined #m-labs
cr1901_modern1 has quit [Ping timeout: 264 seconds]
cr1901_modern has joined #m-labs
attie has quit [Ping timeout: 268 seconds]
attie has joined #m-labs
kyak has quit [Remote host closed the connection]
kyak has joined #m-labs
kyak has joined #m-labs
attie has quit [Ping timeout: 256 seconds]
attie has joined #m-labs
<GitHub-m-labs>
[artiq] enjoy-digital commented on issue #908: @sbourdeauducq: i'm going to do more tests with a simple design (https://github.com/enjoy-digital/sayma_test/blob/master/sayma_amc.py#L312), see if i'm able to reproduce the eye scan issue and try to understand how it could be related to the gateware. I'll also do some tests with the gateware traffic generator/checker that is in the design. https://github.com/m
<rjo>
whitequark: could you give me a report about the work on picam over the last days?
<rjo>
whitequark: and what's the plan for today?
futarisIRCcloud has joined #m-labs
<rjo>
sb0: ping
<sb0>
rjo, yes?
<rjo>
sb0: about the lane spread logic.
<rjo>
sb0: could you explain that to me?
attie has quit [Ping timeout: 260 seconds]
<sb0>
rjo, without it, if you keep writing with strictly increasing timestamps, all events end up in one FIFO and the others are unused
attie has joined #m-labs
<rjo>
sb0: (1) how do evaluate relative risk/benefit of spreading events over lanes to fill buffers and build slack against the risk of filling lanes and thereby being unable to jump back in time?
<sb0>
the spread logic increases buffering depth in this case
<rjo>
sb0: i get that. there are a couple of corner cases and q's i have.
<sb0>
if it becomes problematic, it can be disabled
<rjo>
sb0: (2) i'd have expected the force_laneB logic to be triggered by **the current lane becoming unwritable as a result of a write**
<rjo>
sb0: i am wondering whether there was testing or an analysis whether we expect problems.
<rjo>
re (2) i don't get your trigger logic (~lane_was_writable & lane_is writable).
<sb0>
it's equivalent to the current lane becoming unwritable as the result of a write
<sb0>
since the master always blocks *after* the write until the slave becomes writable
<sb0>
(this is done to have a single "status" register that is all 0 in the common case, and allows for quick testing)
<rjo>
what's master, slave here?
<sb0>
master = cpu or dma core, slave = (d)rtio core
<sb0>
the current code describes more closely what is actually happening due to the post-check, I think
<sb0>
if we want to avoid that, we probably need FIFOs with "almost full" signals...
<sb0>
both sync (which is quite straightforward) and async (which isn't)
<rjo>
avoid what?
<sb0>
blocking at all on writes when there is space in other lanes
<rjo>
sb0: maybe just doing at most M (maybe M=LANE_COUNT/2) subsequent lane switches due to spreading in a row would also work well wnough.
<rjo>
but anyway. that by itself doesn't seem to be the issue i'm looking at.
<rjo>
sb0: a couple more things: (3) any idea why it seems to only use two lanes when doing sequential output events?
<sb0>
how does it go back to the first lane?
<sb0>
is that in simulation or hardware?
<rjo>
(4) this really looks like a false underflow. the rtio_counter is well below now_mu.
<rjo>
sb0: that's hardware.
<sb0>
rjo, the only way it can change lanes is by incrementing the lane number
<sb0>
does it switch between two lanes (some signal having the wrong bit width) or goes to one lane, then the second one and never leaves it?
<rjo>
sb0: the slack (and manually inferred buffer space) is a sawtooth with one always filled lane, filling up a second, then waiting until one has drained, then filling another lane.
<sb0>
that sounds normal
<rjo>
no it doesn't. the waiting is abnormal since there should be buffer space from 6 more lanes.
<sb0>
"drained" == 1 event is removed and the FIFO becomes writable again
<rjo>
"fully drained" then
<sb0>
yes, fully drained isn't normal
<rjo>
at least that's what i infer from the slack. it could be different behavior under the hood.
<rjo>
the slack never rises above 2 lanes full.
<rjo>
and it never jumps below one lane full.
<rjo>
this is all independent of external clock/internal clock still and may or may not be related to the false underflow issue.
<sb0>
not rising above 2 lanes full sounds OK
<rjo>
why?
<rjo>
it should happpily spread to the other lanes.
<sb0>
assuming you are sending a square wave pattern: the second lane can only become writable again after the timestamp of its front event is reached, which is more than the timestamp of the last event in the first lane
<sb0>
so the first lane becomes empty before the second lane becomes writable
<rjo>
why doesn't it spread to the third lane after filling the second?
<sb0>
because spread is only engaged after the current lane has become writable again
<rjo>
yeah. i still don't get why that is correct. especially since you say the equivalent behavior would involve engaging spreading when the lane has become **unwritable**.
<sb0>
it's equivalent because the master waits until the *current* lane is writable again
<sb0>
force_laneB doesn't switch immediately to the next lane, it's a flag that stays raised until the next write, at which point it causes the lane switch and clears
<rjo>
i get the latter.
<sb0>
if there is a nice benefit to better spreading, we can perhaps look into making it switch immediately - but beware of pipelining bugs and timing failure
<sb0>
or use FIFO high marks
<sb0>
the latter option definitely won't cause timing problems and it's a nice generic feature to have on FIFOs
<sb0>
timing in lanedistributor is quite tricky, it has to do a lot of things in just a few cycles
rohitksingh_wor1 has joined #m-labs
rohitksingh_work has quit [Ping timeout: 256 seconds]
<rjo>
i noticed.
<rjo>
but let's shelve it. optimizing spreading is for later.
<cjbe>
sb0: just had a look at the Kasli master-satellite si output alignment using the current gateware: summary, it looks good
<cjbe>
over 10 restarts the pk-pk clock deviation is 95ps, c.f. ~60ps pk-pk over time without restarting the si, and loosing alignment
<rjo>
a couple obeservations about the false underflow issue: (4) it happens at random times, there is always positive slack, it becomes more likely with smaller slacks, when the false underflow happens the slack is larger than the minimum of the past slacks.
<sb0>
cjbe, good
<rjo>
sb0: do we require a rtio reset when switching clocks ext/int?
<rjo>
s/when/after/?
<sb0>
switching what clocks exactly?
<sb0>
if that goes through the si5324 with hitless switching then no. otherwise that needs a thorough reset and I'm not sure if the "rtio reset" is enough.
<rjo>
simeple standalone, starting a kernel that initiates a clock switch between core.external_clock = False/True.
<sb0>
rjo, _florent_, whitequark, kasli-1 is back on the server and is now a v1.1 board. ethernet on sfp0 and 10G drtio link to kasli-2 on sfp1
<sb0>
we should consider some migen features to catch this sort of bug. it's not the first time that bugs of this sort are wasting my and other people's time.
<sb0>
maybe disallow all async paths that are not explicitly marked as allowed in the user code
<rjo>
sb0: yes. a feature that marks all proper CDC implementations as such can then emit errors on async paths as well as emit the correct timing constraints and exceptions to the toolchain.
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
rohitksingh_wor1 has quit [Read error: Connection reset by peer]
<GitHub-m-labs>
[artiq] hartytp commented on issue #908: > @sbourdeauducq: i'm going to do more tests with a simple design (https://github.com/enjoy-digital/sayma_test/blob/master/sayma_amc.py#L312), see if i'm able to reproduce the eye scan issue and try to understand how it could be related to the gateware. I'll also do some tests with the gateware traffic generator/checker that is in the design.... https://github.com/m-
<rjo>
whitequark, sb0: ok to demote the malformed packet and rx dropped messages from WARN to DEBUG? they don't seem to hold any info that i can react to and they come spewing when unplugging a fiber.
<GitHub-m-labs>
[artiq] gkasprow commented on issue #908: That was not easy but I managed to measure the signal on DQS4. The traces are very thin and touching them with probes may break them.... https://github.com/m-labs/artiq/issues/908#issuecomment-371135891
<GitHub-m-labs>
artiq/master 7afb23e Robert Jordens: runtime: demote dropped and malformed packets msgs to debug
attie has quit [Ping timeout: 264 seconds]
attie has joined #m-labs
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #908: @enjoy-digital That's probably not the problem, but shouldn't DQS have a preamble and a postamble (time when it is driven at a fixed value after leaving hi-Z and before entering hi-Z)? I don't see that on the trace, it is toggling all the time when driving. https://github.com/m-labs/artiq/issues/908#issuecomment-371138356
<GitHub-m-labs>
[artiq] enjoy-digital commented on issue #908: @sbourdeauducq: this is similar to what we are doing on Kintex7, so as you are saying this is probably not the issue, but i'll try to add that. https://github.com/m-labs/artiq/issues/908#issuecomment-371144559
<GitHub-m-labs>
[artiq] hartytp commented on issue #908: @sbourdeauducq if there are any other measurements you want @gkasprow to make then please suggest them now. Otherwise, as agreed, let's work on the assumption that this is a gateware/firmware/vivado issue for M-Labs to deal with. Can you make it top priority, please? https://github.com/m-labs/artiq/issues/908#issuecomment-371145145
<GitHub-m-labs>
[artiq] hartytp commented on issue #908: @enjoy-digital not telling you how to suck eggs, but did you try reading the SDRAM datasheet and checking for anything unexpected? Might also be worth double checking Ultra-scale clocking etc to look for differences from the 7 series. https://github.com/m-labs/artiq/issues/908#issuecomment-371145357
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #908: Well, as I said the memtest in the MiSoC/ARTIQ bootloader is only loading it very lightly. What happens with high-bandwidth transfers? @enjoy-digital do I get it right that you already have bitstreams that do that?... https://github.com/m-labs/artiq/issues/908#issuecomment-371148672
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #908: Well, as I said the memtest in the MiSoC/ARTIQ bootloader is only loading it very lightly. What happens with high-bandwidth transfers? Or with a lot of precharge/activate cycles (those use the most power with DRAM)? @enjoy-digital do I get it right that you already have bitstreams that do that?... https://github.com/m-labs/artiq/issues/908#issuecomment-37114867
<GitHub-m-labs>
[artiq] hartytp commented on issue #908: @sbourdeauducq Before we get carried away with endless hardware tests, let's agree on one thing: that neither PI nor SI issues are responsible for the bad eye scans/memtest issues we're currently seeing on Sayma.... https://github.com/m-labs/artiq/issues/908#issuecomment-371151900
<bb-m-labs>
build #754 of artiq-win64-test is complete: Warnings [warnings python_unittest] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/754 blamelist: Robert Jordens <rj@m-labs.hk>, Robert Jordens <jordens@gmail.com>
<sb0>
cjbe, are there drtio issues other than what you reported?
<sb0>
rjo, i'll have a look tomorrow (resets)
<rjo>
interestingly, vivado (cetainly now, but maybe also before) inserts BUFGs for the resets on Sayma, but not on Kasli. also making all the channel data registers reset_less suppressed its urge to consider adding BUFGs on the rio domains on Kasli.
<GitHub-m-labs>
[artiq] gkasprow commented on issue #908: @hartytp the 1.5V rail supplies both SDRAM and relevant bank of FPGA. So the current consumption is related only with SDRAM transactions. That's why we observe small transients when the memory cycles start. We will measure it once again with the Xilinx IP core tester. Tomorrow I go for a conference, @marmeladapk could you please measure the DC voltage on some cap under