<GitHub-m-labs>
artiq/release-3 35b70b3 Sebastien Bourdeauducq: ttl_serdes_generic: fix/upgrade test
<rjo>
whitequark: ack. i misread it.
<rjo>
_florent_, sb0: then read leveling only aligns dq w.r.t. clk?
<rjo>
and per-bit skew is usually not an issue?
<sb0>
rjo, yes. it aligns it for each 8-bit group
<sb0>
DDR3 doesn't have a better way of dealing with per-bit skew - there is one DQS line for 8 DQ
<sb0>
PCB layout should have matched trace lengths for each DQS group
<sb0>
DQS-DQ timing is more tightly specified than CLK-DQ timing inside the SDRAM chip; this is the only reason for using DQS for reading
<sb0>
for writing, DQS is useful to compensate for the CLK skew that accumulates chip after chip from the fly-by layout (the SDRAM chips are pretty dumb and cannot delay DQ on their own)
<rjo>
per-bit skew could be handled on each dq line per chip in the fpga, no?
<sb0>
yes
<sb0>
but there shouldn't be much bit-to-bit skew inside a group of 8
<hartytp>
my guess is that this isn't an issue with trace matching on the PCB. Greg simulated that pretty carefully IIRC. Plus, everything looked good with the Xilinx tests and with _florent_'s liteX test (which gave really nice eyes)
<hartytp>
remind me: (a) have we confirmed that this issue is present with the SAWG and not without it?
<hartytp>
(b) while mem test etc is happening, is the SAWG logic held in a reset/power down state?
<hartytp>
i.e. is this vivado changing the layout etc when the SAWG is there? Or, is this due to the SAWG logic itself running?
<hartytp>
if the latter, one could imagine a thermal/si/pi issue inside the FPGA. e.g. maybe we should double check the decoupling situation on Sayma
<sb0>
most likely vivado is changing the layout when sawg is enabled. we can lock it (build several FPGA areas separately and stitch them together) but that's quite difficult to do
rohitksingh_work has quit [Read error: Connection reset by peer]
<rjo>
the <1e-5 error windows on kc705/kasli/sayma are all the same: ~700ps
<rjo>
_florent_: how is tom's board doing?
<hartytp>
Sorry, I haven't followed this closely.
<hartytp>
but, what's the status
<hartytp>
there have been a few gw/fw changes
<hartytp>
is the idea that teh gw changes have improved the eye scans?
<hartytp>
or is the thinking that the eyes were actually fine, and we just needed more rubust fw?
<hartytp>
or, not known yet?
<rjo>
we didn't have any boards that had the problem while we were doing the changes. gw may have resolved it and sw read leveling is as good as i can imagine.
<rjo>
we are waiting for jbqubit and pawel to test on their boards. _florent_ is testing on his.
<hartytp>
I couldn't remember whether there were any issues with the ml boards. I know they weren't failing mem test, but I thought I remembered seeing some eye scans that looked as bad as the one on my board
<hartytp>
in which case, the mem test passing/failing was luck rather than anything else
<hartytp>
but I might be wrong about that#
<rjo>
ml?
<hartytp>
m-labs
<rjo>
hartytp: i haven't seen any indication of a problem on those boards since wednesday. if you could point me to those bad eye scans i'd like to have a look.
<hartytp>
okay, I'd have to look over the issue. I'll do that when I have time
<_florent_>
rjo: i reinstalled everything from conda, but i'm still not able to flash the board...
<hartytp>
rjo: sorry, being slow about this. Have a driver sketched out, but gave it to a student to finish off as a learning exercise
<_florent_>
hartytp: yes, with last gateware/bootloader
<hartytp>
that looks really nice
<hartytp>
any idea what's different?
<sb0>
_florent_, sawg enabled?
<hartytp>
can you double check with the last version of the gw/fr before I sent it to you and confirm that you still get bad eyes
<_florent_>
hartytp: rjo followed some recommendations in the gateware around the idelayctrl, increased the number of repetitions for the eyescan (64 to 1024) and also improved the algo
<_florent_>
hartytp: yes sure i can test that
<_florent_>
hartytp: i have some from you that were failing, i'll test to see if i'm able to reproduce
<_florent_>
sb0: yes (at least i just generated sayma_amc without options so if it hasn't changed it should be enabled)
<_florent_>
hartytp: for now 150 consecutive restart with memtest passing
<hartytp>
_florent_ would be good to know which change fixed this so we don't run into the same issue in a different form
<hartytp>
whoo!
<_florent_>
hartytp: i know you did some automatic restart tests, what where the results (in % of failures)
<hartytp>
I didn't have time to do that before I shipped the AMC to you
<hartytp>
cjbe did some in the very early days, before all this kerfuffle with SAWG
<rjo>
sb0, whitequark: i did two cleanup rounds over the last couple of months on anaconda. could either of you sweep out old packages that use space? i suspect that might be the reason for the upload failures.
dlrobertson has quit [Ping timeout: 240 seconds]
rooi-oog has joined #m-labs
dlrobertson has joined #m-labs
<hartytp>
here is (untested) the way I imagined the AD53xx driver looking
<hartytp>
everything I want for Zotino is then a trivial wrapper on it
<GitHub-m-labs>
[artiq] jordens commented on issue #940: Yes. I added a workaround. The MASTER_RESET pulse leads to the AD9910 not responding every other time that experiment is run (and errors). It is related to the activity after the cpld and dds init and it is also related to the underflow error. The IO_RST pulse is benign but also does not help. I don't understand yet why the AD9910 gets messed up if and only if there is t
<hartytp>
was trying to get the names of everything a bit more consistent and follow a design pattern that seemed easy to remember
mumptai has joined #m-labs
<hartytp>
rjo: thanks for the review.
<hartytp>
what do you want to do?
<hartytp>
I felt that my version would provide a cleaner interface for users, given common use cases
<hartytp>
if you disagree, feel free to do something else
<hartytp>
then I'll test
<hartytp>
anyway, the zotino driver on top of that is written
<hartytp>
and works fine for the couple of things I tested it on (after a couple of small fixes to the code I committed)
juliusb has quit [Remote host closed the connection]
<rjo>
hartytp: there are a bunch of things happening at the same time, some completely fine, others a bit contentious. let's check with dave on what he thinks and then do it piece by piece.
rooi-oog has left #m-labs [#m-labs]
<hartytp>
fine.
<hartytp>
anyway, as I said, I'm really just having a play around here as I don't yet have strong expectations about how this should look. if you want to do something different then I really don't mind as long as it
<hartytp>
provides about the same functionality (or you can provide a good argument why that functionality shouldn't be in the drive)
<sb0>
those ultrascale ioserdes are really annoying... the serdes ttl phy seems ok on drtio master, but I get those TPWS violations on the standalone design
<GitHub-m-labs>
[artiq] jordens commented on issue #965: Writing the new state file should be atomic. And when a file can't be read (but exists) it should not be cleared/overwritten. AFAICT that's the case.... https://github.com/m-labs/artiq/issues/965#issuecomment-374691232
<GitHub-m-labs>
[artiq] jordens commented on issue #908: I'd just do the memtests in a loop. That's much faster and it isolates problems. If those work, then there is little left to check (other than the eye location algorithm but i don't see how that could be improved) on that board. https://github.com/m-labs/artiq/issues/908#issuecomment-374721105
<whitequark>
rjo:
<whitequark>
RTM FPGA XADC:
<whitequark>
TEMP 36.67 C
<whitequark>
AMC FPGA XADC:
<whitequark>
TEMP 45.53 C
dlrobertson has quit [Read error: Connection reset by peer]
dlrobertson has joined #m-labs
dlrobertson has quit [Ping timeout: 240 seconds]
<GitHub-m-labs>
[artiq] r-srinivas commented on issue #965: > Writing the new state file should be atomic. And when a file can't be read (but exists) it should not be cleared/overwritten. AFAICT that's the case.... https://github.com/m-labs/artiq/issues/965#issuecomment-374741613
<GitHub-m-labs>
[artiq] jordens commented on issue #965: I can't see how that could happen given the code. Unless there are hard disk problems or windows was particularly unhelpful. Or if the error causing the generation of a new cleared settings file is somewhere else. But maybe there is some feature that could be added to recover from that.... https://github.com/m-labs/artiq/issues/965#issuecomment-374756202
mumptai has quit [Quit: Verlassend]
<GitHub-m-labs>
[artiq] hartytp commented on issue #908: Great work all! That eye scan looks really good. Given the build-build variations we've seen, it's hard to be 100% sure that this is fixed properly, but that does look extremely encouraging. Will be interested to hear from the other people with Sayma about how this looks on their boards.... https://github.com/m-labs/artiq/issues/908#issuecomment-374759405
<GitHub-m-labs>
[artiq] cjbe commented on issue #908: @hartytp if I understand correctly, with the gateware you were using @enjoy-digital sees ~1% failures - IIRC you were seeing a rate much higher than this (i.e. >90%). Could this difference be a power supply issue at your end? https://github.com/m-labs/artiq/issues/908#issuecomment-374759977
<GitHub-m-labs>
[artiq] hartytp commented on issue #908: > I've calculated the maximum rated power draw based on the values in the schematics just today. AMC alone is 2.9A and AMC+RTM are 11.5A.... https://github.com/m-labs/artiq/issues/908#issuecomment-374769257
<GitHub-m-labs>
[artiq] hartytp commented on issue #908: Thinking about this more, the board where @gkasprow and @marmeladapk carefully verified the PI also showed bad eye scans, which also suggests that this isn't related to the PSU I'm using (we looked into all that carefully before M-Labs started their thorough gateware investigation....) https://github.com/m-labs/artiq/issues/908#issuecomment-374779574