sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: > Running this as a startup kernel crashes Sayma in 100% of the cases:... https://github.com/m-labs/artiq/issues/1065#issuecomment-397493251
<GitHub131> [smoltcp] pothos commented on issue #237: I'm talking about this which solves the problem of too big TCP MSS:... https://github.com/m-labs/smoltcp/pull/237#issuecomment-397493298
<GitHub55> [smoltcp] pothos commented on issue #237: I'm talking about this which solves the problem of too big TCP MSS:... https://github.com/m-labs/smoltcp/pull/237#issuecomment-397493298
sb0 has joined #m-labs
<sb0> the crasher kernel still crashes sayma without the LOC. but it seems it again prints garbage on the UART instead of freezing.
<sb0> _florent_, is the RTM FPGA supposed to be completely reset during link init??
<sb0> there are still intermittent serwb init failures that seem to depend on what happened to the AMC before the boot
<sb0> or it reboots again. and I'm not sure if that's significant, but the memory scan look worse after a crash-induced reboot.
<sb0> _florent_, also, why does it sometimes freeze in "wishbone test"? afaik the core is supposed to do a bus error when there is a problem, not stall transactions indefinitely
<sb0> and this doesn't seem to be due to memory corruption
<_florent_> sb0: yes the rtm is fully reset during link init. It seems that after a crash, the hardware is not working as well as before the crash (sdram scan, someone also reported serwb errors IIRC), but don't understand why
<_florent_> sb0: for the wishbone freeze, i can add a timeout if we don't receive response to read and generate an error
Gurty has quit [Ping timeout: 245 seconds]
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Another data point.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397527119
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: @hartytp @gkasprow @jbqubit @jordens Can you reproduce those "reboot loop" results? https://github.com/m-labs/artiq/issues/1065#issuecomment-397527554
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Another data point.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397527119
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Another data point.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397527119
<sb0> so, it seems the DACs are also trashing the FPGA
<sb0> or the JESD core, the transceivers, or anything that gets initialized in board_artiq::ad9154::init
<sb0> maybe it's a power integrity issue?
<rjo> sb0: when i was reviewing the rtm platform def, i also started the amc but didn't get far. one thing that may be useful to check is whether the lvds inputs (especially clocks on non-gt inputs) have termination. probably not the cause of the current problems but still worthwhile.
<sb0> those reboot loop results seem well reproducible, at least when I'm running them on the HK board
rohitksingh has joined #m-labs
<_florent_> sb0: i think you sent the same diff two times in #1065
<sb0> _florent_, fixed thanks
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Another data point.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397527119
<rjo> sb0: let's bisect ad9154::init with that jump.
<sb0> rjo, done already. posting.
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: It's ``jesd_unreset()`` that causes the crashy behavior.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397538806
<GitHub-m-labs> [artiq] enjoy-digital commented on issue #1065: @sbourdeauducq: in your first case, the HMC830 and HMC7043 are initialized, but the buffers on the FPGA inputs are still disabled. (We enable them when initializing the DAC).... https://github.com/m-labs/artiq/issues/1065#issuecomment-397538869
<GitHub-m-labs> [artiq] enjoy-digital commented on issue #1065: @sbourdeauducq: so let's remove the buffers (or add a separate control for them) and redo your first test. If that's still working fine, then it's related to the JESD/DACs. If not, then to the HMC830/HMC7043. https://github.com/m-labs/artiq/issues/1065#issuecomment-397539922
<GitHub-m-labs> [artiq] enjoy-digital commented on issue #1065: @sbourdeauducq: so let's remove the enables on the buffers (or add a separate control for them) and redo your first test. If that's still working fine, then it's related to the JESD/DACs. If not, then to the HMC830/HMC7043. https://github.com/m-labs/artiq/issues/1065#issuecomment-397539922
rohitksingh has quit [Quit: Leaving.]
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: > If HMC7043 is now no longer supposed to generate broadband noise, we could remove the enable on the buffers:... https://github.com/m-labs/artiq/issues/1065#issuecomment-397551611
rohitksingh has joined #m-labs
rohitksingh has quit [Ping timeout: 260 seconds]
rohitksingh has joined #m-labs
sb0 has quit [Ping timeout: 256 seconds]
sb0 has joined #m-labs
<GitHub-m-labs> [artiq] cjbe opened issue #1073: DRTIO SFP LEDs should show link status https://github.com/m-labs/artiq/issues/1073
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: With this patch the board simply never boots.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397609928
cr1901_modern has left #m-labs [#m-labs]
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @sbourdeauducq interesting! The conclusion here is that the HMC7043 is still interfering with the FPGA, at least after recovering from a crash -- even with the RESET connected and pulled to 3V3.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397610851
cr1901_modern has joined #m-labs
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Simply apply the patch above - then it doesn't boot at all and fails memtest! No need to have a prior crash, this also happens right after a power cycle. https://github.com/m-labs/artiq/issues/1065#issuecomment-397611190
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: And loading the RTM FPGA, which is supposed to hold the 7043 in reset, does not help. https://github.com/m-labs/artiq/issues/1065#issuecomment-397611278
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: BTW, all the tests above were done with the LOCs and CLOCK_ROOTs removed, so it's not that interfering either. https://github.com/m-labs/artiq/issues/1065#issuecomment-397611593
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: https://m-labs.hk/sayma/sayma_crash.tar.bz2 is with the patch above, the LOCs removed, and Vivado 2018.1... https://github.com/m-labs/artiq/issues/1065#issuecomment-397614189
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: https://m-labs.hk/sayma/sayma_crash.tar.bz2 is with the patch above, the LOCs/CLOCK_ROOTs removed, and Vivado 2018.1... https://github.com/m-labs/artiq/issues/1065#issuecomment-397614189
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Tomorrow I'll try with the RTM disconnected; can you try that too? https://github.com/m-labs/artiq/issues/1065#issuecomment-397614386
<GitHub-m-labs> [artiq] jbqubit commented on issue #1065: I'm at a meeting away from Maryland so can't test further until Monday. https://github.com/m-labs/artiq/issues/1065#issuecomment-397615898
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: What startup Kernel are you using? And, other than the patch, this is with the current ARTIQ master, right? https://github.com/m-labs/artiq/issues/1065#issuecomment-397620764
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: > What startup Kernel are you using? ... https://github.com/m-labs/artiq/issues/1065#issuecomment-397623396
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: And, without SAWG is ok? https://github.com/m-labs/artiq/issues/1065#issuecomment-397624503
<GitHub-m-labs> [migen] jordens pushed 1 new commit to master: https://github.com/m-labs/migen/commit/19e82b7869cd6af5f9eb5f6f0559016480d6eba0
<GitHub-m-labs> migen/master 19e82b7 Robert Jördens: sayma_amc: diff term lvds inputs
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Not tried. I only built with SAWG. https://github.com/m-labs/artiq/issues/1065#issuecomment-397628911
<bb-m-labs> build #284 of migen is complete: Failure [failed python_unittest] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/284 blamelist: Robert J?rdens <rj@quartiq.de>
<GitHub-m-labs> [migen] jordens pushed 1 new commit to master: https://github.com/m-labs/migen/commit/9929b232aaa4cccf4131feba55fefe773d690a99
<GitHub-m-labs> migen/master 9929b23 Robert Jördens: sayma_amc: fix 19e82b7 syntax
<bb-m-labs> build #285 of migen is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/285
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @sbourdeauducq using the binaries you posted above, I see the following: https://drive.google.com/open?id=1UFnlC5iAUXjeOq-2ejaEp8Zt9Ez93hm-... https://github.com/m-labs/artiq/issues/1065#issuecomment-397635433
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Can you try the to repoduce the reboot loops here: https://github.com/m-labs/artiq/issues/1065#issuecomment-397527119 https://github.com/m-labs/artiq/issues/1065#issuecomment-397636084
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Can you try the to reproduce the reboot loops (and the crashes) here: https://github.com/m-labs/artiq/issues/1065#issuecomment-397527119 https://github.com/m-labs/artiq/issues/1065#issuecomment-397636084
<GitHub-m-labs> [artiq] jordens pushed 1 new commit to master: https://github.com/m-labs/artiq/commit/edfae3c4bac7bbbd4d5899ab2f92de9def23fe27
<GitHub-m-labs> artiq/master edfae3c Robert Jördens: hmc7043: make fpga fabric clocks lvds...
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @sbourdeauducq Sure.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397638080
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Also, are you sure that you have the HMC7043 rework done correctly, and that you're holding the chip in reset mode during boot? Can you try removing the AC coupling caps that connect the HMC7043 to the two FPGAs on your board? https://github.com/m-labs/artiq/issues/1065#issuecomment-397638448
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: I don't have the pullup, but if I keep the RTM FPGA loaded, in theory, it should not matter... https://github.com/m-labs/artiq/issues/1065#issuecomment-397638842
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: > No, unmodified bitstream.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397639779
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: > You mean taking ARTIQ master and applying only this patch:... https://github.com/m-labs/artiq/issues/1065#issuecomment-397640146
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Okay, I'll build that now. https://github.com/m-labs/artiq/issues/1065#issuecomment-397641590
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Okay, I'll build that now (with SAWG). https://github.com/m-labs/artiq/issues/1065#issuecomment-397641590
hartytp has joined #m-labs
<hartytp> rjo: I think that driving LVDS FPGA inputs from LVPECL is fine
<hartytp> LVPECL is somewhat better for SI long traces IIRC as it has a stronger drive
<hartytp> also, if you want to use LVDS for that then I think you need to modify the hw by removing the 200R bias resistors
<hartytp> on the hmc7043 outputs
<hartytp> rjo: do we need a multireg on the DAC sysref ?
<rjo> hartytp: i don't think 1.9Vpp into an unterminated LVDS 1.8V HP bank biased to a 0.9 V is fine.
<hartytp> hmmm...
<hartytp> 1.9Vpp
<hartytp> yes, I'm used to thinking about it as 800mV ish as a single-ended signal
<rjo> hartytp: have you tested that? i don't think on the clocks is the problem here. if this turns out to make the sysref scan bad or make sync fail, then we can revisit.
<hartytp> no, it's just something I noticed when looking over the code after some of the SC1 issues, and was curious about
<rjo> and i don't think the 200r bias resistors will hurt lvds swing (given that we run it in high perf mode at 750 mVpp).
<hartytp> rjo: well, if that is a problem, I think we need to remove those resistors on the HMC7043 outputs
<hartytp> rjo: am I being daft here, or is your argument about the LVPECL not correct
<hartytp> 1.8Vpp is the differential signal
<hartytp> the input swing across each input is 800mVpp
<hartytp> which should be fine
<hartytp> or am I missing something?
<rjo> hmc7043 table 6 i am reading 1.9Vpp at 1 GHz. assuming that's with the 200R output, that would swing to 0.9+0.95V=1.85V which is high given the input.
<rjo> and vin_max=vcco+0.2v=2v is pretty close considering that there might well be overshoots.
<rjo> it seems completely unnecessary to drive that into the fpga fabric.
<rjo> and obviously if there is no input termination on the LVDS inputs, all bets are probably off.
<hartytp> table 7
<hartytp> well, yes, the lack of termination is obviously not at all good
<rjo> figure 6. sorry.
<hartytp> since it's a doubly unterminated line
<hartytp> rjo: that's differential
<hartytp> i.e. the difference between the p and n outputs
<hartytp> so each of those outputs only swings by 800mV (which is standard for LVPECL)
<rjo> the driver side is terminated.
<rjo> 850 mV.
<hartytp> sure
<rjo> but it's not standard for a LVDS receiver at all.
<hartytp> no, it's not standard, but I wouldn't have thought it would do any harm...
<hartytp> (particularly not after some loss in transmission lines)
<rjo> well. the datasheet has 600 mV vdiff max.
<hartytp> okay
<hartytp> well, in that case you're 100% right :)
<hartytp> there could be some diodes between the inputs
<hartytp> or something like that
<hartytp> well, we're clearly exceeding that by quite some margin, which I can well imagine causing issues
<rjo> but apart from being potentially harmful it seems to be unneeded as there is no indication that there is a SI issue on sysref.
<rjo> they talk about certain cases where higher vdiff is tolerated but is seems pointless to explore that.
<hartytp> ack. I don't think it was something we put much thought into
<sb0> the artix-7 fpga has clamp diodes to I/O bank VCC and ground
<sb0> and they are permanently connected, unlike in some other fpga families
<hartytp> well, in that case I would only expect a max se swing limit and not a max vdiff limit
<hartytp> but, who knows
<hartytp> anyway, as rjo says, LVPECL is probably over the top for those signals, so let's stick with LVDS and probably remove the bias resistors in the next revision
<bb-m-labs> build #1647 of artiq-board is complete: Exception [exception conda_build_output] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1647 blamelist: Robert J?rdens <rj@quartiq.de>
<rjo> 750mV vodiff pp in LVDS (high power as currently) is still massive and more than 2*vidiff typ. i don't think the 100R each are going to hurt that signal.
<bb-m-labs> build #2451 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2451 blamelist: Robert J?rdens <rj@quartiq.de>
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @sbourdeauducq done. No crashes with that patch either. Will post log in a sec.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397657682
<GitHub-m-labs> [artiq] jordens pushed 1 new commit to master: https://github.com/m-labs/artiq/commit/70fd369e2fe8f33f825646ca0d8a282ba0e187f8
<GitHub-m-labs> artiq/master 70fd369 Robert Jördens: conda: bump migen (sayma lvds diff term)
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Does the this kernel crash on your board? https://github.com/m-labs/artiq/issues/1065#issuecomment-396824032 ... https://github.com/m-labs/artiq/issues/1065#issuecomment-397658312
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Does this kernel crash on your board? https://github.com/m-labs/artiq/issues/1065#issuecomment-396824032 ... https://github.com/m-labs/artiq/issues/1065#issuecomment-397658312
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @sbourdeauducq done. No crashes with that patch either. Log: https://drive.google.com/open?id=1ds1_6zj5BBNqK26HGczKXWkuby6dTMHy... https://github.com/m-labs/artiq/issues/1065#issuecomment-397657682
sb0 has quit [Quit: Leaving]
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Rebuilt without the `jump(0)` and flashed that as a startup kernel.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397663902
<rjo> sb0: i assume the rtm overheated again.
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: NB I haven't built with @jordens latest finds yet. https://github.com/m-labs/artiq/issues/1065#issuecomment-397664094
<GitHub-m-labs> [artiq] jordens pushed 1 new commit to master: https://github.com/m-labs/artiq/commit/40baa8ecba6bb39913ade5a52a750c211ec11e0c
<GitHub-m-labs> artiq/master 40baa8e Robert Jördens: hmc7043: disable ch 10 and 11 group
<rjo> hartytp: i can't test. feel free to revert edfae3c if it is shown to cause SI problems.
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: I take that back.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397664870
<hartytp> rjo: it's probably fine, but ack
<hartytp> sb0: so, I do see that kernel crashing my board
<hartytp> but, I didn't see the other issues you pointed to (although, I had a different kernel flashed at the time)
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @sbourdeauducq what do you expect to see on the UART if that Kernel runs correctly? https://github.com/m-labs/artiq/issues/1065#issuecomment-397667281
<hartytp> sb0 anything else you want me to look at
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: hmm...after a reboot, the kernel ran (same output on UART), 5 minutes later, no crash afaict. Re loading the AMC FPGA with `artiq_flash -t sayma ... start` mem test looks good, and the Kernel runs again with the same output. https://github.com/m-labs/artiq/issues/1065#issuecomment-397668228
<hartytp> as I said though, it might be worth checking that you reall can disable the HMC7043 after your rework by holding reset high
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: hmm...after a reboot, the kernel ran (same output on UART), 5 minutes later, no crash afaict. Re loading the AMC FPGA with `artiq_flash -t sayma ... start` mem test looks good, and the Kernel runs again with the same output.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397668228
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Rebuilt without the `jump(0)` and flashed that as a startup kernel.... https://github.com/m-labs/artiq/issues/1065#issuecomment-397663902
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: ~@sbourdeauducq what do you expect to see on the UART if that Kernel runs correctly?~ https://github.com/m-labs/artiq/issues/1065#issuecomment-397667281
<hartytp> okay, I assume that's all you want from me?
<hartytp> but let me know if there are any test you want me to run?
<bb-m-labs> build #1648 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1648
hartytp has quit [Quit: Page closed]
<bb-m-labs> build #2452 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2452 blamelist: Robert J?rdens <rj@m-labs.hk>
rohitksingh has quit [Quit: Leaving.]
<bb-m-labs> build #1649 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1649
<bb-m-labs> build #2453 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2453 blamelist: Robert J?rdens <rj@quartiq.de>
rohitksingh has joined #m-labs
<GitHub-m-labs> [artiq] jordens closed pull request #1054: Added 'unset' method to I2C switch (master...for_merge) https://github.com/m-labs/artiq/pull/1054
<GitHub20> [smoltcp] podhrmic commented on issue #236: Added tests and the Ethernet header size to MTU, instead of #237 https://github.com/m-labs/smoltcp/pull/236#issuecomment-397702422
<GitHub23> [smoltcp] podhrmic commented on issue #237: Please see #236 I added the header size there. I am closing this and moving the discussing there.... https://github.com/m-labs/smoltcp/pull/237#issuecomment-397703703
<GitHub65> [smoltcp] podhrmic closed pull request #237: Fix MTU settings so fragmented packets can be received (master...proper_mtu_handling) https://github.com/m-labs/smoltcp/pull/237
<GitHub132> [smoltcp] podhrmic commented on issue #236: @pothos It turns out that counting in the ethernet header is strictly speaking needed only for UDP packets. With MTU of 1500, linux sends ethernet frames that are 1514 bytes long. For TCP packets, they are only 1500 bytes long (including the header). You can try this with a wireshark and see for yourself.... https://github.com/m-labs/smoltcp/pull/236#issuecomment-39770
rohitksingh has quit [Quit: Leaving.]
<bb-m-labs> build #1650 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1650
<bb-m-labs> build #2454 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2454 blamelist: Robert J?rdens <rj@m-labs.hk>
dlrobertson has joined #m-labs
hozer has joined #m-labs