sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: Other data points (may or may not be relevant):... https://github.com/m-labs/artiq/issues/1065#issuecomment-399295854
kaolpr has quit [Ping timeout: 245 seconds]
kaolpr has joined #m-labs
<GitHub-m-labs> [artiq] sbourdeauducq pushed 3 new commits to master: https://github.com/m-labs/artiq/compare/c1db02a3513f...60b22217ce0e
<GitHub-m-labs> artiq/master 60b2221 Sebastien Bourdeauducq: sayma: set DRTIO master HMC830_REF to 100MHz
<GitHub-m-labs> artiq/master e6d1726 Sebastien Bourdeauducq: sayma: add RTIO log to DRTIO master
<GitHub-m-labs> artiq/master 8342896 Sebastien Bourdeauducq: sayma: add SAWG and JESD to DRTIO master
<GitHub51> [smoltcp] jhwgh1968 commented on issue #232: Nevermind. I was copying the wrong test as a basis.... https://github.com/m-labs/smoltcp/pull/232#issuecomment-399298995
<GitHub-m-labs> [artiq] sbourdeauducq opened issue #1079: support runtime build without RTIO DMA https://github.com/m-labs/artiq/issues/1079
<GitHub62> [smoltcp] jD91mZM2 commented on issue #244: There wasn't much to do in `ethernet.rs` unfortunately since a ManagedSlice cannot be pushed/removed to. https://github.com/m-labs/smoltcp/pull/244#issuecomment-399337216
<GitHub-m-labs> [migen] sbourdeauducq pushed 11 new commits to master: https://github.com/m-labs/migen/compare/07c46f55474b...e4e92dca1010
<GitHub-m-labs> migen/master 1eeb38d Caleb Jamison: Fixed missing parens, extra spaces
<GitHub-m-labs> migen/master 0dd85cd Caleb Jamison: Split pmods to _connectors, checked against litex...
<GitHub-m-labs> migen/master 04a9914 Caleb Jamison: Arty A7 platform...
sb0 has joined #m-labs
<sb0> Quad DAC @ 12GSPS with Quad ADC @ 3GSPS, Kintex UltraScale AMC
<GitHub17> [smoltcp] whitequark commented on pull request #244 3d9b73b: The purpose of this method is to be able to update IP addresses without assigning a different ManagedSlice. It's for memory-constrained devices without an allocator. https://github.com/m-labs/smoltcp/pull/244#discussion_r197364017
<bb-m-labs> build #288 of migen is complete: Exception [exception conda_build_output] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/288 blamelist: Caleb Jamison <cbjamo@gmail.com>
<bb-m-labs> build #287 of migen is complete: Failure [failed python_unittest] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/287 blamelist: Caleb Jamison <cbjamo@gmail.com>
<bb-m-labs> build #289 of migen is complete: Exception [exception conda_build_output] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/289 blamelist: Caleb Jamison <cbjamo@gmail.com>
rohitksingh_work has joined #m-labs
<sb0> ffs the sayma bug festival never ends, does it? cannot load RTM FPGA gateware: "Did not exit INIT after releasing PROGRAM" appeared out of the blue on one board
<sb0> meanwhile, the other one developed new power supply problems
<sb0> mh, the RTM loading failure seems to be another symptom of the general sayma memory corruption/insanity ...
<sb0> it works with non-sawg gateware
<GitHub-m-labs> [artiq] sbourdeauducq pushed 3 new commits to master: https://github.com/m-labs/artiq/compare/60b22217ce0e...f87da95e57d2
<GitHub-m-labs> artiq/master f87da95 Sebastien Bourdeauducq: jesd204: use jesd clock domain for sysref sampler...
<GitHub-m-labs> artiq/master 76fc63b Sebastien Bourdeauducq: jesd204: use separate controls for reset and input buffer disable
<GitHub-m-labs> artiq/master d9955fe Sebastien Bourdeauducq: jesd204: make sure IOB FF is used to sample SYSREF at FPGA
<sb0> sayma as satellite with a kasli master seems to work just fine. the sayma master, on the other hand, is completely trashed since I added SAWG
<sb0> it seems even crashier than the standalone target
hartytp_ has joined #m-labs
<sb0> sync between the two sayma doesn't work at all, on the other hand...
<sb0> the phase even varies without rebooting any board
<sb0> there are discrete phase jumps in the output. i guess there's jitter on sysref or something
<sb0> those jumps are present on one board only (this is two satellites, same gateware, driven by kasli)
<sb0> that board is Florent's board, on which one DAC is dead... could be just a hardware problem?
<sb0> hartytp, can you test?
<sb0> once the drtio link is established, there are no more sysref adjustments, and I didn't see phase jumps on the other board, so it looks like a one-off board issue
<sb0> could also try running the standalone design on Florent's board and look at phase jumps to confirm ...
<GitHub-m-labs> [artiq] sbourdeauducq pushed 1 new commit to master: https://github.com/m-labs/artiq/commit/51a5d8dff9670d0ff8ef43abc19fbdd73644fc8c
<GitHub-m-labs> artiq/master 51a5d8d Sebastien Bourdeauducq: examples: add Kasli SAWG master
<sb0> hartytp, you need a coax cable between amc clkout and rtm clkin
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: @gkasprow Any progress on the FPGA PI measurements while the SAWG is running? https://github.com/m-labs/artiq/issues/1065#issuecomment-399405633
sb0 has quit [Quit: Leaving]
<bb-m-labs> build #1672 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1672
<bb-m-labs> build #2475 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2475 blamelist: Sebastien Bourdeauducq <sb@m-labs.hk>
<hartytp_> sb0: first I want to look at the memory corruption more (DRTIO is less interesting to me until we get rid of the crashes)
<hartytp_> Plan for today is to add mem tests to sayma standalone during and after boot
<hartytp_> and see if we can track this down a bit
<hartytp_> another question I think we need answered is what the sawg actually does to kill Sayma
<hartytp_> does it trigger a vivado bug?
<hartytp_> or, does it trigger PI/SI issues?
<hartytp_> I think a good test would be to add a separate reset for the sawg only that is released as the last thing during boot
<hartytp_> what do you think?
<hartytp_> hmm...so, the SAWG is clocked from the rio_phy cd
<hartytp_> which is held in reset until rtio_mgt::startup
<bb-m-labs> build #1673 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1673
<hartytp_> wait, sb0: remind me how the sawg reset works
<hartytp_> so, initially, it's not held in reset, but no clock is applied
<hartytp_> then we enable the CB so it's clocked
<hartytp_> but, it's still not held in reset (and there could be a runt pulse or two on that clock input)
<hartytp_> it then gets reset by the frimware later on via reset_phy_write()
<hartytp_> right?
<bb-m-labs> build #2476 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2476 blamelist: Sebastien Bourdeauducq <sb@m-labs.hk>
sb0 has joined #m-labs
rohitksingh_work has quit [Read error: Connection reset by peer]
<hartytp_> sb0: right, so rio_phy.rst is pulsed for one cycle whenever the CSR is written to https://github.com/m-labs/artiq/blob/5a91f820fd962629d569b5e9a2c98c9329fbaa37/artiq/gateware/rtio/core.py#L42
<hartytp_> so, AFAICT, the SAWG starts up as soon as the CB is enabled
<hartytp_> so, a related question: when we discussed the crashes related to the HMC7043 noise, I thought you said that all logic clocked from the HMC7043 was held in reset during the boot
<hartytp_> but, that's not true, is it?
<hartytp_> the SAWG is clocked from that and is *not* held in reset during boot
<hartytp_> so, we were running a pretty big chunk of logic from a crap clock
<hartytp_> that may explain why the issues we saw were so bad. I wonder if we would have had such bad HMC7043 issues if the rtio_phy CD was held in reset until we'd gaurenteed a stable clock...
<sb0> normally there is no difference, especially since it's a synchronous reset
<sb0> additionally, nothing in the rio_phy domain is supposed to interfere with the CPU or SDRAM
<hartytp_> sure
<hartytp_> but, normally you would assume a stable clock for that
<hartytp_> the HMC7043 isn't really designed to do that. at least not during boot
<hartytp_> anyway, not saying that that was our issue, but it does mean that one of the assumptions that I though we had agreed on when talking about the HMC7043 was not correct
<sb0> not really; timing violations may corrupt the state of FFs, but then a reset would clear them
<hartytp_> yes, so the model here would have to be some kind of PI/SI issue
<sb0> sending 2GHz noise through the FPGA clock networks, on the other hand, can cause problems, and synchronous resets won't help
<hartytp_> yes, but it's probably best practice to hold the logic in reset until the clock is good
<hartytp_> anyway, I think I'll OR the rtio_phy with a CSR that defaults to 1 and then release it at the end of boot
<hartytp_> then run mem tests at a few points during boot and see if I can identify what on earth is going on
<sb0> just use the existing ResetSignal("rtio")
<sb0> that one should be asserted until the "rtio" clock is stable
<hartytp_> that doesn't do what I want
<hartytp_> does it?
<hartytp_> that only goes high for one clock cycle when the reset CSR is written to
<hartytp_> I want to tie it high during boot
<sb0> on sawg it does what you want, it defaults to 1 and then it is released by jesd_unreset()
<sb0> that one is rsys
<hartytp_> oops, rtio v rio
<hartytp_> too many cds
<hartytp_> okay, so OR the rio_phy reset with the rtio reset
<hartytp_> yes, that should work
<hartytp_> well, need to finish slave FPGA loading rework first then I'll do that
<hartytp_> thanks
<sb0> I connected it on the via, it's easier than the DAC pin and requires a shorter wire
<sb0> well, with the DRTIO satellite, rio_phy *is* held in reset until the clock is stable
<sb0> (rio_phy is in reset whenever there is no link)
rohitksingh has joined #m-labs
rohitksingh has quit [Quit: Leaving.]
rohitksingh has joined #m-labs
rohitksingh has quit [Quit: Leaving.]
<GitHub185> [smoltcp] jD91mZM2 commented on pull request #244 3d9b73b: Well you still can't append to it. Should I just make a way to set an ip based on index? https://github.com/m-labs/smoltcp/pull/244#discussion_r197450901
<GitHub125> [smoltcp] jD91mZM2 commented on pull request #244 3d9b73b: Should I make a way to set an ip based on index? Or revert the update thing? https://github.com/m-labs/smoltcp/pull/244#discussion_r197450901
<GitHub62> [smoltcp] jD91mZM2 commented on pull request #244 3d9b73b: Should I make a way to set an ip based on index? Or revert the update thing for that function? https://github.com/m-labs/smoltcp/pull/244#discussion_r197450901
<GitHub168> [smoltcp] jD91mZM2 commented on pull request #244 3d9b73b: Should I make a way to set an ip based on index? Or revert the change to that file? https://github.com/m-labs/smoltcp/pull/244#discussion_r197450901
<hartytp_> sb9: whic via?
<hartytp_> sb0: which via?
<sb0> hartytp_, on top of the rtm fpga, you see it on the picture greg posted here https://github.com/m-labs/artiq/issues/813#issuecomment-396751318
<hartytp_> sb0: hmmm
<hartytp_> latest artiq master
<hartytp_> the crash kernel doesn't seem to crash
<hartytp_> oops, never mind that's without sawg
<hartytp_> silly me
<hartytp_> sb0: is this what you had in mind for the mem tests? https://github.com/hartytp/artiq/tree/mem_test
<sb0> hartytp_, yes, something like this
<sb0> why did you remove the prng16? there should be enough space for a 4*64k buffer
<hartytp_> okay, good
<hartytp_> i wasn't sure how much ram we have
<hartytp_> thanks, fixed
<hartytp_> now just need gateware to build
<GitHub-m-labs> [artiq] hartytp commented on issue #813: After the rework, this also works for me. Thanks! https://github.com/m-labs/artiq/issues/813#issuecomment-399478516
sb0_ has joined #m-labs
<sb0_> there's a lot of RAM, just make sure you don't overflow the stack
sb0 has quit [Ping timeout: 240 seconds]
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: A couple of tests I'm planning to run:... https://github.com/m-labs/artiq/issues/1065#issuecomment-399478960
<hartytp_> that's unlikely with an array size of 0x10000
<hartytp_> isn't it?
<hartytp_> 32 bit addresses, right?
<sb0_> yes, u32
<GitHub-m-labs> [artiq] jordens commented on issue #1065: We also had discussed adding a blinking LED (or SMA) and reproduce it toggling erratically in the corrupted state (iirc that's something that was observed as one point). That would allow debugging of the clocking when the board is in the failed/corrupted state. https://github.com/m-labs/artiq/issues/1065#issuecomment-399481337
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @jordens how do you make the LED/SMA toggle erratically in the corrupted state? What drives it? https://github.com/m-labs/artiq/issues/1065#issuecomment-399482699
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: I had tried something like this:... https://github.com/m-labs/artiq/issues/1065#issuecomment-399483322
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: what happened? https://github.com/m-labs/artiq/issues/1065#issuecomment-399483817
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Thanks for the reminder. So, you were looking on the DRTIO master build without SAWG (which didn't crash for you). You looked at the blink signal using microscope. Expectation is that it should toggle at about 2Hz (150MHz / 2^28). What exactly did you see? I'm happy to try adding that to my build at some point soon... https://github.com/m-labs/artiq/issues/1065#issu
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: 1Hz or not at all (7043 in reset/not initialized). It toggled randomly and much faster. https://github.com/m-labs/artiq/issues/1065#issuecomment-399486697
<GitHub-m-labs> [artiq] sbourdeauducq commented on issue #1065: 0.3Hz (ODIV2) or not at all (7043 in reset/not initialized). It toggled randomly and much faster. https://github.com/m-labs/artiq/issues/1065#issuecomment-399486697
jkeller has joined #m-labs
<jkeller> bb-m-labs: force build --props=package=artiq-board,artiq_target=kc705,artiq_variant=nist_qc2 artiq-board --branch=release-3
<bb-m-labs> build forced [ETA 43m49s]
<bb-m-labs> I'll give a shout when the build finishes
jkeller has quit [Client Quit]
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: how long did it do that for? You expect the HMC7043 to startup at the wrong frequency for a while before it is configured via SPI. There, you have the CB enabled even during HMC7043 configuration, so there will be a period of "noise". https://github.com/m-labs/artiq/issues/1065#issuecomment-399491943
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Ran the mem test. No errors out of 1048576 reads/writes.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399493838
<GitHub-m-labs> [artiq] jonaskeller commented on issue #1076: I'd like to test this but can't build the newest `kc705-nist_qc2` 3.6 gateware. The bot is building 4.0.dev despite the argument `-branch=release-3`:... https://github.com/m-labs/artiq/issues/1076#issuecomment-399496511
<GitHub-m-labs> [artiq] jonaskeller commented on issue #1076: I'd like to test this but can't build the newest `kc705-nist_qc2` 3.6 gateware. The bot is building 4.0.dev despite the argument `--branch=release-3`:... https://github.com/m-labs/artiq/issues/1076#issuecomment-399496511
<GitHub-m-labs> [artiq] whitequark commented on issue #1076: Arguments are passed before builder name, not after. https://github.com/m-labs/artiq/issues/1076#issuecomment-399499884
<GitHub167> [smoltcp] podhrmic commented on issue #236: @dlrobertson is there anything else I should change in this PR? https://github.com/m-labs/smoltcp/pull/236#issuecomment-399500467
<bb-m-labs> build #1674 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1674
<GitHub-m-labs> [artiq] gkasprow commented on issue #1065: I looked with a scope at the power rails (especially 1V5) but the noise does no exceed a few mV. Nothing suspicious. https://github.com/m-labs/artiq/issues/1065#issuecomment-399506535
jkeller has joined #m-labs
<jkeller> bb-m-labs: force build --props=package=artiq-board,artiq_target=kc705,artiq_variant=nist_qc2 --branch=release-3 artiq-board
<bb-m-labs> build forced [ETA 42m55s]
<bb-m-labs> I'll give a shout when the build finishes
<GitHub-m-labs> [artiq] gkasprow commented on issue #1065: Is there a way to generate hardware trigger once the corruption occurs? I could trigger the scope and observe all power rails. https://github.com/m-labs/artiq/issues/1065#issuecomment-399510666
<bb-m-labs> build #1675 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1675
<GitHub7> [smoltcp] podhrmic opened pull request #248: Log and print error for all examples (master...better_error_handling) https://github.com/m-labs/smoltcp/pull/248
<GitHub35> [smoltcp] podhrmic commented on issue #248: FYI Travis fails because it cannot download rustc, not because of my commits. https://github.com/m-labs/smoltcp/pull/248#issuecomment-399525071
<GitHub-m-labs> [artiq] gkasprow commented on issue #1065: Oh, I didn't read previous posts:) https://github.com/m-labs/artiq/issues/1065#issuecomment-399525222
<GitHub103> [smoltcp] whitequark commented on issue #248: Thanks! https://github.com/m-labs/smoltcp/pull/248#issuecomment-399557188
<GitHub100> [smoltcp] whitequark closed issue #231: Examples should not panic when poll returns an error https://github.com/m-labs/smoltcp/issues/231
<GitHub36> smoltcp/master 4c7606d Michal Podhradsky: Log and print error for all examples
<GitHub36> [smoltcp] whitequark pushed 1 new commit to master: https://github.com/m-labs/smoltcp/commit/4c7606d9c329f20b5ebea6b094f94957d0ff401e
<GitHub162> [smoltcp] whitequark closed pull request #248: Log and print error for all examples (master...better_error_handling) https://github.com/m-labs/smoltcp/pull/248
<travis-ci> m-labs/smoltcp#1060 (master - 4c7606d : Michal Podhradsky): The build has errored.
<GitHub-m-labs> [artiq] hartytp opened issue #1080: Sayma PRBS errors https://github.com/m-labs/artiq/issues/1080
Gurty has joined #m-labs
hartytp__ has joined #m-labs
<hartytp__> I've exposed memory test to kernels https://github.com/hartytp/artiq/tree/mem_test
<hartytp__> I've modified the "crash kernel" to run mem tests https://pastebin.com/fiWdjx86
<hartytp__> but the mem tests run by kernels don't produce any outputs on the UART
<hartytp__> any pointers?
<whitequark> this is a stack-allocated array
<whitequark> first, don't expect this to work on the comms CPU (in runtime code), other than by accident
<whitequark> the runtimestack is much smaller than that
<whitequark> second, there is no logger registered in the code running on comms CPU
<whitequark> you can use println! instead of info!
<whitequark> the comms CPU stack is large enough, so the stack-allocated array is fine
jkeller has quit [Quit: Page closed]
_whitelogger has joined #m-labs
<GitHub-m-labs> [artiq] whitequark commented on issue #1072: Back when it was added, @sbourdeauducq said that the "proper" way to do debug printing is with the `print` RPC; `core_log` was always internal. https://github.com/m-labs/artiq/issues/1072#issuecomment-399571654
<GitHub160> [smoltcp] podhrmic closed pull request #193: Resource exhaustion test (master...resource_exhaustion_test) https://github.com/m-labs/smoltcp/pull/193
<GitHub-m-labs> [artiq] mfe5003 commented on issue #1078: So it looks like I can communicate with kasli using `artiq_coremgmt` and the log (0x01) and reboot (0x05) commands seem to work fine. I can change the log level to debug then try to write a key value pair.... https://github.com/m-labs/artiq/issues/1078#issuecomment-399578773
<GitHub-m-labs> [artiq] marmeladapk commented on issue #1078: @mfe5003 This is a question to @sbourdeauducq or @jordens. But you don't need a idle kernel to work with Kasli and schedule experiments, it's just an experiment that activates when nothing else is happening (for example to toggle diode). https://github.com/m-labs/artiq/issues/1078#issuecomment-399579952
<hartytp__> whitequark: thanks!
<hartytp__> so, I just need to swap the info! for println and all should be good
<GitHub-m-labs> [artiq] jonaskeller commented on issue #1076: Ah, thanks. I've flashed the new gateware and it works now. https://github.com/m-labs/artiq/issues/1076#issuecomment-399580882
<hartytp__> whitequark: "cannot find macro println in this scope"
<hartytp__> what do I need to add to use it?
<GitHub-m-labs> [artiq] whitequark commented on issue #1078: > Is there an old commit I can roll back to to get the kasli board working?... https://github.com/m-labs/artiq/issues/1078#issuecomment-399581742
<GitHub-m-labs> [artiq] mfe5003 commented on issue #1078: @whitequark This is my first time trying to use artiq, so I am trying to figure out how it all works. I did not intend to use different gateware/firmware. It seems like I need to use version 4 to use kasli, because conda ends up pulling from the version 4 dev branch when I do:... https://github.com/m-labs/artiq/issues/1078#issuecomment-399582990
<hartytp__> whitequark: "error: cannot find macro `println!` in this scope"
<whitequark> hartytp__: yes, println! in kernels is defined in ksupport/lib.rs
<whitequark> so put your code there
<hartytp__> aah, thanks
<hartytp__> well, I already hacked it to just return the results rather than printing
<hartytp__> but good to know
<GitHub-m-labs> [artiq] whitequark commented on issue #1078: If you want to stay on the release versions, you can remove the dev channel from conda instead. Alternatively, you could flash the dev channel gateware using the artiq_flash script. https://github.com/m-labs/artiq/issues/1078#issuecomment-399585726
<GitHub48> [smoltcp] whitequark commented on issue #235: Thanks for another high-quality contribution!... https://github.com/m-labs/smoltcp/pull/235#issuecomment-399588077
<GitHub92> [smoltcp] m-labs-homu commented on issue #235: :pushpin: Commit 354b3c4 has been approved by `whitequark`
<GitHub174> smoltcp/auto 78651bb Dan Robertson: Add MLDv2 packet parsing support to wire...
<GitHub174> [smoltcp] m-labs-homu pushed 1 new commit to auto: https://github.com/m-labs/smoltcp/commit/78651bb5696189fd36cf3c6fa3abeebe586c42bc
<GitHub191> [smoltcp] m-labs-homu commented on issue #235: :hourglass: Testing commit 354b3c4b18de1f5c6c3a959c9574db3b0f1b163d with merge 78651bb5696189fd36cf3c6fa3abeebe586c42bc... https://github.com/m-labs/smoltcp/pull/235#issuecomment-399588138
<travis-ci> m-labs/smoltcp#1061 (auto - 78651bb : Dan Robertson): The build has errored.
bb-m-labs has quit [Quit: buildmaster reconfigured: bot disconnecting]
<GitHub-m-labs> [buildbot-config] whitequark pushed 1 new commit to master: https://git.io/f4yrr
<GitHub-m-labs> buildbot-config/master 3317296 whitequark: Allow artiq-board builds with no variant (empty if not provided).
bb-m-labs has joined #m-labs
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: So, we cannot find any evidence of a SI/PI problem after probing the HW, and I can't find any evidence of memory corruption occurring during boot or during kernel operation.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399591428
<GitHub-m-labs> [artiq] gkasprow commented on issue #1080: Maybe this has something to to with TXEN pin of DAC?... https://github.com/m-labs/artiq/issues/1080#issuecomment-399592240
<GitHub-m-labs> [artiq] hartytp commented on issue #1080: Maybe. But it happens on both DACs and we only altered the TXEN on DAC2... https://github.com/m-labs/artiq/issues/1080#issuecomment-399592427
<GitHub-m-labs> [artiq] gkasprow commented on issue #1080: that's true. But this was the only modification I did. There is 3.3V -> 1.8V conversion using 200R resistor that injects current to 1.8V port of DAC and FPGA. Theoretically the FPGA has protection diodes, but DAC may not like voltage peaks of rougly 2.5V (1.8V + 0.7V of diode). I have no idea how this could affect second DAC channel in such bizarre way.... https:/
<GitHub-m-labs> [artiq] hartytp commented on issue #1080: My guess was that it's due to one of the recent ARTIQ commits rather than the HW changes. But, I might be wrong -- I haven't given it too much thought yet. https://github.com/m-labs/artiq/issues/1080#issuecomment-399593872
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: @hartytp Here is how to get a disassembly from a crash dump:... https://github.com/m-labs/artiq/issues/1065#issuecomment-399593988
<GitHub-m-labs> [artiq] gkasprow commented on issue #1080: the funny thing is that I started seeing PRBS errors on one board a few days ago, another was workin well. And next day second board also got PRBS "sickness ". https://github.com/m-labs/artiq/issues/1080#issuecomment-399594020
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: And of course the crash happens in code because you evict all code from L2 cache during the memory test, whatever is executed during memory test doesn't fit in L1, and so on the next code fetch from DRAM you get a crash. https://github.com/m-labs/artiq/issues/1065#issuecomment-399594376
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @whitequark thanks for the explanation. I need to think about that.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399595217
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @whitequark here is the disassembly of the corresponding section of my runtime.elf... https://github.com/m-labs/artiq/issues/1065#issuecomment-399596919
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: Yup, a bitflip. I have no idea how to debug SI on a board like this, but I can't imagine this being anything other than SI or PI. https://github.com/m-labs/artiq/issues/1065#issuecomment-399598903
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: So, as you say, d7 is getting switched to d3.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399598945
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: So, as you say, d7 is getting switched to d3.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399598945
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: One more time:... https://github.com/m-labs/artiq/issues/1065#issuecomment-399599875
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: So, same again. Really doesn't look like random corruption. https://github.com/m-labs/artiq/issues/1065#issuecomment-399599946
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: > Really doesn't look like random corruption. ... https://github.com/m-labs/artiq/issues/1065#issuecomment-399600378
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Okay, but does this *really* seem like PI/SI noise? I'm not seeing any memory issues during my random reads or writes, but always the same bits getting flipped in the same places. That doesn't sound like noise to me. https://github.com/m-labs/artiq/issues/1065#issuecomment-399600708
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: Also, for that last dump, why to I get an illegal instruction error when the disassembly looks fine? https://github.com/m-labs/artiq/issues/1065#issuecomment-399601083
<hartytp__> whitequark/sb0/rjo: okay, I'm a bit out of my depth here, but this doesn't feel like a simple noise issue here as it seems far too deterministic
<hartytp__> let me know if you can think of anything else I should try
<hartytp__> but, given that greg has checked the PI carefully, I think we need to keep looking at ARTIQ to make sure this isn't an issue in the code
<hartytp__> would be good to hear what your plan for dealing with this is, as these problems have gone on for far too long...
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: > I'm not seeing any memory issues during my random reads or writes, but always the same bits getting flipped in the same places. That doesn't sound like noise to me.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399602545
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: Oh, and to add to this: all `println!` statements in the kernel have to go through the runtime, they don't go directly via UART. This means that when your memory test *did* successfully corrupt memory, chances are, the runtime code is *already* corrupted as well. I don't think that you will ever see a failure message with the way this memory test code is composed.
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: You can run the profiler on the comms CPU and I bet the addresses where you see bitflips will also be at the very top of the profiler report. Conversely, if you look at more crashes you'll see different ones too. If you adjust the runtime code so that it does nothing but spins in a loop after you start the memory test kernel, I predict you'll never see a crash. h
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: You can run the profiler on the comms CPU and I bet the addresses where you see bitflips will also be at the very top of the profiler report. (I already know from the logs you posted that these addresses are some of the hottest in the runtime.) Conversely, if you look at more crashes you'll see different ones too. If you adjust the runtime code so that it does nothin
<GitHub-m-labs> [artiq] gkasprow commented on issue #1065: Providing that I can trigger the scope with IO signal, what is estimated time between curruption and the IO toggling? https://github.com/m-labs/artiq/issues/1065#issuecomment-399604168
<whitequark> bb-m-labs: force build --props=artiq_target=sayma_rtm artiq-board
<bb-m-labs> build #1676 forced
<bb-m-labs> I'll give a shout when the build finishes
<bb-m-labs> build #1676 of artiq-board is complete: Exception [exception conda_build_output] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/1676
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: It's also interesting that I always seem to get 28 successful memory tests before a crash. @whitequark I get what you're saying, but I'm still not sure that this feels like white noise. We do a lot of successful reading/writing from RAM and then always have a crash in the same place. Seems like a cop out to say it's PI/SI.... https://github.com/m-labs/artiq/issues/1
<GitHub-m-labs> [artiq] whitequark closed issue #1062: sayma_rtm builds broken https://github.com/m-labs/artiq/issues/1062
<GitHub-m-labs> [artiq] gkasprow commented on issue #1065: It's quite possible i.e. due to amount of SSO (simultaneously switching outputs) that at certain moment there is voltage peak on one of the supply rails, clock signal, termination voltage, etc.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399604575
<GitHub-m-labs> [artiq] mfe5003 commented on issue #1078: I rolled back the artiq package to match the kasli-master gateware and I can now read/write to the device.... https://github.com/m-labs/artiq/issues/1078#issuecomment-399604839
<GitHub-m-labs> [artiq] mfe5003 commented on issue #1078: I rolled back the artiq package to match the kasli-master gateware and I can now read/write to the device.... https://github.com/m-labs/artiq/issues/1078#issuecomment-399604839
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: A fourth crash log: https://hastebin.com/qikexuguwu.go... https://github.com/m-labs/artiq/issues/1065#issuecomment-399605074
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: @gkasprow okay, what I'm seeing looks deterministic. So, you can take the fork of ARTIQ I linked to above and flash that, as well as the startup Kernel I posted. Check that you see the same crashes as me. Then add a line in ARTIQ that pulses a TTL before each mem test. That gives you your trigger. https://github.com/m-labs/artiq/issues/1065#issuecomment-399605264
<GitHub-m-labs> [artiq] whitequark commented on issue #1065: @gkasprow Are you able to trigger on arbitrary DRAM read and observe data? https://github.com/m-labs/artiq/issues/1065#issuecomment-399605427
<GitHub-m-labs> [artiq] gkasprow commented on issue #1065: This is tricky, but I can use i.e. SDRAM read signal as a trigger but cannot say which address is currently being written. I have only four 1GHz active probes and one 5GHz active probe. And the scope has only 4 inputs. I have also logic analyzers but connecting the probes would kill the SI.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399606180
<GitHub-m-labs> [artiq] hartytp commented on issue #1065: > But you don't crash in the same place. You provided four crash logs, and there are three different crash addresses in them. Yes, they are on the same bit, but that's just because of the illegal instruction encodings in or1k.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399606212
<GitHub-m-labs> [artiq] gkasprow commented on issue #1065: I can generate trigger based on sequence of input signals, but this is still not enough to isolate certain address read/write. For this purpose I'd need some logic that toggles IO line.... https://github.com/m-labs/artiq/issues/1065#issuecomment-399606382
hartytp__ has quit [Quit: Page closed]
<GitHub-m-labs> [artiq] klickverbot commented on issue #1065: Potentially a silly idea @hartytp, but what if you add sleeps/busy spins between memtests? Might help to disambiguate between time being the factor vs. number of writes or something else weirdly stateful. (E.g. is this something heating up leading to SI/PI issues? DRAM refresh being borked?) https://github.com/m-labs/artiq/issues/1065#issuecomment-399606890