sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
_rht has joined #m-labs
<mithro> sb0: What is the current status of the DVI Sampler and frame buffer in the current misoc? _florent_ was mentioning something around the DMA interfaces changing and there are a couple of "TODO: rewrite dma_lasmi module" type things in the dvi_sampler code?
<sb0> mithro, I haven't tested it for ages and there has been major misoc refactorings since then, so sure enough it's broken
<mithro> sb0: okay, that is where I thought it was at
<sb0> the bugfixes shouldn't be substantial, though
rohitksingh has joined #m-labs
kuldeep has quit [Ping timeout: 248 seconds]
kuldeep has joined #m-labs
mumptai has quit [Quit: Verlassend]
mumptai has joined #m-labs
<GitHub189> [migen] sbourdeauducq pushed 2 new commits to master: https://git.io/vVvOa
<GitHub189> migen/master 04edf17 Sebastien Bourdeauducq: fhdl: disallow None statements (use empty list instead)
<GitHub189> migen/master 47ef0d1 Sebastien Bourdeauducq: Merge branch 'master' of github.com:m-labs/migen
<bb-m-labs> Hey! build conda-all #16 is complete: Success [build successful]
sandeepkr has joined #m-labs
<bb-m-labs> build #486 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/486
sb0 has quit [Quit: Leaving]
<GitHub10> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVvCk
<GitHub10> artiq/master 63e367a whitequark: compiler: significantly increase readability of LLVM and ARTIQ IRs.
<bb-m-labs> build #487 of artiq is complete: Failure [failed lit_test] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/487 blamelist: whitequark <whitequark@whitequark.org>
<GitHub174> [artiq] whitequark force-pushed master from 63e367a to 3ee9834: https://git.io/vYgPK
<GitHub174> artiq/master 3ee9834 whitequark: compiler: significantly increase readability of LLVM and ARTIQ IRs.
sb0 has joined #m-labs
<bb-m-labs> build #488 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/488 blamelist: whitequark <whitequark@whitequark.org>
<GitHub186> [conda-recipes] whitequark pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/f31203682f4a9d1d94f166b626a554c0658ec155
<GitHub186> conda-recipes/master f312036 whitequark: llvmlite-artiq: bump.
<whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all
<bb-m-labs> build forced [ETA 12h07m12s]
<bb-m-labs> I'll give a shout when the build finishes
<GitHub104> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVvB6
<GitHub104> artiq/master f5c720c whitequark: compiler: tune the LLVM optimizer pipeline (fixes #315).
<bb-m-labs> Hey! build conda-all #17 is complete: Success [build successful]
<whitequark> ok. hm. llvm pipeline customized. but it still doesn't quite see through all the method invocations...
<whitequark> let me figure something out.
<bb-m-labs> build #489 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/489 blamelist: whitequark <whitequark@whitequark.org>
<whitequark> right, not quite as slow, but still too slow
sb0 has quit [Quit: Leaving]
sb0 has joined #m-labs
sandeepkr_ has joined #m-labs
sandeepkr has quit [Ping timeout: 260 seconds]
kuldeep has quit [Ping timeout: 276 seconds]
sandeepkr__ has joined #m-labs
kuldeep has joined #m-labs
sandeepkr_ has quit [Ping timeout: 260 seconds]
rohitksingh has quit [Quit: Leaving.]
<whitequark> mh, LLVM is pretty stupid at breaking up loads of aggregates...
sandeepkr__ has quit [Ping timeout: 276 seconds]
<GitHub14> [conda-recipes] jordens pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/9f642c0c1ff8c93075cfc8d8296dbaed24f2cf59
<GitHub14> conda-recipes/master 9f642c0 Robert Jordens: pygit2: bump
<rjo> bb-m-labs: force build --props=package=pygit2 conda-lin64
<bb-m-labs> build forced [ETA 58 seconds]
<bb-m-labs> I'll give a shout when the build finishes
<bb-m-labs> Hey! build conda-lin64 #107 is complete: Success [build successful]
<whitequark> only lin64?
<rjo> for testing yes.
<rjo> but it's also likely to be the only package affected.
<GitHub45> [conda-recipes] jordens pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/6da7d8580ba12ad73a514c0afbf6fbfb3734adc8
<GitHub45> conda-recipes/master 6da7d85 Robert Jordens: pygit2: add import test
<rjo> bb-m-labs: force build --props=package=pygit2 conda-lin64
<bb-m-labs> build forced [ETA 52 seconds]
<bb-m-labs> I'll give a shout when the build finishes
<bb-m-labs> build #108 of conda-lin64 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/108
<rjo> bb-m-labs: force build --props=package=pygit2 conda-win64
<bb-m-labs> build forced [ETA 6m09s]
<bb-m-labs> I'll give a shout when the build finishes
<rjo> bb-m-labs: force build --props=package=pygit2 conda-win32
<bb-m-labs> build forced [ETA 5m25s]
<bb-m-labs> I'll give a shout when the build finishes
<rjo> make nobody feel left out.
<bb-m-labs> Hey! build conda-win64 #81 is complete: Success [build successful]
<bb-m-labs> Hey! build conda-win32 #50 is complete: Success [build successful]
<GitHub52> [conda-recipes] whitequark pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/c2e62fddcea4a47e9b183c207d6668ded85b3dca
<GitHub52> conda-recipes/master c2e62fd whitequark: llvmlite-artiq: bump.
<whitequark> bb-m-labs: force build --propx=package=llvmlite-artiq conda-all
<bb-m-labs> Something bad happened (see logs)
<whitequark> what
<whitequark> "something bad"??
<whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all
<bb-m-labs> build forced [ETA 6h05m44s]
<bb-m-labs> I'll give a shout when the build finishes
<bb-m-labs> build #109 of conda-lin64 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/109
<whitequark> rjo: do you have a testcase for #338?
<whitequark> hm, nevermind, I made one
<bb-m-labs> Hey! build conda-all #18 is complete: Success [build successful]
<rjo> set that to < 20 µs max.
<whitequark> rjo: with my latest optimizer tweaking i improved #298 by a factor of 100 and #338 by a factorof 150
<whitequark> should reduce compile time too
<rjo> what is the final absolute number? that's what matters.
<whitequark> 1us
<rjo> on both?
<whitequark> that's for PulseRateDDS
<whitequark> for that RTIO loop it's 250ns
<rjo> that sounds reasonable. please tie down the unittests so that we don't regress again.
<whitequark> and I can actually further improve both, though not by much
<whitequark> mainly, I need to factor out very cold bounds checking code out of the loops
<rjo> it's a reasonable numer. i remember having 170 ns for a 75 MHz sys_clock in a very old version of RTIO (ventilator) with hand written C and lm32 a few years back.
<whitequark> since it pessimizes the inliner
<whitequark> and the second thing is it constantly reads and writes the global now
<whitequark> 170ns might be achieaable
<rjo> yeah. i can see that 64 bit stuff actually dominating eventually.
<whitequark> pulse_rate_dds still has FP math
<rjo> to repeat: RTIO pulse rate is 1/250ns now?
<whitequark> mostly because frequency_to_ftw has a division, which has a ZeroDivisionError branch, which ends up inflating that function
<whitequark> take the example in this issue: https://github.com/m-labs/artiq/issues/298
<rjo> and DDS pulse rate is 1/1us with a bit of fp math still in there?
<whitequark> self.runKernel(250*ns) succeeds
<whitequark> er, sorry, 310*ns apparently is the minimal value
<whitequark> but that's not pulse rate.... hmmm
<whitequark> oh, I misunderstood how PulseRateDDS works.
<whitequark> these are the lowest values that succeed.
<GitHub195> [conda-recipes] jordens pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/f25d3e9b8382de8a1a9aa30ef298a92f386296d4
<GitHub195> conda-recipes/master f25d3e9 Robert Jordens: libgit/pygit2: upgrade (hoping for bug fixes), tie down dependency
<rjo> bb-m-labs: force build --props=package=libgit2 conda-all
<bb-m-labs> build forced [ETA 3h04m55s]
<bb-m-labs> I'll give a shout when the build finishes
<rjo> whitequark: ok. TTL pulse rate is good for now. it does one event per 327 ns. that's good. please tie it down.
<rjo> whitequark: DDS pulse rate is probably still killed by FP.
<whitequark> yes.
<whitequark> as for TTL, it could be faster but LLVM is being unreasonably stupid about the `now` global
<whitequark> it should convert it into a local but it doesn't
<whitequark> I think that pass is not in 3.5
<rjo> whitequark: my guess at that something like 20 µs should be doable since it is about 20 events.
<rjo> ack. maybe it's nice to track the `now` handling in an issue (where we can comaplain extensively about llvm-or1k being old)
<rjo> and which we can slap others with.
<whitequark> huh? I don't think that will motivate anyone to forward-port the backend
<whitequark> it's a massive amount of work
<rjo> this seems to be absoolutely no problem: "-- Found PythonInterp: C:/Python27/python.exe (found version "2.7.11") "
<bb-m-labs> Hey! build conda-all #19 is complete: Success [build successful]
<rjo> just so that all issues related to that old llvm congregate.
<rjo> bb-m-labs: force build --props=package=pygit2 conda-all
<bb-m-labs> build forced [ETA 1h35m40s]
<bb-m-labs> I'll give a shout when the build finishes
<whitequark> rjo: looks like no version of llvm is able to optimize that.
<whitequark> i can write a pass, i think.
<whitequark> i think the reason there's no such pass is that the utility of the pass is fairly... marginal
<bb-m-labs> Hey! build conda-all #20 is complete: Success [build successful]
<whitequark> i.e. it's only really useful if everything is inlined into a single function
<whitequark> other than fixing this `now` issue and marking the TTLOut.channel attribute as immutable there is nothing to be done to increase the TTL pulse rate
<whitequark> since the code is basically optimal already
<rjo> ack.
<whitequark> there's some modest PIC overhead, but not too much
<rjo> one thing that is still in there is marking a few of those registers non-volatile.
<whitequark> the inner loop is composed of 52 instructions
<whitequark> going to non-PIC can save you, uh, I think four?
<whitequark> (52 instructions not counting those in rtio_output)
<whitequark> actually, nope
<whitequark> two instructions
<whitequark> the non-PIC version is 50.
<whitequark> I think two of them stopped being loads, but that's not really much difference
<whitequark> so I think PIC overhead can be considered negligible..
<whitequark> ok. let me see what I can do with PulseRateDDS.
<whitequark> also, I looked at the PulseRate test (the actual test code) that uses exceptions
<whitequark> and the reason it's just a 50ns worse than the code in that hastebin, which doesn't use exceptions, is because I used LLVM's zero-cost exception handling
<whitequark> actually not even 50ns, it's exactly same
<GitHub176> [artiq] whitequark pushed 3 new commits to master: https://git.io/vVfel
<GitHub176> artiq/master 186a564 whitequark: compiler: make quoted functions independent of outer environment.
<GitHub176> artiq/master 5aec82b whitequark: test_pulse_rate: tighten upper bound to 1310ns.
<GitHub176> artiq/master 20ad762 whitequark: llvm_ir_generator: generate code more amenable to LLVM's GlobalOpt....
<whitequark> rjo: I think there is a problem with the PulseRateDDS test.
<whitequark> it does 1000 iterations of setting DDSes
<whitequark> and this currently results in a 500us value
<whitequark> however, if I enlarge the number of iterations to 10000, it results in 2500us per pulse
<whitequark> so I think with 1000 iterations, the measured value is lower than what is real; what happens is that every time it runs it "borrows" a chunk of time from break_realtime
<whitequark> but the iteration count is not high enough that it results in an underflow.
<whitequark> the higher I make the mu value in break_realtime, the lower the measured value becomes.
<whitequark> whereas, if I raise the iteration count to 10000, then the value is the same as with 30000 iterations
<bb-m-labs> build #490 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/490 blamelist: whitequark <whitequark@whitequark.org>
<whitequark> and independent of break_realtime slop
<whitequark> what the heck
<whitequark> why does that test vary, I wonder...
<rjo> yes. i had suspected that there is something weird going on. i have DrainErrors in test_spi.
<rjo> also the dds schedule there events in in the past. by themselves the methods are zero-delay.
<GitHub20> [artiq] whitequark force-pushed master from 5aec82b to 2a210d7: https://git.io/vYgPK
<GitHub20> artiq/master 2a210d7 whitequark: test_pulse_rate: tighten upper bound to 1500ns.
<whitequark> rjo: so the *real* value of pulse_rate for dds is 4.6ms.
<rjo> pretty sure that's not true.
<whitequark> how so?
<whitequark> 2.3ms per a bunch of floating point operations seems reasonable to me
<whitequark> double-precision fp, too...
<rjo> hell no
<rjo> that's 280 000 cycles.
<whitequark> okay, how about a more traditional benchmark method
<whitequark> is there some kind of counter?
<whitequark> get_rtio_counter_mu.
<rjo> you are seeing some effect of the rtio fifos clogging up and errors piling that you then pop for the next test.
<whitequark> mhm.
<whitequark> there's a 10us delay
<whitequark> 10ms
<rjo> or1k has the cpu cycle counter.
<rjo> i think we even have that enabled.
<whitequark> self.core.get_rtio_counter_mu()
<rjo> yes. that as well.
<rjo> but that cycle counter can be used to measure small snippets.
<rjo> you will also see fifo backpressure. even for your "traditional" test.
<rjo> whitequark: if you are happy with the llvm-changes, can you cherry-pick them (or reverse-rebase) to release-1?
<rjo> s/llvm-changes/compiler changes/
<whitequark> well, i'm not very happy with having to do that
<whitequark> why did you branch the release before the milestone was finished anyway?
<rjo> look at the other changes that modify things.
<rjo> you could have developed your changes in release-1 and them merged them into master.
<rjo> that was your choice.
<whitequark> so i made a benchmark using the rtio counter. i measure 3.9-4.9ms per that dds batch on a wide range of cycles
<whitequark> 3.1ms per batch with 100 iterations, 4.9ms per batch with 5000 iterations
<whitequark> i'm not sure why it varies so much. but it's definitely somewhere in the "a few milliseconds" range.
<rjo> with empty fifos, no backpressure, and no errors about to be poped?
<rjo> check that first.
<whitequark> this is the code I use
<whitequark> I assume the fifos are empty when I first run it. i don't handle any errors.
<whitequark> not sure what backpressure is, in this context
<rjo> if the fifos are full because the phy is waiting for time to pass, you are getting backpressure. rtio_write() waits until there is space.
<whitequark> how large are the fifos?
<rjo> couple hundred entries iirc. a dds.set() is about 10 entries.
<whitequark> oh, so even 100 cycles is too much
<rjo> no. its fine if the timing is correct and the phys are executing them.
<whitequark> sure, I mean in my case
<whitequark> ok. yes. I see your point.
<rjo> in your case the 5ms will be limiting for sufficiently large n.
<whitequark> you were right. it was backpressure. with this code I get 260us: http://hastebin.com/ticuwaguzi.py
<bb-m-labs> build #66 of artiq-win64-test is complete: Failure [failed python_unittest] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/66 blamelist: whitequark <whitequark@whitequark.org>
<bb-m-labs> build #491 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/491 blamelist: whitequark <whitequark@whitequark.org>
<rjo> cut that 5 ms to something like 100 us. pretty sure that's sufficient so subsequent batches don't overlap in your case. if you cut that 5ms to much you will see RTIOSequenceError because of overlapping batches.
<whitequark> yep. with this new benchmark i get ~266us on a very wide range of n's
<rjo> that code is weird.
<whitequark> is it?
<whitequark> it's a way to ensure that fifos are cleared in time
<rjo> i think you are just measuring the fifo depth here.
<whitequark> am I?
<whitequark> it returns 266us even with n=10
<whitequark> well, 268us. close enough.
<rjo> you push a bunch of events always 1ms in the future over and over again.
<whitequark> hm.
<whitequark> i see your point
<rjo> that will generally succeed unless there are events in the fifo that prevent new events from getting in and through in time.
<whitequark> yes
<whitequark> so if there are none, am i not measuring the time it takes to submit events?
<rjo> yes. if the fifo is empty and stays non-full during the entire game. you will measure that time modulo the overhead due to setting and getting now.
<whitequark> excellent. that's what i wanted to measure.
<rjo> but for large n (>~ 50) i expect this to be wrong.
<rjo> anyway. good night. see you tomorrow.
<whitequark> night.
<whitequark> this actually returns the same value for even n=100000.
<whitequark> though i do not understand why