sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
<cr1901_modern> sb0_: Thoughts on Mill Arch?
ylamarre has joined #m-labs
ylamarre has quit [Client Quit]
ylamarre has joined #m-labs
ylamarre has quit [Quit: ylamarre]
<whitequark> sb0_: there's no nice way to support list comprehensions with 'if' clauses
<whitequark> since I want to preallocate the list before executing the comprehension
<whitequark> and getting the list length then would involve traversing it twice, which would not match CPython's side effects
<whitequark> and actually, there's the exact same problem with multiple for clauses
<whitequark> shall I axe them all?
<whitequark> cc rjo
<sb0_> yes
<whitequark> ok
<whitequark> amusing fact: code generation for list comprehensions, an 'advanced' feature, is substantially easier than for the '+' operator
<whitequark> (even without the modification above)
<sb0_> + operator for lists?
<whitequark> for everything it supports, which includes lists, yes
<rjo> axe list comprehensions with if'.
<rjo> and if neccessary/convenient also axe list comps themselves.
<whitequark> they're kind of more necessary than in vanilla python
<whitequark> because you cannot resize lists
<whitequark> so just filling a list without comprehensions would quickly become awkward.
<rjo> right. yes. nice for initializing as well.
<rjo> [0 for i in range(10)]
<whitequark> [0]*10 :p
<rjo> but [0] * 10 would be fine imho
<rjo> yes
<whitequark> * for lists and integers is actually harder to implement than list comprehensions (!)
<rjo> ?
<whitequark> but doesn't matter, I'll do both. not a substantial problem
<whitequark> it's more tricky to generate code for
<whitequark> a list comrehension is just an allocation and a for loop
<whitequark> but multiplication requires you to traverse the same list several times
<rjo> yes.
<whitequark> slightly more code to generate SSA.
<rjo> why?
<rjo> isnt [0]*10 also allocation (rhs) and then for loop?
<whitequark> there's the case like [1,2,3]*10
<whitequark> so you have an additional index variable and such
<rjo> ah. forgot that.
<rjo> ack.
<rjo> so listcomps with if are another item on the no-go list for heap-less python.
<whitequark> I think so, yes
<rjo> thats fine.
cr1901_modern has quit [Read error: Connection reset by peer]
<ysionneau> whitequark: I've tried putting -DLLVM_TARGETS_TO_BUILD="OR1K;X86_64" , but llvm-build says : error: invalid target to enable: "X86_64" (not in project)
<ysionneau> any idea why?
<ysionneau> ah, it's just X86 ?
<GitHub118> [artiq] sbourdeauducq pushed 7 new commits to master: http://git.io/vmcxX
<GitHub118> artiq/master 7770ab6 Sebastien Bourdeauducq: worker: factor timeouts
<GitHub118> artiq/master 9ed4dcd Sebastien Bourdeauducq: repository: load experiments in worker, list arguments
<GitHub118> artiq/master a07f247 Sebastien Bourdeauducq: manual: add core device moninj port
travis-ci has joined #m-labs
<travis-ci> m-labs/artiq#308 (master - f836465 : Sebastien Bourdeauducq): The build is still failing.
travis-ci has left #m-labs [#m-labs]
mumptai has joined #m-labs
chiggs has quit [Quit: WeeChat 0.4.2]
cr1901_modern has joined #m-labs
<cr1901_modern> whitequark: rereading the log for things I missed. Why would traversing a list twice, the second time being read only, have any additional side effects compared to CPython?
travis-ci has joined #m-labs
<travis-ci> m-labs/artiq#308 (master - f836465 : Sebastien Bourdeauducq): The build was fixed.
travis-ci has left #m-labs [#m-labs]
key2 has quit [Ping timeout: 246 seconds]
<ysionneau> I pushed a fixed llvmlite package (with X86+OR1K support)
<ysionneau> it's not yet pushed for linux-32 and windows 32 bits
<ysionneau> compiling takes time...
<ysionneau> linux-32 uploaded
ylamarre has joined #m-labs
olofk has quit [Ping timeout: 255 seconds]
olofk has joined #m-labs
Gurty has quit [Ping timeout: 248 seconds]
<ysionneau> windows 32 uploaded
<ysionneau> pfew, LLVM building and packaging is such a PITA
Gurty has joined #m-labs
<GitHub144> [artiq] fallen pushed 2 new commits to master: http://git.io/vmlaQ
<GitHub144> artiq/master af20efa Yann Sionneau: conda: update llvmlite-or1k package and up the build number
<GitHub144> artiq/master 511d519 Yann Sionneau: llvmlite: split patch to be cleaner. close #72
<ysionneau> ah, I forgot to update the manual about the llvmlite patches
<GitHub161> [artiq] fallen pushed 3 new commits to master: http://git.io/vmlKe
<GitHub161> artiq/master fa4f38b Yann Sionneau: manual: add missing llvmlite patches
<GitHub161> artiq/master 774c66a Yann Sionneau: manual: also build LLVM native target (needed for py2llvm test)
<GitHub161> artiq/master 08eec40 Yann Sionneau: manual: building LLVM as shared libraries is not recommended on Linux and not supported on Windows
chiggs has joined #m-labs
ylamarre has quit [Ping timeout: 255 seconds]
ylamarre has joined #m-labs
travis-ci has joined #m-labs
<travis-ci> m-labs/artiq#309 (master - 511d519 : Yann Sionneau): The build passed.
travis-ci has left #m-labs [#m-labs]
mithro has quit [*.net *.split]
mithro has joined #m-labs
<cr1901_modern> ysionneau: Have you had any luck running Xilinx tools on *BSD? Now that Net 7 is about to be released (and Intel drivers hopefully work), I'm thinking about taking some time to play with it on my laptop.
<sb0_> ise ran fine on freebsd (with linux emulation) last time i tried (many years ago)
<cr1901_modern> As long as the emu layers have kept up, I'm guessing it'll still work.
<cr1901_modern> If it doesn't, I'm willing to add the missing syscalls lol
<sb0_> whoa, the pluto probe is transmitting less than 15W
<sb0_> now that's a QRP ;)
<cr1901_modern> Must be using OLIVIA or some really good QRP protocol
<sb0_> no protocol will help you if all your antenna/LNA picks up is noise
<cr1901_modern> Doesn't signal power fall off based on an inverse (square?) law?
<sb0_> this is the beast they are using apparently
ylamarre has quit [Quit: ylamarre]
<cr1901_modern> TIL that dbus is not a Linux-exclusive technology. If I was supposed to know that as a "*nix power user", well... I didn't.
<whitequark> cr1901_modern: becaue you still need to execute the expression in the if clause during second traversal
<whitequark> and it might have side effects
<cr1901_modern> I see. The "naive" example I can think of is assigning to a variable where the new value depends on the previous value inside the if statement.
<GitHub165> [artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/vm4qT
<GitHub165> artiq/master 66940ea Sebastien Bourdeauducq: rtio: disable NOP suppression after reset and underflow
* rjo loves ringbuffers
<rjo> sb0_: i suspect there are a few bugs in uart.c
<sb0_> where?
<rjo> well. 1) line 71: the maximum number of elements in this ringbuffer can only be UART_RINGBUFFER_SIZE_TX - 1
<rjo> if tx_level == UART_RINGBUFFER_SIZE_TX, then tx_consume == tx_produce which is equivalent to empty.
<rjo> 2) on rx, if the ringbuffer if full, it implicitly clears the entire buffer. (l34)
<rjo> 3) (maybe not a bug) why is UART_EV_TX triggered on tx-empty, and not on !tx-full? doesnt that lead to a bit of stuttering and reduced throughput?
<sb0_> 1) the purpose of tx_level is to distinguish empty/full when tx_consume == tx_produce
<rjo> what is line 70/71 supposed to do?
<sb0_> 2) some data has to be dropped :p clearing the ringbuffer is a bit extreme, but spares some lines of code
<sb0_> by l71 you mean "while(tx_level == UART_RINGBUFFER_SIZE_TX);" ?
<rjo> clearing is actually slower if you do rxtx_read() unconditionally if the buffer is full.
<rjo> yes.
<sb0_> waits until there is at least one free character in the output buffer
<rjo> wouldnt it be smarter to not write to the rb if it is size-1 and thereby loosing one byte of possible storage but also getting rid of tx_level?
<sb0_> that would work as well
<sb0_> for #3, you mean because there is the gateware TX FIFO now?
<rjo> yes
<sb0_> there was no TX FIFO initially. Florent added it, but did not change uart.c ...
<sb0_> stuttering, yes
<rjo> well the way he does it works but i suspect it might be smarter to change it to !tx-full.
<sb0_> throughput, not sure. you spend less time context-switching between the user program and the ISR...
<rjo> for uarts where the phy is asynchronous, you would need cdc be able to look at tx_fifo.source.stb/phy.sink.ack
<rjo> ok with me taking care of 1), 2) then? i'll send a patch.
<sb0_> ok
<rjo> how many cycles are a context switch to isr and back on or1k? is it ~100?
<whitequark> huh, that's a lot
<rjo> well there are 2x32 registers to be pushed popped
<whitequark> ah, right
<rjo> +misc stuff. so my naive lower bound was 80.
<rjo> that cris32 guy did an analysis on the optimum number of registers under different conditions. 32 seems like a lot looking at random gcc/llvm assembly they rarely ever get to use r20
<whitequark> add register banks?
<whitequark> speed up the common case. the first nested interrupt pays the full price
<rjo> but you can spare yourself all the trouble if you only ever need ~16 gp regs
<whitequark> that's what cortexes do, don't they?
<rjo> banks?
<whitequark> 16 gpr
<whitequark> and with thumb1 you can actually only access the first 8
<rjo> these smart arm guys must have thought about it ;)
<whitequark> thumb2 adds the rest but you need 2x the instruction size
<rjo> yes. i suspect they must optimized the choice of the instruction set and the register layout accross a wide range of code.
<whitequark> that would have definitely been the case with thumb
travis-ci has joined #m-labs
<travis-ci> m-labs/artiq#311 (master - 66940ea : Sebastien Bourdeauducq): The build passed.
travis-ci has left #m-labs [#m-labs]
<sb0_> rjo, it didn't meet timing on pipistrello. since this keeps happening (that and PAR failing to complete), maybe we should lower the system clock frequency?
<whitequark> sb0_: btw, I am currently looking at lowering EH
<whitequark> and using LLVM's sjlj lowering is definitely the right call because it has all the right machinery to manipulate stack top
<whitequark> i.e. it should correctly adjust it given our stack allocations
<sb0_> what does that bring compared to linking against setjmp/longjmp?
<sb0_> besides more complexity
<whitequark> I would have to implement sjljehprepare myself
<sb0_> and using an obscure feature that may be buggy
<sb0_> mh? why?
<sb0_> the current exception code uses zero black magic ...
<sb0_> and yes, it's slow
<whitequark> this is not about speec
<whitequark> *speed
<whitequark> I mean--sure, I can go implement the functionality of ehprepare myself
<whitequark> basically what it does is allows multiple landing pads to exist within a function
<sb0_> and setjmp doesn't?
<whitequark> which is what you will see if you inline a function with a try..except into another one that has a try..except
<sb0_> also, remember that exceptions may be raised from C
<whitequark> ssure
<whitequark> raise is a very simple operation
<sb0_> and caught, too. since an exception that escapes from the kernel should be reported to the host.
<whitequark> sure
<sb0_> by using regular setjmp/longjmp, you are using the same functions in both cases with no risk of errors and funny bugs that takes weeks to track down
<sb0_> the current code doesn't have a problem with nested try/except blocks, besides inefficiency
<whitequark> I don't understand the fixation on sjlj functions
<sb0_> well, they are known to work and interoperate nicely with C
<whitequark> can't the or1k backend just lower the eh sjlj instrinsics to functions, anyway?
<sb0_> and they are also a common mechanism
<whitequark> let me check it out
<whitequark> the point of using ehprepare is to do less work and reuse a pass from LLVM
<whitequark> since it does a transformation that I would otherwise have to do myself
<sb0_> why?
<whitequark> because I need that transformation?
<sb0_> sure, but what for?
<whitequark> I've explained it above... multiple landing pads
<sb0_> why do you need a transform for that?
<sb0_> google turns up 23 results for "llvm ehprepare", so...
<whitequark> it would be just a way to lower the invokes
<whitequark> similar to what old py2llvm does
<sb0_> (which is another reason for using the regular setjmp/longjmp: they are better understood)
<whitequark> there's nothing unclear about ehprepare
<whitequark> as far as I can see OR1k doesn't implement the intrinsics itself, so they should be just lowered to the C functions
<sb0_> it does look complicated :)
<sb0_> what advantage does using the llvm intrinsics and that transform bring, exactly?
<sb0_> you say "multiple landing pads", but afaik the current code doesn't have a problem with that either
<whitequark> I don't have to lower it myself in any way
<whitequark> I just lower it to invokes, which is nearly a no-op
<whitequark> and then I lower those invokes to LLVM invokes
<sb0_> by "lower" you mean the exception logic (such as re-raise in finally) which py2llvm implements currently?
<whitequark> __eh_pop, __eh_push, etc
<whitequark> the act of lowering the try statement to calls to those functions
<whitequark> (ehprepare calls them _Unwind_RegisterFrame or whatever, same idea)
<sb0_> so if I understand correctly: llvm already implements some of the exception management code, but you have to use its sjlj intrinsics?
<whitequark> on or1k and I assume lm32, which do not implement the intrinsics themselves, these should lower to C functions
<whitequark> so there's actually no difference between @llvm.eh.sjlj.setjmp and @setjmp
<whitequark> otherwise,yes
<sb0_> there are multiple implementations of those for or1k
<sb0_> incompatible ones, of course
<sb0_> does llvm just emit a symbol for the linker to resolve?
<whitequark> as far as I can see from the code, yes
<sb0_> ok...
<sb0_> well, that should be fine then
<sb0_> how will you retrieve exception info from C?
<sb0_> or set it
<whitequark> set up a jmpbuf, push it, retrieve the exception from LSDA
<whitequark> (language-specific data area. the part of memory managed by the unwinder. basically a place for exception)
<whitequark> let me verify that this all will work with or1k as intended
<sb0_> in open source CPU land, things are often broken
<sb0_> ...also, LLVM isn't known for stable APIs
<sb0_> the more LLVM APIs you use, the higher the probability of future problems
<cr1901_modern> I remember reading in the README of a cooperative multithreading C library that setjmp/longjmp, going by pure ANSI C, don't have enough guarantees to implement exceptions.
<cr1901_modern> Of course, what the standard says vs "what impls do" differ
<whitequark> oh
<whitequark> nevermind, it's irrelevant
<whitequark> targets have to opt-in to SJLJ EH
<whitequark> I'll just copy the current model.
<sb0_> so it's not implemented in or1k-llvm?
<whitequark> it is not usable with or1k. or really anything except ARM on Darwin, apparently
<whitequark> so you're right, but for all the wrong reasons. i mean, how did you conclude that the intrinsics are broken, given there is no possible way to call them? :)
<whitequark> or1k actually supports DWARF unwinding, but I agree that libunwind is probably not worth the time
ylamarre has joined #m-labs
ylamarre has quit [Ping timeout: 265 seconds]
ylamarre has joined #m-labs
sb0_ has quit [Read error: Connection reset by peer]
sb0 has joined #m-labs
ylamarre has quit [Ping timeout: 246 seconds]
ylamarre has joined #m-labs
<whitequark> hm, turns out python doesn't reraise if you return from a finally statement
<whitequark> oh crap. negative indexes
cr1901_modern has quit [Read error: Connection reset by peer]