#m-labs on 2015-07-15 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:28 <cr1901_modern> sb0_: Thoughts on Mill Arch?

01:56 ylamarre has joined #m-labs

02:00 ylamarre has quit [Client Quit]

02:01 ylamarre has joined #m-labs

02:56 ylamarre has quit [Quit: ylamarre]

04:04 <whitequark> sb0_: there's no nice way to support list comprehensions with 'if' clauses

04:04 <whitequark> since I want to preallocate the list before executing the comprehension

04:05 <whitequark> and getting the list length then would involve traversing it twice, which would not match CPython's side effects

04:07 <whitequark> and actually, there's the exact same problem with multiple for clauses

04:07 <whitequark> shall I axe them all?

04:07 <whitequark> cc rjo

07:13 <sb0_> yes

07:14 <whitequark> ok

07:15 <whitequark> amusing fact: code generation for list comprehensions, an 'advanced' feature, is substantially easier than for the '+' operator

07:15 <whitequark> (even without the modification above)

07:15 <sb0_> + operator for lists?

07:16 <whitequark> for everything it supports, which includes lists, yes

07:31 <rjo> axe list comprehensions with if'.

07:32 <rjo> and if neccessary/convenient also axe list comps themselves.

07:33 <whitequark> they're kind of more necessary than in vanilla python

07:33 <whitequark> because you cannot resize lists

07:33 <whitequark> so just filling a list without comprehensions would quickly become awkward.

07:33 <rjo> right. yes. nice for initializing as well.

07:33 <rjo> [0 for i in range(10)]

07:34 <whitequark> [0]*10 :p

07:34 <rjo> but [0] * 10 would be fine imho

07:34 <rjo> yes

07:34 <whitequark> * for lists and integers is actually harder to implement than list comprehensions (!)

07:34 <rjo> ?

07:34 <whitequark> but doesn't matter, I'll do both. not a substantial problem

07:34 <whitequark> it's more tricky to generate code for

07:35 <whitequark> a list comrehension is just an allocation and a for loop

07:35 <whitequark> but multiplication requires you to traverse the same list several times

07:35 <rjo> yes.

07:35 <whitequark> slightly more code to generate SSA.

07:36 <rjo> why?

07:36 <rjo> isnt [0]*10 also allocation (rhs) and then for loop?

07:36 <whitequark> there's the case like [1,2,3]*10

07:36 <whitequark> so you have an additional index variable and such

07:36 <rjo> ah. forgot that.

07:37 <rjo> ack.

07:37 <rjo> so listcomps with if are another item on the no-go list for heap-less python.

07:37 <whitequark> I think so, yes

07:38 <rjo> thats fine.

07:39 cr1901_modern has quit [Read error: Connection reset by peer]

08:42 <ysionneau> whitequark: I've tried putting -DLLVM_TARGETS_TO_BUILD="OR1K;X86_64" , but llvm-build says : error: invalid target to enable: "X86_64" (not in project)

08:42 <ysionneau> any idea why?

08:45 <ysionneau> ah, it's just X86 ?

09:21 <GitHub118> [artiq] sbourdeauducq pushed 7 new commits to master: http://git.io/vmcxX

09:21 <GitHub118> artiq/master 7770ab6 Sebastien Bourdeauducq: worker: factor timeouts

09:21 <GitHub118> artiq/master 9ed4dcd Sebastien Bourdeauducq: repository: load experiments in worker, list arguments

09:21 <GitHub118> artiq/master a07f247 Sebastien Bourdeauducq: manual: add core device moninj port

09:59 travis-ci has joined #m-labs

09:59 <travis-ci> m-labs/artiq#308 (master - f836465 : Sebastien Bourdeauducq): The build is still failing.

09:59 <travis-ci> Build details : https://travis-ci.org/m-labs/artiq/builds/71045852

09:59 travis-ci has left #m-labs [#m-labs]

10:08 mumptai has joined #m-labs

10:25 chiggs has quit [Quit: WeeChat 0.4.2]

10:51 cr1901_modern has joined #m-labs

11:02 <cr1901_modern> whitequark: rereading the log for things I missed. Why would traversing a list twice, the second time being read only, have any additional side effects compared to CPython?

12:13 travis-ci has joined #m-labs

12:13 <travis-ci> m-labs/artiq#308 (master - f836465 : Sebastien Bourdeauducq): The build was fixed.

12:13 <travis-ci> Build details : https://travis-ci.org/m-labs/artiq/builds/71045852

12:13 travis-ci has left #m-labs [#m-labs]

12:37 key2 has quit [Ping timeout: 246 seconds]

12:44 <ysionneau> I pushed a fixed llvmlite package (with X86+OR1K support)

12:45 <ysionneau> it's not yet pushed for linux-32 and windows 32 bits

12:45 <ysionneau> compiling takes time...

13:08 <ysionneau> linux-32 uploaded

13:10 ylamarre has joined #m-labs

13:27 olofk has quit [Ping timeout: 255 seconds]

13:27 olofk has joined #m-labs

15:03 Gurty has quit [Ping timeout: 248 seconds]

15:13 <ysionneau> windows 32 uploaded

15:13 <ysionneau> pfew, LLVM building and packaging is such a PITA

15:17 Gurty has joined #m-labs

15:28 <GitHub144> [artiq] fallen pushed 2 new commits to master: http://git.io/vmlaQ

15:28 <GitHub144> artiq/master af20efa Yann Sionneau: conda: update llvmlite-or1k package and up the build number

15:28 <GitHub144> artiq/master 511d519 Yann Sionneau: llvmlite: split patch to be cleaner. close #72

15:29 <ysionneau> ah, I forgot to update the manual about the llvmlite patches

15:34 <GitHub161> [artiq] fallen pushed 3 new commits to master: http://git.io/vmlKe

15:34 <GitHub161> artiq/master fa4f38b Yann Sionneau: manual: add missing llvmlite patches

15:34 <GitHub161> artiq/master 774c66a Yann Sionneau: manual: also build LLVM native target (needed for py2llvm test)

15:34 <GitHub161> artiq/master 08eec40 Yann Sionneau: manual: building LLVM as shared libraries is not recommended on Linux and not supported on Windows

15:37 chiggs has joined #m-labs

15:49 ylamarre has quit [Ping timeout: 255 seconds]

15:55 ylamarre has joined #m-labs

15:59 travis-ci has joined #m-labs

15:59 <travis-ci> m-labs/artiq#309 (master - 511d519 : Yann Sionneau): The build passed.

15:59 <travis-ci> Build details : https://travis-ci.org/m-labs/artiq/builds/71099912

15:59 travis-ci has left #m-labs [#m-labs]

16:20 mithro has quit [*.net *.split]

16:23 mithro has joined #m-labs

16:50 <cr1901_modern> ysionneau: Have you had any luck running Xilinx tools on *BSD? Now that Net 7 is about to be released (and Intel drivers hopefully work), I'm thinking about taking some time to play with it on my laptop.

16:51 <sb0_> ise ran fine on freebsd (with linux emulation) last time i tried (many years ago)

16:52 <cr1901_modern> As long as the emu layers have kept up, I'm guessing it'll still work.

16:52 <cr1901_modern> If it doesn't, I'm willing to add the missing syscalls lol

17:02 <sb0_> whoa, the pluto probe is transmitting less than 15W

17:02 <sb0_> now that's a QRP ;)

17:03 <cr1901_modern> Must be using OLIVIA or some really good QRP protocol

17:03 <sb0_> no protocol will help you if all your antenna/LNA picks up is noise

17:05 <cr1901_modern> Doesn't signal power fall off based on an inverse (square?) law?

17:10 <sb0_> this is the beast they are using apparently

17:10 <sb0_> https://csirouniverseblog.files.wordpress.com/2015/06/dss43_dishmoon1.jpg

17:13 ylamarre has quit [Quit: ylamarre]

17:49 <cr1901_modern> TIL that dbus is not a Linux-exclusive technology. If I was supposed to know that as a "*nix power user", well... I didn't.

18:11 <whitequark> cr1901_modern: becaue you still need to execute the expression in the if clause during second traversal

18:11 <whitequark> and it might have side effects

18:16 <cr1901_modern> I see. The "naive" example I can think of is assigning to a variable where the new value depends on the previous value inside the if statement.

18:55 <GitHub165> [artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/vm4qT

18:55 <GitHub165> artiq/master 66940ea Sebastien Bourdeauducq: rtio: disable NOP suppression after reset and underflow

19:04 * rjo loves ringbuffers

19:04 <rjo> sb0_: i suspect there are a few bugs in uart.c

19:05 <sb0_> where?

19:06 <rjo> well. 1) line 71: the maximum number of elements in this ringbuffer can only be UART_RINGBUFFER_SIZE_TX - 1

19:06 <rjo> if tx_level == UART_RINGBUFFER_SIZE_TX, then tx_consume == tx_produce which is equivalent to empty.

19:07 <rjo> 2) on rx, if the ringbuffer if full, it implicitly clears the entire buffer. (l34)

19:09 <rjo> 3) (maybe not a bug) why is UART_EV_TX triggered on tx-empty, and not on !tx-full? doesnt that lead to a bit of stuttering and reduced throughput?

19:09 <sb0_> 1) the purpose of tx_level is to distinguish empty/full when tx_consume == tx_produce

19:10 <rjo> what is line 70/71 supposed to do?

19:10 <sb0_> 2) some data has to be dropped :p clearing the ringbuffer is a bit extreme, but spares some lines of code

19:11 <sb0_> by l71 you mean "while(tx_level == UART_RINGBUFFER_SIZE_TX);" ?

19:11 <rjo> clearing is actually slower if you do rxtx_read() unconditionally if the buffer is full.

19:11 <rjo> yes.

19:11 <sb0_> waits until there is at least one free character in the output buffer

19:13 <rjo> wouldnt it be smarter to not write to the rb if it is size-1 and thereby loosing one byte of possible storage but also getting rid of tx_level?

19:14 <sb0_> that would work as well

19:15 <sb0_> for #3, you mean because there is the gateware TX FIFO now?

19:15 <rjo> yes

19:15 <sb0_> there was no TX FIFO initially. Florent added it, but did not change uart.c ...

19:16 <sb0_> stuttering, yes

19:16 <rjo> well the way he does it works but i suspect it might be smarter to change it to !tx-full.

19:16 <sb0_> throughput, not sure. you spend less time context-switching between the user program and the ISR...

19:19 <rjo> for uarts where the phy is asynchronous, you would need cdc be able to look at tx_fifo.source.stb/phy.sink.ack

19:21 <rjo> ok with me taking care of 1), 2) then? i'll send a patch.

19:22 <sb0_> ok

19:24 <rjo> how many cycles are a context switch to isr and back on or1k? is it ~100?

19:25 <whitequark> huh, that's a lot

19:25 <rjo> well there are 2x32 registers to be pushed popped

19:26 <whitequark> ah, right

19:26 <rjo> +misc stuff. so my naive lower bound was 80.

19:28 <rjo> that cris32 guy did an analysis on the optimum number of registers under different conditions. 32 seems like a lot looking at random gcc/llvm assembly they rarely ever get to use r20

19:28 <whitequark> add register banks?

19:29 <whitequark> speed up the common case. the first nested interrupt pays the full price

19:30 <rjo> but you can spare yourself all the trouble if you only ever need ~16 gp regs

19:30 <whitequark> that's what cortexes do, don't they?

19:31 <rjo> banks?

19:31 <whitequark> 16 gpr

19:31 <whitequark> and with thumb1 you can actually only access the first 8

19:31 <rjo> these smart arm guys must have thought about it ;)

19:31 <whitequark> thumb2 adds the rest but you need 2x the instruction size

19:32 <rjo> yes. i suspect they must optimized the choice of the instruction set and the register layout accross a wide range of code.

19:33 <whitequark> that would have definitely been the case with thumb

19:38 travis-ci has joined #m-labs

19:38 <travis-ci> m-labs/artiq#311 (master - 66940ea : Sebastien Bourdeauducq): The build passed.

19:38 <travis-ci> Build details : https://travis-ci.org/m-labs/artiq/builds/71132316

19:38 travis-ci has left #m-labs [#m-labs]

19:41 <sb0_> rjo, it didn't meet timing on pipistrello. since this keeps happening (that and PAR failing to complete), maybe we should lower the system clock frequency?

19:50 <whitequark> sb0_: btw, I am currently looking at lowering EH

19:50 <whitequark> and using LLVM's sjlj lowering is definitely the right call because it has all the right machinery to manipulate stack top

19:50 <whitequark> i.e. it should correctly adjust it given our stack allocations

19:51 <sb0_> what does that bring compared to linking against setjmp/longjmp?

19:51 <sb0_> besides more complexity

19:51 <whitequark> I would have to implement sjljehprepare myself

19:51 <sb0_> and using an obscure feature that may be buggy

19:52 <sb0_> mh? why?

19:52 <sb0_> the current exception code uses zero black magic ...

19:52 <sb0_> and yes, it's slow

19:52 <whitequark> this is not about speec

19:52 <whitequark> *speed

19:53 <whitequark> I mean--sure, I can go implement the functionality of ehprepare myself

19:53 <whitequark> basically what it does is allows multiple landing pads to exist within a function

19:54 <sb0_> and setjmp doesn't?

19:54 <whitequark> which is what you will see if you inline a function with a try..except into another one that has a try..except

19:54 <sb0_> also, remember that exceptions may be raised from C

19:54 <whitequark> ssure

19:54 <whitequark> raise is a very simple operation

19:54 <sb0_> and caught, too. since an exception that escapes from the kernel should be reported to the host.

19:55 <whitequark> sure

19:55 <sb0_> by using regular setjmp/longjmp, you are using the same functions in both cases with no risk of errors and funny bugs that takes weeks to track down

19:56 <sb0_> the current code doesn't have a problem with nested try/except blocks, besides inefficiency

19:57 <whitequark> I don't understand the fixation on sjlj functions

19:58 <sb0_> well, they are known to work and interoperate nicely with C

19:58 <whitequark> can't the or1k backend just lower the eh sjlj instrinsics to functions, anyway?

19:58 <sb0_> and they are also a common mechanism

19:58 <whitequark> let me check it out

19:59 <whitequark> the point of using ehprepare is to do less work and reuse a pass from LLVM

19:59 <whitequark> since it does a transformation that I would otherwise have to do myself

19:59 <sb0_> why?

19:59 <whitequark> because I need that transformation?

20:00 <sb0_> sure, but what for?

20:00 <whitequark> I've explained it above... multiple landing pads

20:01 <sb0_> why do you need a transform for that?

20:04 <sb0_> google turns up 23 results for "llvm ehprepare", so...

20:04 <whitequark> it would be just a way to lower the invokes

20:04 <whitequark> similar to what old py2llvm does

20:04 <sb0_> (which is another reason for using the regular setjmp/longjmp: they are better understood)

20:05 <whitequark> http://llvm.org/docs/doxygen/html/SjLjEHPrepare_8cpp_source.html

20:05 <whitequark> there's nothing unclear about ehprepare

20:06 <whitequark> as far as I can see OR1k doesn't implement the intrinsics itself, so they should be just lowered to the C functions

20:07 <sb0_> it does look complicated :)

20:08 <sb0_> what advantage does using the llvm intrinsics and that transform bring, exactly?

20:08 <sb0_> you say "multiple landing pads", but afaik the current code doesn't have a problem with that either

20:08 <whitequark> I don't have to lower it myself in any way

20:08 <whitequark> I just lower it to invokes, which is nearly a no-op

20:08 <whitequark> and then I lower those invokes to LLVM invokes

20:09 <sb0_> by "lower" you mean the exception logic (such as re-raise in finally) which py2llvm implements currently?

20:09 <whitequark> __eh_pop, __eh_push, etc

20:10 <whitequark> the act of lowering the try statement to calls to those functions

20:10 <whitequark> (ehprepare calls them _Unwind_RegisterFrame or whatever, same idea)

20:11 <sb0_> so if I understand correctly: llvm already implements some of the exception management code, but you have to use its sjlj intrinsics?

20:11 <whitequark> on or1k and I assume lm32, which do not implement the intrinsics themselves, these should lower to C functions

20:11 <whitequark> so there's actually no difference between @llvm.eh.sjlj.setjmp and @setjmp

20:12 <whitequark> otherwise,yes

20:12 <sb0_> there are multiple implementations of those for or1k

20:12 <sb0_> incompatible ones, of course

20:13 <sb0_> does llvm just emit a symbol for the linker to resolve?

20:13 <whitequark> as far as I can see from the code, yes

20:17 <sb0_> ok...

20:18 <sb0_> well, that should be fine then

20:19 <sb0_> how will you retrieve exception info from C?

20:19 <sb0_> or set it

20:25 <whitequark> set up a jmpbuf, push it, retrieve the exception from LSDA

20:26 <whitequark> (language-specific data area. the part of memory managed by the unwinder. basically a place for exception)

20:27 <whitequark> let me verify that this all will work with or1k as intended

20:32 <sb0_> in open source CPU land, things are often broken

20:32 <sb0_> ...also, LLVM isn't known for stable APIs

20:34 <sb0_> the more LLVM APIs you use, the higher the probability of future problems

20:36 <cr1901_modern> I remember reading in the README of a cooperative multithreading C library that setjmp/longjmp, going by pure ANSI C, don't have enough guarantees to implement exceptions.

20:36 <cr1901_modern> Of course, what the standard says vs "what impls do" differ

20:50 <whitequark> oh

20:50 <whitequark> nevermind, it's irrelevant

20:50 <whitequark> targets have to opt-in to SJLJ EH

20:51 <whitequark> I'll just copy the current model.

20:51 <sb0_> so it's not implemented in or1k-llvm?

20:52 <whitequark> it is not usable with or1k. or really anything except ARM on Darwin, apparently

20:52 <whitequark> so you're right, but for all the wrong reasons. i mean, how did you conclude that the intrinsics are broken, given there is no possible way to call them? :)

20:58 <whitequark> or1k actually supports DWARF unwinding, but I agree that libunwind is probably not worth the time

21:09 ylamarre has joined #m-labs

21:43 <sb0_> http://karpathy.github.io/2015/05/21/rnn-effectiveness/

21:58 ylamarre has quit [Ping timeout: 265 seconds]

22:01 ylamarre has joined #m-labs

22:16 sb0_ has quit [Read error: Connection reset by peer]

22:17 sb0 has joined #m-labs

22:48 ylamarre has quit [Ping timeout: 246 seconds]

22:57 ylamarre has joined #m-labs

23:40 <whitequark> hm, turns out python doesn't reraise if you return from a finally statement

23:44 <whitequark> oh crap. negative indexes

23:51 cr1901_modern has quit [Read error: Connection reset by peer]