<cr1901_modern>
whitequark: rereading the log for things I missed. Why would traversing a list twice, the second time being read only, have any additional side effects compared to CPython?
travis-ci has joined #m-labs
<travis-ci>
m-labs/artiq#308 (master - f836465 : Sebastien Bourdeauducq): The build was fixed.
<cr1901_modern>
ysionneau: Have you had any luck running Xilinx tools on *BSD? Now that Net 7 is about to be released (and Intel drivers hopefully work), I'm thinking about taking some time to play with it on my laptop.
<sb0_>
ise ran fine on freebsd (with linux emulation) last time i tried (many years ago)
<cr1901_modern>
As long as the emu layers have kept up, I'm guessing it'll still work.
<cr1901_modern>
If it doesn't, I'm willing to add the missing syscalls lol
<sb0_>
whoa, the pluto probe is transmitting less than 15W
<sb0_>
now that's a QRP ;)
<cr1901_modern>
Must be using OLIVIA or some really good QRP protocol
<sb0_>
no protocol will help you if all your antenna/LNA picks up is noise
<cr1901_modern>
Doesn't signal power fall off based on an inverse (square?) law?
<sb0_>
this is the beast they are using apparently
<cr1901_modern>
TIL that dbus is not a Linux-exclusive technology. If I was supposed to know that as a "*nix power user", well... I didn't.
<whitequark>
cr1901_modern: becaue you still need to execute the expression in the if clause during second traversal
<whitequark>
and it might have side effects
<cr1901_modern>
I see. The "naive" example I can think of is assigning to a variable where the new value depends on the previous value inside the if statement.
<GitHub165>
[artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/vm4qT
<GitHub165>
artiq/master 66940ea Sebastien Bourdeauducq: rtio: disable NOP suppression after reset and underflow
* rjo
loves ringbuffers
<rjo>
sb0_: i suspect there are a few bugs in uart.c
<sb0_>
where?
<rjo>
well. 1) line 71: the maximum number of elements in this ringbuffer can only be UART_RINGBUFFER_SIZE_TX - 1
<rjo>
if tx_level == UART_RINGBUFFER_SIZE_TX, then tx_consume == tx_produce which is equivalent to empty.
<rjo>
2) on rx, if the ringbuffer if full, it implicitly clears the entire buffer. (l34)
<rjo>
3) (maybe not a bug) why is UART_EV_TX triggered on tx-empty, and not on !tx-full? doesnt that lead to a bit of stuttering and reduced throughput?
<sb0_>
1) the purpose of tx_level is to distinguish empty/full when tx_consume == tx_produce
<rjo>
what is line 70/71 supposed to do?
<sb0_>
2) some data has to be dropped :p clearing the ringbuffer is a bit extreme, but spares some lines of code
<sb0_>
by l71 you mean "while(tx_level == UART_RINGBUFFER_SIZE_TX);" ?
<rjo>
clearing is actually slower if you do rxtx_read() unconditionally if the buffer is full.
<rjo>
yes.
<sb0_>
waits until there is at least one free character in the output buffer
<rjo>
wouldnt it be smarter to not write to the rb if it is size-1 and thereby loosing one byte of possible storage but also getting rid of tx_level?
<sb0_>
that would work as well
<sb0_>
for #3, you mean because there is the gateware TX FIFO now?
<rjo>
yes
<sb0_>
there was no TX FIFO initially. Florent added it, but did not change uart.c ...
<sb0_>
stuttering, yes
<rjo>
well the way he does it works but i suspect it might be smarter to change it to !tx-full.
<sb0_>
throughput, not sure. you spend less time context-switching between the user program and the ISR...
<rjo>
for uarts where the phy is asynchronous, you would need cdc be able to look at tx_fifo.source.stb/phy.sink.ack
<rjo>
ok with me taking care of 1), 2) then? i'll send a patch.
<sb0_>
ok
<rjo>
how many cycles are a context switch to isr and back on or1k? is it ~100?
<whitequark>
huh, that's a lot
<rjo>
well there are 2x32 registers to be pushed popped
<whitequark>
ah, right
<rjo>
+misc stuff. so my naive lower bound was 80.
<rjo>
that cris32 guy did an analysis on the optimum number of registers under different conditions. 32 seems like a lot looking at random gcc/llvm assembly they rarely ever get to use r20
<whitequark>
add register banks?
<whitequark>
speed up the common case. the first nested interrupt pays the full price
<rjo>
but you can spare yourself all the trouble if you only ever need ~16 gp regs
<whitequark>
that's what cortexes do, don't they?
<rjo>
banks?
<whitequark>
16 gpr
<whitequark>
and with thumb1 you can actually only access the first 8
<rjo>
these smart arm guys must have thought about it ;)
<whitequark>
thumb2 adds the rest but you need 2x the instruction size
<rjo>
yes. i suspect they must optimized the choice of the instruction set and the register layout accross a wide range of code.
<whitequark>
that would have definitely been the case with thumb
<sb0_>
rjo, it didn't meet timing on pipistrello. since this keeps happening (that and PAR failing to complete), maybe we should lower the system clock frequency?
<whitequark>
sb0_: btw, I am currently looking at lowering EH
<whitequark>
and using LLVM's sjlj lowering is definitely the right call because it has all the right machinery to manipulate stack top
<whitequark>
i.e. it should correctly adjust it given our stack allocations
<sb0_>
what does that bring compared to linking against setjmp/longjmp?
<sb0_>
besides more complexity
<whitequark>
I would have to implement sjljehprepare myself
<sb0_>
and using an obscure feature that may be buggy
<sb0_>
mh? why?
<sb0_>
the current exception code uses zero black magic ...
<sb0_>
and yes, it's slow
<whitequark>
this is not about speec
<whitequark>
*speed
<whitequark>
I mean--sure, I can go implement the functionality of ehprepare myself
<whitequark>
basically what it does is allows multiple landing pads to exist within a function
<sb0_>
and setjmp doesn't?
<whitequark>
which is what you will see if you inline a function with a try..except into another one that has a try..except
<sb0_>
also, remember that exceptions may be raised from C
<whitequark>
ssure
<whitequark>
raise is a very simple operation
<sb0_>
and caught, too. since an exception that escapes from the kernel should be reported to the host.
<whitequark>
sure
<sb0_>
by using regular setjmp/longjmp, you are using the same functions in both cases with no risk of errors and funny bugs that takes weeks to track down
<sb0_>
the current code doesn't have a problem with nested try/except blocks, besides inefficiency
<whitequark>
I don't understand the fixation on sjlj functions
<sb0_>
well, they are known to work and interoperate nicely with C
<whitequark>
can't the or1k backend just lower the eh sjlj instrinsics to functions, anyway?
<sb0_>
and they are also a common mechanism
<whitequark>
let me check it out
<whitequark>
the point of using ehprepare is to do less work and reuse a pass from LLVM
<whitequark>
since it does a transformation that I would otherwise have to do myself
<sb0_>
why?
<whitequark>
because I need that transformation?
<sb0_>
sure, but what for?
<whitequark>
I've explained it above... multiple landing pads
<sb0_>
why do you need a transform for that?
<sb0_>
google turns up 23 results for "llvm ehprepare", so...
<whitequark>
it would be just a way to lower the invokes
<whitequark>
similar to what old py2llvm does
<sb0_>
(which is another reason for using the regular setjmp/longjmp: they are better understood)
<whitequark>
there's nothing unclear about ehprepare
<whitequark>
as far as I can see OR1k doesn't implement the intrinsics itself, so they should be just lowered to the C functions
<sb0_>
it does look complicated :)
<sb0_>
what advantage does using the llvm intrinsics and that transform bring, exactly?
<sb0_>
you say "multiple landing pads", but afaik the current code doesn't have a problem with that either
<whitequark>
I don't have to lower it myself in any way
<whitequark>
I just lower it to invokes, which is nearly a no-op
<whitequark>
and then I lower those invokes to LLVM invokes
<sb0_>
by "lower" you mean the exception logic (such as re-raise in finally) which py2llvm implements currently?
<whitequark>
__eh_pop, __eh_push, etc
<whitequark>
the act of lowering the try statement to calls to those functions
<whitequark>
(ehprepare calls them _Unwind_RegisterFrame or whatever, same idea)
<sb0_>
so if I understand correctly: llvm already implements some of the exception management code, but you have to use its sjlj intrinsics?
<whitequark>
on or1k and I assume lm32, which do not implement the intrinsics themselves, these should lower to C functions
<whitequark>
so there's actually no difference between @llvm.eh.sjlj.setjmp and @setjmp
<whitequark>
otherwise,yes
<sb0_>
there are multiple implementations of those for or1k
<sb0_>
incompatible ones, of course
<sb0_>
does llvm just emit a symbol for the linker to resolve?
<whitequark>
as far as I can see from the code, yes
<sb0_>
ok...
<sb0_>
well, that should be fine then
<sb0_>
how will you retrieve exception info from C?
<sb0_>
or set it
<whitequark>
set up a jmpbuf, push it, retrieve the exception from LSDA
<whitequark>
(language-specific data area. the part of memory managed by the unwinder. basically a place for exception)
<whitequark>
let me verify that this all will work with or1k as intended
<sb0_>
in open source CPU land, things are often broken
<sb0_>
...also, LLVM isn't known for stable APIs
<sb0_>
the more LLVM APIs you use, the higher the probability of future problems
<cr1901_modern>
I remember reading in the README of a cooperative multithreading C library that setjmp/longjmp, going by pure ANSI C, don't have enough guarantees to implement exceptions.
<cr1901_modern>
Of course, what the standard says vs "what impls do" differ
<whitequark>
oh
<whitequark>
nevermind, it's irrelevant
<whitequark>
targets have to opt-in to SJLJ EH
<whitequark>
I'll just copy the current model.
<sb0_>
so it's not implemented in or1k-llvm?
<whitequark>
it is not usable with or1k. or really anything except ARM on Darwin, apparently
<whitequark>
so you're right, but for all the wrong reasons. i mean, how did you conclude that the intrinsics are broken, given there is no possible way to call them? :)
<whitequark>
or1k actually supports DWARF unwinding, but I agree that libunwind is probably not worth the time