sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
ylamarre has quit [Quit: ylamarre]
<rjo>
sb0: when i tried, different sys clock speeds it didnt matter
<rjo>
there seem to be three possible results: 1) phase four hits ~1000 hold or setup violations and everything passes fine. 2) it finds ~30000 and hangs 3) finds 30000 and fails.
<whitequark>
is it nondeterministic?
<rjo>
but maybe slower clock reduces the rate of failures/hangs. we should definitely parallelize the bitstream builds.
<rjo>
very much so.
<whitequark>
that's gross
<whitequark>
doesn't it have some kind of -srand switch?
<sb0>
The PIC sub-project aims to add support for position independent code (PIC) for the uClibc/Linux version of the GNU tool chain. This will require extensions to the ABI (which currently has no specification for PIC) and a rewrite of much of the tool chain.
<sb0>
doesn't inspire trust
<sb0>
is the LLVM code looking good?
<whitequark>
yes
<whitequark>
linker might be worse, I will check it a bit later
<whitequark>
the or1k backend is surprisingly well written overall, I expected worse
<whitequark>
we could actually upstream it with relatively little modifications
<whitequark>
sb0: oh, binutils are in the clear
<whitequark>
it supports not only PIC, but even TLS (!)
<whitequark>
(TLS often has spotty support bc unlike PIC it requires OS/runtime support)
<whitequark>
disappointingly, the LLVM backend doesn't have TLS support
<whitequark>
sb0: one reason i'm interested in this, is that it would be good to have decent exception support
<whitequark>
not just "ValueError, in one of the two dozen places it could have been raised"
<whitequark>
but a file:line1:col1:line2:col2.
<sb0>
what does this have to do with PIC?
<whitequark>
string literals
<whitequark>
rodata
<whitequark>
needs linker changes
<whitequark>
easier to just load PT_LOAD and forget about trying to muck with sections.
<whitequark>
like pretty much every other dynamic linker in existence does
<sb0>
well, options are
<sb0>
1) add that one relocation type needed for rodata
<whitequark>
either works i guess
<sb0>
2) use PIC, risk bugs, remove some of the existing linker code which is good
<whitequark>
i'd still look at 2 though
<whitequark>
is there an or1k simulator?
<whitequark>
or do I have to wait until pipistrello arrives to test?
<whitequark>
... i suppose there wouldn't be a simulator with misoc
<sb0>
there is verilator-based simulation
<whitequark>
does it work?
<sb0>
yes. it works very well, though it is slow (~1MHz)
<whitequark>
not very impressive compared to rtl-level sim
<sb0>
there is some QEMU support for or1k as well
<sb0>
...lm32 has good QEMU, but crappy LLVM :(
<whitequark>
well, that can be easily fixed once I finish this all
<whitequark>
always wanted to write a decent LLVM backend
<whitequark>
sb0: oh, while I'm at it, I have this draft for new interleave transformation impl
<whitequark>
every function signature will get a "duration" field. duration(10ns) etc. duration will be calculated purely lexically, during the typechecking phase
<whitequark>
so, there will be no DCE, no constant propagation, and no inlining preceding that
<whitequark>
the only potential problem with this design I see is inability to do something like constant=10; delay(constant*ms)
<whitequark>
I don't know how common this is and if it is, this can be improved upon
<whitequark>
anyway, this information allows us to completely validate the possibility of interleaving during the typechecking phase
<whitequark>
then, after IR generation, functions will be inlined. as a bonus, inlining will only go as far as necessary, i.e. if the process a function is matched with finishes after the function does, it doesn't have to be expanded
<whitequark>
same about loop unrolling
<whitequark>
questions.
<whitequark>
1) how common would be non-literal expressions to delay? how hard they would be to compute statically, at most?
<whitequark>
2) I assume the loop unrolling is meant to be used on loops with statically known iteration count, like range(10)
<whitequark>
since arrays do not have statically known size now, it would not be possible to meaningfully unroll a loop over an array
<whitequark>
is this right?
<sb0>
there will be a few non-literal expressions to delay
<sb0>
a common case is scanning timing
<sb0>
and unrolling the scan loop isn't a good option as those can be large
<sb0>
note that in this case the scanned time may be passed as parameter to a function, e.g. ttl.pulse()
<sb0>
I think that loop unrolling and function inlining should be driven by interleave requirements...
<whitequark>
that's exactly what I'm saying
<whitequark>
if interleave can fit it without expansion, so be it
<sb0>
and e.g. ttl.pulse() cannot get a "duration" field because that duration depends on its parameters, so it would always get inlined. right?
<sb0>
well, not always, but when there is a parallel/sequential block to lower
<whitequark>
well, not quite
<sb0>
whereas functions that have constant time and a duration field could be called (as functions) after parallel/sequential lowering
<whitequark>
let's say you have a function with duration 100 and ten delays of 10, and another with duration 100 and two delays of 50
<whitequark>
so to interleave these, with known durations, you'll still have to inline
<whitequark>
similar case with loop unrolling
<whitequark>
if you cannot definitely compute a duration, it's more problematic, because what I'm trying to do is to get the duration fully known after the typechecking
<whitequark>
let me think about implementing that
<sb0>
yes, you cannot definitely compute a duration in some cases.
<whitequark>
how about this: a duration field could be a number, or it could be an expression, using just addition and multiplication, which would incorporate function parameters
<sb0>
the general case with dynamic durations requires coroutines, which we don't want because they are slow and/or complicated
<sb0>
but the compiler should implement as much as possible of those cases that don't need coroutines
<whitequark>
wait
<whitequark>
slow and/or complicated?
<whitequark>
I was going to ask later whether you all want generators, because there's very little additional work to implement them
<whitequark>
basically the only thing that changes is the function's environment will be allocated in the parent's stack frame instead of its own
<sb0>
context switching between coroutines is slow, yes
<whitequark>
and it gets a "state" internal variable and a switch statement that dispatches it on reentry
<whitequark>
it is as costly as a function call and one indirect jump
<whitequark>
is that too much?
<sb0>
I'm not sure if there are any practical uses for them
<sb0>
well
<sb0>
maybe for implementing complex scanning actually
<whitequark>
I don't think these would be problematic. there's a bit tricky case with nested coroutines, but apart from that, it's simple
<whitequark>
what *I* am not sure however
<whitequark>
is how coroutines help with interleaving
<whitequark>
well, you can rewrite delay(ns) to yield(ns) and basically make with parallel a scheduler
<sb0>
we want to be able to create iterators that scan over a range of values, possibly picking them at random or not with a constant interval between each point
<sb0>
generators may help with that
<sb0>
generators do not help with interleaving. and yes, "yield ns" is what I mean.
<whitequark>
I mean, generators == coroutines
<sb0>
yes
<whitequark>
ok
<whitequark>
let's not look at generators then, until we definitely need them
<whitequark>
08:44 < whitequark> how about this: a duration field could be a number, or it could be an expression, using just addition and multiplication, which would incorporate function parameters
<whitequark>
what do you think about this
<whitequark>
you could still always compute it from a signature. it is also composable.
<whitequark>
and you don't even have to inline
<sb0>
that won't work
<whitequark>
why?
<sb0>
you need to interleave the inside of functions. a simple case is with parallel: a.pulse(10*us) b.pulse(20*ns)
<whitequark>
sure
<whitequark>
let's say pulse is defined as:
<sb0>
that gets lowered to: a.on() b.on() delay(10*us) a.off() delay(10*us) b.off()
<whitequark>
then its signature will look like: (x=int; duration=(x+x))->None
<sb0>
why the final delay?
<whitequark>
illustrative purposes
<whitequark>
so if you want something like
<whitequark>
def pulsen(x, n): for _ in range(n): pulse(x)
<whitequark>
it will get duration=n*(x+x)
<sb0>
ok, I understood
<whitequark>
the advantage over abstract interpretation is that this scheme is very transparent. you can inspect every piece and they will always get the exact same type regardless of context
<sb0>
but my main critique is this won't handle "with parallel: a.pulse(10*us) b.pulse(20*ns)"
<whitequark>
why not?
<sb0>
because you need to break down/inline pulse()
<sb0>
before interleaving
<whitequark>
why?
<sb0>
because the correct lowered result is "a.on() b.on() delay(10*us) a.off() delay(10*us) b.off()"
<whitequark>
sure
<sb0>
and you can't get that without looking into each statement of pulse()
<whitequark>
the interleaving transformation itself inlines
<whitequark>
but this happens after computing the duration
<whitequark>
so after the typechecking, you'll know that a.pulse(10) executes for 20us, and b.pulse(20) executes for 40
<sb0>
so why bother with computing durations? reducing the amount of functions that end up inlined?
<whitequark>
error reporting. with the scheme I am proposing, the computation of duration is completely local
<whitequark>
the computed duration, or impossibility of computing one, depends only on the lexical content of the function
<whitequark>
which makes it easy to explain why was it not possible to do so
<whitequark>
whereas if you inline three levels deep, how are you going to map your error back to your original code?
<whitequark>
less inlining is a minor bonus
<whitequark>
not to mention this is quicker to implement than abstract interpretation, because I don't need DCE, SCCP, etc
<sb0>
you also have to deal with at_mu()
<whitequark>
what does that do?
<sb0>
set now() to an absolute timestamp. that will throw off your duration computation...
<sb0>
a common use case is:
<sb0>
t = signal_input.timestamp_mu(); at_mu(t); delay(...); signal_output.do_something(...)
<whitequark>
can you even interleave waiting at at_mu() with delay()s?
<whitequark>
I don't see how that can be done statically
<sb0>
no, this obviously cannot be interleaved
<whitequark>
then why is this a problem? the type-level duration is only used for interleaving
<whitequark>
if you try to interleave a function containing that, it will point at at_mu and become angry
<sb0>
so it would have a "complicated duration" flag?
<whitequark>
basically, with a diagnostic hidden inside
<whitequark>
the diagnostic will not be shown except if something requires interleaving
<sb0>
same as if the duration would not be a polynomial expression of function params and constants?
<whitequark>
yes
<whitequark>
well, different message, obviously
<whitequark>
but same idea
<sb0>
so this "duration" flag is used solely for getting better error messages. ok, good.
<whitequark>
it will also be used to decide whether you can avoid inlining
<whitequark>
since if you interleave a function which takes 10us, no matter how much delays inside, with a 20us delay, no point in that.
<whitequark>
but otherwise, yes.
<sb0>
the only drawback I see is this will fail to interleave those cases when the duration expression is non polynomial, but which could still be resolved by constant propagation/DCE
<whitequark>
I actually consider this a feature
<sb0>
eg. if x > 5: delay(5*us) else: delay(10*us)
<whitequark>
because if you do this, you introduce global dependencies all across your program, and when you refactor it, it will break in contrived ways
<whitequark>
that are, which is the motivating part, not at all expressible with sane error messages
<whitequark>
I mean, the best you can do is to print the entire history of abstract computation you performed that led to the failing case
<whitequark>
which is not very helpful and is also a lot of work to actually display
<whitequark>
if really desired, you can bring additional clauses into the inferred duration expression. even if expression above, why not
<whitequark>
so if there is some common non-polynomial thing we need to support, it can be done
<whitequark>
duration(5 if x > 5 else 10)
<sb0>
yes, but this support is already there in CP/DCE
<sb0>
or are you considering leaving CP/DCE to LLVM entirely?
<whitequark>
absolutely
<whitequark>
it has enough type information with the design I have. it will do as good a job as I can
<whitequark>
(probably even better, seeing as it has passes like scalar evolution)
<mithro>
how do I configure the speed of the uart in misoc MiniSoC?
<ysionneau>
you can play with uart_baudrate parameter of SoC Module
<ysionneau>
something like -Ot uart_baudrate <value> when using make.py
<mithro>
thanks!
<mithro>
ysionneau: is there an easy way to reach into the self.submodules.uart_phy and map the output value to a second pin?
<mithro>
sb0: and how do I map that to an IO pin? I would have thought self.comb += [platform.lookup_request("debug").eq(platform.lookup_request("serial").tx)] would work - but it can't find the "debug" specifier in the IOs?
<mithro>
oh, that needs to be
<mithro>
self.comb += [platform.request("debug").eq(platform.lookup_request("serial").tx)] it seems
<mithro>
okay, I'm now at the stage that I can see the UART output from misoc and the UART output from the USB-UART - but still no data is going between them...
<mithro>
timing all looks fine....
FabM has quit [Quit: ChatZilla 0.9.91.1 [Firefox 39.0/20150630154324]]
<mithro>
well, the misoc side can receive the data and echo it back, so it just looks like the path from the USB-UART up to the computer that is borked...
<sb0>
lookup_request is for fetching a io pin that has already been requested before
<olofk>
Just read through the back logs.
<olofk>
When it comes to hold time violations in ISE, in my experience these are almost always caused by unhandled CDCs in other parts of the design that makes the router try too hard to meet unnecessary constraitns
<olofk>
I would recommend taking a look at all CDCs and handle them individually. That will probably fixes the hold time violations
<olofk>
Also, the golden reference or1k simulator is or1ksim. It's a C model that's pretty fast
<olofk>
And I think the info on the PIC stuff is out of date. Not entirely sure, but I believe that information is now fixed in the arch spec
<GitHub3>
[artiq] whitequark pushed 7 new commits to new-py2llvm: http://git.io/vmEjt
<GitHub3>
artiq/new-py2llvm 53fb03d whitequark: Restrict comprehensions to single for and no if clauses.
<GitHub3>
artiq/new-py2llvm e9416f4 whitequark: Convert Slice into typed SliceT.
<GitHub3>
artiq/new-py2llvm 5000f87 whitequark: Rename the field of CoerceT from expr to value.
<whitequark>
does or1ksim have any IO though?
<whitequark>
how is that handled?
<GitHub162>
[artiq] whitequark pushed 1 new commit to new-py2llvm: http://git.io/vmuOZ
<GitHub162>
artiq/new-py2llvm 5756cfc whitequark: Correctly infer type of list(iterable).
ylamarre has joined #m-labs
ylamarre has quit [Client Quit]
<GitHub158>
[artiq] whitequark pushed 1 new commit to new-py2llvm: http://git.io/vmu4E
<GitHub158>
artiq/new-py2llvm bcd1832 whitequark: Ensure bindings are created in correct order for e.g. "x, y = y, x".
ylamarre has joined #m-labs
<sb0>
in Qt, retrieving the text of a QLineEdit: widget.text(). of a QComboBox: widget.currentText()
<whitequark>
QLineEdit is a kind of label, whereas QComboBox isn't
<whitequark>
*why* QLineEdit is a kind of label is beyond me
<whitequark>
hm, there's some awkward interaction between exception handling and allocation
<whitequark>
you can't restore the stack pointer blindly when longjmp'ing
<whitequark>
e.g. def f(): try: x = [1]; raise E; except E: x[0] # segfault
<whitequark>
so I would essentially have to update the stored stack pointer in the last jmpbuf before every call or raise
<whitequark>
... which is *exactly* what SjLjEHPrepare was designed to do. but alas
<whitequark>
by the way, if you wanted a reason as to why the intrinsics were kind of weird, this is why
<sb0>
isn't the stack pointer offset once and for all in the function prologue?
<whitequark>
nope
<whitequark>
since I'm using stack for dynamic allocation
<whitequark>
this will continually advance the stack pointer down. the spill slots and locals will be addressed as frame-point-relative though
<sb0>
yeah, sure
<whitequark>
x = [1] is an alloca inside
<sb0>
so you are implementing dynamic allocation, e.g. [0]*some_complicated_algo() is valid code?
<whitequark>
sure
<whitequark>
it doesn't really matter that lists can be dynamically sized, because even if they were statically sized, you could put them one inside another
<whitequark>
well
<whitequark>
basically, allocate and let them escape the immediate vicinity of allocation
<whitequark>
so everything that you allocate needs to stay alive until the function finally returns
<sb0>
I see
<sb0>
what do you propose? implement sjlj intrinsics into the or1k backend?
<whitequark>
no
<whitequark>
well, actually, I'm not sure, maybe yes
<whitequark>
I should look what is easier, adding the intrinsics or implementing that functionality myself
<whitequark>
sb0: are you *sure* you don't want to use libunwind?
<whitequark>
that will give us backtraces and EH support with pretty much no development cost
<whitequark>
LLVM already generates suitable DWARF and there is an OR1K port
<whitequark>
("porting" libunwind consists of adding two assembly stubs to save/restore all registers)
<whitequark>
raising from C is calling _Unwind_Raise with the right arguments
<whitequark>
catching unhandled exceptions from C is wrapping _Unwind_Raise; it will tell you when there are no suitable handlers
<sb0>
well, that dynamic stack pointer is a pretty serious problem, so I guess the options are a) hack jmpbufs b) intrinsics c) libunwind
<whitequark>
yes
<sb0>
a) sounds pretty bad
<sb0>
what are the pros and cons of b vs. c
<whitequark>
b: more development time spent on backend, then again when lm32 support is needed. however, simpler runtime behavior
<whitequark>
c: virtually zero development time (dwarf output requires no target-specific code and libunwind porting is trivial), more complex runtime (includes libunwind, reads DWARF tables)
<whitequark>
c is also zero-cost on fast path
<whitequark>
given that basically everything can raise (IndexError, ValueError, etc) there *might* be some usefulness to that
<whitequark>
but also might not
<sb0>
does libunwind deal with reading the DWARF tables?
<sb0>
how much glue is needed?
<whitequark>
that's pretty much its purpose
<whitequark>
very little. most of the annoying parts are generated by LLVM itself
<whitequark>
you need to set up a specific landing pad structure, provide a personality routine, and call _Unwind_Raise
<whitequark>
and put your unhandled exn handler into whatever place _cxa_terminate is placed by clang
<ysionneau>
cr1901: for desktop related questions about NetBSD you can ask around on #EdgeBSD , khorben is maintaining his own desktop environement for NetBSD called DeforaOS
<ysionneau>
he has well thought about a lot of desktop related stuff
<cr1901>
Alright- so they'll accept plain vanilla Net q's then?
<ysionneau>
yes, absolutely
<ysionneau>
ah, there is a uefi variable you can change to access more functions ... nice
<sb0>
nice? no.
<ysionneau>
well, not nice, but good that it at least exists, even if it should be available directly
<sb0>
also, my bios uses different format
<ysionneau>
a konami code would have be more fun than having to edit with hex editor though ...
<ysionneau>
been*
<ysionneau>
ah so you cannot tweak the same exact byte?
<cr1901>
Awesome, good to know others have given it thought.
<cr1901>
I'm sure Free is great too. I just had better, more fun experiences running Net on stuff (bucket list would be a 68k machine).
<sb0>
if those "others" were lenovo intel, _then_ it would be good