sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
Gurty has quit [Quit: Kooll ~o~ datalove <3³\infty]
Gurty has joined #m-labs
Gurty has joined #m-labs
rohitksingh has quit [Quit: Leaving.]
<stekern> whitequark: or1k doesn't handle double fault in a special way, which in some cases makes debugging hard
<sb0> whitequark, there are no fire-and-forget RPCs
<sb0> so that would need to be implemented too... delays delays delays delays
<sb0> my compiler did the simplest thing to do: add a section at the end with RPCs to setattr() to all attributes that could be modified (i.e. appear on the LHS of an assignment), please do that or whatever takes the least time, you may add your mechanism to a list of "potential future compiler improvements", along with user-friendly exceptions
<sb0> also that mechanism needs to be smarter, e.g. ttl.pulse() is a performance-critical function that is a bit on the slow side already, and it would do two serializations, plus generate a lot of traffic anyway
<sb0> whitequark, re. "for i in range(self.npulses)" not being interleaved in test_loopback_count. my compiler did constant propagation before unrolling to deal with such problems.
<cr1901_modern> Tbh, I thought double-fault was an x86-specific thing
<sb0> wtf, fbo.gov name resolution fails from two HK ISPs and works from Germany and US
<sb0> and adding the IP to /etc/hosts makes the site accessible here
<sb0> https://physics.aps.org/featured-article-pdf/10.1103/PhysRevLett.115.260602 "On-Chip Maxwell’s Demon as an Information-Powered Refrigerator"
<whitequark> sb0: I already did user-friendly exceptions
<whitequark> ok, yes, I see the issue with ttl.pulse
<whitequark> sb0: with regards to constant propagation, please read the entirety of https://github.com/m-labs/artiq/issues/193 to understand why it is a bad idea.
<sb0> hmm
<sb0> re. #193, what about with Constants(t2=t*100) as const: pulse(const["t2"])?
<sb0> that's enforceable on host
<sb0> const.t2 even, that's shorter
<whitequark> oh yeah that totally works
<whitequark> good idea
<whitequark> very easy to implement too
<whitequark> sb0: note that that proposal alone doesn't solve the self.npulses problem
<whitequark> because within the context of run, self is not known
<sb0> yes... another issue with it is some perverse user may still do "const = ..." within the context manager block, and host python will take it
<whitequark> oh, that's not a problem at all
<sb0> yes, ot
<sb0> it's easy to print a clear error for this
<whitequark> yes
<whitequark> `with Constants` is clearly a special construct, it's allowed to be a little magical
<whitequark> and there's no unbounded inlining or something like that
<whitequark> speaking of npulses, the second part that's necessary is https://github.com/m-labs/artiq/issues/191
<whitequark> well, a more general issue than #191
<whitequark> the access to `self.npulses` has to go into the delay signature of run, so that it will be inlined and `self` exposed to the interleaver
<sb0> self.npulses = freeze(...) on host...
<sb0> then the compiler treats it as int literal
<whitequark> no, that will not help in any way
<whitequark> I don't know what *self* is
<whitequark> `freeze` is the same as `with Constants`
<sb0> hm, i see... objects on the device are a pain
<whitequark> yes
<sb0> having a global pool of constants isn't great either, because many constants are clearly e.g. local to one driver
<whitequark> I have just now realized that even if I put the `self.npulses` access into the signature and make inlining it *possible*, there will still be more issues
<whitequark> hm
<whitequark> well, there's a simple solution, of course
<whitequark> use a closure!
<sb0> how?
<whitequark> the upvalues captured by a closure are basically a pool of constants that are local to this function
<whitequark> I think they'll even work with no modification to the current compiler
<sb0> how does it work in the case of that self.npulses test?
<whitequark> well, more like http://hastebin.com/ayekawukid.py
<whitequark> we can combine it like this so that the decorator can grab the core device reference http://hastebin.com/zapimimopo.py
<sb0> that will work for this example, but I'm thinking of another use cases where one would like to apply a constant delay, stored in the driver, to a given TTL channel
<whitequark> and I think we can put Constants on the host, too http://hastebin.com/ovimuyujap.py
<whitequark> hm
<whitequark> yeah, that's a problem.
<whitequark> I mean--it's not just a language design problem, there are perfectly valid cases where it's statically impossible to solve
<whitequark> like if you do do_with_delay(self.driver1 if cond else self.driver2)
<sb0> well we can put that delay in gateware ...
<whitequark> how?
<whitequark> the interleaver has to see every delay
<sb0> registers in the RTIO core that add some value to the timestamp exposed to the CPU for each channel
<whitequark> how would that help? the CPU can already get the delay out of the object directly
<whitequark> the issue is that the interleaver can't
<sb0> right
<sb0> well in that case it's not so much of a problem indeed, as we only need to be monotonic within a channel
<whitequark> unfortunately, this makes things harder
<sb0> insert the delay at the syscall. ttl_set(timestamp+delay, ...)
<sb0> then the interleaver doesn't know about it
<sb0> but optimizers can do constant propagation
<whitequark> I think that will break if you try to read now()
<sb0> could a Constants be passed as function call argument?
<sb0> I think there would be a need for some global pool of constants
<sb0> that can possibly be put into delay()
<whitequark> I don't think that will be of any help
<sb0> it will: with the closure trick, you can build the constant pool and then pass it around
<whitequark> the knowledge that "any of these numbers" can be put into a delay "somewhere inside the callstack" at this point does not allow the interleaver to do anything useful
<whitequark> once you start passing around Constants it's just like any other object
<whitequark> well, it's slightly better, because you can't mutate it
<whitequark> but this is not the issue we are facing yet
<whitequark> the benefit of `with Constants` is that `with` is purely lexical. it restricts the scope the analysis has to examine
<sb0> you can put restrictions on how Constants can be passed around. putting it into a function call argument is OK, anything else isn't
<whitequark> so, how will this help with the "constant delay stored in the driver" issue?
<sb0> actually, maybe it makes sense to apply that restriction to all objects
<sb0> then you always know at compile time what self is
<whitequark> nope, a function can be called with two different objects
<whitequark> and this call may be threaded arbitrarily deep in the call stack
<whitequark> I mean, putting `self.npulses` in the delay expression in the signature will have the same effect without being unnecessarily restrictive
<sb0> <whitequark> I think that will break if you try to read now() <<< how?
<whitequark> because now() will sometimes return values that are less than the channel furthest into the timeline
<sb0> how does this cause a problem?
<sb0> only the gateware/runtime sees that value
<whitequark> you'll get an RTIOSequenceError, no?
<sb0> no, you get RTIOSequenceError when you are non-monotonic within a single channel
<sb0> but all writes to the channel would get the same delay, so it stays monotonic
<sb0> there is actually one problem with synching, but that can be solved within the driver
<whitequark> so is it more like ttl_set(timestamp, ...); add_this_delay_to_every_write_to_this_channel_from_now_on(delay, ...) ?
<sb0> the delay I'm talking about is to compensate for physical latencies in the system (cable lengths, external device reaction times, etc.)
<sb0> the add_this_delay_to_every_write_to_this_channel_from_now_on is typically done once and for all at core device boot-up
<whitequark> ahhh yes I see, so basically you have a configuration in the channel that says that all pulses should be extended by X ms?
<sb0> yes
<whitequark> yeah, absolutely, put it in the driver itself, and don't expose it to the interleaver
<whitequark> that's the best solution
<sb0> or that's the plan, it's not done yet
<whitequark> there's no reason the interleaver should care about it
<sb0> yes, but better if figuring out the delay doesn't involve a lot of pointer-chasing
<sb0> at runtime
<sb0> but I guess LLVM & co take care of that?
<sb0> otherwise, putting it in gateware is also an option
<whitequark> it's one load and one add, is that this bad?
<sb0> should be ok
<sb0> another thing though
<sb0> it's again pretty hard to extract information about that, but AFAIK a lot of experiments involve a number of delays (e.g. qubit flipping time) that need to be periodically recalibrated, written into the datasets, and then read and used by other experiments
<sb0> I think there can be many of such delays, and this will become worse as the number of qubits increases
<whitequark> there's only one global dataset though, right?
<sb0> yeah
<whitequark> that makes it easy
<whitequark> well, for instance, we could add a get_dataset builtin
<whitequark> it's a bit ugly but it will work with no effort
<whitequark> what is hard and what is easy to implement in a static analyzer is often counterintuitive...
<sb0> or maybe a global_constants builtin, it doesn't have to be specific to datasets
<sb0> global_constants would get populated from datasets in Experiment.build()
<sb0> yeah I guess that works. just needs to be careful with name collisions in there...
<sb0> if we have global_constants, is there really a need for with Constants?
<whitequark> oh yes, you need some way to introduce an alias
<whitequark> they're orthogonal
<sb0> maybe global_constants helps with compilation caching too ...
<whitequark> the main issue with compilation caching is all the host objects that can change
<sb0> anyway, i feel that the ultimate solution to those problems is gateware-assisted scheduling and coroutines ...
<sb0> and an asic version of that sounds interesting
mumptai has joined #m-labs
<whitequark> sb0: why would you need an ASIC?
<whitequark> or even gateware assistance
<whitequark> just coroutines are enough
<sb0> context switches are slow, scheduling is slow
<whitequark> context switches are not slow
<whitequark> but that doesn't matter anyway, because when you use coroutines, there are no context switches
<sb0> would you be able to resume a coroutine in ~100ns?
<whitequark> how many OR1K instructions is that?
<sb0> a dozen, if there are no stalls
<whitequark> yes
<whitequark> resuming a coroutine involves a call, load, and an indirect jump
<sb0> and reloading registers from memory
<whitequark> nope
<sb0> well if you put all variables on the stack, you are just displacing the slowness problem
<whitequark> you only have to reload what you have spilled, and given the structure of Python programs, there are few spills between expressions
<whitequark> yes, sort of
<whitequark> let me elaborate on this
<whitequark> if a variable is referenced from an inner closure, you have to put it in memory
<whitequark> if a variable isn't, then it is accessible for LLVM to rearrange, and its liveness analysis usually produces very good result
<whitequark> so... a dozen instructions gives you around nine variables you can load from a spill
<whitequark> which is more than enough for most code
<whitequark> the real problem with coroutines is that our current allocation mechanism is not amenable to concurrent allocations
<sb0> you mean multiple stacks?
<whitequark> not really
<whitequark> if you want to pass data from one coroutine to another, you have to maintain memory safety somehow
<whitequark> and you no longer have the stack frame nesting discipline, which we use now
<whitequark> so if we ever do this, it will require a complete redesign of the runtime and most of the language
<sb0> well the coroutines would not be exposed to the user
<whitequark> variables in the frame that contains `with parallel:` will become coroutine arguments
<whitequark> you're basically doing a CPS transformation of the contents of `with parallel:`
<sb0> also, scheduling is potentially slow as well
<whitequark> saying that all the coroutines are allocated and deallocated at the same time will almost work, except it does not take nesting calls into account
<whitequark> where the nested calls can also be suspended
<sb0> the coroutines will need to yield RTIO requests, then the scheduler has to pick the right one, and resume the right coroutine
<whitequark> yes, gateware-assisted scheduling is probably necessary
<whitequark> if you want to do it in 100ns
<whitequark> Python is nearly the single worst language you could have chosen for coredevice...
<whitequark> not that there was much choice, I guess
<sb0> why?
<whitequark> everything is mutable and relies on side effects
<whitequark> plus there are almost no reliable ways to restrict the language semantics
<whitequark> plus there is almost no extensibility
<sb0> pure functional languages without side effects a pain to use in virtually every real world scenario
<whitequark> that is not true but also I don't suggest a pure functional language
<whitequark> for one, I'm not aware of any that can satsify hard realtime constraints
<sb0> even writing to a goddamn file is difficult in FP
<whitequark> have you actually *used* FP languages?
<whitequark> if we based the coredevice language on, say, OCaml, almost all of the interleaving troubles would disappear instantly
<whitequark> ML's `let` construct introduces an immutable binding
<whitequark> so there's no need for hacks like with Constant
<whitequark> ML's modules make it natural to implement libraries while exporting a static API, so there's no need to do devirtualization to inline basic functions
<whitequark> as a side benefit, you also get faster code, because there are much fewer loads everywhere
<whitequark> ML's iteration, which is just tail recursion, makes loop unrolling fall out naturally out of inlining functions on request, and it would integrate beautifully with our iodelay mechanism in the fucntion signature
<whitequark> so the unroller would be easier to implement *and* more powerful in the range of expressions it can recognize
<whitequark> ML's static type system is well amenable to addition of regions, like in MLton or MLKit, which makes it easy to perform automatic memory management with realtime constraints and predictable deallocation time
<whitequark> oh, and this is how you write to a file. http://hastebin.com/kopamotedi.mel
<whitequark> I don't even mention the parametric types, because ARTIQ Python already has half of those anyway, except without all the really good parts (sum types), since those are incompatible with OOP
<whitequark> ML is what computing *ought* to use for the last 30 years, instead of shitty worthless hacks like C
<sb0> well, that file IO is not pure FP, is it?
<whitequark> it's not
<whitequark> pure FP's usefulness has a very limited domain. I'd never suggest it
<larsc> if only there was a FP language that would result in legible code ;)
<whitequark> larsc: if only there was a dynamic language that would result in a code not filled with trivial bugs to the brim
<whitequark> let's base our language about three concepts (mutation, indirection, and dynamic typing), all of which humans are really bad at analyzing! surely this will result in high-quality software
<whitequark> tell me this is less readable than our python compiler or w/e
<whitequark> or I dunno, a sparse conditional constant propagation pass, if you'd like something more complex https://github.com/whitequark/foundry/blob/master/src/transforms/constant_folding.ml
<larsc> you always end up with these let x = ... in let y = ... in let z = ... and so on
<whitequark> and?
<larsc> and there are no visual clues where the scoping ends
<whitequark> oh, yeah, that's perhaps the worst part of ML
<larsc> when I read code I want to be able to follow the flow without having to constantly backtrace things
<whitequark> the really annoying part is not really `let`s but how you have to randomly wrap `match` in `begin..end`, and if you don't, it produces pretty obscure errors
<whitequark> I agree with you, but also it's beside the point
<whitequark> everything I said above equally applies to, for example, Rust, and it uses braces to delimit scoping
<larsc> rust seems to have the right amount of non-academicness
<larsc> but is not purley functional
<larsc> is it?
<whitequark> I don't think Rust can be classified as functional at all
<whitequark> it does not rely on algorithms based on inductive reasoning
<whitequark> Rust's semantics is still wholesale borrowed from ML
<whitequark> first-class functions, sum/product types, parametric types, regions, modules, implicits (which they call traits), pattern matching, embracing immutability... all of which, excluding 1st class functions, were either introduced or popularized by the ML family
<whitequark> from C++ it borrows bits of syntax, pointers, and specialization of generics
<whitequark> Rust's macro system is weird; the closest thing I remember is Honu, an experimental proposal for JavaScript
<whitequark> I think that's... all?
<whitequark> oh yeah, Rust's deterministic destruction is based on linear types, which also appeared in an ML variant.
<whitequark> sb0: will it lead to any problems if I generate byte stores in the compiler?
<whitequark> i.e. are they anyhow slower on LM32/OR1K?
<sb0> any store (32/16/8-bit) completes in the same time
<whitequark> ok
rohitksingh has joined #m-labs
<cr1901_modern> whitequark: with C proven to be Turing incomplete, I wonder if it's possible to create a TC language where the representation of datatypes in memory. Somehow I think "no". Which would mean that all (modern?) CPUs are effectively a host for a VM or runtime which provides enough abstraction to support a Turing Complete language.
<cr1901_modern> where the representation of datatypes in memory is important*
<whitequark> no actual implemented language is turing-complete, since they all have finite memory
<cr1901_modern> That's an implementation detail
<cr1901_modern> C the LANGUAGE is Turing Incomplete
<cr1901_modern> So my question is, are all bare-metal capable languages Turing Incomplete, when going by the spec alone, and the only way around this is either a complex runtime to abstract real memory away (Haskell, OCaml, etc), or simulate a VM that is Turing complete (Forth).
<cr1901_modern> Rust seems to be a happy medium w/ Linear Types (which TIL... pretty cool!)
<whitequark> not only the question is functionally meaningless, but also the claim that C is not turing-complete is false
<whitequark> sure, C has fixed-width pointers. you can still map an unbounded amount of memory using fixed-width pointers by making use of OS services
ylamarre has joined #m-labs
<cr1901_modern> I'm not sure how to do that, but I'll take your word for it.
<cr1901_modern> Why do you think the question is functionally meaningless?
<whitequark> man 2 mmap
<whitequark> because not only the categories you use ("bare-metal") are very fuzzily defined, but also the answer is completely useless for any practical purpose
<cr1901_modern> bare-metal to me == "Memory contents/address layout at the bit level represents what the CPU sees/puts out as voltages." Sure, the situations where that is useful in personal computing are limited. But for better or worse, there will always be situations where that level of control is needed. So do we; use asm? Use C? A stripped down Haskell/OCaml?
<whitequark> high-speed buses use scrambling, so what you read intentionally has little correlation to the voltages on the bus...
<whitequark> but that's not really the issue here. the issue is that, if you have any hardware in mind, then you do not want or need a turing-complete language. it doesn't matter
<cr1901_modern> hmmm... I suppose that's one purpose of Rust's unsafe. Keep the points where you talk to the hardware to a minimum so you have little surface area to break the abstractions/memory safety that Rust provides
<cr1901_modern> I do wish I understood how Lisp Machines worked, just so I can understand how a CPU that's not tailored to C could be created.
aeris has quit [Read error: Connection reset by peer]
aeris has joined #m-labs
rohitksingh has quit [Ping timeout: 240 seconds]
<ylamarre> cr1901_modern: MIT released much of the documentation linked to those and their processor (CADR machine).
<ylamarre> IIRC, I did send a link about those here.
rohitksingh has joined #m-labs
<cr1901_modern> If you did, I missed it. But I'll look at the logs when I have time
ylamarre has quit [Remote host closed the connection]
ylamarre has joined #m-labs
<ylamarre> Check at the bottom of the page...
<cr1901_modern> Well, that answers my question. Ty for the links
ylamarre has quit [Ping timeout: 276 seconds]
rohitksingh has quit [Quit: Leaving.]
<cr1901_modern> http://www.embedded.com/electronics-blogs/break-points/4441091/Subthreshold-transistors-and-MCUs I'm gonna miss the MSP430/8-bit micros (unless vendors get their acts together) :(
<cr1901_modern> Although ARM micros are fun too, so maybe I won't miss them
aeris- has joined #m-labs
aeris has quit [Ping timeout: 260 seconds]
aeris- has quit [Client Quit]
aeris has joined #m-labs
<GitHub5> [artiq] whitequark pushed 2 new commits to master: http://git.io/vuTwa
<GitHub5> artiq/master 5f68cc6 whitequark: transforms.artiq_ir_generator: handle `raise` in `except:` with `finally:`.
<GitHub5> artiq/master 2e33084 whitequark: transforms.llvm_ir_generator: implement instrumentation for attribute writeback.
FabM_ has joined #m-labs
acathla` has joined #m-labs