#m-labs on 2016-01-01 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:16 Gurty has quit [Quit: Kooll ~o~ datalove <3³\infty]

00:30 Gurty has joined #m-labs

01:33 rohitksingh has quit [Quit: Leaving.]

02:39 <stekern> whitequark: or1k doesn't handle double fault in a special way, which in some cases makes debugging hard

02:42 <sb0> whitequark, there are no fire-and-forget RPCs

02:43 <sb0> so that would need to be implemented too... delays delays delays delays

02:47 <sb0> my compiler did the simplest thing to do: add a section at the end with RPCs to setattr() to all attributes that could be modified (i.e. appear on the LHS of an assignment), please do that or whatever takes the least time, you may add your mechanism to a list of "potential future compiler improvements", along with user-friendly exceptions

02:49 <sb0> also that mechanism needs to be smarter, e.g. ttl.pulse() is a performance-critical function that is a bit on the slow side already, and it would do two serializations, plus generate a lot of traffic anyway

02:56 <sb0> whitequark, re. "for i in range(self.npulses)" not being interleaved in test_loopback_count. my compiler did constant propagation before unrolling to deal with such problems.

02:57 <cr1901_modern> Tbh, I thought double-fault was an x86-specific thing

03:57 <sb0> wtf, fbo.gov name resolution fails from two HK ISPs and works from Germany and US

03:58 <sb0> and adding the IP to /etc/hosts makes the site accessible here

04:31 <sb0> https://physics.aps.org/featured-article-pdf/10.1103/PhysRevLett.115.260602 "On-Chip Maxwell’s Demon as an Information-Powered Refrigerator"

08:51 <whitequark> sb0: I already did user-friendly exceptions

08:52 <whitequark> ok, yes, I see the issue with ttl.pulse

09:06 <whitequark> sb0: with regards to constant propagation, please read the entirety of https://github.com/m-labs/artiq/issues/193 to understand why it is a bad idea.

09:19 <sb0> hmm

09:28 <sb0> re. #193, what about with Constants(t2=t*100) as const: pulse(const["t2"])?

09:28 <sb0> that's enforceable on host

09:28 <sb0> const.t2 even, that's shorter

09:30 <whitequark> oh yeah that totally works

09:30 <whitequark> good idea

09:32 <whitequark> very easy to implement too

09:33 <whitequark> sb0: note that that proposal alone doesn't solve the self.npulses problem

09:33 <whitequark> because within the context of run, self is not known

09:34 <sb0> yes... another issue with it is some perverse user may still do "const = ..." within the context manager block, and host python will take it

09:34 <whitequark> oh, that's not a problem at all

09:35 <sb0> yes, ot

09:35 <sb0> it's easy to print a clear error for this

09:35 <whitequark> yes

09:35 <whitequark> `with Constants` is clearly a special construct, it's allowed to be a little magical

09:35 <whitequark> and there's no unbounded inlining or something like that

09:36 <whitequark> speaking of npulses, the second part that's necessary is https://github.com/m-labs/artiq/issues/191

09:36 <whitequark> well, a more general issue than #191

09:37 <whitequark> the access to `self.npulses` has to go into the delay signature of run, so that it will be inlined and `self` exposed to the interleaver

09:42 <sb0> self.npulses = freeze(...) on host...

09:42 <sb0> then the compiler treats it as int literal

09:43 <whitequark> no, that will not help in any way

09:43 <whitequark> I don't know what *self* is

09:43 <whitequark> `freeze` is the same as `with Constants`

09:46 <sb0> hm, i see... objects on the device are a pain

09:49 <whitequark> yes

09:49 <sb0> having a global pool of constants isn't great either, because many constants are clearly e.g. local to one driver

09:49 <whitequark> I have just now realized that even if I put the `self.npulses` access into the signature and make inlining it *possible*, there will still be more issues

09:49 <whitequark> hm

09:50 <whitequark> well, there's a simple solution, of course

09:50 <whitequark> use a closure!

09:50 <sb0> how?

09:51 <whitequark> the upvalues captured by a closure are basically a pool of constants that are local to this function

09:52 <whitequark> I think they'll even work with no modification to the current compiler

09:52 <sb0> how does it work in the case of that self.npulses test?

09:54 <whitequark> http://hastebin.com/ogonomokib.py

09:54 <whitequark> well, more like http://hastebin.com/ayekawukid.py

09:56 <whitequark> we can combine it like this so that the decorator can grab the core device reference http://hastebin.com/zapimimopo.py

09:57 <sb0> that will work for this example, but I'm thinking of another use cases where one would like to apply a constant delay, stored in the driver, to a given TTL channel

09:57 <whitequark> and I think we can put Constants on the host, too http://hastebin.com/ovimuyujap.py

09:57 <whitequark> hm

09:57 <whitequark> yeah, that's a problem.

09:58 <whitequark> I mean--it's not just a language design problem, there are perfectly valid cases where it's statically impossible to solve

09:58 <whitequark> like if you do do_with_delay(self.driver1 if cond else self.driver2)

09:59 <sb0> well we can put that delay in gateware ...

10:00 <whitequark> how?

10:01 <whitequark> the interleaver has to see every delay

10:01 <sb0> registers in the RTIO core that add some value to the timestamp exposed to the CPU for each channel

10:01 <whitequark> how would that help? the CPU can already get the delay out of the object directly

10:02 <whitequark> the issue is that the interleaver can't

10:02 <sb0> right

10:02 <sb0> well in that case it's not so much of a problem indeed, as we only need to be monotonic within a channel

10:03 <whitequark> unfortunately, this makes things harder

10:03 <whitequark> see https://github.com/m-labs/artiq/issues/193#issuecomment-165963810

10:05 <sb0> insert the delay at the syscall. ttl_set(timestamp+delay, ...)

10:05 <sb0> then the interleaver doesn't know about it

10:05 <sb0> but optimizers can do constant propagation

10:07 <whitequark> I think that will break if you try to read now()

10:08 <sb0> could a Constants be passed as function call argument?

10:08 <sb0> I think there would be a need for some global pool of constants

10:09 <sb0> that can possibly be put into delay()

10:09 <whitequark> I don't think that will be of any help

10:09 <sb0> it will: with the closure trick, you can build the constant pool and then pass it around

10:09 <whitequark> the knowledge that "any of these numbers" can be put into a delay "somewhere inside the callstack" at this point does not allow the interleaver to do anything useful

10:10 <whitequark> once you start passing around Constants it's just like any other object

10:10 <whitequark> well, it's slightly better, because you can't mutate it

10:10 <whitequark> but this is not the issue we are facing yet

10:11 <whitequark> the benefit of `with Constants` is that `with` is purely lexical. it restricts the scope the analysis has to examine

10:11 <sb0> you can put restrictions on how Constants can be passed around. putting it into a function call argument is OK, anything else isn't

10:13 <whitequark> so, how will this help with the "constant delay stored in the driver" issue?

10:13 <sb0> actually, maybe it makes sense to apply that restriction to all objects

10:14 <sb0> then you always know at compile time what self is

10:15 <whitequark> nope, a function can be called with two different objects

10:15 <whitequark> and this call may be threaded arbitrarily deep in the call stack

10:16 <whitequark> I mean, putting `self.npulses` in the delay expression in the signature will have the same effect without being unnecessarily restrictive

10:18 <sb0> <whitequark> I think that will break if you try to read now() <<< how?

10:19 <whitequark> because now() will sometimes return values that are less than the channel furthest into the timeline

10:20 <sb0> how does this cause a problem?

10:22 <sb0> only the gateware/runtime sees that value

10:22 <whitequark> you'll get an RTIOSequenceError, no?

10:23 <sb0> no, you get RTIOSequenceError when you are non-monotonic within a single channel

10:23 <sb0> but all writes to the channel would get the same delay, so it stays monotonic

10:24 <sb0> there is actually one problem with synching, but that can be solved within the driver

10:24 <whitequark> so is it more like ttl_set(timestamp, ...); add_this_delay_to_every_write_to_this_channel_from_now_on(delay, ...) ?

10:25 <sb0> the delay I'm talking about is to compensate for physical latencies in the system (cable lengths, external device reaction times, etc.)

10:25 <sb0> the add_this_delay_to_every_write_to_this_channel_from_now_on is typically done once and for all at core device boot-up

10:26 <whitequark> ahhh yes I see, so basically you have a configuration in the channel that says that all pulses should be extended by X ms?

10:26 <sb0> yes

10:26 <whitequark> yeah, absolutely, put it in the driver itself, and don't expose it to the interleaver

10:26 <whitequark> that's the best solution

10:26 <sb0> or that's the plan, it's not done yet

10:26 <whitequark> there's no reason the interleaver should care about it

10:28 <sb0> yes, but better if figuring out the delay doesn't involve a lot of pointer-chasing

10:28 <sb0> at runtime

10:28 <sb0> but I guess LLVM & co take care of that?

10:28 <sb0> otherwise, putting it in gateware is also an option

10:28 <whitequark> it's one load and one add, is that this bad?

10:29 <sb0> should be ok

10:29 <sb0> another thing though

10:30 <sb0> it's again pretty hard to extract information about that, but AFAIK a lot of experiments involve a number of delays (e.g. qubit flipping time) that need to be periodically recalibrated, written into the datasets, and then read and used by other experiments

10:32 <sb0> I think there can be many of such delays, and this will become worse as the number of qubits increases

10:32 <whitequark> there's only one global dataset though, right?

10:32 <sb0> yeah

10:32 <whitequark> that makes it easy

10:33 <whitequark> well, for instance, we could add a get_dataset builtin

10:33 <whitequark> it's a bit ugly but it will work with no effort

10:33 <whitequark> what is hard and what is easy to implement in a static analyzer is often counterintuitive...

10:33 <sb0> or maybe a global_constants builtin, it doesn't have to be specific to datasets

10:34 <sb0> global_constants would get populated from datasets in Experiment.build()

10:34 <sb0> yeah I guess that works. just needs to be careful with name collisions in there...

10:35 <sb0> if we have global_constants, is there really a need for with Constants?

10:36 <whitequark> oh yes, you need some way to introduce an alias

10:36 <whitequark> they're orthogonal

10:37 <whitequark> eg the example in https://github.com/m-labs/artiq/issues/193#issuecomment-165591505

10:37 <sb0> maybe global_constants helps with compilation caching too ...

10:38 <whitequark> the main issue with compilation caching is all the host objects that can change

11:18 <sb0> anyway, i feel that the ultimate solution to those problems is gateware-assisted scheduling and coroutines ...

11:32 <sb0> and an asic version of that sounds interesting

11:59 mumptai has joined #m-labs

12:48 <whitequark> sb0: why would you need an ASIC?

12:48 <whitequark> or even gateware assistance

12:48 <whitequark> just coroutines are enough

12:48 <sb0> context switches are slow, scheduling is slow

12:48 <whitequark> context switches are not slow

12:49 <whitequark> but that doesn't matter anyway, because when you use coroutines, there are no context switches

12:49 <sb0> would you be able to resume a coroutine in ~100ns?

12:50 <whitequark> how many OR1K instructions is that?

12:50 <sb0> a dozen, if there are no stalls

12:50 <whitequark> yes

12:50 <whitequark> resuming a coroutine involves a call, load, and an indirect jump

12:51 <sb0> and reloading registers from memory

12:51 <whitequark> nope

12:51 <sb0> well if you put all variables on the stack, you are just displacing the slowness problem

12:51 <whitequark> you only have to reload what you have spilled, and given the structure of Python programs, there are few spills between expressions

12:52 <whitequark> yes, sort of

12:52 <whitequark> let me elaborate on this

12:52 <whitequark> if a variable is referenced from an inner closure, you have to put it in memory

12:53 <whitequark> if a variable isn't, then it is accessible for LLVM to rearrange, and its liveness analysis usually produces very good result

12:53 <whitequark> so... a dozen instructions gives you around nine variables you can load from a spill

12:54 <whitequark> which is more than enough for most code

12:54 <whitequark> the real problem with coroutines is that our current allocation mechanism is not amenable to concurrent allocations

12:55 <sb0> you mean multiple stacks?

12:55 <whitequark> not really

12:55 <whitequark> if you want to pass data from one coroutine to another, you have to maintain memory safety somehow

12:55 <whitequark> and you no longer have the stack frame nesting discipline, which we use now

12:56 <whitequark> so if we ever do this, it will require a complete redesign of the runtime and most of the language

12:56 <sb0> well the coroutines would not be exposed to the user

12:57 <whitequark> variables in the frame that contains `with parallel:` will become coroutine arguments

12:57 <whitequark> you're basically doing a CPS transformation of the contents of `with parallel:`

12:58 <sb0> also, scheduling is potentially slow as well

12:58 <whitequark> saying that all the coroutines are allocated and deallocated at the same time will almost work, except it does not take nesting calls into account

12:58 <whitequark> where the nested calls can also be suspended

12:58 <sb0> the coroutines will need to yield RTIO requests, then the scheduler has to pick the right one, and resume the right coroutine

12:58 <whitequark> yes, gateware-assisted scheduling is probably necessary

12:58 <whitequark> if you want to do it in 100ns

12:59 <whitequark> Python is nearly the single worst language you could have chosen for coredevice...

12:59 <whitequark> not that there was much choice, I guess

12:59 <sb0> why?

13:02 <whitequark> everything is mutable and relies on side effects

13:02 <whitequark> plus there are almost no reliable ways to restrict the language semantics

13:02 <whitequark> plus there is almost no extensibility

13:03 <sb0> pure functional languages without side effects a pain to use in virtually every real world scenario

13:03 <whitequark> that is not true but also I don't suggest a pure functional language

13:03 <whitequark> for one, I'm not aware of any that can satsify hard realtime constraints

13:04 <sb0> even writing to a goddamn file is difficult in FP

13:04 <whitequark> have you actually *used* FP languages?

13:05 <whitequark> if we based the coredevice language on, say, OCaml, almost all of the interleaving troubles would disappear instantly

13:05 <whitequark> ML's `let` construct introduces an immutable binding

13:06 <whitequark> so there's no need for hacks like with Constant

13:07 <whitequark> ML's modules make it natural to implement libraries while exporting a static API, so there's no need to do devirtualization to inline basic functions

13:07 <whitequark> as a side benefit, you also get faster code, because there are much fewer loads everywhere

13:08 <whitequark> ML's iteration, which is just tail recursion, makes loop unrolling fall out naturally out of inlining functions on request, and it would integrate beautifully with our iodelay mechanism in the fucntion signature

13:08 <whitequark> so the unroller would be easier to implement *and* more powerful in the range of expressions it can recognize

13:09 <whitequark> ML's static type system is well amenable to addition of regions, like in MLton or MLKit, which makes it easy to perform automatic memory management with realtime constraints and predictable deallocation time

13:10 <whitequark> oh, and this is how you write to a file. http://hastebin.com/kopamotedi.mel

13:13 <whitequark> I don't even mention the parametric types, because ARTIQ Python already has half of those anyway, except without all the really good parts (sum types), since those are incompatible with OOP

13:13 <whitequark> ML is what computing *ought* to use for the last 30 years, instead of shitty worthless hacks like C

13:14 <sb0> well, that file IO is not pure FP, is it?

13:14 <whitequark> it's not

13:14 <whitequark> pure FP's usefulness has a very limited domain. I'd never suggest it

13:19 <larsc> if only there was a FP language that would result in legible code ;)

13:20 <whitequark> larsc: if only there was a dynamic language that would result in a code not filled with trivial bugs to the brim

13:21 <whitequark> let's base our language about three concepts (mutation, indirection, and dynamic typing), all of which humans are really bad at analyzing! surely this will result in high-quality software

13:22 <whitequark> https://github.com/whitequark/foundry/blob/master/src/transforms/dead_code_elim.ml

13:22 <whitequark> tell me this is less readable than our python compiler or w/e

13:23 <whitequark> or I dunno, a sparse conditional constant propagation pass, if you'd like something more complex https://github.com/whitequark/foundry/blob/master/src/transforms/constant_folding.ml

13:25 <larsc> you always end up with these let x = ... in let y = ... in let z = ... and so on

13:25 <whitequark> and?

13:25 <larsc> and there are no visual clues where the scoping ends

13:26 <whitequark> oh, yeah, that's perhaps the worst part of ML

13:28 <larsc> when I read code I want to be able to follow the flow without having to constantly backtrace things

13:30 <whitequark> the really annoying part is not really `let`s but how you have to randomly wrap `match` in `begin..end`, and if you don't, it produces pretty obscure errors

13:30 <whitequark> I agree with you, but also it's beside the point

13:31 <whitequark> everything I said above equally applies to, for example, Rust, and it uses braces to delimit scoping

13:35 <larsc> rust seems to have the right amount of non-academicness

13:35 <larsc> but is not purley functional

13:35 <larsc> is it?

13:41 <whitequark> I don't think Rust can be classified as functional at all

13:41 <whitequark> it does not rely on algorithms based on inductive reasoning

13:42 <whitequark> Rust's semantics is still wholesale borrowed from ML

13:44 <whitequark> first-class functions, sum/product types, parametric types, regions, modules, implicits (which they call traits), pattern matching, embracing immutability... all of which, excluding 1st class functions, were either introduced or popularized by the ML family

13:46 <whitequark> from C++ it borrows bits of syntax, pointers, and specialization of generics

13:47 <whitequark> Rust's macro system is weird; the closest thing I remember is Honu, an experimental proposal for JavaScript

13:47 <whitequark> I think that's... all?

13:52 <whitequark> oh yeah, Rust's deterministic destruction is based on linear types, which also appeared in an ML variant.

14:16 <whitequark> sb0: will it lead to any problems if I generate byte stores in the compiler?

14:16 <whitequark> i.e. are they anyhow slower on LM32/OR1K?

14:17 <sb0> any store (32/16/8-bit) completes in the same time

14:18 <whitequark> ok

16:04 rohitksingh has joined #m-labs

16:42 <cr1901_modern> whitequark: with C proven to be Turing incomplete, I wonder if it's possible to create a TC language where the representation of datatypes in memory. Somehow I think "no". Which would mean that all (modern?) CPUs are effectively a host for a VM or runtime which provides enough abstraction to support a Turing Complete language.

16:43 <cr1901_modern> where the representation of datatypes in memory is important*

16:46 <whitequark> no actual implemented language is turing-complete, since they all have finite memory

16:46 <cr1901_modern> That's an implementation detail

16:46 <cr1901_modern> C the LANGUAGE is Turing Incomplete

16:48 <cr1901_modern> So my question is, are all bare-metal capable languages Turing Incomplete, when going by the spec alone, and the only way around this is either a complex runtime to abstract real memory away (Haskell, OCaml, etc), or simulate a VM that is Turing complete (Forth).

16:50 <cr1901_modern> Rust seems to be a happy medium w/ Linear Types (which TIL... pretty cool!)

16:50 <whitequark> not only the question is functionally meaningless, but also the claim that C is not turing-complete is false

16:51 <whitequark> sure, C has fixed-width pointers. you can still map an unbounded amount of memory using fixed-width pointers by making use of OS services

16:56 ylamarre has joined #m-labs

16:58 <cr1901_modern> I'm not sure how to do that, but I'll take your word for it.

16:58 <cr1901_modern> Why do you think the question is functionally meaningless?

16:58 <whitequark> man 2 mmap

16:59 <whitequark> because not only the categories you use ("bare-metal") are very fuzzily defined, but also the answer is completely useless for any practical purpose

17:09 <cr1901_modern> bare-metal to me == "Memory contents/address layout at the bit level represents what the CPU sees/puts out as voltages." Sure, the situations where that is useful in personal computing are limited. But for better or worse, there will always be situations where that level of control is needed. So do we; use asm? Use C? A stripped down Haskell/OCaml?

17:10 <whitequark> high-speed buses use scrambling, so what you read intentionally has little correlation to the voltages on the bus...

17:15 <whitequark> but that's not really the issue here. the issue is that, if you have any hardware in mind, then you do not want or need a turing-complete language. it doesn't matter

17:18 <cr1901_modern> hmmm... I suppose that's one purpose of Rust's unsafe. Keep the points where you talk to the hardware to a minimum so you have little surface area to break the abstractions/memory safety that Rust provides

17:22 <cr1901_modern> I do wish I understood how Lisp Machines worked, just so I can understand how a CPU that's not tailored to C could be created.

17:24 aeris has quit [Read error: Connection reset by peer]

17:24 aeris has joined #m-labs

17:36 rohitksingh has quit [Ping timeout: 240 seconds]

17:51 <ylamarre> cr1901_modern: MIT released much of the documentation linked to those and their processor (CADR machine).

17:51 <ylamarre> IIRC, I did send a link about those here.

17:53 rohitksingh has joined #m-labs

18:04 <cr1901_modern> If you did, I missed it. But I'll look at the logs when I have time

18:25 ylamarre has quit [Remote host closed the connection]

18:25 ylamarre has joined #m-labs

19:00 <ylamarre> http://www.unlambda.com/cadr/

19:00 <ylamarre> Check at the bottom of the page...

19:34 <cr1901_modern> Well, that answers my question. Ty for the links

20:35 ylamarre has quit [Ping timeout: 276 seconds]

21:11 rohitksingh has quit [Quit: Leaving.]

21:23 <cr1901_modern> http://www.embedded.com/electronics-blogs/break-points/4441091/Subthreshold-transistors-and-MCUs I'm gonna miss the MSP430/8-bit micros (unless vendors get their acts together) :(

21:24 <cr1901_modern> Although ARM micros are fun too, so maybe I won't miss them

21:32 aeris- has joined #m-labs

21:34 aeris has quit [Ping timeout: 260 seconds]

21:35 aeris- has quit [Client Quit]

21:37 aeris has joined #m-labs

22:52 <GitHub5> [artiq] whitequark pushed 2 new commits to master: http://git.io/vuTwa

22:52 <GitHub5> artiq/master 5f68cc6 whitequark: transforms.artiq_ir_generator: handle `raise` in `except:` with `finally:`.

22:52 <GitHub5> artiq/master 2e33084 whitequark: transforms.llvm_ir_generator: implement instrumentation for attribute writeback.

23:57 FabM_ has joined #m-labs

23:57 acathla` has joined #m-labs