<GitHub172>
[artiq] enjoy-digital pushed 2 new commits to phaser: https://git.io/vPltp
<GitHub172>
artiq/phaser e998a98 Florent Kermarrec: phaser/startup: use get_configuration_checksum()
<GitHub172>
artiq/phaser b02a723 Florent Kermarrec: phaser: use 125MHz refclk for jesd
<_florent_>
rjo: ^ can you test that? I think the 500MHz refclk is too high, I tested with a 500Mhz refclk on my board and got the same behaviour you had yesterday.
<whitequark>
sb0: I don't understand
<whitequark>
it looks like runtime.rs isn't being installed into site-packages
<whitequark>
but runtime is
<whitequark>
sb0: is there some magic setuptools incantation? but even if yes I can't find its application to runtime...
<whitequark>
oh, MANIFEST
<GitHub93>
[artiq] whitequark pushed 2 new commits to master: https://git.io/vPlOr
<GitHub93>
artiq/master 8eeb6ea whitequark: packaging: include runtime.rs in MANIFEST.
<GitHub93>
artiq/master ef10344 whitequark: runtime: rewrite isr() in Rust.
<_florent_>
rjo: I'm not sure about the limitation, but I did a test and had the same behaviour you had yesterday.
<_florent_>
rjo: can be worth trying with 125MHz, if it's better we'll investigate for the higher frequencies, if that's not better the problem is elsewhere.
<rjo>
_florent_: will try. it just needs a bit more adapting.
<GitHub151>
artiq/phaser 9b860b2 Robert Jordens: phaser: fix rtio pll inputs
<whitequark>
rjo: about the background RPCs.
<whitequark>
so right now the code that traverses the (possibly rather deep) tree of pointers that is the RPC arguments is on the comm CPU side
<whitequark>
the way #551 is phrased implies that you want these to be moved to the kernel CPU side
<whitequark>
I can do that but then #551 will wait until ksupport is moved to Rust too
<whitequark>
there is another problem here, which is synchronization
<whitequark>
I currently have no idea how to implement a FIFO in a non-cache-coherent AMP system, where the reader doesn't block the writer
<whitequark>
this sounds hard and error-prone.
<rjo>
is your question whether the issue implies that the serialization should be move from the kernel cpu side to the comms cpu side?
<whitequark>
I think moving serialization to the kernel CPU side would be troublesome, yes.
<whitequark>
so I am asking whether it's in the spec.
kuldeep_ has joined #m-labs
<rjo>
well. the spec may have been affected by physicist phantasies.
<whitequark>
if we move the FIFO to a dedicated hardware buffer then that becomes a question of rust on kernel cpu
<whitequark>
which is not hard but will only take a bit of time.
kuldeep has quit [Ping timeout: 272 seconds]
<whitequark>
but I'm not sure whether that's realistic to implement
<rjo>
making #551 dependent on rust would not be a problem afaict.
<rjo>
just to check: we are talking about kernel-to-host RPCs.
<whitequark>
correct.
<whitequark>
can we move the FIFO to a hardware buffer then? how would that work?
<whitequark>
basically what I am looking for is not dealing with the caches
<rjo>
i am not entirely certain i know how the rpc through mailbox stuff works currently. you are saying currently there is one pointer coming through the mailbox and then the comms cpu serializes everything?
<whitequark>
rpc through mailbox works as follows.
<whitequark>
the mailbox is a peripheral that's just one 32-bit register. it's in an uncached area.
<whitequark>
both before setting it, and after reading it on the other side, all L1 caches are purged
<rjo>
sidenote: afaics serialization is actually not a very hungry thing. and not the thing we need to optimize here.
<whitequark>
and yes, serialization happens on the comms cpu
<whitequark>
hm.
<rjo>
but what gets passed?
<whitequark>
optimizing serialization is an independent problem
<rjo>
how does the comms cpu know what and how to serialize?
<rjo>
magic format strings?
<whitequark>
to my understanding the reason for serializing on kernel CPU is having latency bounded just by the serialization
<whitequark>
what gets passed: a struct with RPC number, RPC "tag" and a pointer to an array of arguments
<whitequark>
the tag is a serialization of the complete type of the arguments.
<rjo>
yes. that's correct. serialization on the kernel cpu also (to me at least) would make the fifo simpler ("tcp wire format") and allow the kernel cpu to continue its work when serialized without bothering about caches and dirty data.
<whitequark>
right now serialization is not exceptionally fast in the details (i.e. not microoptimized), but it doesn't have inefficiencies in the large (e.g. it doesn't allocate or traverse anything supralinearly)
<sb0>
whitequark, if you have two pointers in the mailbox instead of just one, I think you can easily make a FIFO with storage in RAM.
<whitequark>
rjo: you can't not bother about caches
<whitequark>
at least, you have to flush the cache after serializing every RPC
<rjo>
yes.
<whitequark>
how does that make the FIFO simpler?
<rjo>
or have some non-cached inter-cpu DMA arena.
<whitequark>
non-cached arena would mean that every write to that arena does a roundtrip to SDRAM, right?
<whitequark>
that sounds very bad
<rjo>
then the comms cpu would not have to bother with the inner structure of the rpc.
<whitequark>
it makes no difference who bothers with the inner structure of the RPC
<whitequark>
well, complexity-wise
<whitequark>
it will be the same Rust code but running on a different core
<rjo>
but only once you have serialized, the kernel cpu can mess with that original data again.
<sb0>
whitequark, the FIFO is only wanted for kernel CPU -> comms CPU
<sb0>
the FIFO would contain messages that are all of a certain size, say, 1KB
<rjo>
i.e. background_rpc(array); array[7] = 9;
<whitequark>
rjo: or even returning from the current function, because the array might have been allocated in the current frame.
<sb0>
serialization fitting into one message is a condition for using background RPCs. otherwise it falls back to a blocking behavior.
<rjo>
yes. isn't this a reason to do the serialization on the kernel cpu?
<sb0>
then the "mailbox" simply contains produce/consume pointers/indices that address that message FIFO.
<sb0>
write to the FIFO: fill in empty message slot in SDRAM (note that the cache is write-through), then increment produce pointer
<whitequark>
yes, with this restriction it is not hard to implement
<sb0>
read from the FIFO: invalidate caches, process messages, increment consume pointer
<whitequark>
then that is only blocked on Rust
<sb0>
when serialization doesn't fit, put a pointer to the serialization into a message, and wait for an ack from the comms CPU.
<whitequark>
naw
<whitequark>
just fall back to the non-background RPC pth
<sb0>
which is what it does
<whitequark>
I would serialize directly into the message slot
<sb0>
but how do you know in advance if it fits?
<whitequark>
I wouldn't
<whitequark>
I'll optimistically assume that it does and bail out if it doesn't
<whitequark>
with your scheme I will need some sort of temporary buffer for serialization
<whitequark>
a) how large?
<whitequark>
there is currently no limit on RPC size
<rjo>
even serializing twice for those cases would be fine by me.
<whitequark>
you can just as well transmit the entire main RAM because the comms CPU doesn't store the serialized data anywhere, it directly transmits that
<whitequark>
b) that means one extra copy on the fast path
<whitequark>
which seems weird.
<rjo>
but still, also in the fallback case the kernel cpu should not have to wait for the rpc return from the host.
<whitequark>
sure
<whitequark>
why would it
<whitequark>
there's no return
<whitequark>
we will need some sort of performance counter that tells you when too many background RPCs fall back.
<rjo>
yes. so the only features that we would need are the RAM message slots + a ringbuffer for their handles. and rpcs without return.
<rjo>
or maybe the rpcs could be "partial" and then assembled on the host.
<whitequark>
"partial" ?
<rjo>
then the parts would always fit.
<rjo>
just fragment them.
<rjo>
without the comms cpu knowing about it.
<whitequark>
that's extremely complex
<rjo>
start serializing until you have filled a fragment, send, start the next fragment. have the host assemble the fragments into one rpc.
<whitequark>
I also need to put the final length of transfer somewhere, handle the case where the RPC is larger than the entire buffer, ...
<sb0>
as I understand, background RPCs would be used essentially to transmit small amounts of data each time
<rjo>
sb0: why? i would use them to transmit large amounts as well.
<whitequark>
rjo: that doesn't bring you any benefit, assuming the large amount >> fifo size
<whitequark>
you'll just block on fifo instead of blocking on comms CPU request
<sb0>
in what case? why not transmit your large buffer with several smaller calls?
<rjo>
whitequark: can't you just do that when you hand the fragment over to the comms cpu? fill a fragment, hand "partial RPC" message over to the comms cpu, fill next fragment, hand over "full RPC" message.
<rjo>
when the buffer is full you have to stall anyway.
<whitequark>
rjo: i hate the word "just".
<rjo>
whitequark: ha.
<whitequark>
no, i can't "just" do that. there's a zillion edge cases to handle
<rjo>
ok. without the "just".
<sb0>
also, the current RPC is particularly inefficient with small data, but with large data it gets better
<rjo>
whitequark: fine. i'll leave it to you ;)
<whitequark>
can i write a fragmentation engine for RPCs? sure. is it worth the hassle? I really doubt so
<whitequark>
and it will take a lot of time for sure
<rjo>
well one thing this would help with is that the comms cpu could be made busy with txing while the kernel cpu could be busy with serializing the next fragment.
<rjo>
but anyway. i am happy to leave it to you.
<whitequark>
if we would go the route that you want then I think we should just ditch RPCs in the main TCP stream entirely
<whitequark>
add another channel that's dedicated to kernel CPU, have it send UDP datagrams with one fragment per datagram
<whitequark>
otherwise we have a whole lot of coordination overhead (TCP overhead plus core communication overhead) for no good reason
<whitequark>
so that would be a (logical) FIFO between the host machine and the kernel CPU
<whitequark>
ok, then I will start converting ksupport to Rust monday
<whitequark>
AssertionError: 0.046450169200000009 not less than 0.015
<whitequark>
that's quite interesting.
<rjo>
over ssh?
<whitequark>
no
<whitequark>
that's on the buildbot, in low latency mode
<whitequark>
I think the problem is I'm calling into lwip too much instead of buffering.
<whitequark>
I assumed, when I am not using the |MORE flag, it will implement Nagle properly
<rjo>
but 30ms is a lot.
<whitequark>
i.e. just buffer the writes somewhere.
<whitequark>
when I am using* rather
<whitequark>
oh, hm: TCP_WRITE_FLAG_MORE (0x02) for TCP connection, PSH flag will not be set on last segment sent
<whitequark>
what does that do anyway?..
<rjo>
aaaaah. i think i crashed the scope.
<whitequark>
you did. I've rebooted it.
<rjo>
whitequark: o you are there?
<whitequark>
yes.
<whitequark>
I need to disassemble one of my HPLC pumps and take photos of the part that's broken
<rjo>
could you just describe whether there was anything interesting in the traces? or was it all boring?
<whitequark>
there wasn't anything at all on the screen.
<whitequark>
and it was in the "WAIT" state
<rjo>
whitequark: ok. if you have a minute, could you fiddle with it and see whether there is something coming out of the DACs? should be a ~100 MHz oscillatory stuff at maybe a volt amplitude.
<whitequark>
flat line on both channels
<whitequark>
or do you mean I should plug it somewhere else?
<rjo>
any of j2 j4 j5 j17 on the ad9154-fmx-ebz
<whitequark>
ch1 is connected to j4 and there's nothing on it
<whitequark>
sb0: by the way. do you know where the blue screwdriver is in the lab?
<whitequark>
or any screwdriver
<rjo>
whitequark: can i bother you with another reboot of the scope?
<whitequark>
rjo: sec
<whitequark>
done
<rjo>
whitequark: thx
rohitksingh_wor1 has quit [Read error: Connection reset by peer]
<sb0>
whitequark, the translucent one? in one of the top drawers in the cabinet near the argon bottle
<whitequark>
thx
<whitequark>
sb0: wow, what an inefficient use of space
<whitequark>
oh well.
<rjo>
_florent_: the SYNC machinery is busy all the time. does it get out of that state for you?
<rjo>
larsc: is it a problem if sysref has 50% duty cycle? (and not something much smaller like in your drawing)?
<sb0>
e.g. if you have submodules that contain the elastic buffer clocked on one side by "gtx_clock" and another by "user_clock", the ClockDomain "gtx_clock", and the transceiver
<sb0>
then CD resolution will rename all those gtx_clock's. the only downside is you cannot use them outside the submodules.
<sb0>
but you can use "user_clock" just fine, since it doesn't have a ClockDomain in the submodules.
<_florent_>
ok thanks, I'm trying to get the QPLL working now, I'll look at that after
<sb0>
this system btw does not force you to put all (transceiver/elastic buffer/etc.) into one single submodule, the CD renaming happens at the first module that has submodules defining the same clock domain
hozer has joined #m-labs
<sb0>
_florent_, why is the "produce square wave" signal 2-bit long and called "pattern_config"?
<sb0>
what other patterns are you planning to have?
<sb0>
does GTXChannelPLL really need to be a class?
<sb0>
it seems OOP just makes things worse here. you'd be better off with pure function return values.
<_florent_>
sb0: yes not sure I'll add others pattern, I'll probably rename this
<_florent_>
sb0 for the GTXChannelPLL, it's to be similar to GTXQuadPLL
<sb0>
hm, ok, but you should probably use functions there too
<_florent_>
but I need a least a GTXQuadPLL for the GTXE2_COMMON instance
<sb0>
or, you might want to use a class as a "bag of related pure functions"
<sb0>
but I don't see why you'd pass what should be function return values using object attributes
FabM has quit [Quit: ChatZilla 0.9.92 [Firefox 45.3.0/20160802213348]]
<sb0>
is the compute_* family of function called outside the constructor?
<_florent_>
GTXChannelPLL and GTXQuadPLL also have refclk/lock signals that are connected differently to the GTXE
<_florent_>
GTXE2_CHANNEL depending it's a CPLL or QPLL
mumptai has joined #m-labs
<sb0>
yes I see that
<sb0>
I'm just trying to avoid a code style where there is unwarranted state and side effects
<_florent_>
yes no problem, if you see a better solution I'm ok to implement it
<sb0>
I'd make compute_* static methods
<sb0>
n1/n2/m become simply local variables of compute_config, and arguments of compute_freq/compute_linerate when they need them
<sb0>
if a class method doesn't really need self, like compute*, you should make it @staticmethod and it can be called externally as GTXChannelPLL.compute_* without altering the state of any object
<_florent_>
ok but are you ok to keep the two GTXChannelPLL and GTXQuadPLL classes?
<_florent_>
or do you want to do that in GTXTransmitter?
<_florent_>
bbl
<sb0>
those two classes are fine afaict
<sb0>
like I said, using classes as bags of related pure functions is fine
<sb0>
what I was complaining about was the excessive state and side effects
<_florent_>
ok no problem, I'll change that then
<sb0>
thanks!
<rjo>
_florent_: soe does SYNC_BUSY stay high for you as well?
<rjo>
*so
<_florent_>
rjo: it seems yes
<rjo>
_florent_: and SYNC_LOCK how?
<rjo>
_florent_: in the datasheet it says SYNC_LOCK is required to proceed.