sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
Ultrasauce has joined #m-labs
FabM has quit [Ping timeout: 255 seconds]
FabM has joined #m-labs
<GitHub86>
[artiq] sbourdeauducq commented on issue #748: @jbqubit Regarding TTL/SAWG latency matching, you are welcome to offer a constructive comment and/or funding in #40. Sure, it is a feature we think is desirable, otherwise #40 would not be open. https://github.com/m-labs/artiq/issues/748#issuecomment-307987938
<GitHub52>
[artiq] jordens commented on issue #748: Please read the current documentation of SAWG (again). Are you satisfied that you have understood the way the DDS/DUC and the phase accumulator clear work? Especially in the light of #744 and #745... https://github.com/m-labs/artiq/issues/748#issuecomment-308009353
<hartytp>
rjo, sb0: in our discussions about the servo, one thing that's come up a few times is resource usage on the Kasli XC7A50T FPGA.
<hartytp>
I get the motivation for keeping Kasli simple/cheap, but if resource sage is likely to be an issue, should we consider going for a 100T or bigger?
<hartytp>
Is the cost difference that large?
<hartytp>
If we're worried about the FPGA for the (pretty basic) servo, then what about things like the proposed Camera-link board, which will have to do some image processing? Won't that be much worse?
<rjo>
hartytp: greg gets th 50T at a steep discount. there is a pin compatible 75T as well. for the 100T we'd have to spin another board.
<hartytp>
How big an FPGA would you have to have to not worry about the servo?
<rjo>
hartytp: "some image processing", i.e. ROI, binning as described on the wiki is pretty simple in terms of logic. i expect that to be significantly smaller than the servo.
<sb0>
I wouldn't worry too much about resource usage. also RTIO can be optimized...
<hartytp>
From the quote, it seemed that worries about resource usage were driving the development costs for the servo up
<rjo>
hartytp: my guess would be that with a 50T the chances of it working "fine" with an acceptable amount of optimizing are >80%, with a 75T 95% and with 100T 98%
<hartytp>
quote/email chain
<rjo>
hartytp: but that's really just a guess.
<hartytp>
so, is the cost difference really that high? It would be a shame if we find that we find that we can't do projects like the servo because Kasli's FPGA is too small
<hartytp>
Can Greg get a good discount on the FGG484?
<hartytp>
That has the 50/75/100
<sb0>
I don't think the 50T would really limit it, may just take a bit more work
<sb0>
also the kasli should fit the low-end experiments, since we don't support the pipistrello anymore
<rjo>
hartytp: one factor is the number of DSPs. that already asks for the sequencers and quite a bit of optimization as even the 100T does not have enough multipliers to do it without sharing.
<hartytp>
also, remind me, what drives the resource usage? DRTIO stuff? Artiq itself? DSP slices for the loop filter?
<rjo>
hartytp: ah. yes. the 50T/75T/100T all are pin compatible in fgga484.
<rjo>
hartytp: i misremembered the packages for the 100T.
<rjo>
hartytp: and we have the 50T in fgg484 currently planned. if that doesn't pack the logic, you could just go for the 100T.
<hartytp>
Re servo costing: would it help if, as a risk mitigation strategy, we agreed that if we can't get the servo to work in a straightforward way on the 50T (e.g. without excessive optimisation) then we either:
<hartytp>
(a) drop from 16 channels to 8 channels per Kasli
<hartytp>
(b) only support the 100T
<hartytp>
Given the small quantities of hardware we're currently planning to build, it would be easy to spend a lot of software development time optimising to save a relatively small amount of money on hardware
<rjo>
hartytp: both options are fine with us.
<hartytp>
me too. How does that affect costs/time?
<hartytp>
FWIW, our current 1MSPS noise eater fits comfortably on C7Z010-1CLG400C (can dig out details if that helps)
<sb0>
rjo, iirc you looked closely at the rtio resource usage on pipistrello - how much did the FIFOs take?
<rjo>
hartytp: i understand your perspective on the hardware cost. but keep in mind that this is different for others who would like to get going at a much lower entry barrier.
<rjo>
sb0: i don't remember.
<hartytp>
sure. I do get that, and am keen to keep it cheap. I currently don't have a feeling for what the 50T/100T price difference is when applying relevant discounts, so I can't assess the trade-offs...
<rjo>
going from 16 to 8 channels or just supporting the 100T are for free.
<hartytp>
rjo: would having that as a contingency plan, and agreeing that we won't spend too much time optimising the servo design, allow us to reduce the cost compared with the quote you sent me?
<rjo>
hartytp: it's ~70 bucks difference... maybe 100 difference with the delta in rebate.
<hartytp>
And, Kasli will be ~$500?
<rjo>
hartytp: around 400-1000 maybe.
<hartytp>
okay, so we're talking a max cost increase of 10%-20% the core component on the board.
<rjo>
hartytp: having those two options helps a bit. but the biggest drivers are still the custom ADC and DDS interfaces and having both extremely tight coupling of ADC-to-DDS and RTIO control over many aspects of those components at the same time. i'd guess that your noise eater did not face either of those challenges.
<hartytp>
our noise eater did have a "custom ADC/DAC interfaces" but they were basically just shift registers + some glue for CNV_START etc. Maybe 10 lines of verilog each
<rjo>
hartytp: e.g. i just realized yesterday that for the source synchronous clocking of the ADC data we'd want to have SCKO on a clock capable pin. that now requires changes to Kasli and the EEM definition. Sayma/Metlino will also probably need changes in their next revision.
<hartytp>
rjo: ack. that was a good spot. thanks for that
<hartytp>
Maybe I'm being a bit simplistic, but I'm thinking about the code we had for our AD9910 DDS boards, and the code we had for our previous noise eaters
<rjo>
hartytp: then you were fortunate with the data clocks on both sides. we don't expect to be able to do that with the data rate/latency you'd like to see.
<hartytp>
Not so much fortunate, as we chose sensible values and set up the PLLs accordingly. But then, we didn't have extra constraints like the RTIO clock (is that the issue here?)
<rjo>
hartytp: your serial clocks were the same on ADC and DAC?
<hartytp>
IIRC, eventually yes that's what we did. But, initially, no. We had three clock domains: ADC, loop filter, and DAC
<hartytp>
and lost a few clock cycles to CDC
<rjo>
hartytp: with the same data rate though?
<rjo>
sample rate
<hartytp>
yes
<hartytp>
same as here
<hartytp>
IIRC, we did ADC + loop filter in ~1us, and DAC update in ~1us (ADC done triggers loop filter). loop filter was clocked at 100MHz. 8 channels, no sharing of resources between channels
<hartytp>
everything done in the dumbest possible way
<hartytp>
and resources weren't an issue
<rjo>
hartytp: would be nice if you could post more details about that. then let's see where we can save.
<hartytp>
what details?
<hartytp>
do you want the Vivado resource usage outputs? Details about implementation? Source code?
<hartytp>
For the new servo, I'm happy to fix the RTIO clock to 125MHz if (a) that's a reasonable choice and (b) that helps
<hartytp>
and fix the AD9910 SPI clock to 62.5MHz
<hartytp>
jitter on the ADC to RTIO CDC it fine
<hartytp>
is
<rjo>
hartytp: do you have a paper? can you publish the source/hardware?
<hartytp>
no paper
<hartytp>
happy to send you schematics
<hartytp>
(although, they're effectively a prototype of Novogorny + a DAC)
<hartytp>
the source we can probably send you, but it has chunks that were written by a master's student learning Verilog for the first time, so it may make you angry to read it
<rjo>
hartytp: if you run kasli as a drtio slave, that will also be the global rtio clock in your experiment.
<hartytp>
ack.
<hartytp>
For the time being, we’re committed to that anyway for the SAWG w/ 2GHz DAC_clkc aren’t we?
<rjo>
yes. as a sidenote, it would be really nice to be able to get funding from somewhere to be able to try to get the coarse RTIO clock up to 250 MHz.
<hartytp>
Faster CPUs etc? Yes, that would be nice.
<rjo>
send it. physicist code doesn't scare me. i have vivide memories of writing that myself.
<hartytp>
The simplistic picture I have for the servo is: sequencer is just a counter from RTIO coarse clock that triggers everything.
<rjo>
not necessarily cpu but RTIO clock first.
<hartytp>
sequencer triggers ADC module, which is just a shift register
<hartytp>
when all channels are read in, this generates a trigger
<hartytp>
that goes through CDC to loop filter/IIR module
<hartytp>
pick cycle time so that's gaurenteed to be done by the end of the servo cycle
<rjo>
yes. first triggers conv, then triggers shifting (source-synchronous, different CD), then data is CDC'ed and ends up in memories.
<hartytp>
right. but the FSM for that is trivial
<hartytp>
"memories" just a 16-bit register, right?
<rjo>
then the IIR is muxed over the ADC inputs, the setpoints, the coefficients, the current IIR state (all in BRAMS).
<hartytp>
then, trivial DDS module that takes FTW, POW, ASF and writes to the spi
<rjo>
hartytp: those should end up being BRAMs.
<hartytp>
all that's needed is a bit of extra glue to hook that up to the RTIO
<rjo>
... the memories for the ADC values and everything.
<hartytp>
why?
<rjo>
well. then the hard part starts. there is a third (read) interface to the ADC values (from RTIO).
<hartytp>
for readback you mean?
<rjo>
because the design for that part should be similar to the setpoint, coefficient, channel matrix, etc.
<rjo>
yes.
<hartytp>
okay, I haven't looked at the rtio interface in ARTIQ yet.
<hartytp>
but, if the ADC presents its output as a register, along with a data_valid flag, is it really non-trivial to read that back over rtio?
<rjo>
it simplifies the design if the IIR only needs to MUX over BRAMs and not some mixture. then the sequencer just drives the addresses of the brams.
<rjo>
but yes. vivado may decide that a bram for this would be a waste and do a LUT RAM.
<rjo>
but it's a RAM nonetheless.
<hartytp>
okay, but where is the hard part? Thinking about how we did it before/what I've just sketched out, the ADC + DDS servo is pretty tribial without the RTIO bits? So, the only hard part I can see is getting the setpoints/gains/FTW from RTIO to the IIR.
<hartytp>
But, even that doesn't seem too hard, right?
<hartytp>
trivial
<hartytp>
I can see room to make this complicated by trying to make things too flexible (e.g. arbitrary RTIO/DDS/ADC frequencies) or by trying to be too clever with resource sharing. But I don't think we need to do that
<rjo>
hartytp: no this won't be arbitrary. but we don't even know for certain what frequency we can achieve for the ADC and the DDS.
<hartytp>
You're worried about signal integrity over the ribbon cable for a 62.5MHz SPI clock?
<rjo>
hartytp: we have zero experience with those long LVDS connectors and fast serial clocks over those lines.
<rjo>
hartytp: yes. and for the 125 MHz or 250 MHz ADC clock.
<hartytp>
okay, I'm optimistic about that, but I take your point.
<rjo>
then i am worried about designing that spi-command builder for the dds. it has inputs from the RAMS with the POW/FTW profile storage which need to be properly interlocked. and it has inputs from the IIRs (RAM for the filter state).
<rjo>
now if we share the IIR over the channels, that will simplify and speed up things and reduce resource usage significantly there. but the DDS interface transmits in parallel. so there needs to be a sequencer there that extracts the data from the RAMs and stages all 64 bit profile words before they are transmitted in parallel.
<hartytp>
"properly interlocked" can you elaborate on that, please?
<rjo>
can you live with FTW updates ending up in different IO_UPDATE cycles? we discussed the order preserving but i don't think we answered that specific question.
<hartytp>
So, my simplistic pciture of this is: the DDS model has FTW/POW/ASF inputs and a trigger line. It then just shifts the outputs to each channel in parallel, along with appropriate IO_UPDATE etc. That's pretty easy.
<hartytp>
The trigger is generated by the sequencer counter.
<hartytp>
We feed the IIR output directly to the DDS module input
<hartytp>
By design (choice of sequencer cycle time) the IIR output is gaurenteed to be valid before the DDS module is triggered
<rjo>
please don't postulate that things are pretty easy or trivial.
<hartytp>
Sorry. I'm trying to understand the issues better by building a straw man, and figuring out where i'm wrong
<hartytp>
well, not a straw man, but a simplistic picutre
<hartytp>
but, I don't see how, for example, a DDS module like I described above is not-trivial. I've written that before for the AD9910.
<rjo>
but if you write FTW0 and (because you can't write them at the same time) POW0, they might end up in different SPI transfers. is that ok? also, since you can submit them at any time, also while transferring, there needs to be another buffer stage between all profiles POW/FTWs and the DDS interface.
<hartytp>
right, I can see that's more of an issue.
<hartytp>
My assumption is that that's fine.
<rjo>
if you don't double-buffer that (per channel and profile), you will get garbage frequencies.
rohitksingh_wor1 has joined #m-labs
<hartytp>
I assume we latch the FTW/POW etc at the start of the DDS write cycle
<hartytp>
(RTIO writes are atomic, given that the DDS logic is clocked from the RTIO clock, right?)
<rjo>
that's the double buffering i mentioned. and you need to get all those out of the profile RAMs before you start shifting (or inject at the right point).
<rjo>
yes. they are atomic. but the DDS SPI shifting would collide with that.
<hartytp>
yes
<hartytp>
collide in what sense
rohitksingh_work has quit [Ping timeout: 255 seconds]
<rjo>
and profile switching could also happen mid-way through the SPI transfer, that could lead to FTW of one profile and ASF of another being used. that also needs interlocking and/or staging before the shift register.
<hartytp>
I assume that we latch the active profile/FTW/POW at the start of the ADC read cycle to prevent that
<rjo>
you will probably agree that once everything has been discussed at length and the issues been thought through and solutions are agreed to, everything becomes trivial.
<hartytp>
okay
<rjo>
you can wait until after the ADC sampling with dragging around the active profile.
<rjo>
but since you only drag around a handle to the data, you can still see the FTWs being used at different points in time than the setpoints. that opens it up for data races.
<hartytp>
"handle to the data"? In the gateware? Isn't the point of latching/buffering everything at the beginning of the ADC cycle to prevent this?
<rjo>
to properly do this you'd have to get everything out of the RAMs and buffer/stage it during the ADC sample/transfer stage. then process it and write back IIR state at the time you are doing the transfer.
<rjo>
the active profile is a "handle to the setpoint/coefficient/ftw/pow" data.
<hartytp>
okay, again, this is worrying about readback?
<rjo>
this is worrying about consistent data. because at the other end of everything you'd still be allowed to write to the profile's data.
<rjo>
and if this buffering really needs to be atomic w.r.t. to the writes on the other side, they all need to be done in parallel over all channels and all ftw/coefficients/setpoints/pow
<rjo>
that would kill the idea of sharing the profile storage logic across channels.
<rjo>
storage+logic
<rjo>
if you say that the data in the active profile and across the channels does not need to be read/used/accessed atomically we can work with that again.
<rjo>
note that proper synchronizing or scheduling of updates to coefficients/profiles/setpoints/configurations are usually not within the scope of "regular" digital servos like the NIST one and maybe yours as well.
<rjo>
hartytp: and then there are things like the RF switch -> integrator activation timer. certainly all doable. it's just work at the gateware level, at the RTIO level, at the software level, at the documentation level, at the testing leven.
<rjo>
hartytp: and if you absolutely want to compare it with master student code, then software/api, documentation, testing is probably where you will find the biggest difference. deficiencies in those areas tend to have lasting effects and correlate directly with code survival rate.
<rjo>
hartytp: also, you mentioned cjbe was worried about resource usage. yet you are not. where did his worries arise?
rohitksingh_wor1 has quit [Read error: Connection reset by peer]
<rjo>
sb0, whitequark: can we do something about the artiq breakage?
<whitequark>
I'll try to set it up via conda and debug...
<whitequark>
today that is
FabM has quit [Ping timeout: 240 seconds]
FabM has joined #m-labs
rohitksingh has joined #m-labs
<rjo>
whitequark: thanks!
<_florent_>
toor
<_florent_>
sorry
rohitksingh has quit [Quit: Leaving.]
rohitksingh has joined #m-labs
<hartytp>
rjo: "also, you mentioned cjbe was worried about resource usage. yet you are not. where did his worries arise?"
<hartytp>
that was w.r.t. running Kasli as an ARTIQ master
<hartytp>
taking a step back, we're agreed that we need the Kasli/Urukul servo, or something very like it.
<hartytp>
And, we would prefer to pay you to do it, rather than do it ourselves.
<hartytp>
but, we have to balance the cost of getting you to do it versus e.g. hiring a post doc to work for us.
<hartytp>
the cheaper quote you sent us is still about 1/2 year of post doc funding.
<hartytp>
I'm trying to check:
<hartytp>
(a) have I put something in the specification which is pushing the price up a long way, and which we don't need (e.g. can we reduce the flexibility slightly, or go for a bigger FPGA)
<hartytp>
(b) have I underestimated the complexity involved in this in general.
<hartytp>
So, it's been useful to talk this through a bit.
<hartytp>
and, yes, I respect that support and documentation are a major driver of cost
<rjo>
hartytp: don't you have to pay overhead on post doc salaries?
<hartytp>
that is with overhead :(
<rjo>
hartytp: for us there are still taxes and overhead coming off of that. so in the end this is not even near half a person-year of our funding...
<hartytp>
ultimately, I'm trying to check I understand how much work is involved in this, to gauge what I expect the cost to be
<hartytp>
e.g. this seemed simpler than the previous Sayma servo, so we expected it to be cheaper to develop
<rjo>
as i see it right now (and i think that is reflected in the amounts) the driving factors are equally distributed in any domension one can look at it: components (ADC, IIR, DDS, profiles, RTIO), gateware/software/docs/testing/support, resource/timing/hardware risks.
<hartytp>
okay. Well, if risk is a concern then we can could split into separate consultancy/design and implementation contracts
<rjo>
when comparing it with the previous sayma servo and summing up the features and the amount of work in either, they seem very comparable to me.
<rjo>
hartytp: right. that's why spelled it out into the first point and then the others. given the fast turnaround for contracts that is a very nice option.
<rjo>
and if you do it incrementally, i can see the price for the other items dropping precisely because of the risks becoming more manageable.
<hartytp>
okay
<rjo>
and if you guys can operate in a way (by stepping up and having local expertise) where we don't have to do much less support and handholding, that would be another point.
<rjo>
*...where we'd have to do much less...
<hartytp>
ack. Although, I think we've been fairly good on that front so far...
<rjo>
hartytp: yes. that's also why we are happy to spend that much time planning, discussing, and tweaking the designs without having the contract fixed.
<hartytp>
"For this item, as well as for "RTIO interface" and item 5 below we have to account for the risk that may have to do a lot of squeezing and optimization to get this into Kasli. That includes possible side-effect squeezing on the existing gateware around the CPUs, the RTIO layers, DRTIO etc. If all works well and if it turns out that we don't have to fight there, that might save 3k€ distributed over those three points."
<rjo>
hartytp: let's do the money stuff in private
<hartytp>
sure
Hopsig has joined #m-labs
Hopsig has quit [Ping timeout: 260 seconds]
hartytp has quit [Quit: Page closed]
<sb0>
sigh, CERN has FPGA jobs over 8k€/month ... and so do many companies.
<jbqubit>
What should have been my indication that timing wasn't met?
<whitequark>
jbqubit: not sure what crashed it but it's up now
<rjo>
jbqubit: there are no packages. i.e. the age of the package.
<jbqubit>
rjo: I don't understand what you meant at 12:25
<jbqubit>
rjo: rjo: other question... when I use self.sawg1.frequency1.set(200*MHz) I got huge close-to-carrier spurs. What is the expected performance? I can't tell you numbers right now as the KC705 is not configured. I recall seeing > -10 dBc spurrs < 50 MHz from carrier.
<rjo>
jbqubit: there are no packages uploaded with broken timing. you installed an old package.
<rjo>
jbqubit: the frequency1/2 oscillators only run at RTIO coarse frequency (mentioned so many times). spurs at 50 MHz are expected for a 200 MHz ask...
<rjo>
jbqubit: but i'll add that to the documentation there as well...
<jbqubit>
Where is this "mentioned so many time"?
<jbqubit>
It would be helpful if the documentation included sufficient information on implementation so I know what you're talking about.
<rjo>
that was mentioned and discussed at least a dozen times and mentioned so many times in the discussions of the design. it's also in the specification, the contract, the wiki.
<rjo>
it would be good if you read the contracts, the wiki, the specification.
<rjo>
jbqubit: we spent so much time discussing and preparing specification, wwiki pages, etc for this (and the phase modes, and the data rates, and the way these things work). it would be a shame if that was all for nothing.
<jbqubit>
Please link to the page you want me to read.
<jbqubit>
The present line of discussion helps me understand which of the several possible implementations you've coded.
<jbqubit>
The concept of "coarse RTIO" frequency is not documented nor how it relates to mu. As I gather each PHY may have its own PHY-dependent "coarse RTIO" frequency.
<jbqubit>
Please document coarse RTIO divider for SWAG.
<GitHub142>
[artiq] dleibrandt commented on issue #672: Spent some time trying to come up with a simple way to reproduce the problem, and learned that it is only present when the SPI lines are connected to my slave device. Further testing revealed that the SPI lines coming out of the kc705 were actually OK, but previously I had only measured the SPI lines after an isolator chip on the TTL breakout board. The isolator chip outputs were dropping out because