<rjo>
sb0: correct. once you have synchronized synclk at the fpga by shifting resets and with the knowledge of the synclk-to-fud round trip you get valid fud timing and ddses synchronized (at the fpga that is).
<rjo>
if you want to synchronize them at their outputs, we need matched synclks-to-fpga.
<sb0>
yes, the synclks do need to be matched. and how well they are matched will determine how well we can sync those DDS
<sb0>
the matching needs to be tight if we want to sync them within one 3.5GHz clock cycle
<rjo>
i remember getting flash errors before i relaxed the spi timing (in the soc) and vaguely remember seeing very rare flash errors with xc3sprog but they were never reproducible and rare.
<rjo>
s/and rare//
<sb0>
I've emailed Jack about it
<sb0>
maybe he's aware of the problem
<sb0>
we don't care about the reset length though, as the FPGA would scan the timings anyway
<sb0>
until the synclks are matched.
<sb0>
the way I imagine this is similar to DDR3 write leveling
<rjo>
the usb forwarding on these vms is quite a bit slower than native. i would imaging it is an interaction between the proxy bitstream's notion of spi and the spi timings of the flash triggered by data hickups..
<sb0>
the FPGA has an internal reference SYNC_CLK that is phase-locked to the 3.5GHz SYS_CLK
<sb0>
and the reference SYNC_CLK is used to sample the incoming per-DDS SYNC_CLK
<sb0>
the FPGA then sends resets with a different delay wrt the internal SYNC_CLK
<sb0>
and stops right when it hits the first e.g. 0->1 transition in the sampled SYNC_CLK
<sb0>
we don't even need a TDC for this
<rjo>
sb0: yes. i think we are just expressing the same constraints in different variables.
<rjo>
you always stress that you don't want to sample on an edge ;)
<sb0>
by the way, what is going to be the RTIO clock with this new DDS system?
<sb0>
with the ad9858 stuff, it was easy - 125MHz for SoC, and 1GHz (exactly 8x) for DDS and RTIO SERDES (and DDR3 data)
<sb0>
we could keep things synchronous
<rjo>
yes. 125MHz for the soc would be nice.
<rjo>
you mean the 9914 dds clock?
<rjo>
it has sync_clk*24=sys_clk, right?
<sb0>
yes... it's nice to have those clocks phase-locked and with a power-of-2 frequency ratio
<sb0>
but I guess it won't be the case anymore with the 9914
<rjo>
so far we used the new dds only because of its wider ftw not because of the faster clock.
<sb0>
if we use 3GHz for SYS_CLK, we can keep 125MHz SoC and SYNC_CLK
<rjo>
at ETHZ we overclocked them to 2**30 Hz because then granularity is just 0.25 Hz...
<rjo>
yes. i think 2 or 3 GHz will be the dds sys_clk.
<sb0>
and there will be 3 samples per RTIO cycle
<sb0>
2GHz will result in a 83 1/3MHz frequency for SYNC_CLK...
<sb0>
it would probably make sense to go asynchronous RTIO instead of trying to get the DDR3 to work on a multiple of that frequenc
<sb0>
y
<rjo>
sb0: have the rtio fifos be asyncfifos.
<sb0>
I'd try 3GHz + sync. if the sync fails, the result will be the DDSes will be off by a coupld samples wrt each other... which may not be a big problem
<rjo>
yeah. looks like 2ghz will be experimentally inconvenient. too close to qubit frequencies. so probably 1 or 3ghz
<sb0>
it's not that simple, we need to support the "replace last FIFO entry" operation to implement pulse merging
<rjo>
ah. yes.
<sb0>
additionally, we need instant feedback on underflows so that exceptions can be precisely raised. having the counter in another clock domain complicates that.
<rjo>
sb0: there could be a lockout mechanism that runs at cpu freq, buffers that "last FIFO entry" and commits to FIFO a few cycles before the deadline.
<sb0>
yes, or keep the FIFO synchronous and use some other clock crossing mechanism
<rjo>
a lockout window for pulse merging.
<rjo>
ack
<sb0>
either way, replace support and instant underflow detection are not going to be straightforward with multiple clock domains
<rjo>
no i think 3ghz is in fact experimentally very convenient if there is not too much feedthrough of the clock itself. so lets just assume and push for 2ghz.
<rjo>
s/3ghz/2ghz/ and then reverse my statement. push for 3ghz.
<rjo>
2ghz would mean a particularly large and useful 1st nyquist image at 2ghz - f.
<rjo>
so lets go with 3ghz that image at 3ghz - f around 2ghz should still be very useful.
<sb0>
ah, sync_clk is single ended
<sb0>
we may want to put a differential buffer close to it
<sb0>
then send that to the backplane, and mux it on the backplane with matched traces to each DDS
<rjo>
yep
<rjo>
man. your refactoring is making having to re-discover the design each time i look at it ;)
<ysionneau>
according to the M1 RC3 schematics it corresponds to DQ[7:0]
<ysionneau>
and dm[0] dqs[0]
<ysionneau>
ok I've got some ISim (/fuse) simulation working, but I don't inject the wishbone reads yet
<ysionneau>
that's when you regret the cool Migen+iverilog combo doing your simulation for you
<ysionneau>
now I need to write the wishbone transactions by hand ... :(
<_florent_>
you can also simulate OSERDES2 with Migen+iverilog
<_florent_>
you just have to compile OSERDES2 model like you have done for the Micron model
<ysionneau>
ah right
<_florent_>
that's only for Kintex7 that it's not possible since OSERDESE2 are use secure-ip...
<ysionneau>
first I thought it was ciphered
<ysionneau>
but then I saw it was not but I was already on track for the ISim stuff :p
<ysionneau>
let's go back to iverilog then!
<_florent_>
I not able to find the issue just by looking at the code
<ysionneau>
ok, thanks for having looked at it :)
mumptai has joined #m-labs
<ysionneau>
pfew at last I have some simulation working with the DDR model + xilinx components
<ysionneau>
it was a bit weird to generate a non "sys" clock at 50 MHz from the top level to then feed the mxcrg which will generate the sys_clk (83,3 MHz) from some PLL stuff
<ysionneau>
to plug everything together was not plain simple for me
sb0 has joined #m-labs
<ysionneau>
sb0 hi!
<sb0>
ysionneau, you really can't tie DM to 0. otherwise any byte access from the CPU access will write two bytes instead of one.
<ysionneau>
sure
<ysionneau>
now I don't do that anymore
<sb0>
why don't you copy the code from lasmicon?
<ysionneau>
I didn't understand every bit of lasmicon yet
<ysionneau>
it's a lot bigger a more complex than my small controller
<ysionneau>
for wrdata_mask it is taking the lasmic bus we signal
<sb0>
I think that driving DQM during reads as well doesn't cause any problem (iirc)
<ysionneau>
if you drive it high you get nothing out of the dram I think
<ysionneau>
it sets dq to hi-z
<ysionneau>
confirmed by sdram simulation which then stops working
<ysionneau>
well sorry I just understood what you meant
<ysionneau>
indeed I guess just driving it to ~bus.sel all the time should work
<ysionneau>
let's try that
<ysionneau>
last time I tried I must have done something like .eq(bus.sel) without the ~ which would explain my hi-z issues
<sb0>
bah, you need to slice bus.sel obviously
<sb0>
like for data
<sb0>
ah, sorry, it's sliced
<sb0>
and you need to send it at the same time as the data (at least). it's ignored during the write command (unless a previous burst was already going on, but your ctl doesn't do that)
* sb0
is sick today :(
<ysionneau>
arg :(
<ysionneau>
that's why you're up so early?
<sb0>
no, I'm in SF right now, and enjoy a combination of jetlag and some sort of cold that manifested itself shortly after arriving
<ysionneau>
:/
<ysionneau>
what are you doing in SF?
<sb0>
visiting folks and trying to find someone to help design an excellent artiq gui. not much luck with the latter so far...
<ysionneau>
hope you will find some talented UI guy!
<ysionneau>
maybe it has nothing to do with DM pin after all...
<sb0>
is the downconverter working correctly?
<ysionneau>
I'm not using it
<ysionneau>
since I get 32 bits wide wishbone directly
<sb0>
huh?
<sb0>
that's on m1?
<ysionneau>
I am only using dq[7:0]
<ysionneau>
yes
<sb0>
oh, probably you are not sending the data at the right time. iirc with ddr you need to send it 1 cycle after the write command, as opposed to simultaneously with sdr.
<ysionneau>
to send what 1 cycle after the write command?
<sb0>
I don't remember if the phy already aligns the write data with the write command, but I'd check that
<ysionneau>
ah ok
<ysionneau>
I thought the phy would take care of that
<sb0>
I don't remember, I wrote it in 2012
<ysionneau>
indeed in sdram you put write command + data_in on DQ at the same rising edge of clk
<ysionneau>
and indeed on ddr you need to wait 1 clock cycle
<ysionneau>
(to present data_in on dq)
<sb0>
just insert a register on dq_w/dm when ddr
<ysionneau>
ah I think I get it
<sb0>
but check the phy first
<ysionneau>
I need to put cmd at wrcmdphase for DDR
<ysionneau>
not at wrphase
<ysionneau>
and indeed wrcmdphase is 0 and wrphase is 1
<ysionneau>
so 1 cycle latency
* ysionneau
resynthesizing
<ysionneau>
if that was the mistake, then I should head bang the wall :'
<sb0>
you need to send the write command on wrcmdphase on all PHYs
<sb0>
not just DDR. it only worked by accident on that SDR PHY.