sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs
balrog has quit [Quit: Bye]
balrog has joined #m-labs
fengling has joined #m-labs
bentley` has joined #m-labs
larsc, you'd need to calibrate it...
but if a given IBUFDS_GTE2 on a given chip is stable, then TDC/DTC applications are possible
it's quite likely it's just the max variation across a chip as you say; since for regular transceiver applications the delay doesn't matter, this is a simple model that works for xilinx
[artiq] sbourdeauducq pushed 2 new commits to release-2:
artiq/release-2 736927b Sebastien Bourdeauducq: language: set NoScan default repetitions to 1
artiq/release-2 d4af441 raghu: added repetitions for no scan, repetitions set to one when disable other scans selected. Closes #532
fengling has quit [Ping timeout: 240 seconds]
balrog has quit [Ping timeout: 264 seconds]
balrog has joined #m-labs
kuldeep has quit [Ping timeout: 250 seconds]
kuldeep has joined #m-labs
rohitksingh has joined #m-labs
sb0: yeah, I guess I can get away with sysref IDELAY monitoring and adjusting the delay according to the measured phase, potentially readjusting at runtime to compensate temperature changes
I just remembered that I already measured the temperature delay difference and it was less than 100ps. At least on the device I measured.
aren't those supposed to be calibrated, referenced to the idelayctrl clock input?
and yes, for periodic signals you can use idelay phase adjustments just fine...
mumptai has joined #m-labs
sb0: didn't you use something like that for the HDMI receiver to adjust the phase, or was that something else?
the HDMI receiver uses the crappy, uncalibrated and buggy spartan6 idelays. there is a phase detector that constantly adjusts the tap when there are input transitions.
the Kintex7 DDR3 PHY uses a similar technique for write leveling
(similar to sysref sampling. the idelay phase adjustement is closer to a CDR)
sb0: What's wrong with adjusting a tap in most cases :P?
I wonder if that's how the DCM also works to implement a DLL...
sb0: how is the calibration implemented there? just check when the sampled edge moves by one clock cycle?
to find the min and max?
larsc, where?
cr1901_modern, what is wrong is that the spartan6 iodelay goes berserk if its delay is larger than 1 UI. if you start near 1 UI, and then voltage/temp make you go over that, you receive garbage data. the xilinx reference designs also have this bug, and it is unfixable unless you use another fpga. this is why I call it crappy and buggy.
sb0: for the DDR
larsc, the DDR3 chip samples its clock (with a skew you want to determine) using DQS (which is skew balanced) and outputs the information on DQ (where timing is irrelevant here)
so you scan the CLK-DQS delay and determine where the DQ transitions are, this gives you the clock skew at the DDR3 chip
(CLK-DQS delay at the FPGA)
sb0: The problem is with the voltage/temp making the delay exceed 1 Unit Interval (UI?), correct? In an of itself, it's not a huge deal that delays greater than 1 UI aren't supported (if the behavior didn't go berserk, but say, just clamped)?
cr1901_modern, if it were clamped then you would not get the delay that gives you the best margins.
additionally, there is no way to know what tap you are using, so you can't determine when it is relevant to reset the delay to 0 and sample in the next clock cycle
it's really a lousy design. 7-series introduced calibration and sane >1UI behavior. ultrascale made tap information available in addition to the 7-series improvements.
what do you mean by tap information?
Which tap is currently being used I think
you have a 9-bit value that tells you which tap is selected. you can also write that value directly to switch to a new tap.
but, series7 has that as well, no?
In any case, I need to read up more about this. I've never used the iodelay block, but not being able to delay your signal by more than a single bit period sounds very limiting
spartan-6 has a crappy "calibration" mechanism that selects the tap in the middle of the clock you give it
but there is no way to know what tap that is
so you cannot determine when you may go over 1 UI
ah, yes, 7-series has it
and series7 and ultrascale there is no such auto calibration and you have to do it in the fabric, right?
but with less precision
5bits I believe
on 7series the taps are calibrated, so you just give it the delay value you want
I guess the iodelay couldn't be implemented as Delay-Locked Loop either; a delay of more than a single bit period would be a loss of lock
the autocalibration on spartan6 is just to give it a "reasonable" start value that will make the later phase detector adjustments unlikely to make it go over 1UI or below 0
(Unless that's what S6 DOES in fact do, and then Xilinx realized that "people need delays larger than a bit period" and changed the behavior)
delay-locked-loops are for periodic signals
hdmi data is not
Ahhh, that makes sense. I'm just hung up on "why did Xilinx think 1UI limit was a good idea"?
the s6 idelay is not implemented with taps. they tried something cheaper.
it's not a tapped delay line inside.
Did they intend to make it not work after 1UI, or was that an implementation problem?
they intended it, their cheaper/smaller system does not support it by design.
that in itself can be justifiable, but it should be combined with calibration or accessible "tap" information, ideally both
Just to make sure I understand correctly: A "UI" is the "period between one data transition on the input line to the next", correct?
As you've prob guessed by now, I've never used IO delays (though kinda obvious what they do)
iodelay* blocks
sb0: "(4:11:30 AM) sb0: cr1901_modern, if it were clamped then you would not get the delay that gives you the best margins." What did you mean by margins, specifically? I presume you mean "if I set the delay to anything besides 0.5UI, some tolerance to an real-world imperfection is reduced b/c the iodelay goes to hell after 1UI". Which real-world imperfection?
^ Since I've never used the iodelay, I'm not sure what you're using it for.
well, you want to maximize setup and hold
best setup and hold are not necessarily at 0.5 UI though
larsc: Sorry for being dense, but I think I'm missing something here. What do you mean by sample and hold? (Let me guess; the taps are clocked?)
setup and hold time of the flip-flop you are using to sample the signal
sb0: ok, in order to do something similar to the write leveling but for sysref you could just use a simple IDDR2 and then adjust the delay until one edge sees a 1 and the other a 0
that doesn't help compared to a simple flipflop
nothing tells you that the delay you set is right between the 0 and the 1
and scanning with a IDDR2 to determine the crossing point is not simpler than scanning with a flipflop
I don't understand how I could do it with just a FF
increase the delay until you measure a different value
start over when you want to re-calibrate
but that requires a signal with the same frequency as the clock on sysref, no?
using the iddr2 is equivalent to sampling with the FF once with delay X, starting over, and sampling again with delay X+clock_period/2
yes, of course, everything must be synchronous
larsc: Okay, right. Makes sense.
sb0: So in my contrived example, if we input a signal to the FPGA through an IDELAY, and delayed it by, say 0.99UI, and the actual signal into the flip-flop had a delay that got clamped to 1.0UI, I don't see how that's a big problem. In fact, wouldn't that DECREASE the setup/hold times that would need to be satisfied for the flip flop (because the signal could not be delayed by more than 1.0UI)?
sb0: yes, of course synchronous. but usually the sysref signal is much slower than the sampling clock
then sample less ;)
skip some cycles
ok, I think I understand what you mean
needs some fine tuneing for practicial application, but the idea should work, thanks
if the pulse is long enough you'll still see some of it
it's like nyquist sampling
your iddr2 trick is like doubling the sample frequency
(but that's never necessary since you can retry almost indefinitely)
I'd have a counter that gives me a ce with the same frequency of the sysref but let the counter wrap one clock cycle earlier
so you can through all possible phase offsets at that level
you can do that, or just collect a bunch of samples and write some software
once you found the transistion at this coarse level keep it in sync with sysref and use the idelay for fine tuning
i'd have stupid gateware: idelay tap control + collect X samples (with X >= sample clock period/sysref period)
then do the rest in software
that gives you a small integrated LA for debugging
er, 1/X above
or s/period/frequency
makes sense
maybe the samples should be synched with your internal sysref divider counter, which the software can tune
e.g. by sending it "slip" commands that make it skip or repeat a value
this way you don't have the real-time requirements that reset or load interface would have, and that software cannot meet
now if I could only create a down divided clock that is aligned to sysref
what is your goal?
if you want to align to sysref, and have a multiplied clock, use a pll :)
down divide the external reference clock by 2
but make sure it is aligned to the sysref transistion
why not take sysref and multiply it by 2?
sysref is like refclk/128
but there are also modes of operation where sysref can be oneshot
to avoid it from leaking into the signal
so in that case you couldn't upmultiply it
btw is that implemented correctly in ADI silicon or should we expect a pandemonium of bugs?
I have no idea
talking to the apps guy later today
those synch mechanisms would probably be simpler if the DAC clock was not free running
turn off clock, reset DACs, first sample is at first edge when you start the clock again
no more complex setup/hold requirements...
also i don't know why jesd looks so complicated, you could probably just dump raw 8b10b encoded data into the DACs
was Intel involved in this again?
sb0: what you are describing is JESD204 without the B
so why did they change it?
people thought this is not enough
not det-lat
no det-lat
clocking was also too complicated
that's why they made it more complicated ;)
no det-lat? how so?
if your first clock edges arrive at the same time then it's det-lat
because of the CDC in the tranceiver
well, there would be a number of samples that you send in advance before the first DAC clock to deal with that
and that's the B
no, the B has this sysref signal that has complicated timing
what I advocate is just a single clock signal that you can turn off
but you'd still need a separate clock for the tranceiver then
so in a sense sysref is that second clock
yes, but transceivers are self-clocking
but they need a reference
soon there will be JESD204C which will probably change everything again
actually in that case it shouldn't be a CDR but more like the HDMI thing
therefore it doesn't need a CDC and the whole sync thing is dubious
you could sync from the transceiver lanes.
this requires good transceiver control at the source though, and therefore no idiotic IBUFDS_GTE2 delay variations or obscure xilinx wizards
I wish FPGA vendors had never put a PCS nor elastic buffers in there...
I begin to have the impression that JESD204 was designed by people without a lot of understanding about high-speed serial. for example, they use the K28.7 special character, even though it causes comma alignment issues and there are other special characters available that are non-problematic
yeah, you have to disable comma alignment after the CGS
whether this was on purpose or not, I don't know
might have been on purpose knowing that you can't use K28.7 during normal operation they might have opted for using it has the synchronization character
why is there even a goddamn CGS
you'd want a stable link before you start sending data I'd guess
can't you disable the output via SPI until the link is stable?
but that is the CGS phase then
make sure CDR has locked and characters are aligned
that's what you do in the CGS
but you shouldn't need a separate protocol for this. just send dummy samples.
character alignment?
all you do during CGS is send K28.7
transmit a comma from time to time
K28.7 is a stupid synchronization character because it has a 5-bit uncertainty
how can the synchronization character have uncertainty?
well, if you have a separate sync phase, it's fine
and you need that sync phase anyway
I don't see the issue
it is 28.5 btw
that is used for CGS
28.7 is used as a control character during normal operation
but during normal operation you don't realign
yes. this is stupid.
you can't realign if you have K28.7's.
at least not in a simple way
a good protocol would be: a series of comma + n raw sample data
only one mode, none of this CGS or ILAS bullshit
use a self-synchronizing scrambler, if you need one
turn off the output during link init
no CDC, sync from the transceiver lanes, no sysref
isn't a series of K28.7 pretty aggressive EMI-wise, too?
28.5 is used for sync
yes, but during normal operation aren't you sending strings of K28.7?
not sure what they mean by the strings of F
F is inserted if the last char in a frame is identical between two adjacent frames
which is unlikely when you send real data
and when scrambling is D28.7 is replaced by K28.7 if it is the last char in a frame
btw. ILAS is to give you additional confidence that your link is really up and running and things are properly aligned the way they should be
what is the purpose of rules like that?
and there are easier ways to test the link.
PRBS checkers is the standard way of doing it. if you use a scrambler, you can simply check that you get always zeros at the output.
so all you need really is 1) something to check that the commas arrive at the same time on all lanes 2) something that can tell you if it has received a non-zero character, post-descrambling. both can be controlled with a pair of SPI registers.
it's to check that the transmitter and the receiver agree on the link parameters
it is more than the bare minimum you need
it's meant as a debugging tool
ILAS was optional in 204A and can still be disabled in most 204B converters
if you don't think you need it just disable it
and don't add support for it to your gateware
ok I'll avoid touching it then, just in case.
whitequark: btw and fyi, i am running backups of lab to my machine. but i am excluding /home because it's huge. if there is anything missing, let me know. the config is in /srv/backup