sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
balrog has quit [Quit: Bye]
balrog has joined #m-labs
fengling has joined #m-labs
bentley` has joined #m-labs
<sb0>
larsc, you'd need to calibrate it...
<sb0>
but if a given IBUFDS_GTE2 on a given chip is stable, then TDC/DTC applications are possible
<sb0>
it's quite likely it's just the max variation across a chip as you say; since for regular transceiver applications the delay doesn't matter, this is a simple model that works for xilinx
<GitHub27>
[artiq] sbourdeauducq pushed 2 new commits to release-2: https://git.io/vizNC
<GitHub27>
artiq/release-2 736927b Sebastien Bourdeauducq: language: set NoScan default repetitions to 1
<GitHub27>
artiq/release-2 d4af441 raghu: added repetitions for no scan, repetitions set to one when disable other scans selected. Closes #532
fengling has quit [Ping timeout: 240 seconds]
balrog has quit [Ping timeout: 264 seconds]
balrog has joined #m-labs
kuldeep has quit [Ping timeout: 250 seconds]
kuldeep has joined #m-labs
rohitksingh has joined #m-labs
<larsc>
sb0: yeah, I guess I can get away with sysref IDELAY monitoring and adjusting the delay according to the measured phase, potentially readjusting at runtime to compensate temperature changes
<larsc>
I just remembered that I already measured the temperature delay difference and it was less than 100ps. At least on the device I measured.
<sb0>
idelay?
<sb0>
aren't those supposed to be calibrated, referenced to the idelayctrl clock input?
<sb0>
and yes, for periodic signals you can use idelay phase adjustments just fine...
mumptai has joined #m-labs
<larsc>
sb0: didn't you use something like that for the HDMI receiver to adjust the phase, or was that something else?
<sb0>
the HDMI receiver uses the crappy, uncalibrated and buggy spartan6 idelays. there is a phase detector that constantly adjusts the tap when there are input transitions.
<sb0>
the Kintex7 DDR3 PHY uses a similar technique for write leveling
<sb0>
(similar to sysref sampling. the idelay phase adjustement is closer to a CDR)
<cr1901_modern>
sb0: What's wrong with adjusting a tap in most cases :P?
<cr1901_modern>
I wonder if that's how the DCM also works to implement a DLL...
<larsc>
sb0: how is the calibration implemented there? just check when the sampled edge moves by one clock cycle?
<larsc>
to find the min and max?
<sb0>
larsc, where?
<sb0>
cr1901_modern, what is wrong is that the spartan6 iodelay goes berserk if its delay is larger than 1 UI. if you start near 1 UI, and then voltage/temp make you go over that, you receive garbage data. the xilinx reference designs also have this bug, and it is unfixable unless you use another fpga. this is why I call it crappy and buggy.
<larsc>
sb0: for the DDR
<sb0>
larsc, the DDR3 chip samples its clock (with a skew you want to determine) using DQS (which is skew balanced) and outputs the information on DQ (where timing is irrelevant here)
<sb0>
so you scan the CLK-DQS delay and determine where the DQ transitions are, this gives you the clock skew at the DDR3 chip
<sb0>
(CLK-DQS delay at the FPGA)
<cr1901_modern>
sb0: The problem is with the voltage/temp making the delay exceed 1 Unit Interval (UI?), correct? In an of itself, it's not a huge deal that delays greater than 1 UI aren't supported (if the behavior didn't go berserk, but say, just clamped)?
<sb0>
cr1901_modern, if it were clamped then you would not get the delay that gives you the best margins.
<sb0>
additionally, there is no way to know what tap you are using, so you can't determine when it is relevant to reset the delay to 0 and sample in the next clock cycle
<sb0>
it's really a lousy design. 7-series introduced calibration and sane >1UI behavior. ultrascale made tap information available in addition to the 7-series improvements.
<larsc>
what do you mean by tap information?
<cr1901_modern>
Which tap is currently being used I think
<sb0>
you have a 9-bit value that tells you which tap is selected. you can also write that value directly to switch to a new tap.
<larsc>
but, series7 has that as well, no?
<cr1901_modern>
In any case, I need to read up more about this. I've never used the iodelay block, but not being able to delay your signal by more than a single bit period sounds very limiting
<sb0>
spartan-6 has a crappy "calibration" mechanism that selects the tap in the middle of the clock you give it
<sb0>
but there is no way to know what tap that is
<sb0>
so you cannot determine when you may go over 1 UI
<sb0>
ah, yes, 7-series has it
<sb0>
already
<larsc>
and series7 and ultrascale there is no such auto calibration and you have to do it in the fabric, right?
<sb0>
but with less precision
<larsc>
5bits I believe
<sb0>
on 7series the taps are calibrated, so you just give it the delay value you want
<cr1901_modern>
I guess the iodelay couldn't be implemented as Delay-Locked Loop either; a delay of more than a single bit period would be a loss of lock
<sb0>
the autocalibration on spartan6 is just to give it a "reasonable" start value that will make the later phase detector adjustments unlikely to make it go over 1UI or below 0
<cr1901_modern>
(Unless that's what S6 DOES in fact do, and then Xilinx realized that "people need delays larger than a bit period" and changed the behavior)
<sb0>
delay-locked-loops are for periodic signals
<sb0>
hdmi data is not
<cr1901_modern>
Ahhh, that makes sense. I'm just hung up on "why did Xilinx think 1UI limit was a good idea"?
<sb0>
the s6 idelay is not implemented with taps. they tried something cheaper.
<sb0>
it's not a tapped delay line inside.
<cr1901_modern>
Did they intend to make it not work after 1UI, or was that an implementation problem?
<sb0>
they intended it, their cheaper/smaller system does not support it by design.
<sb0>
that in itself can be justifiable, but it should be combined with calibration or accessible "tap" information, ideally both
<cr1901_modern>
Just to make sure I understand correctly: A "UI" is the "period between one data transition on the input line to the next", correct?
<sb0>
yes
<cr1901_modern>
As you've prob guessed by now, I've never used IO delays (though kinda obvious what they do)
<cr1901_modern>
iodelay* blocks
<cr1901_modern>
sb0: "(4:11:30 AM) sb0: cr1901_modern, if it were clamped then you would not get the delay that gives you the best margins." What did you mean by margins, specifically? I presume you mean "if I set the delay to anything besides 0.5UI, some tolerance to an real-world imperfection is reduced b/c the iodelay goes to hell after 1UI". Which real-world imperfection?
<cr1901_modern>
^ Since I've never used the iodelay, I'm not sure what you're using it for.
<larsc>
well, you want to maximize setup and hold
<larsc>
best setup and hold are not necessarily at 0.5 UI though
<cr1901_modern>
larsc: Sorry for being dense, but I think I'm missing something here. What do you mean by sample and hold? (Let me guess; the taps are clocked?)
<larsc>
setup and hold time of the flip-flop you are using to sample the signal
<larsc>
sb0: ok, in order to do something similar to the write leveling but for sysref you could just use a simple IDDR2 and then adjust the delay until one edge sees a 1 and the other a 0
<sb0>
that doesn't help compared to a simple flipflop
<sb0>
nothing tells you that the delay you set is right between the 0 and the 1
<sb0>
and scanning with a IDDR2 to determine the crossing point is not simpler than scanning with a flipflop
<larsc>
I don't understand how I could do it with just a FF
<sb0>
increase the delay until you measure a different value
<sb0>
start over when you want to re-calibrate
<larsc>
but that requires a signal with the same frequency as the clock on sysref, no?
<sb0>
using the iddr2 is equivalent to sampling with the FF once with delay X, starting over, and sampling again with delay X+clock_period/2
<sb0>
yes, of course, everything must be synchronous
<cr1901_modern>
larsc: Okay, right. Makes sense.
<cr1901_modern>
sb0: So in my contrived example, if we input a signal to the FPGA through an IDELAY, and delayed it by, say 0.99UI, and the actual signal into the flip-flop had a delay that got clamped to 1.0UI, I don't see how that's a big problem. In fact, wouldn't that DECREASE the setup/hold times that would need to be satisfied for the flip flop (because the signal could not be delayed by more than 1.0UI)?
<larsc>
sb0: yes, of course synchronous. but usually the sysref signal is much slower than the sampling clock
<sb0>
then sample less ;)
<sb0>
skip some cycles
<larsc>
ok, I think I understand what you mean
<larsc>
needs some fine tuneing for practicial application, but the idea should work, thanks
<sb0>
if the pulse is long enough you'll still see some of it
<sb0>
it's like nyquist sampling
<sb0>
your iddr2 trick is like doubling the sample frequency
<sb0>
(but that's never necessary since you can retry almost indefinitely)
<larsc>
I'd have a counter that gives me a ce with the same frequency of the sysref but let the counter wrap one clock cycle earlier
<larsc>
so you can through all possible phase offsets at that level
<sb0>
you can do that, or just collect a bunch of samples and write some software
<larsc>
once you found the transistion at this coarse level keep it in sync with sysref and use the idelay for fine tuning
<sb0>
i'd have stupid gateware: idelay tap control + collect X samples (with X >= sample clock period/sysref period)
<sb0>
then do the rest in software
<sb0>
that gives you a small integrated LA for debugging
<sb0>
er, 1/X above
<sb0>
or s/period/frequency
<larsc>
makes sense
<sb0>
maybe the samples should be synched with your internal sysref divider counter, which the software can tune
<sb0>
e.g. by sending it "slip" commands that make it skip or repeat a value
<sb0>
this way you don't have the real-time requirements that reset or load interface would have, and that software cannot meet
<larsc>
now if I could only create a down divided clock that is aligned to sysref
<sb0>
what is your goal?
<sb0>
if you want to align to sysref, and have a multiplied clock, use a pll :)
<larsc>
down divide the external reference clock by 2
<larsc>
but make sure it is aligned to the sysref transistion
<sb0>
why not take sysref and multiply it by 2?
<larsc>
sysref is like refclk/128
<larsc>
but there are also modes of operation where sysref can be oneshot
<larsc>
to avoid it from leaking into the signal
<larsc>
so in that case you couldn't upmultiply it
<sb0>
btw is that implemented correctly in ADI silicon or should we expect a pandemonium of bugs?
<larsc>
I have no idea
<larsc>
talking to the apps guy later today
<sb0>
those synch mechanisms would probably be simpler if the DAC clock was not free running
<sb0>
turn off clock, reset DACs, first sample is at first edge when you start the clock again
<sb0>
no more complex setup/hold requirements...
<sb0>
also i don't know why jesd looks so complicated, you could probably just dump raw 8b10b encoded data into the DACs
<sb0>
was Intel involved in this again?
<larsc>
sb0: what you are describing is JESD204 without the B
<sb0>
so why did they change it?
<larsc>
people thought this is not enough
<larsc>
not det-lat
<larsc>
no det-lat
<larsc>
clocking was also too complicated
<larsc>
that's why they made it more complicated ;)
<sb0>
no det-lat? how so?
<sb0>
if your first clock edges arrive at the same time then it's det-lat
<larsc>
because of the CDC in the tranceiver
<sb0>
well, there would be a number of samples that you send in advance before the first DAC clock to deal with that
<larsc>
and that's the B
<sb0>
no, the B has this sysref signal that has complicated timing
<sb0>
what I advocate is just a single clock signal that you can turn off
<larsc>
but you'd still need a separate clock for the tranceiver then
<larsc>
so in a sense sysref is that second clock
<sb0>
yes, but transceivers are self-clocking
<larsc>
but they need a reference
<sb0>
true
<larsc>
soon there will be JESD204C which will probably change everything again
<sb0>
actually in that case it shouldn't be a CDR but more like the HDMI thing
<sb0>
therefore it doesn't need a CDC and the whole sync thing is dubious
<sb0>
you could sync from the transceiver lanes.
<sb0>
this requires good transceiver control at the source though, and therefore no idiotic IBUFDS_GTE2 delay variations or obscure xilinx wizards
<sb0>
I wish FPGA vendors had never put a PCS nor elastic buffers in there...
<sb0>
I begin to have the impression that JESD204 was designed by people without a lot of understanding about high-speed serial. for example, they use the K28.7 special character, even though it causes comma alignment issues and there are other special characters available that are non-problematic
<larsc>
yeah, you have to disable comma alignment after the CGS
<larsc>
whether this was on purpose or not, I don't know
<larsc>
might have been on purpose knowing that you can't use K28.7 during normal operation they might have opted for using it has the synchronization character
<sb0>
why is there even a goddamn CGS
<larsc>
you'd want a stable link before you start sending data I'd guess
<sb0>
can't you disable the output via SPI until the link is stable?
<larsc>
but that is the CGS phase then
<larsc>
make sure CDR has locked and characters are aligned
<larsc>
that's what you do in the CGS
<sb0>
but you shouldn't need a separate protocol for this. just send dummy samples.
<larsc>
character alignment?
<larsc>
all you do during CGS is send K28.7
<sb0>
transmit a comma from time to time
<sb0>
K28.7 is a stupid synchronization character because it has a 5-bit uncertainty
<larsc>
how can the synchronization character have uncertainty?
<sb0>
well, if you have a separate sync phase, it's fine
<larsc>
and you need that sync phase anyway
<larsc>
I don't see the issue
<larsc>
it is 28.5 btw
<larsc>
that is used for CGS
<larsc>
28.7 is used as a control character during normal operation
<larsc>
but during normal operation you don't realign
<sb0>
yes. this is stupid.
<sb0>
you can't realign if you have K28.7's.
<sb0>
at least not in a simple way
<sb0>
a good protocol would be: a series of comma + n raw sample data
<sb0>
only one mode, none of this CGS or ILAS bullshit
<sb0>
use a self-synchronizing scrambler, if you need one
<sb0>
turn off the output during link init
<sb0>
no CDC, sync from the transceiver lanes, no sysref
<sb0>
isn't a series of K28.7 pretty aggressive EMI-wise, too?
<larsc>
28.5 is used for sync
<sb0>
yes, but during normal operation aren't you sending strings of K28.7?
<larsc>
not sure what they mean by the strings of F
<larsc>
F is inserted if the last char in a frame is identical between two adjacent frames
<larsc>
which is unlikely when you send real data
<larsc>
and when scrambling is D28.7 is replaced by K28.7 if it is the last char in a frame
<larsc>
btw. ILAS is to give you additional confidence that your link is really up and running and things are properly aligned the way they should be
<sb0>
what is the purpose of rules like that?
<sb0>
and there are easier ways to test the link.
<sb0>
PRBS checkers is the standard way of doing it. if you use a scrambler, you can simply check that you get always zeros at the output.
<sb0>
so all you need really is 1) something to check that the commas arrive at the same time on all lanes 2) something that can tell you if it has received a non-zero character, post-descrambling. both can be controlled with a pair of SPI registers.
<larsc>
it's to check that the transmitter and the receiver agree on the link parameters
<larsc>
it is more than the bare minimum you need
<larsc>
it's meant as a debugging tool
<larsc>
ILAS was optional in 204A and can still be disabled in most 204B converters
<larsc>
if you don't think you need it just disable it
<larsc>
and don't add support for it to your gateware
<whitequark>
ok I'll avoid touching it then, just in case.
<rjo>
whitequark: btw and fyi, i am running backups of lab to my machine. but i am excluding /home because it's huge. if there is anything missing, let me know. the config is in /srv/backup