<sb0>
well, the XIP bug is a low-priority one; the boards will boot from the flash in most real-use situations...
<sb0>
have you tested it?
<sb0>
if it works, please send a patch to the ML so that Uwe can merge it :)
<_florent_>
IIRC it was working with non XIP bitstreams, but not with XIP bistreams but I'll retry that today
<sb0>
okay
<sb0>
I wonder if the K7 delays really allow several transitions in the buffer...
<sb0>
things are acting pretty weird when trying to compensate for DDR3 skew
<_florent_>
maybe I should also implements the fly by delay in the simulation...
<sb0>
by the way, do you have experience with dynamic reconfiguration of the k7 PLLs, or clock rerouting?
<sb0>
perhaps it could help to have some mode with the DLL disabled and a very low frequency, so we can actually know and change what is in the memory array
<_florent_>
not really, I've only reused code for dynamic reconfiguration
<sb0>
but if the k7 is anything like the slowtan6, this will go horribly wrong...
<sb0>
yeah, the fly-by delay should be in the simulation. it's large enough to cause a lot of trouble ...
<sb0>
interestingly enough, I get consistent data on DQ0-7 when removing the other pins in the platform file, and one bit flips sometimes when having all the pins
<sb0>
(read data)
<sb0>
DQ8-31 are consistent
<_florent_>
yes interesting
<_florent_>
for my test I removed all pins related to modules 1 to 7 (DQ8-63) and only keep pins related to module 0 (DQ0-7)
<sb0>
yes, same
<_florent_>
for info, the upstream kc705 target is broken, ddrphy is missing csr_map
<sb0>
fixed - forgot to commit it
<GitHub73>
[misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/c93qOg
<sb0>
p179 of the micron datasheet says clearly that DQS is don't care before the preamble - but removing spurious pulses could help with making sure the SDRAM is receiving DQS when we think it is ...
<_florent_>
yes I'm running the simulation first with the uptream code and only 1 module and will see if there is a warning about DQS, then I'll try remove the pulses
fengling_ is now known as fengling
<sb0>
_florent_, does the bitslip fetch data before or after the word that exists initially (before any bitslip operation)?
<sb0>
eg you have the data AB1234XY
<sb0>
before bitslip you have 1234
<sb0>
what do you get in the different bitslip states?
<sb0>
there is _no_ specification of that in the xilinx datasheet afaict :/
<_florent_>
with bitslip=0 you have the max latency
<_florent_>
by increasing bitslip you reduce latency
<sb0>
and they encrypted the verilog model for the serdes. very clever ...
<sb0>
so you get A/B, and not X/Y?
<_florent_>
it's explained in UG471 P157
<_florent_>
what's the temporal order of AB1235XY?
<sb0>
A first, then B, then 1, ...
<sb0>
UG471 says that the bitslip does a vague "reordering" of the words extracted from those sequences that are periodic with a period equal to the SERDES width...
<_florent_>
ok, so if 1234 is outputed with bitslip=0, by increasing it you will be able to output 234X, or 34XY, but you have to look to UG471 P157 for the exact details
masal has joined #m-labs
masal has left #m-labs [#m-labs]
<sb0>
ok, so it shifts the sampling window *later* in time, which should be equivalent to *increasing* the IDELAY by a discrete amount of bit times, correct?
<_florent_>
no, for me it's the opposite...
<_florent_>
with ISERDESE2 in NETWORKING, you add 1 full sys_clk of latency, and thus when it's outputing data, new data are already sampled inside ISERDES
<_florent_>
bitslip only allow you to select between old and new data on the output
<sb0>
ah yes, you're right
<sb0>
it shifts the sampling window *later* in time, which should be equivalent to *decreasing* the IDELAY by a discrete amount of bit times
<_florent_>
for the bitslip part it seems ok, but in DDR3 you have to look at Figure 3-11 P157 for the output pattern
<_florent_>
in DDR mode sorry
fengling has quit [Quit: WeeChat 1.0]
<sb0>
yes, but the difference is only in the number of bitslip pulses you need to send to reach a certain state, correct?
<_florent_>
yes
<sb0>
ok
<sb0>
one thing that does not make sense is why some DQs *always* have inconsistent values when reading, no matter what the IDELAY setting is
<_florent_>
I don't understand either...
<_florent_>
do you calibrate IDELAY on group of 8 bits, on on each bits?
<sb0>
on groups of 8 bits
<sb0>
maybe there's a lot of jitter... I hope not :/
<sb0>
or the IDELAY is not working correctly
<sb0>
of course, ug471 had to use a different initial pattern in order to make it more difficult to search for the DDR bitslip state that corresponds to a given SDR bitslip state
<sb0>
well I'm too forgiving here... why did they have that difference to start with
<sb0>
%§&/! this DDR bitslip state reordering is a pure annoyance
<sb0>
totally useless, just causes problems
<_florent_>
"In DDR mode, every Bitslip operation causes the output pattern to alternate between a shift right by one and shift left by three"
<sb0>
yeah. WHY?
<_florent_>
don't know, but it seems "easier" to understand than the Figure 3-11...
<sb0>
yeah, fig 3-11 is useless
<sb0>
so, number of shifts wanted -> number of pulses needed in this idiotic DDR mode
<sb0>
0 -> 0
<sb0>
1 -> 3
<sb0>
2 -> 2
<sb0>
3 -> 5
<sb0>
4 -> 4
<sb0>
5 -> 7
<sb0>
6 -> 6
<sb0>
7 -> 1
<_florent_>
so you will have to reset delay between each test
<sb0>
ok. things start to make sense. on the higher DQ group (with >1ns skew from the write leveling), after swallowing one bit of skew using the bitslip and then adding some delay, I don't see read errors anymore
<_florent_>
idelay
<sb0>
yes
<sb0>
I'm not writing yet... just reading whatever garbage is in the memory array (which conveniently is pretty random)
<_florent_>
sorry, so it's like idelay does not support multiple transisions?
<_florent_>
transitions
<sb0>
idelay would only *increase* the delay. for the higher DQ group which receives its read command later and therefore outputs its read data later, we need to *decrease* it.
<_florent_>
indeed...
<sb0>
it could be that the persistent read errors I was seeing were due to the SERDES sampling outside the burst, and it looked like data due to electrical noise ...
<sb0>
I have added a hack to sdrrderr to see where the errors occur within the burst
<_florent_>
ok, I have to go, will be back after lunch
<_florent_>
BTW the DQS pulses do not trigger warnings in simulations
<_florent_>
and I don't see others warnings/errors relative to timings
<sb0>
yeah. they just make it harder to see if the DRAM is sampling the correct burst
<sb0>
as it would still be a correct write (as there are more DQS pulses) if the DRAM sampling window is shifted by a few cycles
<sb0>
the datasheet does say that DQS is don't care before the preamble and after the postambl ...
<GitHub141>
[misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/jHo9BA
<GitHub141>
misoc/master 2234f50 Sebastien Bourdeauducq: k7ddrphy: add bitslip control for incoming DQ
<sb0>
alternatively, we can set DQ to zero when the data is invalid. this way, the DRAM behaves in a specified manner when its burst sampling window is off
<sb0>
that could make it easier to debug, as we'd see zeros in the data read back, instead of some unspecified (and probably messy) behaviour
_florent_ has quit [Ping timeout: 240 seconds]
_florent_ has joined #m-labs
nicksydney_ has joined #m-labs
nicksydney has quit [Ping timeout: 252 seconds]
<_florent_>
sb0, DQ is already set to 0 by LASMICON when it is not writing, so with memtest during preamble sys_clk and postamble sys_clk the PHY send zeros to the DRAM
<sb0>
hmm, ok, dfii should do the same then
mumptai has joined #m-labs
<sb0>
since we are doing this sort of PHY tweaking using dfii ...
<sb0>
memory controllers do not _have_ to do it, since issues like burst window alignment should be sorted out at that stage ...
<_florent_>
I've tried to change dqs_pattern on preamble and postamble but wasn't able to have something working due to OSERDESE2 strange behaviour...
nicksydney has joined #m-labs
nicksydney_ has quit [Ping timeout: 252 seconds]
<sb0>
interesting. and what makes you think that the OSERDESE2 doesn't similarly act up when dealing with the data?
<_florent_>
I don't know exactly by when asserting T1(oe) and applying preamble pattern the same sys_clk cycle new pattern does not seems used...
<sb0>
ok I managed to get consistent data with all DQ groups, over 1.6 million reads
<sb0>
I used one bitslip (3 pulses with the stupid DDR mode) on the three upper (high skew) DQ groups, plus some delay
<sb0>
and some delay on the lower DQ group
<sb0>
and by changing the lower 3 address bits, I get the expected burst reordering. so it sounds I'm definitely reading the DRAM array correctly.
<_florent_>
great
<sb0>
with lasmicon: 881/1048576 failed 32-bit words
<sb0>
sdrwr seems to write the pattern correctly, though I have only made one attempt which statistically is not very significant.
<sb0>
I've never had this low error rate though :-)
<_florent_>
yes it's better that what I had
<sb0>
but there can be non-PHY sources (eg bad timing in the controller), so the next test is to exercise writes to the page buffer with dfii
<sb0>
I wonder how to automate this calibration.. right now it's just a lot of manual guesswork and it might not work on another PCB/SODIMM...
<_florent_>
at least memtest with size of 2048 bytes and l2_size of 128 bytes does not trigger timings issues in the controller
<_florent_>
for read leveling you can also use the DRAM pattern to find the center of the sampling window
<sb0>
I guess you can assume that there is less than 2 bit time of skew across the module
<sb0>
so during write leveling, if you are already sampling CK high at a DQS transition with no delay, then this DQ group has between 1 and 2 bit times of skew
<sb0>
in this case, 1) move the DQS edge in the CK=0 zone before continuing leveling 2) add one bitslip on the read path
<sb0>
finally, increase read delays for each DQ group until read data is consistent
<sb0>
before the last step, you may want to fill the DRAM array with random data
<sb0>
I guess this algo should work for simple DDR3 systems...
<sb0>
well, "simple"
<sb0>
DDR3 is a mess
<_florent_>
IIRC LASMICON does not handles ODT
<sb0>
isn't dynamic ODT only needed for multi-rank systems?
<_florent_>
maybe the last errors we have come from here...
<sb0>
and on single-rank systems, you can just drive ODT=1 all the time
mithro has joined #m-labs
<_florent_>
... yes sorry, I'm reading TN4104.pdf and it seems you are right
<_florent_>
it seems we are also not using DCI termination on DQ/DQs