<azonenberg>
my understanding is that the problem is when you have clock-to-out less than the hold time
<azonenberg>
presumably at the extremes of PTV
<daveshah>
The problem is to do with hold time plus clock skew
<mwk>
azonenberg: also when you have shit time skew
<daveshah>
Without clock skew, Xilinx would be fine
<mwk>
er, clock skew
<azonenberg>
oh if you have clock skew that makes it worse
<daveshah>
But afaik at certain clock region boundaries there is skew
<daveshah>
Yeah
<mwk>
consider the clock region boundary
<daveshah>
Both Xilinx and Intel do take the approach that their tools are good enough they can take more liberties in hardware
<mwk>
Y0 vs Y1
<daveshah>
Lattice don't trust their tooling in the same way
<whitequark>
so in principle, if (considering the verilog i posted above) if you have BUFG->clka->BUFGCE->clkb, and the delay of BUFGCE plus clock skew is less than hold time,
<mwk>
clock regions are 50 CLBs high
<whitequark>
then there would be no race?
<whitequark>
(can you even connect a BUFGCE to the output of BUFG?)
<mwk>
two immediately neighbouring FFs straddling the boundary have the same path length and delay time between their vertical spine tap and the FF clock input
<mwk>
but the vertical spine taps are 50 CLB heights apart
<mwk>
so if they are not the middle two rows, you have 50 CLBs worth of clock skew between them
<mwk>
whitequark: connecting BUFG output to BUFGCE input is a bad idea
<mwk>
what you want is to have src->BUFG->clka and src->BUFGCE->clkb
<whitequark>
yes, I know I wouldn't want to do this in a real design
<whitequark>
I'm not making an FPGA bitstream, I'm writing a simulator
<whitequark>
so I specifically look at bad ideas
<mwk>
some FPGAs have dedicated paths between BUFG outputs and BUFG inputs
<mwk>
some virtex 4/5/6 definitely does, spartan 3 definitely does not
<whitequark>
aha
<mwk>
let's look it up
<whitequark>
oh another question re hold time. I've heard that DFFs can be implemented as two latches with different active polarity in series
<mwk>
yeah, series 7 BUFG outputs can be connected to BUFG inputs via dedicated paths
<whitequark>
doesn't that basically shift D by 180°?
<daveshah>
whitequark: I think this is even the most common way these days
<mwk>
yes, and that's The Standard Way to do a flop
<whitequark>
so your hold time is at least a half period
<mwk>
no
<whitequark>
hm
<mwk>
hold on, I had a nice demo I showed to my students
<whitequark>
where am I mistaken?
<mwk>
whitequark: only one latch is ever enabled
<whitequark>
sure
<whitequark>
if it's posedge triggered, then the first half cycle, first latch is transparent, and second latch is holding the previous value
<mwk>
yes
<whitequark>
the second half cycle, first latch is holding the value, and second latch is transparent
<whitequark>
assuming the delay of the latches themselves is negligible, this means that Q is exactly D shifted 180°
<whitequark>
no?
<whitequark>
if the delay isn't negligible then it's shifted 180+n°, and unstable for 360-n°
<mwk>
that was the thing that actually helped me understand how that thing works
<mwk>
beware: it's a *negedge* flop
<whitequark>
hm
<mwk>
(click on the inputs on the left to have fun)
<whitequark>
oh it has javascript
<mwk>
so when clock is 0, the left latch is holding state, the right latch is transparent
<mwk>
and since left latch cannot change, right latch is effectively frozen as well
<whitequark>
yep
<mwk>
when 0-to-1 transition happens, left latch suddenly opens (to transparent), and right latch suddenly closes (to hold)
<whitequark>
yep
<mwk>
since this happens at the same time, and they both have output delay, the right latch will always end up holding whatever left latch held before
<mwk>
when clock is 1, the left latch is transparent and the middle data line keeps following input; the right latch is holding
<mwk>
and on the active 1-to-0 transition, the right latch opens and gets whatever value is currently on the middle data line (which was connected directly to input until now, only a single NAND of delay)
<mwk>
at the exact same time, left latch closes and freezes its value
<whitequark>
yep. i understand that
<mwk>
and this is how the whole thing manages to have only 1-2 NAND delay worth of setup/hold time
<whitequark>
wait
<whitequark>
i know what went wrong
<whitequark>
i used "hold time" incorrectly
<mwk>
and about delaying stuff by 180°
<mwk>
suppose you want to chain two DFFs with opposite clock polarity
<mwk>
(which is a thing you want to do in DDR I/Os)
<mwk>
if you draw the two FFs next to each other, you'll notice that the middle two latches are exactly identical and redundant
<mwk>
so the circuit can actually be optimized to three latches :)
<mwk>
and this is exactly what they actually do in DDR I/O logic
<mwk>
you have three latches in a row; if you select SAME_EDGE ddr, you use all of them; if you select plain FF, you use two of them and bypass the third; and if you select a latch, you bypass all but one
<mwk>
quite elegant
<whitequark>
yes, I was wondering about the redundant middle latch
<whitequark>
neat
<whitequark>
also I'm not sure how it's called but when I said "hold time" earlier I meant "period minus propagation delay time" and it's not half period but rather almost the entire period
<whitequark>
it was a fairly dumb question
<whitequark>
because that's just how a DFF works
<whitequark>
but I understand it better now, anyway
<whitequark>
thank you :)
<mwk>
yeah, I'm really glad I found that demo thing :)
<mwk>
worked wonders for my class as well
<whitequark>
it turns out that I understood the basic structure just fine without the demo, actually
<whitequark>
but I never really thought about the timing implications of it properly before
<mwk>
ah, fair enough
<whitequark>
although it's definitely useful to see exactly how it's implemented on gate level, too
<whitequark>
hm
<whitequark>
the SB_IO circuit in the datasheet shows two FFs and a mux selected by the clock
<whitequark>
for the DDR output
<whitequark>
I guess that's a lie then
<mwk>
does ice40 have something like SAME_EDGE mode?
<mwk>
ie. for the output register, the data for both phases is actually sampled at the same edge of the clock
<mwk>
to make interfacing from the fabric easier
<whitequark>
no, you have to do that yourself
<whitequark>
(nmigen inserts this FF :)
<mwk>
ah, then they don't use that trick
<whitequark>
ohh
<mwk>
also tbh it's not really that awesome
<mwk>
given the area proportion between a single latch and a big honking I/O pad
<mwk>
it'd be really hard to notice it at all
<whitequark>
well, it's cute
<mwk>
agreed
<cr1901_modern>
>(nmigen inserts this FF :)
<cr1901_modern>
To simulate SAME_EDGE mode?
<mwk>
of course, what else
<mwk>
you want these two bits in the same clock domain to do anything with them
<whitequark>
yep
<whitequark>
i mean, nextpnr actually deals with it just fine, but your Fmax drops in hal
<whitequark>
*half
<whitequark>
so you probably don't want that :p
<cr1901_modern>
>of course, what else
<cr1901_modern>
Sorry, long day lol
pie__ has quit [Ping timeout: 240 seconds]
Bike has quit [Quit: Lost terminal]
nrossi has joined ##openfpga
rohitksingh has quit [Ping timeout: 245 seconds]
ZombieChicken has quit [Ping timeout: 240 seconds]
<OmniMancer>
cool, are those collected via fuzzing?
<daveshah>
Nope, by parsing the Lattice CSV files
<daveshah>
They use names like PT42A that directly correspond to site locations
<daveshah>
(i.e. top row, col 42, PIO A)
pie_ has joined ##openfpga
<OmniMancer>
ah
mifune has quit [Ping timeout: 240 seconds]
mifune has joined ##openfpga
m4ssi has joined ##openfpga
massi_ has joined ##openfpga
BusterTheDummy has joined ##openfpga
keesj_ has joined ##openfpga
m4ssi has quit [Excess Flood]
IanMalcolm has quit [Remote host closed the connection]
keesj has quit [Ping timeout: 240 seconds]
BusterTheDummy has quit [Quit: ZNC 1.7.5 - https://znc.in]
IanMalcolm has joined ##openfpga
<OmniMancer>
Hmmm, I am not sure if all the tiles are real or which are abstractions
<OmniMancer>
daveshah: in the nextpnr generic backend, what does the "location" of a wire represent?
<daveshah>
OmniMancer: it's just a nominal point used for delay estimates
<daveshah>
Either source location or some kind of midpoint would work
<OmniMancer>
and "pip" locations are then?
<OmniMancer>
the location of the sink?
<daveshah>
The location of the switch
<daveshah>
Usually where the bitstream bits are for non pseudo pips
<OmniMancer>
so the location of the sink end of the wire is fine then
Asu has joined ##openfpga
<OmniMancer>
it seems each input can be explicitly tied to ground, but how does one determine what state an input is in when no mux setting is applied?
<OmniMancer>
daveshah: do tile location suffice for the delay estimate?
<daveshah>
Yes
<daveshah>
Tile location is what ecp5 and ice40 use
<daveshah>
Some experimentation might be needed for no mux setting values
<daveshah>
Seeing what the tool does in certain cases
<OmniMancer>
well AFAIK the tools default state is to set no bits
<OmniMancer>
so I sort of expect that an all 0s bitfile will do nothing
<OmniMancer>
I suppose LUT/FF inputs can be inferred by constructing a design that will give one or the other result based on the unconnected input state
<OmniMancer>
Hmmm the PLLs in this part can apparently be dynamically configured
<daveshah>
I would be careful with experimental results, a floating signal could end up either way depending on circumstances so a single experiment won't be perfect
<daveshah>
Stuff usually floats high, although notably ECP5 still has explicit connections to 1 too
<daveshah>
Xilinx otoh a connection to 1 doesn't set any bits
<sorear>
does a single floating signal affect static power enough to measure?
<OmniMancer>
No idea yet
<OmniMancer>
I would have to set up a test
<OmniMancer>
Or ask someone else to
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 240 seconds]
X-Scale` is now known as X-Scale
<OmniMancer>
Hmm it seems each tile only has 2 possible global clock inputs
<OmniMancer>
but those global clock wires can be routed into the fabric
X-Scale has quit [Ping timeout: 265 seconds]
X-Scale has joined ##openfpga
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 250 seconds]
X-Scale` is now known as X-Scale
<OmniMancer>
hmmm, it appears you only get to pick a clock for the mslices and a clock for the lslices in a tile
<OmniMancer>
I suspect the local wires are used to bridge the gaps in which interconnect wires can connect to which inputs
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 240 seconds]
X-Scale` is now known as X-Scale
OmniMancer has quit [Quit: Leaving.]
freemint has joined ##openfpga
dh73 has joined ##openfpga
genii has joined ##openfpga
<tnt>
Does anyone know if the sd card 4 bit mode is documented somewhre publically ?
<tnt>
I mostly just find the spi mode documented.
massi_ has quit [Remote host closed the connection]
<miek>
there are some details in the SDIO simplified specification, it doesn't look like the full spec is public
<tnt>
yeah, I was hoping it leaked somewhere in all this time :p
<tnt>
miek: btw, unrelated but ... I have your SMA antenna still.
Finde has quit [Ping timeout: 245 seconds]
<kc8apf>
tnt: I see 4-bit mode described in Section 3.6.1 of SD Specifications Part 1 Physical Layer Simplified Specification
<kc8apf>
unless you mean UHS-II which is in a separate addendum
Finde has joined ##openfpga
<gruetzkopf>
is the sd express stuff publically described?
freemint has quit [Quit: Leaving]
<miek>
tnt: oh yeah, i'll get it at congress :)
<kc8apf>
wtf. sd express repurposes the UHS-II pins for PCIe lanes and steals a few pins from UHS-I for REFCLK, PERRST#, and CLKREQ#.
<kc8apf>
their whitepaper claims that as long as you connect the PCIe signals to the right pins, it will train and show up as a normal NVMe device
<gruetzkopf>
oh, neat
<GenTooMan>
question is that outside the specification for the bus? IE are they doing something that it wasn't designed for.
<kc8apf>
doesn't look like it. They recommend talking over SDIO to determine card capabilities but they explicitly mention PCIe initialization is supported
<cr1901_modern>
tnt: Simplified spec is enough to build a 4-bit mode core if you want something that "just works".
<GenTooMan>
As long as they don't go outside the specification protocol wise or bus wise it's probably fine to do.
m4ssi has joined ##openfpga
m4ssi has quit [Remote host closed the connection]
<GenTooMan>
daveshah thanks for the hint about nextpnr set_frequency in the pcf file change it now works works correctly.
dh73 has quit [Quit: Leaving.]
mumptai has joined ##openfpga
freemint has joined ##openfpga
dh73 has joined ##openfpga
marcan has quit [Remote host closed the connection]
marcan has joined ##openfpga
<azonenberg_work>
pcie sd cards???
<azonenberg_work>
the power density of that must be ridiculous
<kc8apf>
they claim 1.8W max
<kc8apf>
which is.....a lot
rohitksingh has quit [Ping timeout: 250 seconds]
<TD-Linux>
slightly weirded out by a future where sd cards have dma access to your system
<ZirconiumX>
Or where PCIe turns into the USB philosophy of "Everything over ~~USB~~ PCIe"
<sorear>
when do we get the future where acs/ats are widely supported
freemint has quit [Ping timeout: 245 seconds]
nrossi has quit [Quit: Connection closed for inactivity]
mkru has joined ##openfpga
lopsided98 has quit [Remote host closed the connection]
<kc8apf>
sorear: Microsoft's Secured Core program requires firmware to enable IOMMU with all devices restricted