Topic for #milkymist is now Milkymist One, Milkymist SoC & Flickernoise development channel (LLHDL/Antares are welcome too) :: Logs: http://en.qi-hardware.com/mmlogs :: JFDI
mumptai joined #milkymist
wolfspraul joined #milkymist
wolfspraul joined #milkymist
wpwrak joined #milkymist
mumptai joined #milkymist
<wpwrak> let's see if git-send-email works ...
<kristianpaul> seems it does
<wolfspraul> wpwrak: oh wow, lots of patches
<wolfspraul> 'full-speed almost works now', that sounds like the rest can also be fixed in software?
<wolfspraul> I am referring to my question from a few days ago whether we can rule out hardware bugs or needed improvements to support full-speed.
<wolfspraul> I guess now we can?
<wpwrak> let's see. it could still be that we're weak in the analog domain
<wpwrak> but at least we're not hopeless :)
<wpwrak> now i'm unrolling the loop of usb_rx ... that should improve timing quite a lot ... and make the code really cryptic ;-)
<wolfspraul> [analog domain] you mean the signals going out on the wire are not clean, wrong timing?
<wolfspraul> can't we just look at that on a scope and tell whether it's good or not?
<wpwrak> there could be signal distortion, yes
<wpwrak> tricky. a 100 MHz / 400 MSa/s scope should be sufficient for this kind of task, but i get a lot of noise, so it's hard to see the exact shape
<wolfspraul> and the noise is coming from the m1 board?
<wpwrak> another way to find out is to check the CRC, and count error
<wpwrak> no, from my environment the the measurement setup
<wolfspraul> should adam take a quick measurement? when he goes to minbo or xray shop or other places, maybe they have a good scope standing around - if measurement is easy...
<wolfspraul> sounds like just power m1, plug in usb keyboard, take measurement
<wpwrak> something like that, yes :)
<wpwrak> you have to be careful not to distort the signal with the probe
<wpwrak> usb doesn't like capacitative loads on the signals
<wpwrak> already full-speed is a bit on the finicky side
<wolfspraul> hmm, well. should we do it or not?
<wolfspraul> in case we do, can you describe in a line or two how you would do the measurement setup?
<wpwrak> can't hurt if he has a look, if he feels it's useful. there's always the chance of someone noticing something. and maybe he can find a fast scope with a FET probe. like that GHz monster we had at openmoko. with such a device, you rule :)
<wpwrak> (okay, that one ran windows, so in that particular case, you still lose)
<wpwrak> hmm. 12 bit times, down from 16. still darn tight.
<wpwrak> (sample size 1, as any good statistician would)
<mumptai> full-speed usb should be below 100MHz bandwidth
<wpwrak> let's see ... full-speed rise/fall time is 4-20 ns. let's say 10 ns. for an accuracy of 5% or better, according to lecroy, the scope's rise/fall time should be 1/3 oft that.
<wpwrak> the 3 dB rise/fall time would be 1/3/analog_bw. so yes, a 100 MHz scope would just do. the next problem is the sample rate. some low-cost brands have really low sample rates. e.g., mine does typically only up to 200 MSa/s. 400 MSa/s if i use only one channel.
<wpwrak> (rigol improved that in the successor)
<wpwrak> the rule of thumb is somewhere around sample rate = 5 x signal bandwidth
<wpwrak> so my scope could barely make sense of such a signal
<wpwrak> of course, i could spot any truly massive distortion, but i couldn't tell with certainty whether the signal meets specs
<wpwrak> wolfspraul: one of these cases where expensive tools do improve the results. with a slow scope, there's just a lot you can't see
<wpwrak> grr. 15 cycles. despite adding extra exit points to the poll loop. this is messy.
mumptai joined #milkymist
<wpwrak> hehe, 6 cycles. that's more like it ;)
<wolfspraul> wpwrak: oh, you totally misquote me and out of context too. I have not said anything against expensive tools :-)
<wolfspraul> it's safe to assume that in most cases there is a reason they are expensive, i.e. someone buys them and they create that value.
<wolfspraul> that's why I'm wondering now how we can effectively and quickly rule out analog usb issues... very aware that bad tools may complicate our effort and even lead us in the wrong direction.
<wolfspraul> one time we had this nasty potential sdram problem, and Sebastien got access to a high-end scope at Xilinx to track down the issue... which was great!
<wolfspraul> otherwise I take the risk on manufacturing to produce something I need to repair or even discard later
<wpwrak> yeah. it's nice to have definite answers on such issues. not things that are guesstimates based on indirect evidence, etc. a bit like they do astronomy of distant planets :)
<wolfspraul> man, I cannot believe I still haven't done another news release
<wolfspraul> urgh
<wolfspraul> MUST get it done before Monday!
<wolfspraul> (note to self)
<wpwrak> eleven days left until the date for the quarterly news :)
<wpwrak> or sooner. always better :)
mumptai joined #milkymist
cjdavis joined #milkymist
isa_ joined #milkymist
rejon joined #milkymist
r33p joined #milkymist
azonenberg joined #milkymist
isa_ joined #milkymist
rejon joined #milkymist
mumptai joined #milkymist
sb0 joined #milkymist
antgreen joined #milkymist
r33p joined #milkymist
aeris- joined #milkymist
gbraad_ joined #milkymist
kilae joined #milkymist
mumptai joined #milkymist
DJTachyon joined #milkymist
stekern joined #milkymist
DJTachyon joined #milkymist
r33p joined #milkymist
sb0 joined #milkymist
* kristianpaul click
<kristianpaul> sb0: is not what you do with hpdmc ?
<sb0> no, hpdmc is in-order
<sb0> with fast page mode and pipelining
<sb0> I don't know if the article says it (haven't finished reading yet) but OOO controllers have a latency penalty
<sb0> if the cores using the DRAM can't maintain performance in the presence of the extra latency, the benefit of OOO diminishes
<sb0> but I think the future is OOO and prefetching. especially if we use DDR3 (or more) someday.
<sb0> well it just says: "However, since reordering improves efficiency and therefore reduces memory controller occupancy, the result is to improve average read latency.". I'm not sure the case is so clear-cut, especially with DDR1
<wpwrak> ah, what would be needed to use DDR2/3 in M1 ? could pin-compatible chips plus a SoC change be enuogh ? or would it be a more complex redesign ?
<kristianpaul> oh, FEL was removed from F16, sb0 ?
<sb0> way more complex redesign
<sb0> different voltage, different package, different pinout, different timings
<sb0> DDR 2/3 is BGA
<kristianpaul> but DDD3 with a 80Mhz worth ?
<wpwrak> ah, pity. it's never easy, isn't it ? :(
<kristianpaul> 80mhz soc*
<sb0> we'd use a clock multiplier
<sb0> and probably the memory controller would generate several DRAM commands in one cycle at 80MHz that would then be serialized and sent into the DRAM in 10 cycles at 800MHz
<kristianpaul> :-)
<sb0> wpwrak, I think there is already a 4x increase in memory bandwidth possible with the current system
<sb0> we can use a 2x clock multiplier and serdes
<sb0> and get another 2x increase with OOO and prefetching
<wpwrak> oh, wow
<wpwrak> 1080p and deep color, here we come ! ;-)
<sb0> btw, the TMU already has an experimental prefetch system in SoC head
<sb0> 'experimental' = it works, but the performance boost isn't so high
<sb0> some 30% with the current memory system
<sb0> if you want to know how it's done, it's explained here: http://www.graphics.stanford.edu/papers/texture_prefetch/texture_prefetch_down.pdf
<sb0> without the reorder buffer of course, there's instead a simple register that holds one transaction
<sb0> that's actually one big limiting factor
<sb0> s/register/FIFO
<wpwrak> hmm. something is still screwy with USB timing. that MS mouse comes and goes. make a small tweak and it dies. make another tweak and it works again.
<sb0> full HD would need more unfortunately, especially if we use 10:10:10 colors (which I think is the right thing to do, with more potential than resolution)
<sb0> but we can still scale on the fly (-:C
<sb0> (for the GUI it should be OK though)
<wpwrak> hm yes, 4x even isn't quite enough for 1024x768 (2.56x) if the pixel size doubles too
<wpwrak> well, you mentioned once that 800x600 would work. so maybe there is enough room :)
<sb0> the internal resolution is 512x512
<sb0> that's the size of the renderer's internal texture, which is then scaled to 640x480 (or 800x600)
<sb0> this works
<sb0> switching to 1024x768, I also increased the internal buffer to 1024x1024. this creates a lot of slowdown.
<wpwrak> ah, i see. 1024x512 ?
<wpwrak> 4:3 is dying anyway :)
<sb0> wpwrak, what clock recovery algorithm would you recommend?
<sb0> right now it's resyncing at every transition
<sb0> there's a counter running at 48MHz, and a transition resets it
<sb0> note that the 48MHz clock is not synchronized to the 12MHz transmission clock from the device
<wpwrak> hmm, maye some deglitching could help ? (if that's the issue)
<wpwrak> 1:4 may also be a bit low. 1:8 may give more margin of error.
<wpwrak> s/of/for/
<wpwrak> at which phase offset to you sample ? +1 cycle ?
<wpwrak> s/to/do/
<wpwrak> grmbl. can't type.
<sb0> 96MHz is more difficult on this slow FPGA
<sb0> would be easier if we had a virtex or such
<sb0> :P
<wpwrak> ;-)
<sb0> 2nd cycle, just in the middle
<sb0> yes, we can try deglitching ...
<wpwrak> i wonder if the problem could be with SYNC. it seems that the first edge of SYNC can be a bit slower (?) than the rest
<wpwrak> at least i've seen something like this mentioned
<sb0> meh, debian is no longer shipping gcc-avr?
<sb0> wtf
<kristianpaul> swtich to sid :-)
<kristianpaul> or stay in stable :-)
<sb0> debian stable is for meter-long bearded sysadmins
<kristianpaul> or people with poor-low bandwitch ;-) (me)
<wpwrak> sb0: wrong. it's for their grandparents ;-)
<wpwrak> sb0: do you have full-speed HID devices at home to test things with ?
<lars_> i use debian stable
<lars_> at least on the hardware where it still works
<sb0> no, but I have that LV3
<sb0> I didn't know full speed HID devices existed. but why make it simple ...
<sb0> anyway, I'm still in Norway, back tomorrow
<sb0> wpwrak, btw what is the use case for the software controlled USB power switch? heavy-handed debugging a la norruption?
<wpwrak> that, power-cycling devices that don't respond to any nicer form of reset, turn off things when not needed or when causing trouble, etc.
<wpwrak> being able to power-cycle is also nice for remote debugging
<wpwrak> beats shipping USB devices around the globe ;-)
<sb0> i see...
<sb0> let's add it then
<sb0> should the switch controller from navre or lm32?
<sb0> I can easily add two outputs to the existing GPIO controller in charge of the LEDs
<wpwrak> can control pass from one to the other with a soc update ? or are there routing restrictions that get i the way ?
<sb0> in a first version, the BIOS would them on and we can forget about them for a while
<wpwrak> sure. easy does it :)
<sb0> there are no routing restrictions, especially for such low speed signals
<wpwrak> you were concerned about glitches. at what time scales would they be ? < 1 us ? or longer ?
<wpwrak> perfect. that's what i thought
<sb0> I don't really mean a 'glitch' in the strict sense of the term - in fact, there shouldn't be any
<sb0> what happens though is that there are weak pull ups when the FPGA is unconfigured
<sb0> which happens at power up and when switching between SoC and standby bitstreams
<sb0> those have to be taken into account
<sb0> I don't like the idea of switching power to the USB device for < 1s until the FPGA is configured and then switch it off
<wpwrak> okay, that shoudn't be too critical. we can just default to vusb off.
<sb0> which would happen at power up if the pull up is not neutralized (assuming the switch control is active high)
<wpwrak> we can choose between active low or high
<wpwrak> we just have to tell adam that it's no longer up to his coin toss ;-)
<sb0> ok, if we choose active high, then we don't need to neutralize the FPGA pull ups
<sb0> and the USB devices will be off while the FPGA is reconfigured
<wpwrak> yup. and if the pull-ups are too weak, we can easily help them.
<wpwrak> or if they're likely to transition into Z
<sb0> we can then drive those pins high in the standby bitstream to keep USB off
<sb0> and when the SoC is loaded, the GPIO defaults to 0 and turns USB on, so we have nothing to do
<wpwrak> sounds good to me
<sb0> s/choose active high/choose active low
<sb0> of course
<wpwrak> err, yes :)
<sb0> I think the FPGA pull ups are strong enough ... is that a CMOS input on the switch?
<sb0> I don't remember the value from the datasheet, but as you can see they let pass a current sufficient to light the LEDs noticeably
<wpwrak> 1 uA leakage (max)
<sb0> yes, no problem then
<wpwrak> yeah. that'll be plenty :)
<wpwrak> and now, lunch ...
<sb0> bon appetit
<wpwrak> merci !
<sb0> ah, seems you fixed the DATAx mismatch bug I sometimes experienced... thanks :)
<wpwrak> heh, it was fun to see with just how many protocol violations one can almost get away :) well, i probably added a few of my own, so the fun probably isn't quite over yet ...
<sb0> hmm... your first series of patches doesn't apply cleanly
<sb0> do you still use the trigger code?
<sb0> ah, seems you didn't merge my new SOF/keepalive generation code
<wpwrak> no, i have that
<wpwrak> maybe you tripped over 4k vs. 8k ?
<wpwrak> i branched off "USB: send SOFs and keepalives on both ports and immediately after reset"
<wpwrak> commit f6c7474ae3b181157d8950e25c4705d53d9ae9c1
<sb0> no
<sb0> is there a debug mode for patch? it rejects a hunk that looks totally OK for me
<wpwrak> heh ;-)
<sb0> it just says '1 out of 1 hunk FAILED'... no way to know more?
<wpwrak> maybe -l helps ? (ignore whitespace) but i usually just apply it manually if patch starts hallucinating
<sb0> ah, yes, -l helps
<wpwrak> mystery difference :)
<wpwrak> maybe we'll have a trailing blank more or less now :)
<wpwrak> ah, there are some tailing tabs. maybe that's it
<wpwrak> tRailing
<GitHub15> [milkymist] sbourdeauducq pushed 11 new commits to master: http://git.io/_DfZ1g
<GitHub15> [milkymist/master] softusb: 4 kB hack - Werner Almesberger
<GitHub15> [milkymist/master] softusb: use OE# of port A for trigger - Werner Almesberger
<GitHub15> [milkymist/master] softusb: send SETUP and DATA0 back-to-back - Werner Almesberger
<GitHub5> [milkymist] sbourdeauducq pushed 6 new commits to master: http://git.io/vOd9Rg
<GitHub5> [milkymist/master] softusb: partially unroll usb_in - Werner Almesberger
<GitHub5> [milkymist/master] softusb: send ACKs from dedicated inline function - Werner Almesberger
<GitHub5> [milkymist/master] softusb: fail garbled packets fatally again - Werner Almesberger
<sb0> wpwrak, all merged, thanks a lot!
<sb0> if someone wants high speed, that could be useful... I'm not sure if and how the fpga could do clock recovery at 480 mpbs
<lars_> mwalle: there seems to be a problem with the new uart linux code. milkymist_uart_tx_char is sometimes called although the uart tx path is still busy
<lars_> my fixis to check whether THRE is set, and if not leave the routine right away and wait for the next interrupt
r33p joined #milkymist
sh4rm4 joined #milkymist
<wpwrak> sb0: (merged) thanks !