#qi-hardware on 2011-10-21 — irc logs at freenode.irclog.whitequark.org

01:10 <wolfspraul> wpwrak: looking at the slashdot post, the # of comments does show that people there don't care

02:40 <wpwrak> wolfspraul: or maybe they just couldn't think of anything nasty to say :)

02:44 <wolfspraul> wpwrak: are your nor corruption tests still running?

02:46 <wpwrak> oh yes, very much. i now have a run with a success period of ~10500 cycles. but that's the same hardware that before failed on average every 600 cycles.

02:46 <wolfspraul> hmm

02:47 <wpwrak> yeah, it doesn't quite make sense. this brings back the theory of an external factor

02:47 <wpwrak> the weather is getting warmer now. so maybe it's really just that.

02:49 <wpwrak> so the next step should be to move my M1 to a cooler place. didn't have time for this in the last days (had to do a bit of politicking :)

02:53 <wolfspraul> hmm

02:53 <wolfspraul> yes I understand

02:53 <wolfspraul> just a little worried that we get stuck

02:54 <wolfspraul> so I also think maybe we just leap ahead and apply everything we plan for rc4 anyway already, and then try to reproduce the corruption again

02:54 <wolfspraul> that will not satisfy the enlightened scientist inside you though :-)

02:54 <wpwrak> i think the general plan doesn't change. the pull-ups definitely sound right, even if they don't affect this case of NOR corruption.

02:54 <wpwrak> oh, i think the changes we've discussed all make sense.

02:55 <wpwrak> what i'm looking for is an indicator for whether we;ve actually killed the bug

02:56 <wpwrak> and that's the problem with probabilities that jump around wildly. if the bug is simply not detectable on the day or week you test, you haven't gained anything.

02:57 <wolfspraul> yes I understand

02:58 <wolfspraul> but like I said - I am worried that you are stuck trying to make a perfect logical sequence

02:58 <wpwrak> but i'm all for designing in the pull-ups. i don't see any design risk there. the improved reset circuit should be tested first. but it's almost certainly okay, too.

02:58 <wolfspraul> of course it would be better to first understand the true root cause before fixing it

02:58 <wolfspraul> rather than randomly and blindly improving this and that, and then maybe just being unable to reproduce the bug, instead of having fixed it

02:59 <wpwrak> oh, the thing is just running while i do something else :) i check every now and then whether it has found any new troubles, but that's all

02:59 <wpwrak> that's the beauty of automated tests :)

02:59 <wolfspraul> unfortunately it feeds your perfectionism well too :-)

02:59 <wpwrak> ;-)

03:00 <wolfspraul> since it's automated, we may as well accumulate a better 'statistical data base', right? :-)

03:00 <wpwrak> exactly :)

03:00 <wpwrak> but i'll actually be happy with one the does give me the same answer twice in a row. it hasn't done that yet. this is what worries me.

03:01 <wolfspraul> time passes and we have each block of time only once

03:01 <wolfspraul> so there may also be some value in working on this backwards, i.e. applying all fixes, and then spending the same amount of time on trying to reproduce the problem again

03:02 <wolfspraul> even if we lost the reasoning path somewhere in the middle...

03:02 <wolfspraul> but ok, let's see :-)

03:02 <wolfspraul> Adam is slowly but surely getting closer to rc4 work

03:02 <wolfspraul> worst case I need to skip over some of your missing statistical data and make decisions based on gut feeling :-)

03:02 <wpwrak> yes, that's a possibility. but then, if external factors can just hide the problem completely, your tests are meaningless if you don't understand the external factors

03:02 <wolfspraul> the alter ego of great statistics

03:03 <wolfspraul> not entirely meaningless

03:03 <wolfspraul> since we also think when we try to reproduce

03:03 <wpwrak> in my current test i have at least the certainty that the thing must fail

03:03 <wolfspraul> sure, all good

03:03 <wolfspraul> I got it

03:03 <wolfspraul> anxiously awaiting the results :_)

03:03 <wolfspraul> :-)

03:04 <wpwrak> i'll post a summary of what i have so far on the weekend. then you'll see what a mess it is.

03:04 <wolfspraul> I can imagine

03:05 <wolfspraul> that's why I suggest, maybe we are better off rebooting, replying all planned fixes, and then focusing on reproducing the problem again

03:05 <wolfspraul> s/replying/applying/

03:05 <wpwrak> tomorrow i'll try to finally get my labsw bom done. it keeps on jumping from mondays to fridays and then to mondays again (digi-key deadlines)

03:05 <wolfspraul> there is definitely a serious root bug somewhere there

03:06 <wolfspraul> very serious

03:06 <wolfspraul> if a unit fails like this for a real user, that's bad

03:06 <wolfspraul> and if anything your whole pile of test data shows that there is something serious somewhere

03:06 <wpwrak> for rc4, i think it's good to plan to have those changes. i think there's no need to wait for statistical subtleties.

03:07 <wpwrak> what the statistics can contribute is a test that confirms that the problem is indeed gone - or not

03:07 <wolfspraul> you think those planned changes (including gate & 4.4v reset ic) will make it go away even without locking?

03:07 <wpwrak> that's what i don't know yet :)

03:07 <wolfspraul> let me guess your answer :-)

03:07 <wolfspraul> "I don't know yet"

03:07 <wpwrak> ;-))

03:08 <wpwrak> but i do know that i like these changes. they improve overall design stability

03:09 <wpwrak> oh, and have you considered the possibility of using a different NOR chip ? some have other locking strategies that may provide much better protection

03:09 <wpwrak> ... while keeping the other parameters the same, i hope

03:10 <wpwrak> the one we have is just a particularly bad fit. many others just come out of reset locked. with one of these, we may never even have known we had the bug ;-)

03:13 <wolfspraul> hmm

03:13 <wolfspraul> which one specifically do you have in mind?

03:13 <wolfspraul> and if we change - can we keep one binary to support both?

03:15 <wolfspraul> fiddling with this sounds risky but I do want to make the very best rc4 we can, and we may just need to take some risks here and there

03:15 <wolfspraul> here's the list http://www.micron.com/partscatalog.html?categoryPath=products/nor_flash/parallel_nor_flash

03:15 <wolfspraul> now we have JS28F256J3F105A

03:16 <wolfspraul> "JS28F256J3F105A 256Mb Production x8/x16 2.7V-3.6V TSOP 56-pin 105ns Yes Uniform -40C to +85C Embedded J3 Tray"

03:16 <wolfspraul> yes I think that's the current one

03:17 <wpwrak> maybe one from the M28W series. these don't have persistent locks

03:18 <wpwrak> instead, there's a "soft" block lock, by default, it is set. you can remove it and set it again as many times as you like.

03:18 <wpwrak> plus, there's a "lock-down" that can be made one-way per session (session = time between resets)

03:19 <wpwrak> so, for example, the boot code or even standby could lock-down things we really really never want to change

03:20 <wpwrak> the code for writing would differ a bit between 28F and 28W. but that could be an isolated change.

03:20 <wpwrak> for all i know, 28W may even be cheaper ;-) let's see ...

03:22 <wpwrak> hmm, size may be an issue ... at least at digi-key. let's see ...

03:24 <wolfspraul> that switch sounds like a lot of trouble

03:24 <wolfspraul> "differ a bit" "isolated change"

03:24 <wolfspraul> that sounds like we will struggle to get this right for 2 years

03:25 <wpwrak> why the sudden pessimism ? :)

03:25 <wolfspraul> nah

03:25 <wolfspraul> just realistic

03:26 <wolfspraul> those 'small details' are nasty

03:26 <wolfspraul> plus we seem to be on track to making our current chip very robust

03:26 <wolfspraul> then you would wonder what you actually get with a switch

03:27 <wolfspraul> cheaper is nice, bigger. you may even start the serial flash discussion again :-)

03:27 <wolfspraul> then maybe it's better to just focus on making our current chip work better (fixing bugs), and otherwise leave things as they are

03:31 <wpwrak> something like this guy perhaps http://www.micron.com/products/ProductDetails.html?product=products/nor_flash/parallel_nor_flash/M29W256GL70N6F

03:31 <wpwrak> would need to compare the data sheet bit by bit, though. flash is tricky :)

03:31 <wpwrak> oh, i'm all for serial flash. thanks for mentioning it ;-)

03:32 <wpwrak> if all the smarts were on the uSD card, we wouldn't have this flash corruption discussion :) instead, we'd just recommend carrying a backup

03:33 <wolfspraul> now that again :-)

03:33 <wolfspraul> from my experience, WHATEVER path you choose you will run into difficult problems

03:33 <wolfspraul> the proof is not in how much of a genius you are in choosing the right path, but what you do once you hit the first difficult obstacle

03:34 <wolfspraul> so there is no way I will jump to the "serial flash" promised land now

03:34 <wolfspraul> or even the "M28W" promised land

03:34 <wolfspraul> :-)

03:35 <wolfspraul> if there is a better nor chip, we should compare. but my #1 question would be whether the switch causes a need for software changes, and our ability to provide one set of binaries for all m1 boards

03:35 <wolfspraul> that needs to be weighed against the pros of the chip

03:36 <wolfspraul> maybe better to focus on making the current one rock solid

03:36 <wolfspraul> any new chip would have new nasty surprises, guaranteed

03:36 <wpwrak> sw change: yes. same binary: yes.

03:36 <wpwrak> but yes, there's always a design risk

05:39 <kristianpaul> wolfspraul: but because namuru and milkymist uses different clocks, i'm having some problems reading memory content from namuru, so Artyom suguested i switch all the whole soc to use milkymist soc, so basically i get rid off cores i dont need and focus namuru

05:39 <wolfspraul> ok

05:39 <kristianpaul> wolfspraul: yes, better documentation about how to connect stuff to milkymist is very important too indeed

05:39 <wolfspraul> definitely

05:50 <kristianpaul> as when Artyom asked me about milkymist and how could be helpfull for him..

05:57 <kristianpaul> also yes i hope stickers and other stuff arive soon, workshop is already pointed here too http://www.comunlab.cc/

05:57 <kristianpaul> as liure, but thats me :)

06:08 <kristianpaul> will be something small, more focused on play with the M1, video in music and some patches, hoping i can guide and people also will get something more elaborate,

06:08 <kristianpaul> and in the night i'll see how setup M1 somweherÂ Â to ambience music performance

06:26 <wolfspraul> nice that sounds good!

06:26 <wolfspraul> I'm not sure whether the stickers arrive in time, but it's moving

06:26 <wolfspraul> we need to know about events in advance

06:27 <wolfspraul> cannot always fix bad planning with wasting money on overnight couriers of cheap little stickers :-)

06:29 <kristianpaul> sure not :)

07:33 <johnnyhah> Does someone familiar with llhdl?

07:34 <wolfspraul> johnnyhah: not really yet, I guess there is only 1 person on earth right now truly 'familiar' with it and that's Sebastien (nick lekernel)

07:38 <johnnyhah> thx,but how can i contact sebastien(or lekernel)?

07:48 <wolfspraul> johnnyhah: his primary channel is #milkymist on freenode, and/or also here in #qi-hardware

07:51 <wolfspraul> johnnyhah: so you are pretty close, just wait in a few hours he should get up and respond

07:51 <wolfspraul> what brings you to qi & llhdl ?

07:51 <wolfspraul> or milkymist

07:51 <wolfspraul> can you tell us a bit more about yourself?

07:57 <johnnyhah> i am a graduate and want to know how can convert netlist to verilog or vhdl

08:02 <wolfspraul> which school?

10:48 <orsonzhai> hi

10:52 <kyak> hi

10:58 <wolfspraul> hi

14:54 <B_Lizzard> I've been away for long, trying to catch up.

14:56 <B_Lizzard> larsc, what's the latest stable kernel?

14:56 <B_Lizzard> Also, is suspend working OK now?

15:14 <xiangfu> B_Lizzard, we are plan to use linux 3.0 on next nanonote release. :)

16:34 <kristianpaul> B_Lizzard: 3.0 is last stable i remenber

16:34 <kristianpaul> suspend still same i bet :)

16:34 <B_Lizzard> Hmmm, I'll have to look into it.

16:40 <whitequark> wpwrak: can you take a look at a synchro issue? http://imgur.com/a/YMP9d

16:40 <whitequark> not sure what may be the cause of it

16:44 <whitequark> the shifted lines are consistent and stay at the same place

16:44 <whitequark> may it be a PLL mislock?

16:53 <wpwrak> you should be able to see it on hsync. actually, most scopes have a tv mode that may be useful for this. i've never tried that, though

16:54 <whitequark> tv mode?

16:54 <wpwrak> what you certainly can do is trigger on vsync, then walk through the hsync. a bit messy, but should work

16:54 <wpwrak> trigger on tv line and such. not sure if it works for vga or only for composite

16:54 <wpwrak> DocScrutinizer: probably knows such things ;-)

16:55 <whitequark> ah yes. afaik it's composite only

20:51 <DocScrutinizer> err, the scpes I know all have a separate trigger input (optional)

21:03 <DocScrutinizer> whitequark: what I see is a one-pixel-off issue it seems. Might be caused by driving the display via a digital interface with pixel clock, vsyn, hsync, the pixel clock not in phase with hsync, and some noise on on either of both lines as well. The screenshots don't really allow as much amalysis, but to me it seems the one-pixel-off is not constant along one horizontal scanline, i.e. there's no issue in a line on left side of screen

21:03 <DocScrutinizer> while right side same line shows that offset (or other way round). If that'S actually correct, then it's clearly noise on pixel clock and pixel clock not in phase with edges on data lines, so display is randomly picking one time the "old" pixel data while next time picking the "new" (later) pixel data

21:07 <DocScrutinizer> placing sth like a 10pF..1nF load capacitor to gnd on pixelclock should create visible massive changes in the effect (you quite frequently see such timing-fixer capacitors on video [and other] clock lines)

21:12 <DocScrutinizer> on pixel clocks that are operating always on rising (or always on falling) edge, this effect commonly is caused by inverted polarity of clock signal: active edge is meant to occur exactly in the middle of the data-steady time window, while inactive edge can occur roughly around the time when level transitions on data lines for new pixel data take place. If active and inactive edge are swapped due to some inverter in the clock line,

21:12 <DocScrutinizer> you'll find the display pick old or new data from data lines on a random basis

21:13 <DocScrutinizer> propagation delay differences between clock and data lines of of ~1/2 pixel time period will have similar effect

21:18 <DocScrutinizer> oops I forgot to take into account your point that the lines with offset are always same place. Might indicate the noise on clock line is from crosstalk from video RAM addr lines. Or your video RAM is actually defect, such a internal short between cells is a quite common failure pattern

21:23 <DocScrutinizer> the manga picture is a poor test pattern to really investigate it. You should use Red-BlacK / Blue-BlacK / Green-BlacK chessboard patterns and vertical line patterns, also vert.line patterns of several clearly distinguishable colors and 1 pixel line width

21:27 <DocScrutinizer> then mark the error spots on screen (with a marker pen, maybe on transparent foil attached to screen with sticky), and see if they stay permanent errors no matter what's the displayed image, and maybe even try to change timing of whole video a little and see if the spots move or vanish. If they don't then it's most likely a defect video RAM

21:30 <DocScrutinizer> remembers the funny effects on video output he seen on devices that had some short between two addr lines of video RAM

21:31 <DocScrutinizer> a short between addr and data line (or a capacitor mixing both when they are multiplexed on same bus) is sometimes even more funny

22:43 <wpwrak> DocScrutinizer: ah yes, inverted polarity would also be a candidate. do you know that we had this once at openmoko ? and EE had been complaining to the LCD maker forever because of the lousy yield ;-))

23:20 <DocScrutinizer> must've been pre-joerg times

23:26 <DocScrutinizer> (vert.line patterns of several clearly distinguishable colors) like 9 colors, one line of each, then repeating the pattern. Don't use 2^n number of colors

23:28 <DocScrutinizer> as odds are the commonly seen "mirroring" as a symptom of video RAM defects tends to skew with a natural base-2 magnitude

23:29 <DocScrutinizer> so if you use 8 colors, then repeat, you might not notice anything odd even when video ram copies a whole set of 8 pixels to next 8 pixel field

23:30 <DocScrutinizer> or to any aligned 8 pixels

23:31 <DocScrutinizer> I.E. with a 8 line pattern test image you won't see problems on any addr lines other than A0, A1, A2

23:31 <DocScrutinizer> s/see/notice/

23:43 <wpwrak> (pre-joerg) yeah, that didn't happen on your watch :) it was a software problem anyway, but of course, hardware got the beating