<wolfspraul> wpwrak: looking at the slashdot post, the # of comments does show that people there don't care
<wpwrak> wolfspraul: or maybe they just couldn't think of anything nasty to say :)
<wolfspraul> wpwrak: are your nor corruption tests still running?
<wpwrak> oh yes, very much. i now have a run with a success period of ~10500 cycles. but that's the same hardware that before failed on average every 600 cycles.
<wolfspraul> hmm
<wpwrak> yeah, it doesn't quite make sense. this brings back the theory of an external factor
<wpwrak> the weather is getting warmer now. so maybe it's really just that.
<wpwrak> so the next step should be to move my M1 to a cooler place. didn't have time for this in the last days (had to do a bit of politicking :)
<wolfspraul> hmm
<wolfspraul> yes I understand
<wolfspraul> just a little worried that we get stuck
<wolfspraul> so I also think maybe we just leap ahead and apply everything we plan for rc4 anyway already, and then try to reproduce the corruption again
<wolfspraul> that will not satisfy the enlightened scientist inside you though :-)
<wpwrak> i think the general plan doesn't change. the pull-ups definitely sound right, even if they don't affect this case of NOR corruption.
<wpwrak> oh, i think the changes we've discussed all make sense.
<wpwrak> what i'm looking for is an indicator for whether we;ve actually killed the bug
<wpwrak> and that's the problem with probabilities that jump around wildly. if the bug is simply not detectable on the day or week you test, you haven't gained anything.
<wolfspraul> yes I understand
<wolfspraul> but like I said - I am worried that you are stuck trying to make a perfect logical sequence
<wpwrak> but i'm all for designing in the pull-ups. i don't see any design risk there. the improved reset circuit should be tested first. but it's almost certainly okay, too.
<wolfspraul> of course it would be better to first understand the true root cause before fixing it
<wolfspraul> rather than randomly and blindly improving this and that, and then maybe just being unable to reproduce the bug, instead of having fixed it
<wpwrak> oh, the thing is just running while i do something else :) i check every now and then whether it has found any new troubles, but that's all
<wpwrak> that's the beauty of automated tests :)
<wolfspraul> unfortunately it feeds your perfectionism well too :-)
<wpwrak> ;-)
<wolfspraul> since it's automated, we may as well accumulate a better 'statistical data base', right? :-)
<wpwrak> exactly :)
<wpwrak> but i'll actually be happy with one the does give me the same answer twice in a row. it hasn't done that yet. this is what worries me.
<wolfspraul> time passes and we have each block of time only once
<wolfspraul> so there may also be some value in working on this backwards, i.e. applying all fixes, and then spending the same amount of time on trying to reproduce the problem again
<wolfspraul> even if we lost the reasoning path somewhere in the middle...
<wolfspraul> but ok, let's see :-)
<wolfspraul> Adam is slowly but surely getting closer to rc4 work
<wolfspraul> worst case I need to skip over some of your missing statistical data and make decisions based on gut feeling :-)
<wpwrak> yes, that's a possibility. but then, if external factors can just hide the problem completely, your tests are meaningless if you don't understand the external factors
<wolfspraul> the alter ego of great statistics
<wolfspraul> not entirely meaningless
<wolfspraul> since we also think when we try to reproduce
<wpwrak> in my current test i have at least the certainty that the thing must fail
<wolfspraul> sure, all good
<wolfspraul> I got it
<wolfspraul> anxiously awaiting the results :_)
<wolfspraul> :-)
<wpwrak> i'll post a summary of what i have so far on the weekend. then you'll see what a mess it is.
<wolfspraul> I can imagine
<wolfspraul> that's why I suggest, maybe we are better off rebooting, replying all planned fixes, and then focusing on reproducing the problem again
<wolfspraul> s/replying/applying/
<wpwrak> tomorrow i'll try to finally get my labsw bom done. it keeps on jumping from mondays to fridays and then to mondays again (digi-key deadlines)
<wolfspraul> there is definitely a serious root bug somewhere there
<wolfspraul> very serious
<wolfspraul> if a unit fails like this for a real user, that's bad
<wolfspraul> and if anything your whole pile of test data shows that there is something serious somewhere
<wpwrak> for rc4, i think it's good to plan to have those changes. i think there's no need to wait for statistical subtleties.
<wpwrak> what the statistics can contribute is a test that confirms that the problem is indeed gone - or not
<wolfspraul> you think those planned changes (including gate & 4.4v reset ic) will make it go away even without locking?
<wpwrak> that's what i don't know yet :)
<wolfspraul> let me guess your answer :-)
<wolfspraul> "I don't know yet"
<wpwrak> ;-))
<wpwrak> but i do know that i like these changes. they improve overall design stability
<wpwrak> oh, and have you considered the possibility of using a different NOR chip ? some have other locking strategies that may provide much better protection
<wpwrak> ... while keeping the other parameters the same, i hope
<wpwrak> the one we have is just a particularly bad fit. many others just come out of reset locked. with one of these, we may never even have known we had the bug ;-)
<wolfspraul> hmm
<wolfspraul> which one specifically do you have in mind?
<wolfspraul> and if we change - can we keep one binary to support both?
<wolfspraul> fiddling with this sounds risky but I do want to make the very best rc4 we can, and we may just need to take some risks here and there
<wolfspraul> here's the list http://www.micron.com/partscatalog.html?categoryPath=products/nor_flash/parallel_nor_flash
<wolfspraul> now we have JS28F256J3F105A
<wolfspraul> "JS28F256J3F105A 256Mb Production x8/x16 2.7V-3.6V TSOP 56-pin 105ns Yes Uniform -40C to +85C Embedded J3 Tray"
<wolfspraul> yes I think that's the current one
<wpwrak> maybe one from the M28W series. these don't have persistent locks
<wpwrak> instead, there's a "soft" block lock, by default, it is set. you can remove it and set it again as many times as you like.
<wpwrak> plus, there's a "lock-down" that can be made one-way per session (session = time between resets)
<wpwrak> so, for example, the boot code or even standby could lock-down things we really really never want to change
<wpwrak> the code for writing would differ a bit between 28F and 28W. but that could be an isolated change.
<wpwrak> for all i know, 28W may even be cheaper ;-) let's see ...
<wpwrak> hmm, size may be an issue ... at least at digi-key. let's see ...
<wolfspraul> that switch sounds like a lot of trouble
<wolfspraul> "differ a bit" "isolated change"
<wolfspraul> that sounds like we will struggle to get this right for 2 years
<wpwrak> why the sudden pessimism ? :)
<wolfspraul> nah
<wolfspraul> just realistic
<wolfspraul> those 'small details' are nasty
<wolfspraul> plus we seem to be on track to making our current chip very robust
<wolfspraul> then you would wonder what you actually get with a switch
<wolfspraul> cheaper is nice, bigger. you may even start the serial flash discussion again :-)
<wolfspraul> then maybe it's better to just focus on making our current chip work better (fixing bugs), and otherwise leave things as they are
<wpwrak> something like this guy perhaps http://www.micron.com/products/ProductDetails.html?product=products/nor_flash/parallel_nor_flash/M29W256GL70N6F
<wpwrak> would need to compare the data sheet bit by bit, though. flash is tricky :)
<wpwrak> oh, i'm all for serial flash. thanks for mentioning it ;-)
<wpwrak> if all the smarts were on the uSD card, we wouldn't have this flash corruption discussion :) instead, we'd just recommend carrying a backup
<wolfspraul> now that again :-)
<wolfspraul> from my experience, WHATEVER path you choose you will run into difficult problems
<wolfspraul> the proof is not in how much of a genius you are in choosing the right path, but what you do once you hit the first difficult obstacle
<wolfspraul> so there is no way I will jump to the "serial flash" promised land now
<wolfspraul> or even the "M28W" promised land
<wolfspraul> :-)
<wolfspraul> if there is a better nor chip, we should compare. but my #1 question would be whether the switch causes a need for software changes, and our ability to provide one set of binaries for all m1 boards
<wolfspraul> that needs to be weighed against the pros of the chip
<wolfspraul> maybe better to focus on making the current one rock solid
<wolfspraul> any new chip would have new nasty surprises, guaranteed
<wpwrak> sw change: yes. same binary: yes.
<wpwrak> but yes, there's always a design risk
<kristianpaul> wolfspraul: but because namuru and milkymist uses different clocks, i'm having some problems reading memory content from namuru, so Artyom suguested i switch all the whole soc to use milkymist soc, so basically i get rid off cores i dont need and focus namuru
<wolfspraul> ok
<kristianpaul> wolfspraul: yes, better documentation about how to connect stuff to milkymist is very important too indeed
<wolfspraul> definitely
<kristianpaul> as when Artyom asked me about milkymist and how could be helpfull for him..
<kristianpaul> also yes i hope stickers and other stuff arive soon, workshop is already pointed here too http://www.comunlab.cc/
<kristianpaul> as liure, but thats me :)
<kristianpaul> will be something small, more focused on play with the M1, video in music and some patches, hoping i can guide and people also will get something more elaborate,
<kristianpaul> and in the night i'll see how setup M1 somweher  to ambience music performance
<wolfspraul> nice that sounds good!
<wolfspraul> I'm not sure whether the stickers arrive in time, but it's moving
<wolfspraul> we need to know about events in advance
<wolfspraul> cannot always fix bad planning with wasting money on overnight couriers of cheap little stickers :-)
<kristianpaul> sure not :)
<johnnyhah> Does someone familiar with llhdl?
<wolfspraul> johnnyhah: not really yet, I guess there is only 1 person on earth right now truly 'familiar' with it and that's Sebastien (nick lekernel)
<johnnyhah> thx,but how can i contact sebastien(or lekernel)?
<wolfspraul> johnnyhah: his primary channel is #milkymist on freenode, and/or also here in #qi-hardware
<wolfspraul> johnnyhah: so you are pretty close, just wait in a few hours he should get up and respond
<wolfspraul> what brings you to qi & llhdl ?
<wolfspraul> or milkymist
<wolfspraul> can you tell us a bit more about yourself?
<johnnyhah> i am a graduate and want to know how can convert netlist to verilog or vhdl
<wolfspraul> which school?
<orsonzhai> hi
<kyak> hi
<wolfspraul> hi
<B_Lizzard> I've been away for long, trying to catch up.
<B_Lizzard> larsc, what's the latest stable kernel?
<B_Lizzard> Also, is suspend working OK now?
<xiangfu> B_Lizzard, we are plan to use linux 3.0 on next nanonote release. :)
<kristianpaul> B_Lizzard: 3.0 is last stable i remenber
<kristianpaul> suspend still same i bet :)
<B_Lizzard> Hmmm, I'll have to look into it.
<whitequark> wpwrak: can you take a look at a synchro issue? http://imgur.com/a/YMP9d
<whitequark> not sure what may be the cause of it
<whitequark> the shifted lines are consistent and stay at the same place
<whitequark> may it be a PLL mislock?
<wpwrak> you should be able to see it on hsync. actually, most scopes have a tv mode that may be useful for this. i've never tried that, though
<whitequark> tv mode?
<wpwrak> what you certainly can do is trigger on vsync, then walk through the hsync. a bit messy, but should work
<wpwrak> trigger on tv line and such. not sure if it works for vga or only for composite
<wpwrak> DocScrutinizer: probably knows such things ;-)
<whitequark> ah yes. afaik it's composite only
<DocScrutinizer> err, the scpes I know all have a separate trigger input (optional)
<DocScrutinizer> whitequark: what I see is a one-pixel-off issue it seems. Might be caused by driving the display via a digital interface with pixel clock, vsyn, hsync, the pixel clock not in phase with hsync, and some noise on on either of both lines as well. The screenshots don't really allow as much amalysis, but to me it seems the one-pixel-off is not constant along one horizontal scanline, i.e. there's no issue in a line on left side of screen
<DocScrutinizer> while right side same line shows that offset (or other way round). If that'S actually correct, then it's clearly noise on pixel clock and pixel clock not in phase with edges on data lines, so display is randomly picking one time the "old" pixel data while next time picking the "new" (later) pixel data
<DocScrutinizer> placing sth like a 10pF..1nF load capacitor to gnd on pixelclock should create visible massive changes in the effect (you quite frequently see such timing-fixer capacitors on video [and other] clock lines)
<DocScrutinizer> on pixel clocks that are operating always on rising (or always on falling) edge, this effect commonly is caused by inverted polarity of clock signal: active edge is meant to occur exactly in the middle of the data-steady time window, while inactive edge can occur roughly around the time when level transitions on data lines for new pixel data take place. If active and inactive edge are swapped due to some inverter in the clock line,
<DocScrutinizer> you'll find the display pick old or new data from data lines on a random basis
<DocScrutinizer> propagation delay differences between clock and data lines of of ~1/2 pixel time period will have similar effect
<DocScrutinizer> oops I forgot to take into account your point that the lines with offset are always same place. Might indicate the noise on clock line is from crosstalk from video RAM addr lines. Or your video RAM is actually defect, such a internal short between cells is a quite common failure pattern
<DocScrutinizer> the manga picture is a poor test pattern to really investigate it. You should use Red-BlacK / Blue-BlacK / Green-BlacK chessboard patterns and vertical line patterns, also vert.line patterns of several clearly distinguishable colors and 1 pixel line width
<DocScrutinizer> then mark the error spots on screen (with a marker pen, maybe on transparent foil attached to screen with sticky), and see if they stay permanent errors no matter what's the displayed image, and maybe even try to change timing of whole video a little and see if the spots move or vanish. If they don't then it's most likely a defect video RAM
<DocScrutinizer> remembers the funny effects on video output he seen on devices that had some short between two addr lines of video RAM
<DocScrutinizer> a short between addr and data line (or a capacitor mixing both when they are multiplexed on same bus) is sometimes even more funny
<wpwrak> DocScrutinizer: ah yes, inverted polarity would also be a candidate. do you know that we had this once at openmoko ? and EE had been complaining to the LCD maker forever because of the lousy yield ;-))
<DocScrutinizer> must've been pre-joerg times
<DocScrutinizer> (vert.line patterns of several clearly distinguishable colors) like 9 colors, one line of each, then repeating the pattern. Don't use 2^n number of colors
<DocScrutinizer> as odds are the commonly seen "mirroring" as a symptom of video RAM defects tends to skew with a natural base-2 magnitude
<DocScrutinizer> so if you use 8 colors, then repeat, you might not notice anything odd even when video ram copies a whole set of 8 pixels to next 8 pixel field
<DocScrutinizer> or to any aligned 8 pixels
<DocScrutinizer> I.E. with a 8 line pattern test image you won't see problems on any addr lines other than A0, A1, A2
<DocScrutinizer> s/see/notice/
<wpwrak> (pre-joerg) yeah, that didn't happen on your watch :) it was a software problem anyway, but of course, hardware got the beating