#milkymist on 2011-11-02 — irc logs at freenode.irclog.whitequark.org

00:02 <wpwrak> hmm .. "oggenc" ... let's see ...

00:04 <wpwrak> ah, but that's audio only :-(

00:07 <Thihi> .ogm is at least the filetype for an ogg movie file, I think

00:07 <Thihi> But I have no idea about encoders

00:08 <wpwrak> ah well, everyone has mplayer to see it anyway :)

00:14 <wpwrak> patch sent. upload is still running

00:17 <wpwrak> ... sync-before.mov is done, sync-after.mov at 71% ...

00:19 <wpwrak> lekernel: did you see this ?

00:19 <wpwrak> Chain_Control looks a bit suspicious. there's one definition in cpukit/score/include/rtems/score/chain.h and another one in doc/tools/bmenu/chain.h

00:20 <wpwrak> gdb tells me flickernoise uses the latter definition. i find the path name rather scary ...

00:28 <wpwrak> video upload complete

03:15 <kristianpaul> http://petalogix.com/products/plnx_zynq

03:28 <errordeveloper> kristianpaul: ?

03:29 <wolfspraul> yeah, I also wondered what that URL meant :-)

06:02 <aw_> xiangfu, one question: i took out audio codec chip and used test program to test vga, and it showed well even video source. but when I tried to boot up m1. there's no screen shows on monitor. http://dpaste.com/645906/

06:04 <aw_> xiangfu, what possible reason to let no screens after booted up? Does this reasonable in boot procedure if my audio codec unmounted on m1?

06:06 <xiangfu> aw_, probably because the audio chip not mount. the system stopped when check the audio chip.

06:06 <aw_> xiangfu, after "Unable to open audio mixer: No such device" msg, the d2/d3 is fully ON, i think it's in rendering mode, isn't it?

06:10 <xiangfu> aw_, since there is error while boot. I can not make sure if it already boot to rendering mode.

06:11 <wpwrak> when we ran into the audio grounding problem (Lxxx), lekernel mentioned - if i understood this correctly - that the codec provided a clock that was vital for the system

06:11 <xiangfu> aw_, it should be boot to rendering mode.

06:11 <wpwrak> so perhaps you're just hitting a condition where the system depends on it

06:12 <xiangfu> wpwrak, 'that the codec provided a clock that was vital for the system' oh.

06:13 <wpwrak> i'm not entirely sure i understood this correctly. but he basically suggested that, if the codec somehow crashed, the whole system would hang (which is what i had indeed observed)

06:13 <wpwrak> and of course, and absent codec can only be worse ;-)

06:14 <aw_> xiangfu wpwrak so that clock is designed from AC97_SOUT then feeding into fpga to identify and depend?

06:15 <wpwrak> no idea which clock it is and what exactly depend on it. but lekernel should wake up soonish ;-)

06:16 <aw_> wpwrak, hmm...this may really explain my condition now. since the codec was mounted before and worked well for couples minutes only last time then a very huge noise occurred then my screen went to frozen and never showed screen on monitor more.

06:18 <aw_> i doubted my reworks on codec previously. now I took out the audio codec and used test program to test vga/video source etc. all pass except audio section.

06:21 <aw_> so seems now i have no choice to go. just go remount codec again... I hope that codec itself is still good. ;-)

06:26 <xiangfu> wpwrak, those 'clock' stuff is under bitstream code right?

06:27 <wpwrak> aw_: heroic rework ;-)

06:28 <wpwrak> xiangfu: i would think so, yes. but then, i don't really know what it is or even whether i understood this correctly

06:29 <xiangfu> aw_, we trust your soldering skill, but don't trust the chip :)

06:32 <aw_> xiangfu, ha...no...Â Â consolation is not bad. but fact is I reworked two m1, reward: 1 done 1 failed, so poor skill though. ;-)

06:34 <wpwrak> probably not a bad result - you're moving from relatively simple and well-understood changes into increasingly difficult territory

06:35 <wpwrak> also, the board you're working on may have had other rework in the past. so the potential errors add up ...

06:37 <wpwrak> hmm. setting a conditional breakpoint on _CORE_message_queue_Seize seems to be a bit too much work for this poor system

06:37 <aw_> wpwrak, yes. taking risk to potential err added up already.

06:43 <wpwrak> it's funny. the system does seem to advance. but very very slowly. the queue grows by about ten messages per minute :)

06:53 <wpwrak> hmm, or less :) amazingly slow. but it keeps on growing. just a question of time until it hits 64. and then ....

08:19 <aw_> xiangfu, it acts as rendering mode while detected successfully an audio codec. Your guess was right. Hope this second board can still rendering well for more couple hours after temperature goes up.

08:21 <wolfspraul> aw_: cool, so everything works *right now* on the second upgraded rc2 board as well?

08:24 <aw_> wolfspraul, needs run rendering for more hours, yes they works well after test program for all items. Don't know what exactly happened in my previous work. It was frozen and never showed up in the past. ;-)

08:25 <wolfspraul> sure, I understand

08:25 <wolfspraul> but that's a good first step

08:25 <wolfspraul> of course let's do more testing now, let it run for 24hours, then wait a day, then again for a few hours, etc.

08:25 <wolfspraul> we are in no rush with this

08:27 <aw_> I'll append this board into here for records: http://en.qi-hardware.com/wiki/Milkymist_One_run_3_schedule#Upgrade_h.2Fw_RC2_to_RC3

08:28 <wolfspraul> ok, good

08:29 <wolfspraul> how are the rc3 reworks going?

08:30 <xiangfu> aw_, great. also thanks to Werner. :-)

08:31 <aw_> still have 14 remaings in rc3. meanwhile I started to gather boards which will go for x-ray.

08:32 <aw_> 14 boards: 1) midi 0x46 2) nor 0x55 / 0x67 / 0x6d / 0x6f 3) no boot up 0x32 / 0x70 4) video i2c 0x4d 5) dimly lit 0x3a

08:33 <aw_> xiangfu, oh..yes thanks to Werner too. ;-)

08:34 <aw_> 6) short 0x57 / 0x59 / 0x5d / 0x62 / 0x70

08:37 <aw_> fixed one short board with C104/0805 which surrounding D16/R30 area. that must be caused by carelessness while first replaced R30 in factory.

08:38 <aw_> I keep checking short boards now.

08:39 <wolfspraul> ok, so 5 completely fixed so far?

08:39 <wolfspraul> remaining boards down to 14?

08:39 <aw_> yes. down to 14

08:40 <aw_> (x-ray condidates) 0x32, 0x3a, 0x46, 0x4d, 0x70.....will gather more I think.

08:45 <aw_> 0x32 / 0x70 are the BTN2 (bga ball AA4) with keeping high voltage after power on which must be 0. As a rough guess: the AA4 is nearby the area of D16 and R30. (i.e. at the corner of fpga), so this may completely damage by first heat air in factory already to replace R30.

08:51 <wolfspraul> ok, good news still

08:51 <wolfspraul> so the yield is 76/90 now, 14 to analyze

08:51 <wolfspraul> and those 14 are very important as preparation for rc4

08:54 <aw_> 0x46, midi_rx (ball AB21). 0x4d, videoin_sda (ball AB17) those are abnormal level.

08:58 <aw_> 0x32 / 0x70 may also be involved my several reworks fix2/fix2b (potential errors added up in the past)

10:54 <GitHub100> [flickernoise] sbourdeauducq pushed 3 new commits to master: http://git.io/SjK5zQ

10:54 <GitHub100> [flickernoise/master] input.c: synchronize with MIDI status and ignore real-time messages - Werner Almesberger

10:54 <GitHub100> [flickernoise/master] input: remove MIDI timeout - Sebastien Bourdeauducq

10:54 <GitHub100> [flickernoise/master] New X2 patch from Werner - Sebastien Bourdeauducq

10:54 <GitHub17> [flickernoise] sbourdeauducq pushed 2 new commits to stable_1.0: http://git.io/7OYdsQ

10:54 <GitHub17> [flickernoise/stable_1.0] input.c: synchronize with MIDI status and ignore real-time messages - Werner Almesberger

10:54 <GitHub17> [flickernoise/stable_1.0] input: remove MIDI timeout - Sebastien Bourdeauducq

11:08 <kristianpaul> wolfspraul, errordeveloper, nah just vaporware it seems, until i see a dek kit with zynq chip

11:23 <lekernel> I don't think it's really "vaporware"... Xilinx often ships experimental stuff to a few lab-rat companies before it is generally available

11:45 <lekernel> http://digilentinc.com/Products/Detail.cfm?NavPath=2,719,929&Prod=JTAG-SMT1

12:26 <wpwrak> (midi timeout) ah, interesting ... how long is a "tick" ?

12:33 <lekernel> 10ms (iirc)

12:33 <lekernel> wpwrak, btw, if you think you are getting lost bytes because of the interrupts not being serviced fast enough, mwalle has made a new UART interface design that should be a lot friendlier to implement a small hardware FIFO

12:34 <lekernel> it's in soc git head, but there's no RTEMS driver for it yet

12:39 <wpwrak> (timeout) then is probably would only have worked if there's no clock. the clock ticks at 24*bpm, so something like 30-50 Hz. i may actually have observed some slight changes when playing with the clock. interesting :)

12:41 <wpwrak> (UART) great ! that's definitely something worth considering. at the moment, i seem to get very few losses, maybe even none. but a lot more of those hangs :-(

12:41 <lekernel> ah, there is no clock with my MIDI keyboard

12:41 <lekernel> maybe that explains why you got bugs and not me

12:43 <wpwrak> heh, yes, that might be just the trigger

14:22 <wpwrak> lekernel: any ideas about the hang ? i've now set a conditional breakpoint on _CORE_message_queue_Seize (for e_message_queue->number_of_pending_messages == 64) and i can watch it crawl to towards its doom, but i still don't have any smoking gun

14:23 <wpwrak> it appears that disaster doesn't necessarily strike the very first time the queue fills up

14:24 <wpwrak> also, with the conditional breakpoint in place, I didn't get to stop in memcpy. instead, the first evidence of trouble I see is the_message_queue->Pending_messages.last = 0x0

14:31 <kristianpaul> lekernel: i wrote then just by curiosity and point me to use qemuÂ Â instead of a board :)

14:31 <kristianpaul> but yes, they may have a real board for sure i guess

15:21 <kristianpaul> oh http://digilentinc.com/Products/Detail.cfm?NavPath=2,400,836&Prod=ATLYS

15:24 <kristianpaul> ha, supported by petalinux ;-)

15:43 <kristianpaul> hum it uses serial flash instead

15:57 <lekernel> wpwrak, not off the top of my head, sorry

15:57 <lekernel> what sets the_message_queue->Pending_messages.last to 0?

15:57 <lekernel> iirc you can also use watchpoints

16:49 <wpwrak> watchpoints ? hmm, let's see. the conditional breakpoints are glacially slow. takes something like 10-30 seconds per queue size increment

16:50 <lekernel> hmm... I don't know how they work. if they result in a lot of traffic exchanged between the PC and the M1 every time the code is executed, that may explain it

16:50 <lekernel> the serial link is not fast

16:51 <lekernel> btw - the FT2232H might support 30Mbps there as well. and with a redesign of the FPGA UART, the SoC could support similar speeds too.

16:51 <kristianpaul> or start by moving the uart core from csr to wishbone?

16:52 <wpwrak> watchpoints would only work usefully if there's hardware support for them

16:52 <lekernel> there should be hardware support for thel

16:52 <lekernel> them

16:52 <lekernel> I have not tested it, but maybe mwalle did

16:52 <wpwrak> perfect. let's put it to good use then :)

16:53 <wpwrak> oh, btw, did you implement any NULL pointer dereferencing trap ?

16:53 <lekernel> no

16:53 <lekernel> this would happily land in the flash

16:54 <wpwrak> that would be a worthwhile feature for catching bugs

16:54 <lekernel> in theory, you could easily generate a bus error on such a condition with something like

16:54 <wpwrak> oh, even in the flash ? wow ;-)

16:54 <wpwrak> yes, a bus error is what i had i ming

16:54 <wpwrak> minD

16:55 <wpwrak> if it's even NOR address space, then the CPU has no business accessing the beginning of that range anyway (standby bitstream)

16:55 <lekernel> assign wb_err_i = wb_adr_o[31:lower_bit] == <# of bits>'d0 & wb_stb_o & wb_cyc_o;

16:55 <lekernel> right on the CPU buses

16:55 <lekernel> I have never tested bus errors with LM32 though

16:56 <lekernel> I don't know how the current debugger handles them (they are never asserted with the current design)

16:56 <wpwrak> anyone here who got time ? :)

17:08 <wpwrak> hmm. watchpoints kinda pseudo-work :-(

17:08 <wpwrak> the watchpoint per se seems fine

17:08 <wpwrak> but the conditional part is weird

17:10 <wpwrak> the "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" in the backtrace seems to be "normal". at least i get it from very early on. hmm.

20:51 <wpwrak> amazing. there are no less than three instances of chain.inl in RTEMS. two of them overlap in what they define. the third is a set of wrappers for (which ?) one of the others. if i was looking for a design that made broad allowances for letting subtle but nasty errors creep in, that approach would be a good candidate.

20:53 <lekernel> I know you dislike this system and I will easily admit it's far from perfect. but... seriously try running FN under Linux, and you'll see it's a lesser evil :)

20:54 <wpwrak> oh, i hope very much to meet these evils ;-)

20:56 <wpwrak> what's irritating with these lists/chains is that they're such a fundamental thing and there are at least two potentially dangerous things in how they're done. of course, i keep telling myself that, given that they're so fundamental, everything must pan out in the end. but still, ...

20:59 <wpwrak> of course, the code says "1989-2006". not that lists would particularly new, but, say, the considerably more elegant solution linux uses for the same problem (not just lists but some internal properties of them as well) may not have been common knowledge back then. (not that i'd expect the solution in linux to originate from linux, of course)

21:43 <mwalle> wpwrak: lekernel: yeah conditional watchpoints/breakpoints are handled by gdb (not by the gdbstub)

21:46 <mwalle> and watchpoints are hardware watchpoints, but i dont know if they are switched on the MM1, i remember the comparators were within the critical path and we wont meet timing

21:54 <wpwrak> the watchpoints seem to work. but the conditional part isn't handled correctly.

21:54 <wpwrak> (or so it seems)

21:55 <wpwrak> interestingly, i get conditional breakpoints work just fine

21:56 <wpwrak> like this: http://pastebin.com/t6fbcSqa

21:57 <wpwrak> also triedÂ Â watch ...Â Â withÂ Â condition ...Â Â which should be equivalent toÂ Â break/watch ... if ...Â Â but got the same result

21:57 <wpwrak> it traps all the time in _Chain_Append_unprotected, which is indeed where "last" changes

22:02 <wpwrak> regarding the mixed-up types, at least gdb is confused: http://pastebin.com/Tg3Xqyvk

22:03 <wpwrak> the struct with first/permanent_null/last is from doc/tools/bmenu/chain.h while gdb locates the sources for the rest from the more plausible cpukit/score/inline/rtems/score/ universe

22:03 <mwalle> btw iirc watchpoints are always two instructions behind

22:04 <wpwrak> at least these structures should be compatible (both by intention and by the way they were compiled), but such things don't exactly inspire confidence ...

22:06 <mwalle> watch or awatch? (or are these cmds equivalent?

22:06 <wpwrak> watch is for writes, says the manual :)

22:06 <wpwrak> awatch for read/write

22:06 <mwalle> lm32 only supports access (read and write)

22:06 <wpwrak> oh

22:07 <mwalle> mh

22:07 <wpwrak> i sense potential for some improvement ;-)

22:08 <mwalle> wpwrak: forget it, should be fixed within the latest gdbstub, it supports write and read and access

22:09 <wpwrak> wheee ! :)

22:09 <mwalle> wpwrak: but have a look at $pc, i guess its two instructions behind the actual sw or lw instruction

22:10 <mwalle> i don't know if this has some influence to gdb's conditional logic

22:14 <wpwrak> hm, shouldn't ... after all, i'm giving it a constant address

22:16 <mwalle> wpwrak: so you made sure $pc is a sw or lw instruction?

22:17 <mwalle> gdb does some weird single stepping after a watchpoint

22:17 <mwalle> iirc ;)

22:18 <mwalle> i guessed that some archs break before the actual store/load instruction and some after the instruction was executed

22:19 <mwalle> but gdb is always singlestepping one instruction

22:20 <mwalle> you may turn on gdbstub debugging to see whats actually going on

22:20 <mwalle> set debug remote on

22:20 <wpwrak> it's a sw

22:20 <mwalle> mh ;)

22:21 <wpwrak> one of many. so it may very well be a little off

22:21 <wpwrak> in fact, it probably is

22:21 <wpwrak> http://pastebin.com/GtUWsWtX

22:22 <mwalle> whats $r1 + 72 ?

22:22 <mwalle> your watchpoint? :)

22:22 <wpwrak> hmm. i'm not entirely sure about those offsets. the difference seems to large to make sense

22:23 <wpwrak> naw, nowhere in right :)

22:23 <wpwrak> oh, wait. typo

22:23 <wpwrak> $r1+88 is my watchpoint

22:23 <wpwrak> so $pc is correct

22:23 <mwalle> mh

22:24 <mwalle> i should probably update my gdb ;)

22:24 <wpwrak> in any case, the calculation should be affected by where $pc is only in as much as what instructions have executed since the trap

22:25 <wpwrak> the calculation of the value does not depend on any local context (except for the symbol table)

22:33 <mwalle> watch if cond.. should set the hw watchpoint, and then gdb checks cond on every exception. to find out whats broken, assuming you want to use conditional watchpoints, a little test binary, which triggers the bug, would be helpful :)

22:35 <wpwrak> watch <var> if <cond>Â Â is what i tried. it breaks all the time, no matter what the condition evaluates to :-(

22:37 <mwalle> wpwrak: so try to enable remote debug and see whats going on

22:38 <mwalle> the packets are described here: http://sourceware.org/gdb/onlinedocs/gdb/Packets.html#Packets

22:40 <mwalle> you should see the set hw watchpoint packet, then continue, then a signal packet, when the watchpoint has hit and after that there should be some memory read commands where the reply should be interesting

22:42 <mwalle> sorry but i have to go to bed now, my alarm wakes me up very early ;)

22:43 <wpwrak> oh dear

22:44 <mwalle> yeah and some register info packets ;)(

22:44 <wpwrak> http://pastebin.com/dTdGdn8J

22:45 <wpwrak> and http://pastebin.com/5pk8ph6d

22:51 <mwalle> so gdb doesn't read the memory at all?!

22:52 <wpwrak> maybe this is it ? $m408dfe68,4#06 -> 7fffffff

22:52 <wpwrak> but i'm not so sure what it thinks it's reading :)

22:53 <mwalle> gdb disables the watchpoint and reads 401365dc, 401365e0 and 401365e4

22:53 <mwalle> do you have some print statements on break enabled?

22:53 <wpwrak> no

22:55 <mwalle> maybe you should try raw memory addresses :)

22:55 <wpwrak> let's see ..

22:55 <mwalle> gn8 :)

22:57 <wpwrak> not even .. if *(uint32_t *) 0x408da714 == 0Â Â does the trick :-(

23:41 <wpwrak> kewl. now i killed it so hard gdb doesn't get through anymore

23:43 <lekernel> hmm... I wonder if this could be because the CPU tries to access unmapped bus areas that never get acked

23:44 <lekernel> generating a bus error in those cases (I'm not sure if they exist) would solve the problem

23:45 <wpwrak> i certainly get a very hung CPU. i suppose with some jtag magic, i could also find out where exactly it hangs :)

23:48 <wpwrak> now, i set a watchpoint on 0x10, since this seems to be a popular "NULL" pointer.and it tripped in rtems_message_queue_send: http://pastebin.com/t1zHBWwM