Topic for #qi-hardware is now Copyleft hardware - http://qi-hardware.com | hardware hackers join here to discuss Ben NanoNote, atben / atusb 802.15.4 wireless, and other community driven hw projects | public logging at http://en.qi-hardware.com/irclogs
compcube has joined #qi-hardware
compcube has quit [Changing host]
compcube has joined #qi-hardware
LunaVorax has quit [Ping timeout: 246 seconds]
urandom__ has quit [Remote host closed the connection]
nikescar has quit [Ping timeout: 246 seconds]
Hoolxi has joined #qi-hardware
guanucoluis has joined #qi-hardware
Hoolxi has quit [Remote host closed the connection]
compcube has quit [Quit: Leaving]
Ayla has quit [Quit: dodo]
Jay7 has quit [Quit: Conversation terminated]
Jay7 has joined #qi-hardware
<DocScrutinizer05> and sometimes you hear in coding style lessons to use for ( stream = SNDRV_PCM_STREAM_LAST; stream >(=)0; stream-- ) for optimizing reasons (compare to 0 faster&shorter than compare to const)
<DocScrutinizer05> anyway this form has the benefit of being to the point about how to interprete pragma-origin-0 and upper-limit
<DocScrutinizer05> alternative:
<DocScrutinizer05> for ( stream = --SNDRV_PCM_STREAM_AFTER_LAST; stream >(=)0; stream-- )
<DocScrutinizer05> ooops
<DocScrutinizer05> for ( stream = --SNDRV_PCM_STREAM_AFTER_LAST; stream >(=)0; --stream )
<DocScrutinizer05> actually I learnt about pre- vs post-in/decrement as well, one is better than the other, alas I already forgot which is which
<DocScrutinizer05> performance-wise
<DocScrutinizer05> I guess pre-*crement is better since it doesn't require creating a temporary variable
<DocScrutinizer05> unless your compiler is slightly smarter than any assembler and recognizes that there's no need for any such var
<DocScrutinizer05> jr@halebop:~> time for (( i=0; i<300000; i++ )); do :; done #-> real 0m1.807s ### jr@halebop:~> time for (( i=300000; i>0; i-- )); do :; done #-> real 0m1.694s
<DocScrutinizer05> sure, it's shellscript
<DocScrutinizer05> and then there's been that one CPU that didn't know decrements ;-P
<DocScrutinizer05> for (( i=-300000; i<0; i++ ))
guanucoluis has quit [Remote host closed the connection]
jekhor has joined #qi-hardware
kristoffer has joined #qi-hardware
nikescar has joined #qi-hardware
Hoolxi has joined #qi-hardware
rejon has joined #qi-hardware
<wpwrak> since the invention of SSA, you can pretty much stop worrying about temporary variables :)
<viric> ssa?
<lindi-> static single assignment
<wpwrak> yup :)
<viric> what's that like?
<lindi-> viric: you only assign each variable once in the IR
<wpwrak> your compiler basically creates a new temporary variable for each basic operation
<viric> ah.
<viric> and that's any good?
<wpwrak> afterwards, it clears that up and removes everything it doesn't actually need
<wpwrak> yes, it gives you very nice semantics for optimization
<viric> ok
<larsc> DocScrutinizer05: your coding style lessons are from the 90ies ;)
<DocScrutinizer05> sure
<DocScrutinizer05> though held 6 months ago
<larsc> e.g. there is performance wise no difference between i++; and ++i; with a modern compiler
<mth> there can be a performance difference if you're using C++ iterators, but not for integers
<DocScrutinizer05> I think I already mentioned that?
<DocScrutinizer05> ooh yes, I forgot to recall to mention it is of course *very* depending on type of variable, and esp c vs c++
<DocScrutinizer05> c++ you err compiler might even have to instanciate a new object
<DocScrutinizer05> maybe not nowadays anymore, or more seldom nowadays
<larsc> mth: yes, looks like a bug
<DocScrutinizer05> but this ENEA guy told us that as well
<DocScrutinizer05> and it took him like 2 sentences resp 10s to handle the whole topic
<wpwrak> larsc: early 90es ;-)
<DocScrutinizer05> c'mon guys, cpu still is faster to check register for 0 than against a constant
<viric> ++ pre and post operators can be implemented completely different, for objects.
<larsc> DocScrutinizer05: depends on your arch
<DocScrutinizer05> and that's sth a compiler usually can't auto-optimize
<mth> larsc: will you fix it or shall I fix it?
<wpwrak> depends. most compiler don't see beyond function calls, that's true. so they don't know whether there are side effects lurking.
<DocScrutinizer05> I don't know any arch that wouldn't set flags on arithmetic operations like inc or whatever
<wpwrak> but if there are no calls or all the functions are inline (or maybe static), then there's a good chance the compiler can rearrange that check if it thinks it makes a difference
<DocScrutinizer05> so comparing against 0 is basically for free
<larsc> mth: I'm lazy today
<mth> ok, I'll make a patch then; can I include your name in a header?
<wpwrak> i'm lazy too. so i won't look up one that doesn't set flags on inc/dec ;-)
<larsc> mth: sure
<DocScrutinizer05> that for sure was too much for a lazy weekend
<DocScrutinizer05> ;-D
<wpwrak> but the point is that the compiler should be perfectly capable of changing the loop itself, provided the context isn't spread out too widely (in which case the optimization is irrelevant anyway)
<DocScrutinizer05> compiler can't optimize a inc loop to a dec loop, since that's way beyond what compiler understands about code semantics
<larsc> there are archs where you don't jump on flags, but the jmp or branch instruction does a compare
<larsc> so it does not really matter whether inc/dec sets a flag or not
<DocScrutinizer05> dbnz
<DocScrutinizer05> yes, I know
<DocScrutinizer05> I don't know a ibnz instruction
<DocScrutinizer05> ibl
<DocScrutinizer05> even
<DocScrutinizer05> was it 8080 or z80 that invented decrement-and-branch-if-non-zero instruction?
<DocScrutinizer05> this training lesson was targeted at embedded. The CPUs there are amazingly stupic little things sometimes. And still today you can find those
<DocScrutinizer05> stupid*
<DocScrutinizer05> I strongly suspect a compiler will rather create a second variable it increments, inside a djnz loop, rather than comparing this variable against a constant for loop termination
<DocScrutinizer05> of course when you're using a dec loop terminating at zero, the compiler can use the djnz register
<DocScrutinizer05> wpwrak: and I really wanna see the compiler smart enough to usually detect this case and convert for (i=0, i<10,i++) to for(i=9, i>=0, i--)
<DocScrutinizer05> it needs thorough understanding of all that's done inside the loop with i, and the side effects
<larsc> DocScrutinizer05: if i for example is not used inside the loop the compiler will do it
<larsc> if it actually is faster
<DocScrutinizer05> yep, then it's safe
<DocScrutinizer05> and that's the only safe case the compiler is 'smart' enough to detect
<DocScrutinizer05> I'd say
<DocScrutinizer05> anyway even bashscript is faster on comparing to 0 ;-D
<larsc> if it can be sure that the loop is side effect free, I'd say I can also optimize other cases. E.g. summing up a array
<DocScrutinizer05> and while I've been aware of that since 1980, I tend to forget it in daily coding
<DocScrutinizer05> securing side effect absence isn't trivial
<DocScrutinizer05> it's possible 95% of cases, as long as you got 'clean' architecture
<DocScrutinizer05> means no mem-mapped IO that the compiler doesn't know about, etc
<DocScrutinizer05> even then in a multitasking environment there always might be differences
<larsc> DocScrutinizer05: e.g. foo and bar generate the same code: http://pastie.org/4650248
GNUtoo has joined #qi-hardware
<wpwrak> and you didn't even enable optimization ;-)
<DocScrutinizer05> larsc: sure, but in one of the cases your results may be bogus
<wpwrak> DocScrutinizer05: how so ?
<DocScrutinizer05> well, not in this very example
<wpwrak> qed :)
<DocScrutinizer05> for (i = 0; i < 10; i++); if (i=0 || i=1) {while x[i]==0 {usleep 1000} } else { tmp += x[i] }; return tmp;
<DocScrutinizer05> ooops
<DocScrutinizer05> for (i = 0; i < 10; i++); if (i==0 || i==1) {while x[i]==0 {usleep 1000} } else { tmp += x[i] }; return tmp;
<wpwrak> luckily, the loop optimization should be very simple here ;-)
<DocScrutinizer05> sure, you shouldn't do that inside the loop anyway
<wpwrak> do what ? ;-)
<wpwrak> your loop is empty
<DocScrutinizer05> eh?
<wpwrak> well, the for loop
<DocScrutinizer05> meh
<wpwrak> and the while loop won't compile, so .. :)
<DocScrutinizer05> I can spam the chan with another 5 to 10 typo fixed verions
<DocScrutinizer05> though it's a pita to edit in this IRC client
<wpwrak> the main problem in your example is the usleep anyway, because there's no good way to tell the compiler that the side effect in question is a delay
<DocScrutinizer05> err wut?
<wpwrak> ah, it's an endless loop. i see.
<DocScrutinizer05> the side effect in question is another process initializing the array with values and only _then_ set the semaphore in x[0] x[1]
<wpwrak> there are a few "volatile" missing already for correctness ;-)
<DocScrutinizer05> good point
<wpwrak> and obviously, if you tell the compiler explicitly to avoid any optimizations, you shouldn't be too disappointed if it indeed doesn't optimize :)
<DocScrutinizer05> though I'm not sure you need volatile for storage that can get changed by CPU only
<wpwrak> you're confused :)
<DocScrutinizer05> I'm just saying it mustn't optimize in that case
<wpwrak> if you add the right volatiles, it won't optimize
<DocScrutinizer05> since (it seems I should explain my code) when the loop reads the array from x[9] down to x[0], then the test for semaphore set and usleep will be done last, not first
<DocScrutinizer05> hell, I'm not setting volatile for all global stroage that might get accessed by other processes
<wpwrak> "volatile" tells the compiler that the access sequence must not be changed
<wpwrak> then your code is broken :)
<wpwrak> of course, since function calls are usually a barrier, too, you can get away with that in many cases. but the sooner or later, you'll run into trouble.
<DocScrutinizer05> any static var already implies that there may be (possibly concurrent) access to the storage from other location in code, no?
<DocScrutinizer05> I mean, that's one of the reasons you get static var
<wpwrak> concurrency is never assumed unless you explicitly indicate it with "volatile"
<wpwrak> what ?
<DocScrutinizer05> err, yes I'm confused. Please wait until my coffe had time to kick in
<wpwrak> in your coding class, they shouldn't waste time with prehistoric optimization techniques and instead teach some basics ;-)
<DocScrutinizer05> meh
<DocScrutinizer05> haha
<DocScrutinizer05> a *global* var (scope beyond local function) usually is defined that way because other functions may want to access it as well
<larsc> yes, but C assumes that no two functions run at the same time
<DocScrutinizer05> in the example *x is "form outside"
<wpwrak> larsc: ... unless one calls the other (possibly via intermediates)
<DocScrutinizer05> larsc: C assumes what? No taskswitching allowed during execution of int foo(int *x) ?
<larsc> DocScrutinizer05: exactly
<wpwrak> anyway, gotta get things ready for today's barbecue. 24 C predicted for this chilly mid-winter day :)
<DocScrutinizer05> now that sounds absolutely BS. How's preemptive multitasking going to work with that?
cladamw has joined #qi-hardware
<larsc> DocScrutinizer05: Well if you run two processes, your two processes run in different contexts and they'll never see each other
<larsc> but if you have two processes which share memory you need to explicitly protect that memory against concurrent access
<DocScrutinizer05> ummpf, what if I got a IRQ service callback function in same module like int foo(int *x)
<larsc> if that IRQ service routine also accesses your x you need to explicitly write your code to handle that case
<DocScrutinizer05> I did, by defining x[0] x[1] as semaphore
<larsc> well, see as no native semaphores
<larsc> s/see/C/
<qi-bot> larsc meant: "well, C as no native semaphores"
<DocScrutinizer05> and I assumed compiler better not comverts this loop to a decrement loop for optimization
<larsc> s/as/has/
<qi-bot> larsc meant: "well, C has no native semaphores"
<larsc> it won't. but that's because it does not no what happens inside of usleep
<larsc> s/not no/not know/
<qi-bot> larsc meant: "it won't. but that's because it does not know what happens inside of usleep"
<larsc> what the fuck is wrong with my typing today?
<lindi-> yeah usleep does a syscall so it is quite hard to optimize
<DocScrutinizer05> I *might* kick out the usleep (though that's for sure rogue)
<lindi-> for library calls some optimizations are possible though, like converting printf with constant arguments to puts
<lindi-> or printf("%c", c) to putchar(c)
<DocScrutinizer05> replace that by a simple return(-ETOOFAST) when x[0]==0
<DocScrutinizer05> still a pretty nice race when foo() sums up bogus values for x[2..9] and then finds meanwhile semaphore in x[0] got set so everything seems to be fine
<larsc> gcc has extensions like __attribute__(("pure")) which tells the compiler that the function has no sideeffects
<lindi-> larsc: llvm IR has readonly and readnone attributes that are similar
<DocScrutinizer05> the whole point of my reasoning been that compiler has a hard time to decide if there are side effects in a for-loop that might bite you when compiler optimizes loops from increment to decrement
porchao has joined #qi-hardware
porchaso0 has quit [Ping timeout: 276 seconds]
<larsc> lindi-: I think llvm, well clang, should also support __attribute__(("pure"))
<DocScrutinizer05> take a simple memmove shifting each byte down one position. Do you think compiler is smart enough to see it will fail miserably when you change sequence? like x[8]=x[9]; x[7]=x[8]...
<lindi-> larsc: yeah, I was talking about how it is represented in the IR
<larsc> DocScrutinizer05: it must see this
<DocScrutinizer05> no sarcasm, really wondering if compilers might be smart enough for that
<lindi-> DocScrutinizer05: just paste it to http://llvm.org/demo/
<DocScrutinizer05> o.O ??
<DocScrutinizer05> wow!
<mth> larsc: I'm testing the pwm changes and there is one annoying effect: the numbering changes
<mth> JZ_GPIO_PWM2 becomes pwm 0 etc
<mth> should we just accept that? request that the pwm core can handle mappings that don't start at 0? or register JZ_GPIO_PWM0 and JZ_GPIO_PWM1 and return EBUSY if someone tries to use them?
<mth> the hardware does have 8 PWMs, it's just that timer 0 and 1 are in use by the kernel and therefore not available as PWMs
jluis has quit [Read error: Connection reset by peer]
<larsc> mth: e.g. for jz4760 timer 0 and 1 are not used by the kernel, so I think it should be fine
<larsc> I don't really get patch 1 of that series
<larsc> but I might be missing someting
<larsc> Btw. I met with Thierry last week, seems to be a nice guy
<mth> patch 1 wants to get rid of a header named "irq.h" in the current directory (for 4740 arch) since it gets included instead of something from the header search path
<mth> I don't know what that something is
<mth> maybe the MIPS version of irq.h?
<larsc> but we include irq.h with "" instead of <>
<mth> hmm, so it's the one that is supposed to be included
<mth> the MIPS version is called asm/irq.h, so unless the "asm" path is in a search path as well, it cannot be masked by just "irq.h"
<mth> maybe it's shadowing arch/mips/include/asm/mach-generic/irq.h?
<mth> that's the one that defines NR_IRQS
<larsc> but we never do #include <irq.h> anyway
<mth> I don't see the rest of the patches include <irq.h> either
<mth> so I'm out of ideas
<mth> I guess you should just ask Thierry
jluis has joined #qi-hardware
<larsc> I think the pwm_{enable,disable} in jz4740_pwm_config should be jz4740_pwm_{enable,disable}
<mth> I get the pwm backlight driver to init now using the legacy api, but the screen goes dark
<mth> the difference in the enable/disable is this: test_and_clear_bit(PWMF_ENABLED, &pwm->flags)
<larsc> ah, I misunderstood you earlier, the new id's start with PWM2 = 0
<larsc> that's indeed not so nice
<mth> I think the numbering thing is even causing bugs
<mth> pwm->hwpwm is 5 for PWM7, so it will conifgure timer 5
<larsc> yea, just saw this as well
urandom__ has joined #qi-hardware
<mth> btw, I think the is_enabled check before pwm_disable in jz4740_pwm_config is redundant, since the pwm core checks this as well
<mth> and the core checks it atomically, so that's a bit safer as well
<mth> or jz4740_pwm_config is called with a lock held anyway?
<larsc> I think we should set NUM_PWM to 8 and return -EBUSY for 0 and 1 for in request
<mth> yes, I think that makes it much clearer since it corresponds better with the hardware
kilae has joined #qi-hardware
kristoffer has quit [Quit: This computer has gone to sleep]
<mth> ok, with that change it works via the legacy API
<mth> now I have to figure out how the new API works
<larsc> the same, I think
<mth> it seems some kind of registration to work
<mth> see pwm_add_table()
<larsc> ah, probably like the clk api
<mth> there is nothing arch/ that uses 'struct pwm_lookup' though
<mth> so it seems the new API is so new that it's not actually being used yet
<mth> shall I send a diff to Thierry?
<larsc> I think all you need for the PWM_LOOKUP("jz4740-pwm", 4, 0, 0)
<larsc> or similar
<larsc> all you need for the pwm backlight
<larsc> I'd just comment on the patch
<mth> code is sometimes easier to understand than a description of code...
<mth> PWM_LOOKUP("jz4740-pwm", 7, 0, 0) does not work
<mth> pwm-backlight pwm-backlight: unable to request PWM, trying legacy API
<larsc> hm, ok the first one seems to be the name of the device which requests the pwm
<larsc> so pwm-backlight
<mth> strange to call it "provider" then...
<larsc> no
<larsc> wait
<larsc> yes
<larsc> the 3rd argument is the requesting device
<larsc> but it should also work if it is NULL
woakas has quit [Ping timeout: 272 seconds]
<mth> PWM_LOOKUP("jz4740-pwm", 7, "pwm-backlight", 0) does seems to work
panda|x201 has joined #qi-hardware
<larsc> about patch 2, I'd like to keep the timer functions as inline functions, do you agree?
<mth> it would reduce the number of exported symbols significantly
<mth> performance wise, I don't think it really matters, since we don't call those functions very often
<larsc> well, they are used in the system clock code
<mth> I sent my comments
guanucoluis has joined #qi-hardware
<mth> does the system clock change the timer configuration often though?
<mth> maybe once adaptive tickless is in it will?
<larsc> well it reads the counter quite often
<mth> ok, then inline is better indeed
guanucoluis has quit [Ping timeout: 246 seconds]
<DocScrutinizer05> ((<larsc> I think we should set NUM_PWM to 8 and return -EBUSY for 0 and 1 for in request)) indeed
kristoffer has joined #qi-hardware
cladamw has quit [Ping timeout: 240 seconds]
cladamw has joined #qi-hardware
cladamw has quit [Read error: Connection reset by peer]
Hoolxi has quit [Remote host closed the connection]
jekhor has quit [Ping timeout: 244 seconds]
<larsc> mth: I found out that the udc core used by the jz4740 ist the musb udc core and so we already have a driver for it in mainline
<larsc> we just need to add some boilerplate code to hook e.g. clocks up
DocScrutinizer05 has quit [Ping timeout: 268 seconds]
DocScrutinizer05 has joined #qi-hardware
DocScrutinizer05 has quit [Disconnected by services]
DocScrutinizer05 has joined #qi-hardware
guanucoluis has joined #qi-hardware
kuribas has joined #qi-hardware
guanucoluis has quit [Ping timeout: 276 seconds]
emeb has joined #qi-hardware
kilae_ has joined #qi-hardware
kilae has quit [Ping timeout: 252 seconds]
Ayla has joined #qi-hardware
GNUtoo has quit [Quit: Program received signal SIGSEGV, Segmentation fault.]
<mth> larsc: this is the glue code we're using for musb on the JZ4770: http://www.treewalker.org/temp/musb-jz4770.diff
<mth> I migrated Ingenic's code from 2.6.31 to 3.5
<mth> that was an awful lot of work since they backported a lot of future mainline commits to 2.6.31 and published it as part of a 500,000 line diff
<larsc> we shouldn't need as much for jz4740
<larsc> also there is a generic gpio driver for cable detection, iirc
<mth> we probably don't need that much on 4770 either, but I haven't tried to cut it down further yet as it works in its current state
jekhor has joined #qi-hardware
Ayla has quit [Ping timeout: 272 seconds]
Ayla has joined #qi-hardware
Ayla has quit [Ping timeout: 276 seconds]
Ayla has joined #qi-hardware
newcup has quit [Ping timeout: 268 seconds]
viric has quit [Ping timeout: 268 seconds]
viric has joined #qi-hardware
jluis has quit [Read error: Connection timed out]
jluis has joined #qi-hardware
kristoffer has quit [Quit: Leaving]
newcup has joined #qi-hardware
DocScrutinizer05 is now known as stephenelop
stephenelop is now known as DocScrutinizer05
porchao has quit [Ping timeout: 244 seconds]
porchao has joined #qi-hardware
kilae_ has quit [Quit: ChatZilla 0.9.88.2 [Firefox 15.0/20120824154833]]
<Ayla> larsc: hi
<Ayla> arch/mips/jz4740/clock.c:221-222
<Ayla> do you remember what the '+ 2' is for?
_whitelogger_ has joined #qi-hardware
<mth> Ayla: if it's jz_clk_pll_get_rate you're talking about, I don't think larsc is responsible
<mth> in the version I have here, there is no +2 at line 221-222
<Ayla> ?
<Ayla> there is, on the jz4770 tree
whitequark has joined #qi-hardware
<mth> ah, I just changed the names of those variables, not the expression
<mth> I think the data sheet just says that the register value is 2 less than the actual value
jluis has joined #qi-hardware
wolfspra1l has joined #qi-hardware
wolfspraul has quit [Ping timeout: 272 seconds]
guanucoluis has joined #qi-hardware
LunaVorax has joined #qi-hardware
jekhor has quit [Ping timeout: 244 seconds]
LunaVorax has quit [Ping timeout: 246 seconds]
jluis has quit [Ping timeout: 245 seconds]
guanucoluis has quit [Remote host closed the connection]
LunaVorax has joined #qi-hardware
jluis has joined #qi-hardware
kuribas has quit [Remote host closed the connection]
LunaVorax has quit [Read error: Connection reset by peer]