<wolfspraul> if a full nanonote openwrt build on the buildhost takes 30 hours now, how can we determine what the bottleneck is?
<wolfspraul> 1) cpu 2) hdd 3) memory
<wolfspraul> others?
<larsc> if you havn't already you might want to consider to use ccache
<wolfspraul> thanks!
<wolfspraul> I guess we can try that on the existing machine first
<mth> yes, most builds will be almost the same as the previous one, so ccache shoudl work really well
<wolfspraul> can it easily be enabled in openwrt?
<mth> alternatively, try to not do full builds, but that might be a bigger developer time investment
<wolfspraul> also I guess we assume its
<wolfspraul> bug-free :-)
<wolfspraul> well, one purpose of the "full" builds is to rule out problems from incremental builds
<wolfspraul> there's a reason so many devs first erase everything and build from scratch, must be from their experience :-)
<mth> yes, it's something that is necessary in practice but not in theory
<mth> I'm wondering if it is feasible to set up a build system where dependency checking is reliable enoug to actually trust it
<wolfspraul> show me one dev who is doing some "incremental" build magic, and when running into anything "strange" wouldn't first nuke all the temp files and start over? :-)
<mth> but that's a very long term approach
<wolfspraul> ok, so ccache. good idea, how can it be enabled?
<wolfspraul> is it easily supported with openwrt? I'll look into it
<wolfspraul> in parallel the raw performance of the machine is also something that can be improved
<wolfspraul> but I'm trying to find out the bottleneck - cpu/mem/hdd
<mth> afaik it's done by setting CC and CXX to point to ccache rather than the actual compiler
<wolfspraul> I'm wondering what it is doing all these 30 hours
<wolfspraul> the hdd is a raid-0 over 2 disks (no ssd)
<mth> "time" should tell you whether CPU is the bottleneck: compare real time with user time
<wolfspraul> we could increase memory and try with a memory based /tmp or so
<mth> mem could be checked by monitoring how much is swapped out and how much cache is available
<wolfspraul> the cpu is a single-core 64-bit, that could be increased as well
<mth> a quad core would build about 3-3.5 times as fast as a single core in my experience
<mth> maybe a bit less if you have lots of small packages
<wolfspraul> that's assuming that amount of memory or hdd/sdd speed are not the bottleneck
<wolfspraul> so you say in your experience it will be the CPU?
<mth> I can build the OpenDingux rootfs in a quad-core VM on an i7 in 25 minutes
<wolfspraul> ok I think we build a few thousand packages here, in 30 hours
<wolfspraul> and I'm trying to understand which hardware improvement would help the most
<mth> that's far less packages than OpenWRT, I guess, but still quite a lot
<wolfspraul> cpu, memory, hdd/ssd
<wolfspraul> I don't think the build process will max out mem
<wolfspraul> and we don't have a ramdisk (maybe we should?)
<wolfspraul> so yeah, probably the cpu. an SSD would probably also help a lot.
<mth> would ramdisk be faster than sufficient memory for caching?
<mth> at least with caching you don't have to manually manage it
<wolfspraul> well I don't know who is using the resources and to which degree
<wolfspraul> actually I think it is running builds most of the time
<wolfspraul> checking...
<mth> you'd need some background process gathering vital stats of the system say, once a minute, and log them
<mth> perhaps existing network monitoring tools already do that?
<mth> the kind they use to keep track of server farms
<mth> nagios etc
<mth> one big problem that's hard to get rid of is autoconf
<mth> that won't utilize multiple cores
<mth> and it takes a significant amount of time for the build of small packages
<mth> it's really overdue for replacement, imo
<wolfspraul> it's so badly designed that it will never be possible to be replaced
<wolfspraul> survival strategy
<mth> you could speed it up by caching probe results, but I don't know how reliable that is if you mix different versions and possibly customized rules
<wolfspraul> no no
<wolfspraul> I am looking for some easy way to speed up
<wolfspraul> not to be stuck with arcane problems for a few years
<wolfspraul> ccache sounds interesting if a) it's easy to enable b) it's bug-free
<wolfspraul> :-)
<mth> nothing non-trivial is bug-free, but I think ccache's approach is low-risk
<mth> since it uses the preprocessed input to do the lookup in the cache
<wolfspraul> sure I was joking
<wolfspraul> a build is indeed running today
<wolfspraul> and I think the machine is doing this for weeks
<wolfspraul> is the kernel or anybody collecting any load statistics that I can easily look at now?
<mth> you might have to flush the cache if you update the compiler, I'm not sure about that
<mth> "top" would be a start
<mth> it should at least give you an impression of CPU and memory use
<kristianpaul> iotop may help a bit too
<wolfspraul> ok I looked at vmstat 1 for a while. indeed it looks like mostly cpu bound, and/or memory speed
<wolfspraul> not amoutn of memory (1.5gb of 2 used, but lots of buffers, swap very lightly used if at all)
<wolfspraul> also not disk speed I think
<wolfspraul> all seems to be cpu and/or memory speed
<mth> disk speed might become a factor once you switch to multiple cores
<mth> so don't spend all your money at once
<wolfspraul> sure, something always bubbles up
<wolfspraul> you make one piece faster, then one or multiple of the others become relatively bigger :-)
<wolfspraul> ok so: 1) try ccache 2) upgrade cpu, maybe a little more memory just in case
<kristianpaul> or if you still like visually/fun  debugging try watch --color -d 'ps -x -kpcpu -o pid,pcpu,args'
<mth> not just relatively, if you start using multiple cores the access pattern will change as well
<mth> it will be less localized
<kristianpaul> vmstat wont tell you i/o problems i remenber
<mth> you can detect I/O problems indirectly: if there is enough memory and the CPUs are not fully utilized, the I/O must be the bottleneck
<mth> well, or you're not actually running in parallel (small packages, scripts like configure)
<mth> buildroot will only use multiple jobs within one package, not build two packages at once
<mth> I don't know if OpenWRT still has that limitation as well or whether it was removed there
<kristianpaul> anyway if you all took 30hrs worth install munin munin-node i bet
<kristianpaul> at least you can get interesting resources utilization stats over a week
<kristianpaul> not on the last second :)
<wolfspraul> not sure
<wolfspraul> all I've seen munin create so far is a lot of data that adds a lot of confusion
<kristianpaul> just check what you need ;)
<wolfspraul> whereas I can just login to the running machine and look at the load for a little while with simple commands, and get a good understanding where the bottleneck is
<wolfspraul> well, just saying from past experience. that could have well been me.
<kristianpaul> not over the time tought
<wolfspraul> but I just see dozens of pretty charts but little conclusive value
<kristianpaul> indeed, it always depend what are you looking for
<wolfspraul> the pattern is quite stable, if you don't see something over 5 minutes I'd say it's not very relevant to the machine's performance anyway
<wolfspraul> if you have some backup running once every 24h, that's a special thing and what is happening in those x minutes is not representative either
<kristianpaul> for example the process that eats more cpu/mem over a longer period of time, but i dont get to that yet tought :/
<kristianpaul> (5 minutes) yeah ;)
<wolfspraul> cpu seems super busy, ca 80% us, ca. 20% sy
<wolfspraul> cpu upgrade it is
<wolfspraul> and faster memory
<kristianpaul> whats load average?
<wolfspraul> no need to waste money on an ssd now, I think the raid-0 over two normal hdds is not bad
<wolfspraul> load average: 1.03, 1.01, 1.15
<kristianpaul> dint look that bad
<kristianpaul> is still compiling right?
<wolfspraul> yes compiling all the time I think :-)
<mth> load is more-or-less the number of processes waiting for CPU time, correct?
<mth> then a load of ~1 is what I'd expect on a single core -j1 compile
<mth> well, if I/O were a big problem the load would be below 1, so it does point towards the CPU as the bottleneck
<wpwrak> ccache is reasonably safe. i once managed to create a pathological case where the difference was deep in one of the more unusual ELF sections (i don't remember the details, but i think it was with umlsim), where the ccache folks just accepted defeat, but if you don't drive it to extremes, it'll serve you well. even compiler upgrades should be okay.
<wpwrak> ah, he already left
<kyak> viric: ping
<viric> kyak: pong from thousands of km
<kyak> viric: nevermind, i just built the offrss and giving it a try :)
<kyak> had some problems with my eyes. Somehow i thought that libmrss-0.9 > libmrss-0.19.2
<kyak> that's a mindtrick :)
<viric> haha
<viric> sometimes even configure scripts make that errors
<viric> kyak: oth, I feel honored :)
<kyak> viric: btw, i had to add -I/usr/include/curl in the Makefile and #include <curl.h> in the offrss.c
<kyak> :)
<viric> ah
<viric> interesting
<viric> I never built offrss on non-nix
<kyak> damn the X over network is slow.. even in my home network
<kyak> i have to keep it locally or use console brwoser
<kyak> for newsbeuter, i sometimes use the "External actions" feature or whatever it is called. It is when the article is passed to some external program; i use to download things from torrent
<kyak> viric: hm, it's interesting - when i start it like "WEBBROWSER=links ./offrss -w", it won't work. The links shows up, but can't connect to server
<kyak> when i start as ./offrss -w and then just links http://localhost:8090, it works fine
<kyak> oh, a segfaul in podofo...
<viric> in podofo?
<viric> kyak: what version of podofo? Have you linked podofo?
<viric> (any gdb bt?)
<kyak> viric: podofo 0.7.0, the one supplied with my distro, no gdb bt yet
<qi-bot> [commit] Werner Almesberger: m1/perf/eval.pl: warn if an instruction reads and writes from the same register (master) http://qi-hw.com/p/wernermisc/5bf9ae0
<qi-bot> [commit] Werner Almesberger: m1/perf/sched.c: use calloc instead of malloc plus memset (master) http://qi-hw.com/p/wernermisc/0a7e5b1
<qi-bot> [commit] Werner Almesberger: m1/perf/sched.c: return -1 if malloc fails (master) http://qi-hw.com/p/wernermisc/24a9b85
<qi-bot> [commit] Werner Almesberger: m1/perf/sched.c: code cleanup (no functional changes) (master) http://qi-hw.com/p/wernermisc/35e9903
<wpwrak> i like it when qi-bot calls me "master". i always imagine "i dream of jeannie" ;-)
<viric> :)
<larsc> i don't want to destroy your dreams, but i think it's referring to the branch name ;)
<wpwrak> i'll choose to disregard this opinion of yours :)