<wolfspraul>
if a full nanonote openwrt build on the buildhost takes 30 hours now, how can we determine what the bottleneck is?
<wolfspraul>
1) cpu 2) hdd 3) memory
<wolfspraul>
others?
<larsc>
if you havn't already you might want to consider to use ccache
<wolfspraul>
thanks!
<wolfspraul>
I guess we can try that on the existing machine first
<mth>
yes, most builds will be almost the same as the previous one, so ccache shoudl work really well
<wolfspraul>
can it easily be enabled in openwrt?
<mth>
alternatively, try to not do full builds, but that might be a bigger developer time investment
<wolfspraul>
also I guess we assume its
<wolfspraul>
bug-free :-)
<wolfspraul>
well, one purpose of the "full" builds is to rule out problems from incremental builds
<wolfspraul>
there's a reason so many devs first erase everything and build from scratch, must be from their experience :-)
<mth>
yes, it's something that is necessary in practice but not in theory
<mth>
I'm wondering if it is feasible to set up a build system where dependency checking is reliable enoug to actually trust it
<wolfspraul>
show me one dev who is doing some "incremental" build magic, and when running into anything "strange" wouldn't first nuke all the temp files and start over? :-)
<mth>
but that's a very long term approach
<wolfspraul>
ok, so ccache. good idea, how can it be enabled?
<wolfspraul>
is it easily supported with openwrt? I'll look into it
<wolfspraul>
in parallel the raw performance of the machine is also something that can be improved
<wolfspraul>
but I'm trying to find out the bottleneck - cpu/mem/hdd
<mth>
afaik it's done by setting CC and CXX to point to ccache rather than the actual compiler
<wolfspraul>
I'm wondering what it is doing all these 30 hours
<wolfspraul>
the hdd is a raid-0 over 2 disks (no ssd)
<mth>
"time" should tell you whether CPU is the bottleneck: compare real time with user time
<wolfspraul>
we could increase memory and try with a memory based /tmp or so
<mth>
mem could be checked by monitoring how much is swapped out and how much cache is available
<wolfspraul>
the cpu is a single-core 64-bit, that could be increased as well
<mth>
a quad core would build about 3-3.5 times as fast as a single core in my experience
<mth>
maybe a bit less if you have lots of small packages
<wolfspraul>
that's assuming that amount of memory or hdd/sdd speed are not the bottleneck
<wolfspraul>
so you say in your experience it will be the CPU?
<mth>
I can build the OpenDingux rootfs in a quad-core VM on an i7 in 25 minutes
<wolfspraul>
ok I think we build a few thousand packages here, in 30 hours
<wolfspraul>
and I'm trying to understand which hardware improvement would help the most
<mth>
that's far less packages than OpenWRT, I guess, but still quite a lot
<wolfspraul>
cpu, memory, hdd/ssd
<wolfspraul>
I don't think the build process will max out mem
<wolfspraul>
and we don't have a ramdisk (maybe we should?)
<wolfspraul>
so yeah, probably the cpu. an SSD would probably also help a lot.
<mth>
would ramdisk be faster than sufficient memory for caching?
<mth>
at least with caching you don't have to manually manage it
<wolfspraul>
well I don't know who is using the resources and to which degree
<wolfspraul>
actually I think it is running builds most of the time
<wolfspraul>
checking...
<mth>
you'd need some background process gathering vital stats of the system say, once a minute, and log them
<mth>
perhaps existing network monitoring tools already do that?
<mth>
the kind they use to keep track of server farms
<mth>
nagios etc
<mth>
one big problem that's hard to get rid of is autoconf
<mth>
that won't utilize multiple cores
<mth>
and it takes a significant amount of time for the build of small packages
<mth>
it's really overdue for replacement, imo
<wolfspraul>
it's so badly designed that it will never be possible to be replaced
<wolfspraul>
survival strategy
<mth>
you could speed it up by caching probe results, but I don't know how reliable that is if you mix different versions and possibly customized rules
<wolfspraul>
no no
<wolfspraul>
I am looking for some easy way to speed up
<wolfspraul>
not to be stuck with arcane problems for a few years
<wolfspraul>
ccache sounds interesting if a) it's easy to enable b) it's bug-free
<wolfspraul>
:-)
<mth>
nothing non-trivial is bug-free, but I think ccache's approach is low-risk
<mth>
since it uses the preprocessed input to do the lookup in the cache
<wolfspraul>
sure I was joking
<wolfspraul>
a build is indeed running today
<wolfspraul>
and I think the machine is doing this for weeks
<wolfspraul>
is the kernel or anybody collecting any load statistics that I can easily look at now?
<mth>
you might have to flush the cache if you update the compiler, I'm not sure about that
<mth>
"top" would be a start
<mth>
it should at least give you an impression of CPU and memory use
<kristianpaul>
iotop may help a bit too
<wolfspraul>
ok I looked at vmstat 1 for a while. indeed it looks like mostly cpu bound, and/or memory speed
<wolfspraul>
not amoutn of memory (1.5gb of 2 used, but lots of buffers, swap very lightly used if at all)
<wolfspraul>
also not disk speed I think
<wolfspraul>
all seems to be cpu and/or memory speed
<mth>
disk speed might become a factor once you switch to multiple cores
<mth>
so don't spend all your money at once
<wolfspraul>
sure, something always bubbles up
<wolfspraul>
you make one piece faster, then one or multiple of the others become relatively bigger :-)
<wolfspraul>
ok so: 1) try ccache 2) upgrade cpu, maybe a little more memory just in case
<kristianpaul>
or if you still like visually/fun  debugging try watch --color -d 'ps -x -kpcpu -o pid,pcpu,args'
<mth>
not just relatively, if you start using multiple cores the access pattern will change as well
<mth>
it will be less localized
<kristianpaul>
vmstat wont tell you i/o problems i remenber
<mth>
you can detect I/O problems indirectly: if there is enough memory and the CPUs are not fully utilized, the I/O must be the bottleneck
<mth>
well, or you're not actually running in parallel (small packages, scripts like configure)
<mth>
buildroot will only use multiple jobs within one package, not build two packages at once
<mth>
I don't know if OpenWRT still has that limitation as well or whether it was removed there
<kristianpaul>
anyway if you all took 30hrs worth install munin munin-node i bet
<kristianpaul>
at least you can get interesting resources utilization stats over a week
<kristianpaul>
not on the last second :)
<wolfspraul>
not sure
<wolfspraul>
all I've seen munin create so far is a lot of data that adds a lot of confusion
<kristianpaul>
just check what you need ;)
<wolfspraul>
whereas I can just login to the running machine and look at the load for a little while with simple commands, and get a good understanding where the bottleneck is
<wolfspraul>
well, just saying from past experience. that could have well been me.
<kristianpaul>
not over the time tought
<wolfspraul>
but I just see dozens of pretty charts but little conclusive value
<kristianpaul>
indeed, it always depend what are you looking for
<wolfspraul>
the pattern is quite stable, if you don't see something over 5 minutes I'd say it's not very relevant to the machine's performance anyway
<wolfspraul>
if you have some backup running once every 24h, that's a special thing and what is happening in those x minutes is not representative either
<kristianpaul>
for example the process that eats more cpu/mem over a longer period of time, but i dont get to that yet tought :/
<kristianpaul>
(5 minutes) yeah ;)
<wolfspraul>
cpu seems super busy, ca 80% us, ca. 20% sy
<wolfspraul>
cpu upgrade it is
<wolfspraul>
and faster memory
<kristianpaul>
whats load average?
<wolfspraul>
no need to waste money on an ssd now, I think the raid-0 over two normal hdds is not bad
<wolfspraul>
load average: 1.03, 1.01, 1.15
<kristianpaul>
dint look that bad
<kristianpaul>
is still compiling right?
<wolfspraul>
yes compiling all the time I think :-)
<mth>
load is more-or-less the number of processes waiting for CPU time, correct?
<mth>
then a load of ~1 is what I'd expect on a single core -j1 compile
<mth>
well, if I/O were a big problem the load would be below 1, so it does point towards the CPU as the bottleneck
<wpwrak>
ccache is reasonably safe. i once managed to create a pathological case where the difference was deep in one of the more unusual ELF sections (i don't remember the details, but i think it was with umlsim), where the ccache folks just accepted defeat, but if you don't drive it to extremes, it'll serve you well. even compiler upgrades should be okay.
<kyak>
viric: nevermind, i just built the offrss and giving it a try :)
<kyak>
had some problems with my eyes. Somehow i thought that libmrss-0.9 > libmrss-0.19.2
<kyak>
that's a mindtrick :)
<viric>
haha
<viric>
sometimes even configure scripts make that errors
<viric>
kyak: oth, I feel honored :)
<kyak>
viric: btw, i had to add -I/usr/include/curl in the Makefile and #include <curl.h> in the offrss.c
<kyak>
:)
<viric>
ah
<viric>
interesting
<viric>
I never built offrss on non-nix
<kyak>
damn the X over network is slow.. even in my home network
<kyak>
i have to keep it locally or use console brwoser
<kyak>
for newsbeuter, i sometimes use the "External actions" feature or whatever it is called. It is when the article is passed to some external program; i use to download things from torrent
<kyak>
viric: hm, it's interesting - when i start it like "WEBBROWSER=links ./offrss -w", it won't work. The links shows up, but can't connect to server
<kyak>
when i start as ./offrss -w and then just links http://localhost:8090, it works fine
<kyak>
oh, a segfaul in podofo...
<viric>
in podofo?
<viric>
kyak: what version of podofo? Have you linked podofo?
<viric>
(any gdb bt?)
<kyak>
viric: podofo 0.7.0, the one supplied with my distro, no gdb bt yet
<qi-bot>
[commit] Werner Almesberger: m1/perf/eval.pl: warn if an instruction reads and writes from the same register (master) http://qi-hw.com/p/wernermisc/5bf9ae0