foobarquux changed the topic of #ocaml to: www.ocaml.org
jao has joined #ocaml
ni has joined #ocaml
<ni> Whoa. #ocaml is busy tonight.
<ni> I have a small amount of code here that's running much slower than I'd expect it to. My implementation is pretty naive, and I'm wondering if anyone would be able to take a look at it and offer some suggestions.
<ni> Anyone? It's not much code at all - only 50 lines or so. It's a program to find a string within a file, 'grep' style.
<ni> I'm sure my problem is pretty obvious to more experianced eyes...
<malc> ni: send
<ni> Thanks.
<ni> That doesn't _exactly_ emulate grep, it finds the smallest string that doesn't exist in a file. The slow parts are the grep-like parts though.
<malc> dcc failed
<ni> OK, I'll put it on a webpage. One sec.
<malc> sure
<malc> first nitpick, why dont you just mmap it into bigarray?
<ni> I.. ermm.. Hrm. :) I suppose I wanted to make sure it was kept in memory, but I guess if the system had enough memory to load the file it would stay cached..
<malc> what kind of input this comp/i is?
<ni> Well, I just picked a random file. However, a good demonstration of the speed problem occurs if you use a (for example) 8 meg text file. It should find that the null character, the first one it tries, isn't present (since it's a text file), and it does - but it takes 20 or 30 seconds to process it.
<malc> % cumulative self self total
<malc> time seconds seconds calls us/call us/call name
<malc> 20.00 0.01 0.01 36809 0.27 0.27 adjust_gc_speed
<malc> 20.00 0.02 0.01 36809 0.27 0.82 alloc_custom
<malc> 20.00 0.03 0.01 36809 0.27 0.27 check_urgent_gc
<malc> 20.00 0.04 0.01 1 10000.00 10000.00 Comp_findStringInBigArrayI
<malc> nt_73
<malc> 20.00 0.05 0.01 bigarray_get_N
<malc> profiler is your friend
<malc> obviously gazzilion of objects (Strings?) are created and thrown away
<malc> secondly bigarray_get will be slow without type of array declaration for a function (uh..)
<ni> I'm not sure I udnerstand your second comment - I'm specifying it's a Bigarray.char, what else should I specify?
<malc> your function is polymorphic hence need to call c code to extract element of array
<malc> force it to be monomorphic by explicitly describing type of array (layout elem etc) in function declaration
<malc> should bring you some speed
<malc> and createNextStringPerm is fugly, better use Buffer
<ni> I'm _really_ sorry, I still don't understand - could you give an example of a function declaration that specified thet ype of array?
<malc> this ^ "" stuff is the source of excessive allocation methinks
<malc> go along those lines
* malc gone smoking, bb in 5
<ni> Ok, I understand - I'll give that a try.
<malc> why do you need bigarrays anyway?
<ni> Is there another way to deal with large amounts of data?
<malc> large=?
<ni> 600 or 700 MB
<malc> since you dont need to access them randomly, sure
ni has quit [carter.openprojects.net irc.openprojects.net]
smkl has quit [carter.openprojects.net irc.openprojects.net]
gl has quit [carter.openprojects.net irc.openprojects.net]
ni has joined #ocaml
<ni> malc: I've specified the array type, it brought the run time down from around 13s to around 12.5s.
smkl has joined #ocaml
gl has joined #ocaml
<ni> Do I need to specify it for each each function (I have), or can I specify it for one and have ocaml infer the array type in the others?
<malc> for each i guess, again while array access contributes to runtime, the real bottleneck is elsewhere (allocations)
<ni> Yes, I suspected as much. Any idea how I should go about fixing that?
<ni> Also, if there are stylistic changes you'd recomend, feel free. I'm not really sure I'm doing things in a functional programming sort of way.
<malc> drop java style naming
<ni> I don't know java, so I'm afraid I don't understand. :) If I've picked up habbits, they're probably from C++.
<malc> it really hurts, since the basic building blocks are functions, so fancy(and long) names hurt
<ni> OK.
<malc> bigArrayToString
<malc> each time creates new string
<malc> rethink it
<ni> OK, well it seems reasonable to suspect I'll see a performance improvment if I take string out of the picture entirely, and do everything arrays, or Bigarrays.
<malc> dont use ';;'
<malc> not really you can get around by other means, but duh, its 5:32 here, so dont expect reasonable suggestions
<malc> >B)
<ni> :)
<ni> Thanks for your help, it's much apreciated.
<ni> I mainly use ';;' at the end of functions, to return a value. How do I avoid using it?
<malc> n/p. remember ocamlopt -p comp.ml; gprof a.out | less is your friend
<malc> ';;' is leftover from Caml-light days, it's not needed for anything anymore
<malc> <well not really, but still>
<malc> and for your own sake (unless you want to get physical disorders) open Bigarray, will save you a lot of typing
<ni> malc: Yeah, I've opened Bigarray in the current version of it. Typing Bigarray. every time was getting painful. :)
<ni> OK, if I replace ';;' with ';', I get syntax errors..
<malc> no remove ';;' altogether(sp?)
<ni> OK. Done. It seems to give an error if I don't leave it in for the final "let" statment, I guess I leave it there?
<malc> do it like this
<malc> let _ = print_string ...
<ni> Whoa. Huh. OK. Done. Thanks again for all your help, I've learned a lot tonight.
<malc> anytime
<ni> Does the _ mean anything in particular, or is it just traditional?
<malc> _?
mellum has quit [Read error: 110 (Connection timed out)]
<ni> In let _ = print_string ..., we're assigning the output of print_string to "_". Is it convention to run your first function assigning it's output to '_', or does '_" actually mean something? ie, would it have made any different if it'd been "let b = print_string ..."?
<malc> it just means drop the result (_ has different meaning in pattern matching ofcourse)
<malc> anyhow, its time for me to go to bed, ta-ta
malc has quit ["no reason"]
ni has quit [Read error: 104 (Connection reset by peer)]
jao has quit ["leaving"]
pHa has quit [Read error: 104 (Connection reset by peer)]
pHa has joined #ocaml
malc has joined #ocaml
mellum has joined #ocaml
scott has quit [Read error: 113 (No route to host)]
teek has joined #ocaml
teek has quit ["Coffee break"]
pHa has quit [carter.openprojects.net irc.openprojects.net]
pHa has joined #ocaml
malc has quit ["no reason"]
malc has joined #ocaml
malc has quit ["no reason"]
clog has joined #ocaml
<jemfinch> so, who's here and experienced with O'Caml?
two-face has joined #ocaml
<jemfinch> two-face seems to be a new one...
<two-face> hi
<two-face> what for?
<jemfinch> I haven't seen you around these parts before :)
<two-face> true
<jemfinch> but it's all good, we like as many people as possible :)
<two-face> good :)
<gl> ho un francais
<gl> hum
<gl> *sorry*
<two-face> hmm
<two-face> you did not set your real name properly :)
<jemfinch> who?
<two-face> gl
<two-face> jemfinch: fincher like David ?
<gl> :)
<jemfinch> the director David FIncher?
<jemfinch> that same kind of fincher, though there's no relation to the director.
<jemfinch> my dad is actually also a David Fincher too, though.
<two-face> wow
<jemfinch> why do you ask?
<two-face> welcome to fight club :)
<jemfinch> ah, yes, and that's not even David Fincher's best movie :)
<jemfinch> he also directed "Seven" which is also very good.
<two-face> i didn't like seven
<two-face> i loved fight club
<jemfinch> oh, I think Seven had a far more surprising ending.
<jemfinch> fight club was good, but seven was so much more...dramatic.
<two-face> the end was surprising too
two-face is now known as tfaway
tfaway has quit [carter.openprojects.net irc.openprojects.net]
tfaway has joined #ocaml
* jemfinch needs a fixed length record database in O'Caml.
tfaway is now known as two-face
sjh_ has joined #ocaml
two-face has left #ocaml []
pHa has quit [Read error: 110 (Connection timed out)]
malc has joined #ocaml
jao has joined #ocaml