foobarquux changed the topic of #ocaml to: www.ocaml.org
jao has joined #ocaml
ni has joined #ocaml
<ni>
Whoa. #ocaml is busy tonight.
<ni>
I have a small amount of code here that's running much slower than I'd expect it to. My implementation is pretty naive, and I'm wondering if anyone would be able to take a look at it and offer some suggestions.
<ni>
Anyone? It's not much code at all - only 50 lines or so. It's a program to find a string within a file, 'grep' style.
<ni>
I'm sure my problem is pretty obvious to more experianced eyes...
<malc>
ni: send
<ni>
Thanks.
<ni>
That doesn't _exactly_ emulate grep, it finds the smallest string that doesn't exist in a file. The slow parts are the grep-like parts though.
<malc>
first nitpick, why dont you just mmap it into bigarray?
<ni>
I.. ermm.. Hrm. :) I suppose I wanted to make sure it was kept in memory, but I guess if the system had enough memory to load the file it would stay cached..
<malc>
what kind of input this comp/i is?
<ni>
Well, I just picked a random file. However, a good demonstration of the speed problem occurs if you use a (for example) 8 meg text file. It should find that the null character, the first one it tries, isn't present (since it's a text file), and it does - but it takes 20 or 30 seconds to process it.
<malc>
% cumulative self self total
<malc>
time seconds seconds calls us/call us/call name
<ni>
Is there another way to deal with large amounts of data?
<malc>
large=?
<ni>
600 or 700 MB
<malc>
since you dont need to access them randomly, sure
ni has quit [carter.openprojects.net irc.openprojects.net]
smkl has quit [carter.openprojects.net irc.openprojects.net]
gl has quit [carter.openprojects.net irc.openprojects.net]
ni has joined #ocaml
<ni>
malc: I've specified the array type, it brought the run time down from around 13s to around 12.5s.
smkl has joined #ocaml
gl has joined #ocaml
<ni>
Do I need to specify it for each each function (I have), or can I specify it for one and have ocaml infer the array type in the others?
<malc>
for each i guess, again while array access contributes to runtime, the real bottleneck is elsewhere (allocations)
<ni>
Yes, I suspected as much. Any idea how I should go about fixing that?
<ni>
Also, if there are stylistic changes you'd recomend, feel free. I'm not really sure I'm doing things in a functional programming sort of way.
<malc>
drop java style naming
<ni>
I don't know java, so I'm afraid I don't understand. :) If I've picked up habbits, they're probably from C++.
<malc>
it really hurts, since the basic building blocks are functions, so fancy(and long) names hurt
<ni>
OK.
<malc>
bigArrayToString
<malc>
each time creates new string
<malc>
rethink it
<ni>
OK, well it seems reasonable to suspect I'll see a performance improvment if I take string out of the picture entirely, and do everything arrays, or Bigarrays.
<malc>
dont use ';;'
<malc>
not really you can get around by other means, but duh, its 5:32 here, so dont expect reasonable suggestions
<malc>
>B)
<ni>
:)
<ni>
Thanks for your help, it's much apreciated.
<ni>
I mainly use ';;' at the end of functions, to return a value. How do I avoid using it?
<malc>
n/p. remember ocamlopt -p comp.ml; gprof a.out | less is your friend
<malc>
';;' is leftover from Caml-light days, it's not needed for anything anymore
<malc>
<well not really, but still>
<malc>
and for your own sake (unless you want to get physical disorders) open Bigarray, will save you a lot of typing
<ni>
malc: Yeah, I've opened Bigarray in the current version of it. Typing Bigarray. every time was getting painful. :)
<ni>
OK, if I replace ';;' with ';', I get syntax errors..
<malc>
no remove ';;' altogether(sp?)
<ni>
OK. Done. It seems to give an error if I don't leave it in for the final "let" statment, I guess I leave it there?
<malc>
do it like this
<malc>
let _ = print_string ...
<ni>
Whoa. Huh. OK. Done. Thanks again for all your help, I've learned a lot tonight.
<malc>
anytime
<ni>
Does the _ mean anything in particular, or is it just traditional?
<malc>
_?
mellum has quit [Read error: 110 (Connection timed out)]
<ni>
In let _ = print_string ..., we're assigning the output of print_string to "_". Is it convention to run your first function assigning it's output to '_', or does '_" actually mean something? ie, would it have made any different if it'd been "let b = print_string ..."?
<malc>
it just means drop the result (_ has different meaning in pattern matching ofcourse)
<malc>
anyhow, its time for me to go to bed, ta-ta
malc has quit ["no reason"]
ni has quit [Read error: 104 (Connection reset by peer)]
jao has quit ["leaving"]
pHa has quit [Read error: 104 (Connection reset by peer)]
pHa has joined #ocaml
malc has joined #ocaml
mellum has joined #ocaml
scott has quit [Read error: 113 (No route to host)]
teek has joined #ocaml
teek has quit ["Coffee break"]
pHa has quit [carter.openprojects.net irc.openprojects.net]
pHa has joined #ocaml
malc has quit ["no reason"]
malc has joined #ocaml
malc has quit ["no reason"]
clog has joined #ocaml
<jemfinch>
so, who's here and experienced with O'Caml?
two-face has joined #ocaml
<jemfinch>
two-face seems to be a new one...
<two-face>
hi
<two-face>
what for?
<jemfinch>
I haven't seen you around these parts before :)
<two-face>
true
<jemfinch>
but it's all good, we like as many people as possible :)
<two-face>
good :)
<gl>
ho un francais
<gl>
hum
<gl>
*sorry*
<two-face>
hmm
<two-face>
you did not set your real name properly :)
<jemfinch>
who?
<two-face>
gl
<two-face>
jemfinch: fincher like David ?
<gl>
:)
<jemfinch>
the director David FIncher?
<jemfinch>
that same kind of fincher, though there's no relation to the director.
<jemfinch>
my dad is actually also a David Fincher too, though.
<two-face>
wow
<jemfinch>
why do you ask?
<two-face>
welcome to fight club :)
<jemfinch>
ah, yes, and that's not even David Fincher's best movie :)
<jemfinch>
he also directed "Seven" which is also very good.
<two-face>
i didn't like seven
<two-face>
i loved fight club
<jemfinch>
oh, I think Seven had a far more surprising ending.
<jemfinch>
fight club was good, but seven was so much more...dramatic.
<two-face>
the end was surprising too
two-face is now known as tfaway
tfaway has quit [carter.openprojects.net irc.openprojects.net]
tfaway has joined #ocaml
* jemfinch
needs a fixed length record database in O'Caml.
tfaway is now known as two-face
sjh_ has joined #ocaml
two-face has left #ocaml []
pHa has quit [Read error: 110 (Connection timed out)]