<Drup>
(and the compiler is never going to unroll that for you)
johnnydiabetic has quit [Ping timeout: 240 seconds]
<aggelos_>
hmm
<aggelos_>
the ocp-build readme talks about how annot files will be next to the source files, but in practice I find them in _obuild/prog/temp/*.annot
<aggelos_>
am I doing something incorrectly?
<aggelos_>
little mention of -annot in the ocp-build manual...
johnnydiabetic has joined #ocaml
johnnydiabetic has quit [Client Quit]
philtor_ has joined #ocaml
ygrek has quit [Ping timeout: 272 seconds]
<aggelos_>
quiz question: I have let func (f : 'el -> 'nel p) (l : 'el list) : ('nel list * blah list) = ...
<aggelos_>
(where p and blah are defined)
<aggelos_>
yet when I use annot to print out the type of func, I get: ('a -> 'a p) -> 'a list -> 'a list * blah list
<mrvn>
so?
<aggelos_>
what am I doing wrong here? is it the case that the scope of 'el and 'nel is not the whole definition?
<mrvn>
nothing. The compiler infered that 'el == 'nel
<aggelos_>
mrvn: well, this seems to force the types in the sig to be the same, so I get a type error when using the function from a different module
<aggelos_>
mrvn: ok, I don't understand why
<aggelos_>
this function is not called in this module
<aggelos_>
or referenced in any way, in fact
<mrvn>
well, something makes it think they are the same
<Drup>
aggelos_: 'el and 'nel are merely type variables
<Drup>
the compiler can put anything as value of this type variables
<aggelos_>
other than a call of func with a concrete 'f' passed in as argument, I don't think I know of any other way that the compiler could infer things about 'el and 'nel
<Drup>
"let x : 'a list = [ 4 ]" is valid
<aggelos_>
Drup: sure, my problem is that the resulting type signature essentially forces that f : 'a -> 'a p
<aggelos_>
which is of course not the case
<Drup>
of course ?
<Drup>
the compiler think it is, and he's usually better than you at figuring that out
<mrvn>
show the source
<aggelos_>
Drup: that is my experience too, so I'm trying to figure out what information I'm mistakenly passing along
<aggelos_>
mrvn: sec, lemme do a simple testcase
axiles has quit [Remote host closed the connection]
<aggelos_>
mm yah, turns out I don't have ocaml 3.12.1 on this box, lemme move things to a box with opam...
<Drup>
I doubt it will change with ocaml versions
<mrvn>
and while you do that we could already look at the source and find the bug
<aggelos_>
Drup: I also doubt that, but I try not to make such assumptions ;)
<aggelos_>
mrvn: ForkWork just farms out the computation to worker processes and brings in & unmarshals the result
<aggelos_>
the badly-named ForkWork.fold_enumerator does the same, but tries to call the f_acc function (to overwrite the input with the result in the accumulator) as results come in, so that we don't keep (almost) the same data in memory twice until the computation for everything finishes
<aggelos_>
anyways, that was mostly a high-level explanation
philtor_ has quit [Ping timeout: 240 seconds]
<aggelos_>
mostly interested to find out possible sources for the problem, I'm happy to debug my own code
rand000 has joined #ocaml
<mrvn>
What's the signature of ForkWork.fold_enumerator?
<aggelos_>
mrvn: it's provided as an annotation to the original paste, just scroll down a bit
<aggelos_>
(I had trouble with using a type variable in the place of int * 'c there too, but needed to move forward a bit)
<mrvn>
Error: Unbound value update_bench
<aggelos_>
mrvn: oh so you do want to compile it. hold on then
Kakadu has quit [Quit: Konversation terminated!]
<Drup>
merlin doesn't even need update_bench to tell you that's it's going to unify el and nel :p
<aggelos_>
Drup: again, I can see what's happening. any clue as to how to find out the *why* would be most helpful
<aggelos_>
update_bench is basically:
<aggelos_>
let update_bench mb = let mb = new Perfacc.marshalled_bench mb in mb#replay ()
<aggelos_>
so lemme just cook up a .mli for that
<mrvn>
fold_enumerator takes "(int * 'acc, 'el) folder" and you are passing "('acc2 -> 'nel -> 'acc2) -> 'acc2 -> 'acc2"
slash^ has quit [Read error: Connection reset by peer]
<aggelos_>
mrvn: fair point, I haven't run into that yet, as I can't fully compile the file
<mrvn>
That make 'el == 'nel
<aggelos_>
any clues as to why I can't use 'acc2 in the place of (int * 'acc) in that sig would be welcome too
<aggelos_>
mrvn: oh
<Drup>
aggelos_: to debug this kind of issue, you can do modify the declaration of fold_over into : " (type a) (type b) (f_el : a -> b workfunc_result) (f_acc : 'acc -> int -> b -> 'acc) (acc : 'acc)"
<aggelos_>
Drup: ah! thanks
<aggelos_>
Drup: I think I've seen that in one of the books, but couldn't find it today
<mrvn>
'el -> int * ('acc * ('nel workfunc_result, exn_with_bt) result list)) ->
<mrvn>
int * ('acc * ('nel workfunc_result, exn_with_bt) result list) ->
<mrvn>
int * ('acc * ('nel workfunc_result, exn_with_bt) result list)) ->
<Drup>
aggelos_: the bottom line is don't annotate like that
<mrvn>
'acc * exn_with_bt list
<mrvn>
You consider that a simple type?
nikki93 has left #ocaml [#ocaml]
<Drup>
mrvn: give it some type alias and use -short-type
<Drup>
it's going to be much smaller
<mrvn>
still way to complex code
<Drup>
that, I agree
<aggelos_>
Drup: sure, was debugging another type issue when I added all those annotations
<Drup>
but the type issue come from the annotations
<mrvn>
I assumed he added the annotations to figure out some error
<aggelos_>
mrvn: the issue here is that I need to add the result to the accumulator asap
<Drup>
probably, but he introduced the error we were trying to debug ;)
<aggelos_>
mrvn: because this code forks off possibly thousands of processes (only ncores at once though, of course)
<mrvn>
aggelos_: why?
<aggelos_>
mrvn: memory usage explodes
Eyyub has joined #ocaml
<aggelos_>
mrvn: the elements in the hashtable are very large when marshaled (30MB is common)
alpounet has joined #ocaml
<mrvn>
you are building 2 lists: One with OK results and one with backtraces. How does building those list earlier save memory?
ontologiae has joined #ocaml
<aggelos_>
mrvn: so if I wait until all the results come in, I'm holding N * 30MB in memory at once
<aggelos_>
mrvn: it's not about building the lists earlier, it's about doing the f_acc call asap
<aggelos_>
mrvn: in either case, the boxed result gets put into the list
<mrvn>
aggelos_: because that only saves res.ret instead of the whole res?
<aggelos_>
mrvn: no, because the user-supplied accumulator will be doing something like BatHashtbl.replace
<aggelos_>
mrvn: so the /old/ version of the structure gets dropped as soon as we have the old one
<mrvn>
in map_list you set f_acc to (fun list _ el -> el :: list)
<aggelos_>
mrvn: I won't be using map_list any more, that's just to keep the code compiling or for farming out computations that deal with smaller structures
<aggelos_>
mrvn: also to verify correctness, of course
<aggelos_>
well, test for obvious correctness issues, at least ;)
alpounet has quit [Ping timeout: 272 seconds]
<aggelos_>
would have just used BatEnum.t instead of this folder, actually, but might consider submitting the change to forkwork at some point
ontologiae has quit [Ping timeout: 272 seconds]
<Drup>
where does forkwork comes from ?
<Drup>
(and believe, you don't want to use BatEnum)
<mrvn>
And why not let the f_acc deal with the ('a, 'b) results? Maybe it wants to know which elements failed
<mrvn>
aggelos_: going back to your ram usage: (f_acc acc i res.ret, result :: results)
<mrvn>
aggelos_: you put all the results in the second part of the tuple. So you have the ram usage anyway.
<aggelos_>
Drup: some person who put it up on github?
<aggelos_>
mrvn: but I do want all the results
<mrvn>
aggelos_: but you said you need to call f_acc early so you can free the memory.
<aggelos_>
mrvn: say I have N input elemens of a (large) size M
<aggelos_>
mrvn: and they're in a hashtable
<aggelos_>
mrvn: if I use map_list and then do the replace
<Drup>
huum, didn't know this forkwork
<aggelos_>
and the work function basically just mangles each element a bit
<aggelos_>
mrvn: then, until list_map returns, I'll have in memory /both/ the original version of the N elements /and/ the new one
<aggelos_>
mrvn: (there's no sharing, as the results have to be marshaled back)
<aggelos_>
mrvn: so what I want to do is to BatHashtbl.replace the old version of an element as soon as the new one comes in
<Drup>
aggelos_: are you sure you gain from using forking ?
<aggelos_>
Drup: yes
<Drup>
ok
<aggelos_>
Drup: forking does a pipeline of transformations
<Drup>
because marshaling is a performance killer
<aggelos_>
Drup: I'm aware, but the numbers are pretty clear
<Drup>
ok
<Drup>
(I was just making sure you benchmarked it :p)
<aggelos_>
Drup: my main concern is that forking kills all sharing, so memory usage increases quite a lot anyway
<Drup>
yeah
<mrvn>
aggelos_: which you do have anyway I think
<aggelos_>
mrvn: I'm sorry? what do I have anyway?
<mrvn>
aggelos_: both all the old items and all the new items.
<aggelos_>
mrvn: Hashtbl.replace should be dropping the only reference to the old element
<aggelos_>
mrvn: so then it will get GC'd at some point
<mrvn>
aggelos_: at least the way you use it in map_list you end up with both the input list and output list being alive till fold_over returns.
<aggelos_>
mrvn: see above, I won't be using map_list, the code will be updated to use fold_enumerator directly
<aggelos_>
mrvn: I mean, the code which currently uses map_list
<aggelos_>
thank you both for your help btw
<mrvn>
aggelos_: good luck.
<mrvn>
when you're done do check that the GC can actually free up stuff like you think.
<aggelos_>
mrvn: yah, took me quite some time to find the actual point where memory explosion happens and I'm not even 100% sure my conclusion is correct atm...
<aggelos_>
mrvn: honestly, I'm mostly worried about the unsharing :/
<aggelos_>
err, a few lines above, I meant the code will be updated to use fold_over, of course (doing that now)
<mrvn>
the fold function will keep your input list alive
<aggelos_>
mrvn: hmm
<aggelos_>
mrvn: that's a good point
<aggelos_>
mrvn: so I'd have to do acrobatics where I temporarily stick the elements in a different data structure (possibly an array)... damn
<aggelos_>
this is getting hairier and hairier
<mrvn>
use an enumeration, not a list
<aggelos_>
mrvn: where do you see a list in fold_over?
<Drup>
aggelos_: look at recent ephemerons, they might help you out
<mrvn>
the fold function
<aggelos_>
mrvn: ah you mean for the intermediate data structure?
<aggelos_>
mrvn: well, the fold is supplied by the caller and initially I was going to use BatHashtbl.fold
<aggelos_>
well, in a lambda to massage the arguments, possibly
<mrvn>
is that save to use while you modify the table?
<aggelos_>
Drup: ok, dunno what that is
<Drup>
basically, weaktbl
eikke_ has joined #ocaml
<aggelos_>
mrvn: probably not. would need to get the keys out first then, then fold on those... that would work nicely actually
<aggelos_>
Drup: oh k. I don't think it's necessary here
<aggelos_>
Drup: or especially convenient
<aggelos_>
Drup: oh I see. I'm stuck with 3.12.1 atm, as that's the latest version supported by the framework I'm using