c355e3b has quit [Quit: Connection closed for inactivity]
aturley has joined #ponylang
aturley has quit [Ping timeout: 260 seconds]
trapped has joined #ponylang
jemc has quit [Ping timeout: 252 seconds]
aturley has joined #ponylang
aturley has quit [Ping timeout: 250 seconds]
trapped has quit [Read error: Connection reset by peer]
aturley has joined #ponylang
aturley has quit [Ping timeout: 252 seconds]
srenatus has joined #ponylang
gsteed has joined #ponylang
Applejack_ has joined #ponylang
Praetonus has joined #ponylang
aturley has joined #ponylang
aturley has quit [Ping timeout: 250 seconds]
aturley has joined #ponylang
aturley has quit [Ping timeout: 276 seconds]
lispmeister has joined #ponylang
unbalanced has joined #ponylang
unbalanced has quit [Ping timeout: 260 seconds]
_andre has joined #ponylang
aturley has joined #ponylang
c355e3b has joined #ponylang
aturley has quit [Ping timeout: 276 seconds]
copy` has joined #ponylang
nyarumes has joined #ponylang
nyarum has quit [Ping timeout: 260 seconds]
aturley has joined #ponylang
aturley has quit [Ping timeout: 276 seconds]
unbalancedparen has joined #ponylang
trapped has joined #ponylang
aturley has joined #ponylang
aturley has quit [Ping timeout: 250 seconds]
aturley has joined #ponylang
<shepheb>
oooh
<shepheb>
I might borrow that serialization for this daemon-client architecture
<shepheb>
so far I've been serving JSON over TCP, but that's a nuisance.
<shepheb>
since both sides can be Pony, using the serialization would be cool, and maybe a useful test too.
<jonas-l>
just keep in mind that it's not enough for both sides to be Pony. Binaries should be identical too.
<jonas-l>
so the architecture and probably OS also the same
<Applejack_>
Hi, with the help of #ponylang, I've been exploring how I could use Pony for scientific computations, and of course I need to bind to Gsl because Pony of course does not yet have anything like it. So here's a simple example that has two imbricated loops with the inner loop calling two Gsl functions, acosh and hypot (the latter is just the hypotenuse of a right triangle computed properly) : C is 5 times faster than Pony. Please
<shepheb>
jonas-l: that's fine in this case. it's just that there's a few large data blobs to load and index, so the daemon does that and then the client runs many smaller queries against it.
<Applejack_>
I originally benchmarked a program that requires random numbers, and with the help of this channel I figures out how to access Gsl random generaing functions through Pony, but because of how slow Pony was I tried to simplify the code and wrote that imbricated loop
<Praetonus>
Applejack_: Looks like you linked the same C code twice
<SeanTAllen>
Applejack_: how are you determining speed?
<Applejack_>
using "time"
<Applejack_>
I redirect the output to a file and use "time"
<SeanTAllen>
that isnt going to be accurate
<Applejack_>
I agreee, but C is 2s and Pony 10s
<SeanTAllen>
1. output from out print calls goes through an actor and stderr, stdout are the slowest things
<SeanTAllen>
2. pony programs dont exit as soon as they are "done"
<SeanTAllen>
they exit when the last actor is gc'd
<SeanTAllen>
that isnt nec going to be immediately after work has completed
<SeanTAllen>
doing the timing around your work in the program, you'll get a more accurate result
<SeanTAllen>
for the computational aspect
<SeanTAllen>
the print calls will have an impact if they are inside the timing loop as env.out.print is going through an actor. i'd suggest removing the print calls from both the c and pony to get the best computational comparison.
<Applejack_>
Ok, I will try that
<Praetonus>
Also, keep in mind that actors are why Pony is awesome. On non-parallel applications, Pony is unlikely to outperform C
<Applejack_>
Well, I guess Sylvan's Fibonacci benchmark bit me...
jemc has joined #ponylang
<Applejack_>
I'll try SeanAllen's suggestions, but a factor 5 in speed when accessing Gsl would be a bummer
jemc has quit [Client Quit]
jemc has joined #ponylang
<shepheb>
still struggling with generics and capabilities around this JSON stuff.
<shepheb>
I haven't managed to grok the algebra for capabilities with generics :/
<shepheb>
the input is a filename (String val, trivial) and the output is ideally an Array[Station val] val or similar, where Station is FromJSON.
<Applejack_>
SeanAllen: Same relative speed when removing print. Will try to bench the computation per se, but it really looks like calling Gsl produces an overhead.
<shepheb>
Applejack_: do your Pony types align nicely with the C types?
<shepheb>
ie. are you hitting the good case or forcing Pony to (un)marshall for the call?
<shepheb>
I haven't work with the FFI when speed was important, I've mostly used it for things like ncurses.
<shepheb>
the fundamental incompatibility is that the array elements aren't val unless I recover them, and if I recover them then I can't pass the non-sendable JsonParser.
<shepheb>
the only path I can see confidently is to split the constructor and from_json again, and make the latter take and return a JsonParser iso^.
<Applejack_>
shepheb: well, the C requires/outputs double and I use F64, if that is what (un)marshall pertains to
<shepheb>
Applejack_: yes, that's what I meant. you want the types to be actually-the-same on both sides, so that no conversions are necessary
tankfeeder has joined #ponylang
<Praetonus>
Pony calls C functions directly with no conversion, so I don't think this is the problem
<sylvanc>
applejack: do you have a gist for the C version of your gsl program?
<shepheb>
Praetonus: interesting. that's exactly the problem I hit, yeah.
<shepheb>
in the meantime, I managed to get it to compile by making everything Just So
<shepheb>
but I had to manhandle my API well away from the ideal, so I'm definitely following that bug.
<sylvanc>
applejack: im getting 1.33 seconds to run your pony example, which is about 13 nanoseconds per C call amortised (including all cost of setting up and tearing down a program, gc, everything)
<shepheb>
overall I guess it's not clear to me why generics are any different from a concrete type. I should be able to say I accept an A ref and return an A box or whatever.
<shepheb>
local type alias indeed.
<sylvanc>
shepheb: the reason, originally, was because of type expressions as bounds
<sylvanc>
ie if the bounds is (Foo ref, Bar val)
<sylvanc>
what does it mean to say (Foo ref, Bar val) iso ?
<sylvanc>
originally, you also couldn't put an rcap on a `type`
<sylvanc>
the answer is, i think, that such an rcap overrides
<sylvanc>
so this can (and should!) work for type parameters, exactly as you want
<SeanTAllen>
shepheb: there's an issue for that. i hit it a couple weeks back
<sylvanc>
applejack: if i link the libgsl.a instead of the dynamic library, that drops to 1.27 seconds
amclain has joined #ponylang
Praetonus has quit [Quit: Leaving]
nyarum has joined #ponylang
nyarumes has quit [Ping timeout: 252 seconds]
<jemc>
shepheb: catching up...
xenthree3 has joined #ponylang
xenthree3 has left #ponylang [#ponylang]
<jemc>
I still think you probably don't *need* to specify `A iso` with the approach I suggested at the end of our discussion
<jemc>
that is, putting the whole parsing and creation and population of the array into a recover block
<jemc>
but if you give me some concrete otherwise-compilable code (complete with a class that implements from_json) I can check my idea and then show you what I mean
<jemc>
I could be missing something, but in my head, my idea seems to work :)
<jemc>
also note that when you're trying to create a `val`, it can often solve many of your problems to create as a `trn` instead of `iso`, because it gives you more flexibility on passing box aliases of the object - in this case, that's not your main problem, but it's a nice tip to avoid problems in the future
<shepheb>
I think I implemented the idea correctly and ran into trouble, but I can't be sure now.
<shepheb>
more distressing is that I've debugged the parser and fed this file into it... and it still explodes while trying to load the data.
<shepheb>
I'm not sure why it's using so much memory.
<shepheb>
there are 62k records, and it's using well north of 1.5GB of memory, so that's at least 25M per record! it's just a simple Pony class with about 8 String fields and 8 int ones.
<shepheb>
am I angering the GC gods by creating a huge amount of garbage but not giving it a chance to finish collecting?
<jemc>
well, GC collection only happens between behaviours, so perhaps
<jemc>
also, I haven't thoroughly reviewed your gist for extra string allocations / copies, so that would be something else to consider
<shepheb>
there's only the Main actor here and this JSON parsing doesn't give it a break.
<shepheb>
I wouldn't think there was that much garbage being created, but it's possible.
<TwoNotes>
If an actor calls a behavior in itself, does that give GC a chance to run?
<jemc>
yep, you can use self-behaviour-tail-calls like a loop
<jemc>
and the GC can run in between those iterations
<shepheb>
I should probably turn the JSON loading code into a self-calling actor then.
graaff has joined #ponylang
<TwoNotes>
Watch out for tail-call optimization turning off when debug is on
<jemc>
I think that wouldn't matter for tail-behaviour calls, since they only put a message in the mailbox
<TwoNotes>
ah yes
<shepheb>
is there not a Promise.all or something?
<shepheb>
I guess that's harder to do with variable arguments.
Matthias247 has joined #ponylang
tankfeeder has joined #ponylang
tm-exa has joined #ponylang
tm-exa has quit [Client Quit]
copy` has quit [Quit: Connection closed for inactivity]
<shepheb>
it looks like my code is exiting before all promises are resolved.
bodie_ has quit [*.net *.split]
doublec has quit [*.net *.split]
Applejack_ has joined #ponylang
doublec has joined #ponylang
bodie_ has joined #ponylang
<jemc>
unless the promise-holders have dropped their references, that shouldn't happen
srenatus has quit [Ping timeout: 276 seconds]
mankyKitty has quit [Ping timeout: 276 seconds]
darach has quit [Ping timeout: 276 seconds]
srenatus has joined #ponylang
darach has joined #ponylang
mankyKitty has joined #ponylang
lispmeister has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
graaff has quit [Quit: Leaving]
unbalancedparen has quit [Ping timeout: 260 seconds]
<Applejack_>
SeanTAllen: shepheb: removing the print and all did nothing to make Pony faster relative to C, so I just removed the calls to Gsl and replaced them with simple math operations that Pony can do by itself without he help of Gsl, and then the speed is the same
<Applejack_>
So it really looks like the Gsl calls are really slowing everything down, which some other languages manage not to suffer from, so it should be solvable... hopefully
<jemc>
Applejack_: are you still timing the entire program execution including startup and shutdown, or are you timing the operations within the program (by getting monotonic time at start and end of the operations)?
<Applejack_>
Still the whole thing but I removed the print from the outer loop and put it outside. Is there a timing package I can use (didn't look yet) ?
<Applejack_>
But with that caveat, removing the Gsl calls achieves speed on par with C
<shepheb>
isn't that wrong? it starts at the head and goes backwards
tankfeeder has left #ponylang ["Leaving"]
lispmeister has joined #ponylang
<jemc>
Applejack_: there is a `time` package with a `Time` primitive that you can use - the `perf_begin` and `perf_end` methods are usually recommended because they prevent re-ordered instructions from "leaking" into or out of the perf block under test, but these return a cycle count instead of a monotonic clock time, so they may be harder to compare with whatever timing you get in C
<jemc>
so there are also methods on the `Time` primitive for getting monotonic `millis` or `nanos` or what have you
copy` has joined #ponylang
<SeanTAllen>
shepheb: question- are you working on streaming encoding and decoding for json or just decoding?
Applejack_ has quit [Ping timeout: 250 seconds]
<jemc>
Applejack_: also, looking at your Pony gist, you might try removing the use of `Range` in your hot path section, since that will be an allocation each time
<jemc>
you can use a while loop and a local variable to be more fair with respect to the C implementation, which isn't going to be allocating all those iterators
<jemc>
this is mainly a difference because you have a nested loop, and you'll be allocating a new iterator for the inner loop on each iteration of the outer loop
_andre has quit [Quit: leaving]
TwoNotes has quit [Quit: Leaving.]
lispmeister has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
Applejack_ has joined #ponylang
srenatus has quit [Quit: Connection closed for inactivity]
<Applejack_>
jemc: replacing the Range by a while does not really change anything...
<shepheb>
SeanTAllen: decoding only so far
unbalancedparen has quit [Ping timeout: 252 seconds]
<Applejack_>
jemc: I used the time package to measure the millis at the beginning/end inside the function that gets called, with a start time just before the the loops and finish time just after, and I basically get the same result as by simpky using gnu time on the command line, I mean, of course it's not the same exactly but still factor nearly 5 between Pony and C.
<Applejack_>
But as I said earlier, replacing Gsl calls by simple arithmetic operations which Pony handles all by itself then puts Pony on par with C.
<Applejack_>
So could it be the Gsl calls that somehow are super slow?
<Applejack_>
I mean inducing an overhead because of the way Pony does FFI ?
<Applejack_>
I really don't know anything about those matters
unbalancedparen has joined #ponylang
<SeanTAllen>
shepheb: and i'm working on an encoder
<SeanTAllen>
Applejack_: you shouldnt see that kind of overhead from C calls. Pony shares an ABI with C and there's no wrappers. You are calling the C functions. No indirection.
Applejack_ has quit [Ping timeout: 250 seconds]
<jemc>
Applejack_ - it would probably be interesting to look at the difference in the LLVM IR output between ponyc compiling pony and clang compiling C
<jemc>
Applejack_ - if you update your gists with your latest iteration, I will try to take a look tonight and see if I can find an issue
<jemc>
if you eliminated the Range objects, then the code generated by ponyc should be roughly the same as that generated by clang, and it sounds like that's not happening right now, so there may be an "easy win" available
<jemc>
I'm not a big fan of microbenchmarks in general, but in this case it sounds worthwhile to investigate the differences a bit deeper
nyarumes has joined #ponylang
nyarum has quit [Ping timeout: 252 seconds]
trapped has quit [Read error: Connection reset by peer]
aturley has quit [Ping timeout: 250 seconds]
<doublec>
I played around with AppleJack's gist
<doublec>
If I inline the Gsl primitive calls I get a 15x speedup