<Shorttail_>
I'm running a prime number counter in a single actor on my 64 bit Windows 7 machine, ponyc version 0.10.0. It takes 50% of the 4 core 8 thread i7. If I run multiple actors all doing calculations, it also runs at 50%, although it finishes faster. Is the program supposed to user 50% all the time? It it not detecting hyperthreading?
<Shorttail_>
The program is here, switch the comment to disable multiple actors
<SeanTAllen>
Shorttail_: I dont use Windows so I can't answer that, cpu.c has the code where Pony figures out what cpus are available. on my i7, OSX, it detects 4 cpus and uses those
<SeanTAllen>
ive never specifically checked to see what it is doing with hyperthreads
<SeanTAllen>
ponyint_cpu_init() is probably what you are most interested in
<Shorttail_>
It would seem like it completely ignores threads then, unless you manually disabled yours.
<SeanTAllen>
its using the GetLogicalProcessorInformation() system call
<SeanTAllen>
i dont know the Windows data structures but...
<SeanTAllen>
I assume a relation from what ive just read would be the cpu and hyperthread in which case, yes, it just uses the cpu
<Shorttail_>
Yep, only logical cores. I don't see any comments in cpu.c about performance. Maybe adding hyperthreads doesn't improve performance, maybe it wasn't tried
<SeanTAllen>
so, at Sendence
<SeanTAllen>
we are building a high performance streaming data processing system
<SeanTAllen>
we do a lot of testing in amazon
<SeanTAllen>
where they expose hyperthreads as part of a "VCPU"
<SeanTAllen>
so if they list 8 VCPUS, that 4 real cores, and 4 hyperthreads
<SeanTAllen>
performance in that env for us, is far worse when the hyperthreads are used. we avoid using them.
<Shorttail_>
I think the main reason for hyperthreading and other SMT is that the extra it offers, even if not much, costs alsmost no extra power
<SeanTAllen>
sylvan and i have been looking at strategies for making pony be able to handle more network throughput, perf on that front is already quite good, but we want to make it really good. for that, we might end up leveraging hyperthreads
<SeanTAllen>
pony is really good at using all the cpu without hyperthreads
<Shorttail_>
Do you run tasks that take 100% cpu? If you max out all cores it should be fasterthan without hyperthreading
<Shorttail_>
I see
<SeanTAllen>
yes
<SeanTAllen>
its easy to max out all the cores when pushing a pony program
<Shorttail_>
Some of the threads are surely doing nothing though, with a single actor I still hit max
<SeanTAllen>
so...
<SeanTAllen>
by default, pony will start X number of schedulers where X is the number of cpus
<Shorttail_>
By doing nothing I mean busy waiting
<SeanTAllen>
you can pass --ponythreads to a pony program to change the number of threads
<SeanTAllen>
in addition to those
<Shorttail_>
Ahh, I'll try that
<SeanTAllen>
there is 1 additional thread for asio events
<SeanTAllen>
on Linux when we want best performance, what we do is...
dougmacdoug has joined #ponylang
<SeanTAllen>
lets say we want 4 scheduler threads
<SeanTAllen>
we will set aside 5 cpus only for the pony program
<SeanTAllen>
the first 4 get used scheduler
<SeanTAllen>
the last for asio
<SeanTAllen>
--ponypinasio pins the asio thread to the last available cpu
<SeanTAllen>
that is going to get your best performance
jemc has quit [Ping timeout: 240 seconds]
<Shorttail_>
And those threads busy wait even if the progrsm could potentially be single threaded?
<SeanTAllen>
again on Linux, we set aside 1 cpu for the OS and use the rest for pony
<SeanTAllen>
so
<SeanTAllen>
that would be a work stealing question
<SeanTAllen>
work stealing needs some work, right now it involves some hand tuning
<SeanTAllen>
lets talk pony scheduling for a moment
<Shorttail_>
My single actor does CPU work, nothing else happens, the behavior is pretty long
<SeanTAllen>
by default when an actor sends a message to another actor
<SeanTAllen>
if the receiving actor isnt already scheduled, it will be scheduled on the same scheduler as the sender
<SeanTAllen>
so when your program starts
<SeanTAllen>
everything would be on 1 scheduler
<SeanTAllen>
the other schedulers, when they have no actors to schedule will attempt to steal actors from other schedulers
<SeanTAllen>
and in this way, work gets distributed across the available schedulers
<SeanTAllen>
there is overhead to work stealing and if it happens to often, performance can suffer in which case its best to lower the number of ponythreads to get better performance
<SeanTAllen>
if you were to profile such a program you would see most of its time spent in work stealing
<SeanTAllen>
ive tried a number of strategies to back off work stealing when there isnt enough work for all threads but thus far they have all had a large impact on "full bore" performance so i havent opened any PRs
<SeanTAllen>
thats a bit more detailed of an answer than you might have been looking for, hopefully not too much info
<Shorttail_>
I get it. It makes sense to not senselessly tune the runtime for single thread performance when that is not what pony is made for
<SeanTAllen>
well
<SeanTAllen>
its not just single thread
<SeanTAllen>
for example at sendence
<SeanTAllen>
if we have 8 cores set up
<SeanTAllen>
but only run at 100k messages a second, we get a lot of work stealing overhead
<SeanTAllen>
ideally we wouldnt
<SeanTAllen>
but that is a difficult thing to balance
<SeanTAllen>
at the moment pony does the simple thing and leave it to you to tune using ponythreads to your workload
<SeanTAllen>
the problem is you might have a variable workload, so its something we are working on
<SeanTAllen>
and by "we", that has really been me.
<SeanTAllen>
i've tried about 20 strategies so far, none worked out
<SeanTAllen>
if you have runtime questions, i'm probably one of the best people to ask. feel free to get my address off the mailing list if you ever have runtime questions and i'm not around here to answer
<Shorttail_>
I tested with all 8 threads enabled, and it had a speedup of only 20% over 4 threads, so I guess it's not worth it to use hyperthreads by default, seeing as they affect the cache as well]
<SeanTAllen>
ya
<Shorttail_>
Thank you
<SeanTAllen>
you're welcome
Shorttail_ has quit [Quit: Page closed]
dougmacdoug has quit []
jemc has joined #ponylang
rurban has joined #ponylang
rurban has quit [Client Quit]
jemc has quit [Ping timeout: 268 seconds]
jemc has joined #ponylang
rurban has joined #ponylang
graaff has quit [Quit: Leaving]
abeaumont has quit [Ping timeout: 264 seconds]
jemc has quit [Ping timeout: 240 seconds]
jkleiser has joined #ponylang
rurban has left #ponylang [#ponylang]
jkleiser has quit [Remote host closed the connection]
jkleiser has joined #ponylang
_andre has joined #ponylang
jkleiser has quit [Remote host closed the connection]
jkleiser has joined #ponylang
rurban1 has joined #ponylang
dougmacdoug has joined #ponylang
abeaumont has joined #ponylang
jemc has joined #ponylang
abeaumont has quit [Ping timeout: 240 seconds]
jkleiser has quit []
amclain has joined #ponylang
abeaumont has joined #ponylang
obadz has quit [Ping timeout: 252 seconds]
obadz has joined #ponylang
Matthias247 has joined #ponylang
abeaumont has quit [Ping timeout: 240 seconds]
rurban1 has quit [Quit: Leaving.]
dougmacdoug has quit [Remote host closed the connection]
_andre has quit [Quit: leaving]
prettyvanilla_ has joined #ponylang
prettyvanilla has quit [Ping timeout: 268 seconds]
kr1shnak has quit [Quit: bye bye]
<lisael>
Hi, there... I have to sleep, so I just drop this here :
<jemc>
it's an unfinished project, but it takes the approach of an in-language "DSL" rather than an actual DSL with a first-clas syntax
<lisael>
jemc: I know pegasus, of course ( otherwise my project would be named pegasus :D )
aedigix- has quit [Ping timeout: 264 seconds]
<lisael>
BTW I don't think pony is made to make DSLs, and the aproach is different here.
<lisael>
I generate pony code ( I read somewhere, maybe in the tutorial that it's not something desirable, though )
<lisael>
I'd like to create toolings that allow someone to generate the parser and ship it in their project without even depending on pony-peg
<lisael>
(for some reasons, at the moment, the generated code has to `use "peg"`
<lisael>
)
<jemc>
I've definitely come to have a positive stance toward code generation in Pony
<jemc>
especially for this sort of thing, like a codec or large state machine
aedigix has joined #ponylang
abeaumont has quit [Ping timeout: 256 seconds]
<lisael>
to be clear, what i have in mind is writing the pony grammar (not too hard, I think, just have ot port the ANTLR grammar)
<lisael>
then generate pony-ast
<lisael>
and experiment with macros or stuff like elixir sigils
<lisael>
I have to sleep, realy, now :)
<lisael>
bye.
<jemc>
lisael: FYI I am in the middle of massively revising pony-ast to be more static in nature
<jemc>
using code generation, in fact
<jemc>
not sure whether the current "dynamic" `AST` class will be kept around as an intermediate step that can be transformed into the static version, or not
<jemc>
would definitely be interested in feedback about generating the static version of the codebase directly
kr1shnak has joined #ponylang
kr1shnak_ has joined #ponylang
<jemc>
sorry, I mean generating the static data structure directly
kr1shnak has quit [Ping timeout: 260 seconds]
<jemc>
the only reason this work is on pause is because I found that the ponyc compiler gets bogged down to be quite slow when compiling the compiling the pony-ast/static codebase