SeanTAllen changed the topic of #wallaroo to: Welcome! Please check out our Code of Conduct -> https://github.com/WallarooLabs/wallaroo/blob/master/CODE_OF_CONDUCT.md | Public IRC Logs are available at -> https://irclog.whitequark.org/wallaroo
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nisanharamati has quit []
nemosupremo has joined #wallaroo
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nemosupremo has joined #wallaroo
nemosupremo has quit [Client Quit]
nemosupremo has joined #wallaroo
nemosupremo has quit [Client Quit]
aturley has joined #wallaroo
aturley_ has quit [Ping timeout: 252 seconds]
nemosupremo has joined #wallaroo
aturley has quit [Ping timeout: 240 seconds]
aturley_ has joined #wallaroo
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nemosupremo has joined #wallaroo
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
vaninwagen has joined #wallaroo
vaninwagen has quit [Ping timeout: 245 seconds]
<jtfmumm> I recently uploaded a blog post about how we approached designing Wallaroo with performance in mind and some principles that could be useful when building your own performance-sensitive software systems. You can read it here: https://blog.wallaroolabs.com/2018/02/how-we-built-wallaroo-to-process-millions-of-messages/sec-with-microsecond-latencies/
aturley has joined #wallaroo
aturley_ has quit [Ping timeout: 240 seconds]
aturley_ has joined #wallaroo
aturley has quit [Ping timeout: 240 seconds]
nemosupremo has joined #wallaroo
nemosupremo has quit [Client Quit]
nemosupremo has joined #wallaroo
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nemosupremo has joined #wallaroo
alanbato has joined #wallaroo
alanbato has quit [Client Quit]
<nemosupremo> I can't remember if I brought this up - but is there a way to provide your own kafka offsets?
<nemosupremo> If I make significant changes to my Python code (like even change class names), I notice I have to delete the entire state
<nemosupremo> and in that case, it seems I may lose data as Wallaroo will start again at the newest offset
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nemosupremo has joined #wallaroo
<SeanTAllen> nemosupremo: that work is underway right now. and we are starting on work to allow you to migrate state with schema changes. because pickle includes the class name it does have issues if you change the class name. if you use a serialization format that isn't tried to the class name at all then that wouldnt be an issue (unless the state object schema changes).
<nemosupremo> I'm trying to understand the simple performance metrics I'm seeing. In my decode message from kafka, I decode I also include the time in the message. At the end, before encode, I print out the time taken from decode-encode
<nemosupremo> I see that it takes 2-3 seconds to go all the way through, but the numbers dont look right, the first message takes 4s, while the last takes 2s
<nemosupremo> Does wallaroo buffer messages in each stage?
<nemosupremo> nvm my computation just seems to be slow
dss has joined #wallaroo
dss has quit [Client Quit]
<nemosupremo> wallaroo def seems to be buffering the messages or something, I'd just like an insight into how. Currently I see a stream of messages every 10 seconds
<slfritchie> What are you using for your TCP sink process, nemosupremo ?
<nemosupremo> giles, but I"m ignoring it
<nemosupremo> I basically print in the encoder function
<nemosupremo> and what I'm seeing is that the first print statement in a "stream" comes 5-6 seconds after it was decoded
<nemosupremo> (and the last comes like ~50ms, which is consistent with my compute)
<nemosupremo> I'm taking a dive into the source code to try and understand if theres any buffering going on after a message gets decoded and what the parameters are
<nemosupremo> more generally, I'm trying to understand how these buffers may affect our real time SLAs (as I'm trying to come up with a general number of when a message hits kafka how long will it take to get processed?)
<slfritchie> Do your print statements have a newline in them? Are you using stdout or stderr or another file descriptor?
<nemosupremo> yeah they have a new line, just print to stdout
<nemosupremo> def encoder(data): print time.time() - data[0]; return struct.pack(">If", 4, 0)
<slfritchie> There isn't any explicit buffering in wallaroo's internals that would cause that kind of delay.
<SeanTAllen> nemosupremo: its not completely linear. each underlying actor for something like decoding, encoding, state computation etc will process up to X number of messages in a go which improves performance.
<nemosupremo> SeanTAllen: is this a dynamic heuristic, or an internal one?
<SeanTAllen> it doesnt do 1 message all the way through. we'd read a chunk of data off of a source and decode those and they get routed on to the state computation or stateless computation that processes it.
<SeanTAllen> but that would not result in that kind of delay.
<nemosupremo> Why wouldn't that cause a delay? If decode holds my message until it sees X number of messages, isn't that a delay?
<SeanTAllen> it will process up to 100 messages at any computation at a time, if 100 were available. ditto for a sink.
<SeanTAllen> how many it reads off a source is different
<SeanTAllen> no no
<SeanTAllen> it doesnt wait for X number of messages
<SeanTAllen> it will process up to X number of messages
<SeanTAllen> so if there is only 1
<SeanTAllen> it is processed
<SeanTAllen> it doesnt wait for 100
<SeanTAllen> its not a batch, its a limit on the number before it has to give up the scheduler to other processing.
<SeanTAllen> so if you send 1 message, it will get processed by all steps immediately.
<SeanTAllen> it wouldnt be waiting for others to arrive.
<SeanTAllen> does that make sense?
<nemosupremo> how do you know there's another message and not just 1?
<nemosupremo> don't you have to poll a socket?
<SeanTAllen> no
<SeanTAllen> its all async io
<SeanTAllen> the tcp source is async io
<nemosupremo> I might have donkey brains so bear with me
<SeanTAllen> its alerted when there is data
<SeanTAllen> it read to...
<nemosupremo> what I'm currently seeing is every X seconds I get a stream of exactly 500 messages
<SeanTAllen> ok, so, does that async io for the source make sense or should i go into that?
<SeanTAllen> are you using the tcp source or the kafka source?
<nemosupremo> I understand 100% what you are saying theoretically, I'm trying to match that with the output I'm seeing from my application
<nemosupremo> Kafka Source
<nemosupremo> In my stream the first message is
<nemosupremo> 6.519, ('DEVICEID=248170905114908013378', 1519772495.34687)
<nemosupremo> and the last massage is 0.052, ('DEVICEID=177170626113313718856', 1519772501.857092)
<nemosupremo> where 6.519 and 0.052 is the number of seconds between that message was decoded
<nemosupremo> To *me* that looks like a message could be decoded, hang around for 5 seconds, before its ultimately computed and/or encoded
<nemosupremo> and given that a stream has exactly 500 messages, I'm thinking that wallaroo is waiting for 500 messages before it shuffles data across through the stream
<SeanTAllen> thats not how it works so that is odd
<SeanTAllen> ill need to dig into the kafka source and consumer though and see if there is anything with that.
<SeanTAllen> and talk to Dipin who has done most of that work.
<SeanTAllen> im going to refer him to his conversation and what you are reporting and get his thoughts nemosupremo
<nemosupremo> ok, FWIW, I don't care that there is a delay, I'm simply trying to understand how the underlying system works
<nemosupremo> I don't even think I should be calling it a delay
<SeanTAllen> yeah that isnt how the underlying system works
<SeanTAllen> if it happening, its not intended and could be a bug.
<SeanTAllen> well
<SeanTAllen> i know it is happening, just not sure why
<SeanTAllen> i trust that you are seeing what you say you are seeing
<SeanTAllen> i dont have an immediate explanation
<SeanTAllen> do you computations take a long time to run? or your messages a long time to decode?
<SeanTAllen> for some definition of "long time"
<nemosupremo> I have a to(compute) step that takes like ~50ms, which is all CPU (which is why I consider it long)
<slfritchie> I wonder if there's an interaction with computation -> actor eliding + Machida's use of `--ponythreads=1` could be the "batching" factor. Multiple computation steps could be computed by a singleactor without the message passing & yielding that an unelided pipeline with multiple actors would have.
<SeanTAllen> i agree that could happen slfritchie but 500 is weird, given the internal max for messages to process at a step is 100.
<SeanTAllen> its a single `to` right nemosupremo?
<nemosupremo> thats what I thought as well, given that even though its async, it still has to "defer" to something else to get new messages
<nemosupremo> right
<nemosupremo> source -> to -> sink
<SeanTAllen> o yeah...
<SeanTAllen> hmmm
<slfritchie> 100 input messages x 0.050 seconds total processing time is 5 seconds, which is pretty near 6 seconds. {shrug}
<SeanTAllen> so something it could be, i need to check with jtfmumm when he starts (hes in europe), is that the `to` is being coalesced into the source. but 500 still seems weird.
<SeanTAllen> sorry, to coalesced is a rather technical internal term nemosupremo
<SeanTAllen> it basically means because there is no state that the `to` operates on, we turn into a single step with the proceeding step.
<nemosupremo> What slfritchie said sounds about right
<SeanTAllen> that might result in what you are seeing. although batch of 500, still seems odd.
<slfritchie> (sorry I used "elided" instead of "coalesced")
<SeanTAllen> if it was 100 at a time that could make sense
<SeanTAllen> @jtfmumm ^ does that seem correct to you?
<SeanTAllen> nemosupremo: i suspect that is what is happening.
<SeanTAllen> high computation times could end up looking like batching.
<SeanTAllen> we should probably open up a way to change that, either allowing a user to set or based on computation time.
<SeanTAllen> given that right now the python uses a single thread, it would be much more noticeable there.
<nemosupremo> If I make my compute function a no-op I see those numbers become .021 and .106 respectively, so I think he's right
<SeanTAllen> ya
<SeanTAllen> so that is the "up to 100"
<SeanTAllen> so theres 100 available so it processes them
<SeanTAllen> which is generally good for cache coherence
<SeanTAllen> particularly with state computations
<SeanTAllen> im off to eat dinner now. slfritchie i leave nemosupremo in your hands.
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]