SeanTAllen changed the topic of #wallaroo to: Welcome! Please check out our Code of Conduct -> https://github.com/WallarooLabs/wallaroo/blob/master/CODE_OF_CONDUCT.md | Public IRC Logs are available at -> https://irclog.whitequark.org/wallaroo
dipin has joined #wallaroo
dipin has quit [Quit: dipin]
moas has joined #wallaroo
dipin has joined #wallaroo
dipin has quit [Quit: dipin]
aturley has quit [Read error: Connection reset by peer]
aturley has joined #wallaroo
dipin has joined #wallaroo
moas has quit [Remote host closed the connection]
moas has joined #wallaroo
moas has quit []
nod has joined #wallaroo
<nod> hi
<nod> quick question (in the exploratory hey this sounds cool phase) - Are there any examples of writing a custom source in Go?
<strmpnk> Hi nod. We're currently working on a feature like this. But if you'd like to hook up your own code externally and don't mind using TCP, you can use TCPSource/TCPSink.
<nod> strmpnk: well, what i'm desperate for is a kinesis source
<nod> so i was going to try to marry the aws-go-sdk kinesis stuff to it
<strmpnk> nod: Cool. I think we can probably help you get something running. I haven't used the aws-go-sdk yet but once you have that part, we can send it over a configured TCPSource.
<strmpnk> The main trick is to frame your messages with a 32bit length as you send them.
<nod> good to know!
<strmpnk> We definitely want to make this super easy though and we'll probably write some examples for kinesis specifically, once the new system is reaady. In the meantime, the current TCPSource should work to get your application started.
<nod> alright, thx. I'm surfing through docs.wallaroolabs.com right now looking for TcpSource
<nod> my other question - Do you have any way possible to write portions in Go and portions in Python?
<strmpnk> You can check out the go example apps in the repo too if you want: https://github.com/WallarooLabs/wallaroo/tree/0.4.3/examples/go
<strmpnk> We don't currently support mixing in one application but you could have one Wallaroo app send it's sink to another Wallaroo app's source.
<strmpnk> The main reason is that we allow arbitrary data for state in Wallaroo and there's no way to automatically convert any Go value into a Python value, and vice versa.
<nod> mixing is less a requirement than kinesis at this point, was mainly toying with diff ideas
<strmpnk> Gotcha.
<strmpnk> If we find more motivations for mixing we might explore ways to help build hybrid apps but it'd likely still be like getting two wallaroo apps to communicate rather than switching between one and the other in each computation.
<nod> that makes sense
<nod> i get the data type transformation issues... it's stupidly expensive to do
<nod> and not safe.
<nod> lol the "sources and sinks" link on this page is 404 https://docs.wallaroolabs.com/book/appendix/tcp-decoders-and-encoders.html
<strmpnk> nod Thanks! I'll file an issue and we'll get that fixed.
<strmpnk> Core concepts has a short word on sources and sinks: https://docs.wallaroolabs.com/book/core-concepts/core-concepts.html
<nod> ahh.. ok. I saw that, was hoping for more. Guess I'll be "learning" Pony enough to grok it
<nod> thx
<nod> strmpnk: out of curiosity - do I understand your suggestion correctly? We setup a TCPSource which listens on a port, we write a secondary app that reads from kinesis and sends the data into wallaroo over that TCP port for processing?
<nod> (i'd do more reading but I aalso realize it's friday and everyone may disappear soon for a night out)
<strmpnk> Yes.
<strmpnk> nod: I don't mind helping out. So no worries.
<nod> hrmmm.. I need to stew on that. It's going to potentially break the model of checkpointing against the kinesis stream
<strmpnk> This specific area is something we're working on (I am specifically overhauling it) but the general model won't change too much. We'll provide a way to get acks so the source driver (that kinesis program) can appropriately checkpoint.
<strmpnk> Right now with TCP, you'll need a heuristic of your own or a way to signal from the sink that you've made certain progress. We'll provide a library to help make it clear how to work with this rather than just leave a bare TCP protocol.
<nod> ok.. yeah that's what i'm realizing. it's that signal from the sink that we'd require.
<nod> I need to stew on this.
<nod> would each Wallaroo worker have their own TCPSource port they're listening on? (I apologize for my ignorance fo the architecture)
<strmpnk> Right now the sources are handled by the initializer but we're planning on letting the source stream receive partitions and spreading that over many workers.
<nod> this really complicates the conceptual design I was looking at, as I'll have to now have a cluster of kinesis consumers that strictly act as a proxy to wallaroo
<strmpnk> The initializer is the first member of the cluster in this case.
<strmpnk> yeah. In some sense, it depends how much forwarding costs in your case. Wallaroo can handle a lot of throughput and pass it on to the cluster from there. Skipping that hop is planned.
<nod> ok - i need to do some reading/experimenting on this. We're running some _large_ volume kinesis streams
<nod> my goal was to possibly get something up and running over the next week or two as an MVP
<nod> but i don't think the timing is going to work
<strmpnk> Cool. Definitely let us know where you get with this. You're definitely not the only person looking at getting Kinesis wired up and we're interested in getting this bumps fixed.
<nod> wallaroo needs a kinesis consumer source
<nod> I don't believe anyone under scale is going to want to use a tcp fwd'ing mechanism
<strmpnk> The problem is there is a lot of overhead dealing with inversion of control in consumer libraries.
<nod> we're using the python aws KCL right now as the basis for our cluster and I'd LOVE to get off the dependency we have for java
<nod> I saw you had a Kafka source so got my hopes up :)
<strmpnk> Yeah. We wrote a custom Kafka client to improve performance and it ended up being a lot more work than just binding librdkafka.
<nod> i'd imagine so
<strmpnk> Kinesis might be something we could manage natively but after looking at local link performance (not over a physical interface) we can get pretty good performance.
<nod> we have anywhere from 20-200 nodes consuming our kinesis streams at any given time
<nod> so coordinating kinesis shard distribution across a go app just to fwd over a local tcp link seems less than ideal to me
<strmpnk> What's the average data rate total (message size * message rate estimate)?
<nod> 5-10tb raw data daily, records are largeish json blobs
<nod> (sorry for slow reply, surfing through github code)
<nod> msgs avg'ing around 32k size each
<nod> you'd know better than I, but how difficult would it be to write a pony->aws-sdk-go bridge? (i have ZERO clue on this space)
<strmpnk> cgo has some limitations for integration that can make it a little bit harder than it should be but hooking Pony up to go is very similar to hooking up C.
<strmpnk> The main gotcha is that Pony likes the manage it's own threads for scheduling (much like Go).
<strmpnk> So the rendezvous between the two might be a little tricky (could require the overhead of an OS futex).
<nod> hmmm
<nod> i'm sure you totally had great reasons for picking pony :D
<strmpnk> (I'm not an expert on our Go API itself and I know we tested many variants to find the fastest way to call Go code from Pony, but I'm not sure what it would take in the other direction)
<nod> i love what i've seen of wallaroo so far
<strmpnk> Thanks!
<nod> but i'm doing mental gymnastics on getting it hooked up to our actual production system
<strmpnk> I hear you. If you want to jump on a call sometime and walk through this with one of our engineers that's more familiar with the Go integration story, we can set that up.
<nod> appreciate it. I'll come back next week and revisit it.
<nod> I really need to wrestle with Is this the path we want to go down.
<nod> I want to simplify our data pipeline and this is adding moving parts :(
<strmpnk> Cool. Thanks for dropping by and please feel free to ask more questions anytime. We have people out at some conferences today but there'll usually be multiple people ready to answer here.
<nod> xlnt. I hope to scratch out more time this weekend to play, so I may pop in then.
<nod> and now, kiddo time. Have a great weekend! Thanks a ton for your help.
<strmpnk> Enjoy your weekend as well! We'll probably be floating around in here too if you hit anything.
<strmpnk> nod: I'm going to step AFK in a bit. Just in case you need an email, I'm at brian@wallaroolabs.com. My client will stay online so I'll be able to check in here later as well.
<SeanTAllen> nod: we'd love to get someone on a call with you to know what is needed to make things work for you, a lot of it might already be on our roadmap. we definitely prioritize things that folks who are interested in using Wallaroo need.
dipin has quit [Quit: dipin]