#wallaroo on 2018-08-10 — irc logs at freenode.irclog.whitequark.org

2017-09-30 09:40 SeanTAllen changed the topic of #wallaroo to: Welcome! Please check out our Code of Conduct -> https://github.com/WallarooLabs/wallaroo/blob/master/CODE_OF_CONDUCT.md | Public IRC Logs are available at -> https://irclog.whitequark.org/wallaroo

00:22 moas has joined #wallaroo

00:26 moas has quit [Ping timeout: 240 seconds]

02:23 moas has joined #wallaroo

02:27 moas has quit [Ping timeout: 248 seconds]

04:23 moas has joined #wallaroo

04:28 moas has quit [Ping timeout: 256 seconds]

04:40 <cajually> SeanTAllen: Sorry I lost my internet connection for the rest of the day, I realized it was just print out that were handled. The scripts seem to manage pipefail etc correctly. I'm working on a quick and dirty functioning machida with python3.5, currently working out some strings/unicode differences.

04:41 <cajually> I'm the asshole behind #2334 btw and I have a similar amount of questions around the state management :)

05:57 <cajually> anyway what is the defacto preferred medium for wallaroo discussion?

06:24 moas has joined #wallaroo

06:29 moas has quit [Ping timeout: 248 seconds]

07:49 moas has joined #wallaroo

07:53 moas has quit [Ping timeout: 248 seconds]

08:50 moas has joined #wallaroo

09:30 <SeanTAllen> cajually: I think that depends. For freeform discussions, either IRC or the mailing list. If you have something specific and actionable, GH issues are best.

09:34 <SeanTAllen> Given the difference in timezones, I think mailing list would probably work out better than IRC cajually

10:01 moas has quit [Remote host closed the connection]

10:11 <cajually> Yeah was thinking for less actionable stuff and preferably somwhere that doesn't give 40~ people email from github everything you do write something

10:14 <cajually> Mailinglist sounds about right. Anyway I've made some progress with my python3 port, porting the pony side of things have been slower as I don't know the language. I'll probably make some helper functions for dealing with python bytes() as the old PyString stuff that is used for buffers currently doesn't quite match the new ways

10:16 moas has joined #wallaroo

10:18 <cajually> for python3 we have the option of targeting the Unicode interface or the Buffer interface instead, sometimes the Unicode one is the correct way but most the situations Buffer is much better. Neither of them have simple size operations, for the Unicode case serialization has to happen and then we can know the size but the reason we try to figure out the size ahead of time is to allocate the correct

10:18 <cajually> amount of memory it seems. And the buffer protocol requires different things to figure out the size and potentially dealing with non contigious buffers. All in all I'm not trying to make the best implementation, just one that lets me find the issues and port the tests

10:20 <cajually> I'll be traveling until tuesday and probably will have no more time from now to then btw

10:20 moas has quit [Ping timeout: 240 seconds]

10:29 <cajually> Typing that out I realized that there should be a PyBytes type I could have used that should map very nicely. Disregard most of that stuff I guess

10:34 rblasucci has joined #wallaroo

10:56 moas has joined #wallaroo

11:01 moas has quit [Ping timeout: 268 seconds]

11:19 <SeanTAllen> cajually: will do. For the Pony stuff, we can definitely help you out there. What timezone are you in? We might be able to work out some pairing from time to time to assist if that would be something that interests you.

11:24 moas has joined #wallaroo

14:38 <cajually> I'll be in GMT +8 (china, taiwan,singapore etc), that sounds interesting. I'm currently waiting for a working visa process an have unknown amount of full time available. Regardless I really like the project and would love to be able to contribute something that can help traction. And I think that the python3 support is a perfect start, I've been looking for something like Wallaroo to do real time data

14:38 <cajually> processing for ML setups, a world that severly lacks open source tooling

14:39 <cajually> I think Wallaroo is closest to having the IMO perfect approach to allow for this

14:41 <cajually> I don't think python is going anywhere, I don't think that datascientists have the time to care for JVM languages, Flink isn't amazing Spark has too high latency. The others are plentyful but too small and for good reason. In the end doing in memory styff correctly and distributed while not confined to the JVM i the way forward.

14:44 <cajually> ... I realize I have like 10 more paragraphs for why I think Wallaroo is the project in its space(and adjacent) that I'm putting my money(read time) on and I should just make a completely independent blog post on it

15:38 <SeanTAllen> cajually: i shared what you said with the team. it's definitely nice to hear from folks who get what we are doing. thank you.

15:38 <SeanTAllen> let me know how you would like to get assistance with the pony portions of what you are doing and i'll get you help.

15:44 <cajually> Happy to hear that! I know what external validation can mean, having worked on very technical products in what was probably the smallest team possible for the task.

15:46 <cajually> Regarding Pony, I fail to find a language specification, is there one? The very soft documentation in form of tutorial and some stdlib spec doesn't really provide a direct path to understanding how stacks works, the builtin types with boxing etc

15:52 <aturley> cajually there's no spec at that level right now.

15:54 <aturley> folks in the irc channel and mailing list are pretty happy to answer those kinds of questions, but at this point there's not a good single repository for those pieces of knowledge.

15:54 <aturley> i mean, other than reading the compiler and runtime code.

15:54 <aturley> (it is pretty readable code, but maybe not the fastest way to get questinos answered)

15:55 <SeanTAllen> Following on what aturley said, I'm happy to answer Pony language questions either here or #ponylang channel cajually.

15:58 <cajually> that's great to know. I've wanted a language like pony for a long time tbh, just very hard to google things with pony in the query and get good results currently..

16:00 <cajually> I'll see if I can find a reasonable workflow reading the compiler source but often it's very hard to trace the details through lexing and code generation

16:00 <cajually> I'll probably have questions

16:01 <aturley> yeah, agreed. you're probably better off asking in IRC if you want to know something specific.

16:01 <cajually> I was part of a team making something like pony long time ago, https://github.com/hnsl/librcd, there under a different github account that I've lost access to

16:02 <cajually> in the end that company died from NIHS

16:04 <cajually> (and competitors having better product-market fit

16:05 moas has quit []

16:10 <SeanTAllen> cajually: librcd looks interesting

16:12 <cajually> I noticed that wallaroo uses libgold, does it provide a large performance boost for pony, maybe in lieu of more language specific optimisation

16:13 <cajually> librcd was honstly a lot of fun, we experimented with strange ways to do concurrency and memory management

16:14 <cajually> in the end the way we did memory management was very expensive. Didn't stop us from building a massive stack on top of it tho

16:15 <SeanTAllen> do you mean the gold linker cajually ?

16:16 <cajually> yeah

16:16 <cajually> the link time optimisation

16:16 <SeanTAllen> there can be some not negligible improvements for some code.

16:17 <cajually> I can imagine that is a fantastic band aid if inlining is not done properly

16:17 <SeanTAllen> like many things optimization its a "it depends" sort of answer

16:17 <cajually> yeah of course

16:17 <SeanTAllen> so far, we havent found any cases of link time optimization introducing bugs

16:17 <SeanTAllen> so its come a long ways from when it was introduced and was usually more of a way to create bugs

16:17 <SeanTAllen> you can turn LTO on and off with Pony.

16:18 <slfritchie> cajually: My TODO-soon list includes adding a bit to the Gotchas section of the Pony language docs to summarize stack use. I'd shot myself in the foot, overrunning the Pthreads stack and then having some very odd actor behavior and SIGSEGV crashes result. (Silly me.)

16:18 <cajually> Last time I used it, a long time ago, it could not be used with gcc 03 at all

16:18 <cajually> happy to see that it is used with 03

16:19 <cajually> I can imagine that there is a lot of stack gotchas because I read it as C yet it is actor model

16:19 <SeanTAllen> Pony was the first time I used LTO and didn't have it blow up on me. I'd avoided it for a while.

16:19 <cajually> another thing I have not yet

16:20 <cajually> gah, my ssh is killing me, 300ms from where I am. yet... understood is how the pony calling convention works

16:21 <cajually> is it a pure C stack based , no TCO unless lucky situation?

16:21 nisanharamati has joined #wallaroo

16:21 <SeanTAllen> TCO is an optimization only in Pony

16:22 <SeanTAllen> There's no Pony specific support for TCO outside of the optimizations that LLVM can do.

16:23 <cajually> figured, the only alternative that made sense was if the the erlang or haskell calling convention was implemented

16:23 <cajually> but CFFI looks too smooth for that to be true

16:23 <slfritchie> Yes. Ignoring optimizations, each behavior or regular function call use the stack in the same way that C does. Instead of `main` at the bottom of the stack, a Pony scheduler Pthread has a variable number of frames related to scheduling, then mostly 1-1 mapping of the behavior's function calls on the stack. The consequences for overrunning the Pthreads stack size are identical for plain old C/C++/etc.

16:23 <cajually> o7 nisanharamati

16:23 <nisanharamati> hiya

16:24 <cajually> that sounds similar to what we did in librcs

16:24 <cajually> librcd*

16:25 <cajually> are there a lot of TLA traps associated to actors?

16:28 <SeanTAllen> cajually: there are some areas where C-FFI needs to be improved, some things aren't possible but the straightforward stuff is straightforward

16:29 <SeanTAllen> im not familiar with the term TLA trap cajually. Trap, yes, not TLA trap.

16:30 <slfritchie> TLA trap? (Sorry, Google is distracted by music & not compilers)

16:31 <cajually> say that you use some C library that wants to allocate heap memory, it calls it's own malloc it is has pulled in, that could cause issues with pony if an memory allocation is doen with as thread local allocations

16:32 <SeanTAllen> cajually: it shouldn't. depending on how you define "cause issues". thread locals dont play well with pony because no actor is guaranteed to always be run by the same scheduler thread.

16:33 <cajually> ok that is good to know

16:33 <SeanTAllen> the allocating of memory itself should be an issue

16:34 <SeanTAllen> * shouldn't

16:35 <cajually> scheduling of actors/ fibers can often be made a lot faster by making all their memory thread local and allowing for that move to be expensive in case of work stealing

16:35 <SeanTAllen> man, i had to read that 5 times to realize it said should instead of shouldn't

16:35 <cajually> haha

16:35 <SeanTAllen> Pony has a pool allocator that all actor memory comes from

16:36 <SeanTAllen> If you use "new" in some form, it comes from the pool

16:36 <SeanTAllen> With the usual heap/stack constraints

16:36 <cajually> interesting, has this been a considered a performance bottle neck?

16:36 <SeanTAllen> There's an optimization pass that will move heap allocations to the stack where possible

16:37 <SeanTAllen> cajually: nope.

16:37 <cajually> I remember having issues that we ended up effectively spending a lot of time pushing data between cores

16:37 <SeanTAllen> Pony tries to keep actors on the same core, but, in the end, that is basically magic.

16:37 <cajually> because the scheduler kept moving our threads, then we pinned the threads with cgroups

16:38 <SeanTAllen> So, pony supports pinning threads to a core.

16:38 <SeanTAllen> There's an option you can pass at runtime

16:38 <cajually> and then we ended up moving to TLA

16:38 <SeanTAllen> well, sorry

16:38 <SeanTAllen> it will pin scheduler threads by default

16:39 <SeanTAllen> you have to ask for them to not be pinned

16:39 <SeanTAllen> you have to ask for the thread that handles asio events to be pinned

16:39 <cajually> hah

16:39 <SeanTAllen> by default the runtime will start 1 scheduler thread per core and pin to it

16:39 <cajually> this is almost identical to what we did

16:39 <cajually> by trial and error

16:39 <SeanTAllen> you can use cgroup and the --ponythreads=X option to only use some cpus

16:40 <SeanTAllen> and to devote them solely to your app (this is best practice and discussed in the Pony performance notes on the website)

16:40 <SeanTAllen> https://www.ponylang.org/reference/pony-performance-cheatsheet/

16:41 <cajually> I'll check that out, anyway I feel like we could go on for hours talking about implementation details

16:41 <cajually> and I think it's better I do some reading now

16:42 <cajually> close to 1 AM here:)

16:44 <SeanTAllen> Enjoy the rest of your night cajually

16:45 <cajually> SeanTAllen: thanks for your patience and good night! I got some level of python3 running, currently going through the python module and the unit tests. Once I'm done I'm sure we will have a great conversation of how it should actually work and how to preceed etc

16:46 <SeanTAllen> Awesome

17:07 <nisanharamati> 👍

20:32 rblasucci has quit [Quit: Connection closed for inactivity]

21:01 nisanharamati has quit [Quit: Connection closed for inactivity]

22:19 puzza007 has quit [Quit: ZNC 1.8.x-nightly-20180801-e2a96470 - https://znc.in]

22:20 puzza007 has joined #wallaroo