cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | the secret reason for us trying to get PyPy users: to test the JIT well enough that we're somewhat confident about it
<R3d_Sky>
i've been testing out a silly function as a microbench on pypy - and it appears to be 100x slower than the same test run on python3 with perf
<R3d_Sky>
s/python3/cpython
<R3d_Sky>
is pypy just not very good at handling small functions run via perf? because I add some extra code to make the function do stuff, and pypy beats cpython by 50% in my perf tests
adamholmberg has quit [Ping timeout: 240 seconds]
<simpson>
PyPy's designed for real workloads rather than microbenchmarks.
inad922 has quit [Ping timeout: 240 seconds]
<mattip>
lesshaste: you might want to take a look at cppyy
<mattip>
if you want to use cffi, you will have to create wrappers for class methods, and pass a void* for the class instance as an argument
<mattip>
and no dynamic dispatch, you need a separate wrapped function for each specialization
<the_drow>
Hi guys, I tested PyPy3.5-6.0.0 with the new cpyext improvements. It seems that python-rapidjson does not show an improvement comparing to 3.5-5.10.1. I'll write a blog post about this later so I want to find out exactly why that's not the case.
<the_drow>
If anyone wants to help me figure this out, I'd love some help. And you'll get the credit of course.
<antocuni>
the_drow: it's entirely possible that certain libs do not show any cpyext speedup
<the_drow>
I'm fully aware
<the_drow>
But I want to provide others with a sense of what these speedups do and what they don't
<antocuni>
the CPython C API is huge, and consists of hundreds of functions/macros; what we basically did was to find a general way to make them faster, and to implement this speedup for a handful of cases
<the_drow>
The only thing that this speed up improves is calling Python functions written in C right?
<antocuni>
yes, the cost of function (and method) calls is vastly reduced
<antocuni>
however, if once you are in C you do tons of API calls, it's possible that you spend lot of time there
<antocuni>
it's a bit hard to say; what we usually do is to run a benchmark (or, even better, a microbenchmark) using callgrind, and see where we spend most of the time
the_drow_ has joined #pypy
<the_drow_>
antocuni, sorry was disconnected. python-rapidjson spends most of it's time building Python objects
<antocuni>
yes exactly; this is one of the thing which is still not fully optimized
<the_drow_>
That's probably why the speed up is not significant in this case
<the_drow_>
I wish PyPy would have used RapidJson directly.
<antocuni>
yes, I guess so
<the_drow_>
Instead of using a custom implementation
<antocuni>
I suppose we are willing to consider a PR :)
<the_drow_>
But that would be very hard to do with RPython and introduce C++ as a dependency
<the_drow_>
I thought of that
<the_drow_>
But I'm not sure adding C++ as a build dependency is a good idea
<antocuni>
probably not
the_drow has quit [Ping timeout: 240 seconds]
<antocuni>
also, what is the performance of rapidjson compared to e.g. ujson?
<the_drow_>
much much faster
<the_drow_>
I'll send you a link to the benchmark
<the_drow_>
and then you have the problem where you can't use CFFI because that requires us to hop between Python and C++.
<antocuni>
you can try cppyy
<the_drow_>
cppyy doesn't work in this case because rapidjson is too modern
<the_drow_>
I was just saying ;)
<antocuni>
ah
<the_drow_>
antocuni, this was the trigger to tell Wim to upgrade cling and do all that refactoring he did
<antocuni>
anyway, I doubt that cpyext will ever be so fast to make python-rapidjson faster than an rpython or rpython+c implementation
<the_drow_>
Is PyPy's build system ready for such a thing?
<antocuni>
yes, you can select whether or not to compile a module when translating
<the_drow_>
how would such an implementation look in RPython?
<antocuni>
the hard part is that we don't have a good way to call c++ from RPython
<antocuni>
so you probably need to expose a C API first
<antocuni>
and then call this C API with rffi
<the_drow_>
antocuni, I tried using CFFI but because the GIL is released all the time that doesn't work
<antocuni>
rpython uses rffi, which is conceptually similar to cffi but implemented very differently
<the_drow_>
There's an interface for RapidJSON that calls back when it encounters an object, a number, a string etc.
<the_drow_>
So if I implement something with cffi I can port it pretty easily?
<antocuni>
look e.g. at the implementation of pypy/module/_ssl
<antocuni>
it wraps openssl
<antocuni>
the rffi bindings are inside rpython.rlib.ropenssl
<antocuni>
the_drow_: you are making confusion
<antocuni>
you can write a cffi module if you want
<antocuni>
this module would work on cpython and pypy, you can put it on pypi and install it with pip
<the_drow_>
antocuni, but that doesn't work because CFFI releases the GIL and we skip a lot through Python and C++ space
<the_drow_>
antocuni, I'm fully aware of that fact.
<antocuni>
ok, then forget about cffi
<antocuni>
if you want to write an rpython module, then it's pypy-only, needs to live inside pypy/module and can be compiled only when you translate the full pypy
<the_drow_>
So far so goo
<antocuni>
for calling C inside rpython, you use rffi
<the_drow_>
good
<the_drow_>
And it's the same API as CFFI?
<antocuni>
no
<antocuni>
look at _ssl
asmeurer_ has quit [Quit: asmeurer_]
<the_drow_>
How is libssl_SSL_new for example defined?
<the_drow_>
and can I include some C code for the wrapper
antocuni has quit [Ping timeout: 276 seconds]
<the_drow_>
arigato, ping?
<the_drow_>
I actually think there's an easier patch here
<the_drow_>
But I need to use SIMD from RPython if possible
<arigato>
the_drow_: we recommend against using rffi. better try to use cffi and complain if you have specific performance benchmarks
<arigato>
e.g. yes, it releases the gil and reacquires it all the time, which is why that has been heavily optimized already and shouldn't cost a lot now
amaury has quit [Remote host closed the connection]
<arigato>
and yes, cffi is not really meant to work with a huge number of Python objects handled by the C side, but you can often reorganize things
<the_drow_>
arigato, we already tried CFFI with hiredis (remember that one?) and rapidjson. It is impossible to reach to CPython level of performance with CFFI for parsers.
<the_drow_>
arigato, Maybe if CFFI had an instruction to avoid releasing the GIL we could have made it work...
<the_drow_>
arigato, I think the main problem is not the GIL but the fact that we have to use callbacks.
<arigato>
cffi and rffi are very, very different beasts for a similar purpose
<arigato>
rffi is only available when you translate a new pypy from scratch
<arigato>
while it is certainly possible to make something in rffi, it requires rpython knowledge and it won't be available in standard pypy's
<arigato>
here's one way I'd consider to try, using cffi, which should work on pypy or cpython but be definitely slow on cpython but really fast on pypy:
<arigato>
you write C code whose purpose is to parse the text and write a "parsed representation" in a big buffer
<arigato>
the parsed representation is basically defined with a bunch of C structures, repeated as needed, with maybe some pointers or maybe not (i.e. all inline as binary data)
<arigato>
so once the text is parsed, you have this big buffer (only one call to C); then the Python side can read it using ffi pointers
<arigato>
the trick here is that on the Python side, you write classes that are implemented with just one pointer each, and that lazily build more such classes when you access more items
<arigato>
so for example, say you want to parse some text that is a list of 2d points, "[(x=5.6, y=7), (x=7, y=-2)]"
<arigato>
you do that with C code that emits a big array of "struct point { float x, y; }", say
<arigato>
then on the Python side, you make two classes: ListOfPoints and Point
<arigato>
ListOfPoints has got a __getitem__ which checks the bound and returns a fresh Point
<arigato>
the Point class is simple enough so __getitem__ would just read the x and y values from the C data and stick them on the Point instance
<arigato>
if Point were more complicated, __getitem__ would compute a ffi pointer to the item, and pass that to the Point instance
<arigato>
and then the Point instance itself would have properties x and y that would read the x and y values from the ffi pointer
<arigato>
you do that without caching the Point instances
<arigato>
it looks indirect, but PyPy is very good at optimizing that; it will make Point instances temporarily and free them as soon as they are no longer needed---or the JIT can completely remove the creation of the Point instances if it sees that it doesn't survive for long
<the_drow_>
arigato, What I want to do is to port the SIMD code that skips whitespaces from RapidJSON to PyPy
<the_drow_>
There's no need to actually bind RapidJSON into PyPy
<arigato>
ok, then I completely misunderstoof you, sorry :-)
<the_drow_>
I only wanted RapidJSON because the code is already there
<arigato>
maybe look at rpython.rlib.longlong2float.uint2singlefloat for an example
<the_drow_>
However, there are a few things here I need to figure out: 1) I have not seen one line of C code in the PyPy code base so I'm wondering if I should write the headers in C and use them with rffi. 2) I'm not sure if we really need C code. If we could simply import the right SIMD headers and use rffi to call to the intrinsics that would be better no?
<the_drow_>
Oh I can use C code like that
<the_drow_>
Ok
<the_drow_>
Got it
<the_drow_>
arigato, is there a ptrdiff_t in rlib?
<the_drow_>
RapidJSON uses pointer arithmetics while we use an index