forgottenone has quit [Quit: Konversation terminated!]
antocuni has quit [Ping timeout: 245 seconds]
tsaka__ has joined #pypy
Waynes has joined #pypy
moei has joined #pypy
themsay has quit [Ping timeout: 250 seconds]
themsay has joined #pypy
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
<Waynes>
I've implemented a 9-by-9 boxfilter on a 512x512 image which runs in 0.49 milliseconds in C, but takes 0.76 seconds when called through cffi. The foreign function call overhead is just 2 microseconds, so where are the other 0.2 milliseconds coming from?
adamholmberg has joined #pypy
<Waynes>
I've also benchmarked memcpy and there's almost no difference between C and cffi
adamholmberg has quit [Ping timeout: 255 seconds]
<LarstiQ>
Waynes: is this code available to reproduce for others? Is that on cpython or pypy?
forgottenone has joined #pypy
xcm has quit [Remote host closed the connection]
<johnjay>
Waynes: what tools do you use for that benchmarking?
<Waynes>
LarstiQ: It's ~300 lines, so probably a bit too much to ask to review. I'll try condensing it to something with the same performance characteristics, but I thought I should ask here first in case it's something that has been observed before. I'm using cpython.
xcm has joined #pypy
<Waynes>
johnjay: time.perf_counter() for python and clock_gettime(CLOCK_MONOTONIC,...) in C
<johnjay>
ah
<mattip>
Waynes: cffi on cpython uses a backend, which needs to box and unbox the arguments. On pypy that all is eliminated by the JIT.
Frankablu has joined #pypy
adamholmberg has joined #pypy
adamholmberg has quit [Remote host closed the connection]
<Frankablu>
Hi, I'm trying to use pikaclient under pypy under windows. I've had to edit the source code so it handles errno.WSAEWOULDBLOCK as oppose to errno.EWOULDBLOCK. I'm not sure if this is a pypy bug or a pika bug so I'm not sure where to raise the issue. Any advice?
<Waynes>
mattip: I measured the function call overhead by deleting the C function body. The function call then only took 2 microseconds, which does not account for the over 200 microseconds difference.
<Waynes>
mattip: but I should probably also try pypy, maybe it makes a difference anyway
themsay has quit [Ping timeout: 250 seconds]
<Frankablu>
Waynes, I've just joined, what's the question?
<mattip>
Waynes: strange. Perhaps the compiler optimized the call away entirely? What happens if you add a single increment statement in the function?
<mattip>
Frankablu: you can see logs at the link in the IRC topic
themsay has joined #pypy
<Waynes>
Frankablu: I've implemented a boxfilter which runs in 0.76 seconds through cffi, but in just 0.49 milliseconds in C. I'm currently trying to track down where the difference is comming from
<mattip>
Frankablu: what version of pypy? AFAICT even pypy 5.7.1 released March 31,2017 has errno.WSAEWOULDBLOCK
<mattip>
(which has the same value as errno.EWOULDBLOCK - 10035)
<Waynes>
mattip: 10 microseconds, so that's not it either
ReflectedImage_ has joined #pypy
Frankablu has quit [Read error: Connection reset by peer]
ReflectedImage_ has left #pypy [#pypy]
Frankablu has joined #pypy
<mattip>
Waynes: so try adding in your function implementation one piece at a time, since something in it is slower under cffi that under C
<Frankablu>
mattip, PyPy 7.0.0, the problem is the library is expecting EWOULDBLOCK rather than WSAEWOULDBLOCK
<Waynes>
mattip: yes, currently doing that
<mattip>
Frankablu: ahh, misunderstanding. Can you link to the code?
<mattip>
using -fvisibility=hidden seems like the best suggestion (and should be on by default anyway for shared objects you produce)
<mattip>
it forces a cleaner API
themsay has quit [Ping timeout: 250 seconds]
<Waynes>
I've only got a single global function, but I've got two local nested functions. I'll clean that up and see if it makes a difference.
themsay has joined #pypy
<mattip>
Frankablu (for the logs): errno.EWOULDBLOCK and errno.WSAEWOULDBLOCK are the same. Where does substituting one for the other make a difference?
<Waynes>
mattip: is errno.WSAEWOULDBLOCK not defined as 10035?
<mattip>
AFAICT both are 10035
<Waynes>
I don't have a windows machine right now, but I think EWOULDBLOCK might be 11
<mattip>
I checked on windows
<mattip>
but not pypy 7.0
<Waynes>
should pypy codes match the C ones?
<mattip>
yes. We take them from the C headers. On cypthon windows 3.7.1 they both are 10035 too.
<Waynes>
I'll boot up my old windows machine and check
<Waynes>
I get 140 for EWOULDBLOCK and 10035 for WSAEWOULDBLOCK with gcc on windows 7
<mattip>
hmm. Do you have a cpython or pypy on that machine?
<Waynes>
cpython, what should I test?
<Waynes>
I also just checked with cl, same numbers
<Waynes>
huh, the guy who had the question already left
ReflectedImage__ has joined #pypy
ReflectedImage_ has quit [Read error: Connection reset by peer]
<Waynes>
oh, right, I knew there was some problem with that machine
<Waynes>
it has no internet
<Waynes>
sorry, can't install anything
<mattip>
ok, thanks anyway
<Waynes>
well then back to -fPIC
ReflectedImage__ has left #pypy [#pypy]
Frankablu has joined #pypy
<Frankablu>
mattip, Sorry was out of the room, also my internet connections is a bit unstable at the moment for some reason
forgottenone has quit [Quit: Konversation terminated!]
<mattip>
pypy2-v7.0.0 - 10035. So it must be a pypy3 problem
<mattip>
maybe visual 2008 (pypy2) has a different value
<mattip>
yup, vs2008 only defines WSAEWOULDBLOCK, so python copies the value to EWOULDBLOCK, then when cpython moved on they preserved the old behaviour
<Frankablu>
It's when the sending buffer gets filled up
<Frankablu>
Basically I'm porting a genetic algorithm from pypy2 to pypy3, on pypy2 it uses multiprocessing but pypy3 windows doesn't have that implemented, so I've swapped multiprocessing.queue for a RabbitMQ server with a pikaclient. My code's working, it's just I'll rather get a fix upstream
<mattip>
ok, thanks.
ReflectedImage_ has joined #pypy
Frankablu has quit [Read error: Connection reset by peer]
ReflectedImage_ has left #pypy [#pypy]
Frankablu has joined #pypy
Frankablu has quit [Client Quit]
PileOfDirt has joined #pypy
themsay has quit [Ping timeout: 255 seconds]
jcea has quit [Ping timeout: 268 seconds]
jcea has joined #pypy
themsay has joined #pypy
Frankablu has joined #pypy
ReflectedImage_ has joined #pypy
Frankablu has quit [Read error: Connection reset by peer]
ReflectedImage_ has quit [Quit: Leaving]
<Waynes>
mattip: turns out I was wrong, it wasn't the compiler flags after all
<Waynes>
mattip: the C code was working on an uninitialized array while the python code worked on an uninitialized one
<Waynes>
mattip: apparently copying an uninitialized array is twice as fast as copying an initialized one
<Waynes>
the current working theory on ##C is that the uninitialized memory is all mapped to the same page by the kernel, so all of it is in cache and reading it is super fast
darkman66 has quit [Read error: Connection reset by peer]
darkman66 has joined #pypy
darkman66 has quit [Remote host closed the connection]