antocuni changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | "PyPy: the Gradual Reduction of Magic (tm)"
static- has quit [Quit: ZNC 1.6.3+deb1 - http://znc.in]
exarkun has quit [Ping timeout: 255 seconds]
exarkun has joined #pypy
marr has joined #pypy
exarkun has quit [Ping timeout: 255 seconds]
exarkun has joined #pypy
tazle has quit [Ping timeout: 240 seconds]
asmeurer___ has quit [Quit: asmeurer___]
asmeurer_ has joined #pypy
tazle has joined #pypy
asmeurer_ has quit [Client Quit]
asmeurer__ has joined #pypy
<mattip>
starting to look at supporting tzinfo in cpyext's datetime objects, like in cpython
<mattip>
cpython seems to go to extreme efforts to save a few bytes in the datetime.h header
<gcbirzan>
So, I asked on Friday, but, gonna ask again. If the results on speed.pypy.org are wildly different than real life, can I make sure I'm at the very least not running the wrong test code?
<mattip>
does anyone know the history for that? Is datetime such a heavily used struct that saving a few bytes makes sense?
<arigato>
hi all
<mattip>
gcbirzan: which test were you running?
<gcbirzan>
mattip: In real life, I'm serializing a lot of stuff to json, so json_bench.
<mattip>
arigato: hi
<gcbirzan>
mattip: I found _something_, but not directly in the links in the about page.
<arigato>
mattip: I think it's not essential to save a few bytes in cpyext
<arigato>
I think it's done for applications that really have tons and tons of datetime objects all at the same time, though I agree that's a bit strange in the first place
<mattip>
arigato: the wierdest thing is they pack year, month, date... as tightly as possible, and then tzinfo is a PyObject* with a flag hastzinfo
<mattip>
so any datetime with a tzinfo is relatively huge and complicated
<arigato>
they can share the tzinfo
<arigato>
I guess?
<mattip>
anyhow, the problem I face is that there is no clean API to access the tzinfo in the PyDateTime_GET_* functions
<mattip>
and pandas through cython started accessing the tzinfo field in the newest version
<arigato>
right
<mattip>
so I can either make a c-api compatible PyDateTime object that has a pointer, or
danchr has quit [Ping timeout: 240 seconds]
<mattip>
convince cython to use a PyPy-specific PyDateTIme_GET_TZINFO() api function
<gcbirzan>
Anyway. Where's the best place to talk about these things?
<arigato>
gcbirzan: either here, or on the pypy-dev mailing list
<arigato>
unsure what you tried, these are the results of "python json_bench.py" vs "pypy json_bench.py" with cpython 2.7.14 and pypy 5.10.0 on Linux 64
<mattip>
gcbirzan: on this page http://speed.pypy.org/comparison, if you choose the tannit Environment, and the pypy-c-jit latest Executable, and the json_bench Benchmark, you see 0.35
<arigato>
python: 3m02s
<mattip>
which is pretty much what arigato just reported :)
<gcbirzan>
Okay, I think I know what the problem is.
<gcbirzan>
I'm talking about pypy3
<arigato>
ok, then our benchmarks are not adapted at all on python 3.x
<gcbirzan>
yeah
<arigato>
I'm surprized one of them actually runs out of the box :-)
<gcbirzan>
well, json_bench doesn't.
<arigato>
well, that should have been a hint that something is wrong, then
<gcbirzan>
or, well, I don't know what part isn't adapted.
<gcbirzan>
definitely not the json part itself.
<gcbirzan>
the harness itself is not meant to be ran under python3
<arigato>
right, so you're complaining that we didn't port the benchmarks to python 3.x, which is a known issue: we should do some efforts to look at porting them and/or copying the efforts done in the CPython world
<gcbirzan>
no.
<gcbirzan>
I'm complaining that the results for, if nothing else, json_bench, are so wildly off that it's hilarious
<arigato>
if you look carefully at a page like http://speed.pypy.org/timeline/#/?exe=3,6,1,5&base=2+472&ben=json_bench&env=1&revs=200&equid=off it says clearly "cpython 2.7.2" for example
<gcbirzan>
yeah
<gcbirzan>
let me try a pypy 2, just to see
<arigato>
I'm not sure what results you're talking about, given that json_bench is a Python 2 program
<arigato>
did you somehow port it to Python 3 and run it, and getting very bad results on pypy3?
<gcbirzan>
it's a python module
<gcbirzan>
I obviously didn't use your harness to run it
<arigato>
so, to be clear, you're talking about the performance of a piece of code that you modified and that we don't see
<gcbirzan>
okay, it is faster in pypy2
<arigato>
our turn to complain :-)
<gcbirzan>
aw, man
<arigato>
ok, I can guess what you did
<gcbirzan>
can you?
<gcbirzan>
let me tell you what I did
<arigato>
you removed the imports and just run the same main() function by hand
<gcbirzan>
no
<gcbirzan>
initially, I had dict that was about 5MB of JSON
<gcbirzan>
and I serialized it 10k times
<gcbirzan>
it was waaaaaaaaay slower in pypy
<gcbirzan>
then I went to your website and saw "oh, look, they claim it's faster"
<gcbirzan>
and since your testing harness is terrible, I just called main 50 times, the results were the same
<gcbirzan>
but you wouldn't know, since you cannot run these things in python 3
<arigato>
ah, ok, and confusion resulted because you missed that we're only testing python 2 so far
<Rotonen>
so, people come to pypy with the assumption of pypy3.5 being polished, just because it exists?
<gcbirzan>
yeah
<gcbirzan>
I mean
<arigato>
pypy3.5 was just released, I'm not surprized there are rough corners like that
<gcbirzan>
just?
<arigato>
yes? the first non-beta version of pypy3.x
<arigato>
all previous ones were betas and incomplete in some way
<gcbirzan>
well, maybe, then
<gcbirzan>
PyPy is a fast, compliant alternative implementation of the Python language (2.7.13 and 3.5.3).
<gcbirzan>
that part should be revised.
<arigato>
well, that part is very new.
<arigato>
and still correct
<gcbirzan>
it should say "a compliant alternative implemenmtation of python3, and a fast, compliant implementation of python2
<gcbirzan>
also, I'm sorry
<arigato>
you've found that the json implementation is much slower than pypy2's, which is a good thing to look
<Rotonen>
you can find corners in both where cpython smokes pypy - those lessen over time
<gcbirzan>
but, you should take some of that donation money and work on speed.pypy.org
<gcbirzan>
aside it being the slowest site ever
<gcbirzan>
apparently, nobody cares enough to update it with new versions
<arigato>
well, it's a major effort
<gcbirzan>
yeah. I bet.
<arigato>
we've spent our efforts during the whole of 2017 to make pypy3.5 occur
<gcbirzan>
as big as chaning 10 lines to get your test crap to work in python 3 :P
<arigato>
that's wrong
<arigato>
it's 10 lines in json_bench.py, I agree
<arigato>
what about some benchmark that uses 42 libraries, not all of them have been ported to python 3?
<gcbirzan>
arigato: your point is, you won't bother until everything is ported to python 3?
<arigato>
our benchmarks are of a wide varietly, they're not all 10 lines like json_bench
raynold has joined #pypy
<gcbirzan>
which benchmark is 42 libraries?
<arigato>
the obvious example is trans_* and trans2_*
<gcbirzan>
yeah
<gcbirzan>
that's anotheer thing
<arigato>
well a number of benchmarks use libraries that might have been recently ported to python 3
<gcbirzan>
trans_* in... some repo, somewhere :P
exarkun has quit [Ping timeout: 268 seconds]
<arigato>
so yes, the job of porting the benchmarks because more easy the longer we wait, basically
<gcbirzan>
arigato: the job of CHECKING the benchmarks is impossible.
<gcbirzan>
arigato: because whatever you use to run them is not public.
<gcbirzan>
arigato: Ah, so one has to come here and complain about it to find out what actually goes there?
<arigato>
if you have a more constructive comment, like where to put that information I just gave, feel free to suggest a pull request on the pypy docs
<gcbirzan>
arigato: Anyway. I don't want to be too much of an ass. But, the test harness itself doesn't support python 3. That's a 10 line fix.
<arigato>
gcbirzan: and my point is, cool, and then you get a grand total of 3 of the 30 benchmarks to work out of the box
<arigato>
and you're only left with 99% of the real work
<gcbirzan>
arigato: No.
<gcbirzan>
You're left with that work, and some data.
<arigato>
fine, if you do it then be sure to make a pull request :-)
<gcbirzan>
You aren't even trying to get the data
<gcbirzan>
Sure.
<gcbirzan>
Against what repo?!
<gcbirzan>
Sorry. Yeah, I will, tomorrow... afternoon EU time.
<gcbirzan>
Apparently, I have a planning meeting in the morning :P
<arigato>
I guess one of the reasons we never tried so far is that the benchmark runner logic is layer upon layer of hacks
<arigato>
of course we can hack at these layers until they are in python2+3 style
<arigato>
or maybe a better approach would be to look at the work CPython did
<arigato>
so that would mean cleanly forget our benchmarks, and start with a different set, provided by CPython
<arigato>
probably with the web site from CPython too
<arigato>
(both of which originally came from PyPy, somehow)
<gcbirzan>
arigato: The benchmarks themselves can have different versions if that's the case. but, generally, six has very very low runtime impact,
<arigato>
that's the question that we never seriously addressed: do we care to run versions of the same benchmarks, or would we be happy with a different benchmark suite
<arigato>
I think at this point I would prefer running whatever benchmark suite CPython developers create
tbodt has joined #pypy
<arigato>
because I certainly don't feel like maintaining 30 benchmarks all the time for Python 3.x
<mattip>
arigato: benchmarks appears on the sprint topics, we could at least reach a consensus on that point and a general overhaul
<gcbirzan>
arigato: or, you know, for python 2
<arigato>
mattip: agreed
<arigato>
gcbirzan: well, it may seem outdated now, but we're mostly python 2 people here
<arigato>
(pypy is written in python 2)
<arigato>
python 3 gets a new version every 18 months, python 2 doesn't
<gcbirzan>
Yeah. In particular, python 2 will never get a new version.
<gcbirzan>
And for a reason. :)
<gcbirzan>
This has been quite enlightening. I'll try to fix the harness, but this is quite disappointing.
<gcbirzan>
Yeah. Great, you got it to barely work!
<arigato>
well, yes. thanks! it's far more efforts than people imagine.
<gcbirzan>
arigato: It's also something that some people, both implicitly and explicitly, donated.
<gcbirzan>
same with STM, but that's not really your fault :P
<gcbirzan>
arigato: and, I do understand that, but it feels like python 3 is a second class citizen.
<gcbirzan>
arigato: but you advertise it as being... well, working.
<arigato>
inside pypy, yes, it was until the end of 2017. now it's almost working and we can fix the (probably large) number of places that are still slower
<gcbirzan>
arigato: I mean, aside it changing too often... why? I get lack of time, but :(
<arigato>
the main reason is that it's extremely convenient to have a stable language, and be able to spend years on other things than tweaking the syntax: like for example, the GC, the JIT, etc.
<arigato>
so like, in 2017, we really spent most of our time upgrading pypy3.3 to pypy3.5
<gcbirzan>
I see, so it's a manpower issue :)
<arigato>
that's all time that is lost from the point of view of improving the rest (GC, JIT, etc.)
<arigato>
yes
<gcbirzan>
Okay. Thanks!
<gcbirzan>
The way things looked I thought there's a fundamental hatred for python 3
<arigato>
that's too harsh, but there is a bit the impression "why do they keep adding features at that pace"
<arigato>
(or even more than a bit)
<arigato>
e.g. the async+await keywords are a very nice new feature of Python 3.5, but they are implemented in a rather awful way on top of generators internally, except not completely
<arigato>
it took me a while to copy that in pypy
<arigato>
and that's only one point in the infinite web page "what's new in python 3.5"
<gcbirzan>
arigato: But, none of the json stuff is impacted. :)
<mattip>
also way too many new api functions to implement in c-api, and no internal, private vs. external, public differentiation so people use whatever
<arigato>
right, in the json case I guess it's something like a bytes/unicode difference which required a few changes, and maybe this killed performance by mistake
<arigato>
I fear there are quite a number of places like that
<arigato>
(potentially)
<gcbirzan>
arigato: I'll definitely get the ones that work in python 3 to work.
<gcbirzan>
However. That requires me to get up tomorrow in time to drive my kid to daycare, then go to work. So, good night. And, sorry for being an asshole, but the realisation that python 3 is just 'meh' did annoy me a bit. :(
<mattip>
so maybe we should just join forces with them instead of rewriting our benchmarks
<arigato>
yes, that was my point above
<jacob22_>
We may still need some benchmarks of our own, to check areas where pypy has poor performance, but those can be added as the problematic areas show themselves.
<cfbolz>
mattip: yes, that's definitely the way to go
* mattip
wondering how much work it would be to run vmprof on those json benchmarks
<cfbolz>
mattip: vmprof is still not that useful if the time is spent inside rpython functions
jacob22_ is now known as jacob22
<mattip>
running "pip install performance" on cpython 2.7, pypy2-v5.10, pypy3-v5.10, then "python|pypy|pypy3 -m performance venv create" for each,
<mattip>
then python -mperformance run -b json_dumps --venv venv/[pypy2*] or [pypy3*] or [cpython2.7*]
<mattip>
gives me 17.9 ms +- 1.2 ms on cpython 2.7,
<mattip>
5.20 ms +- 0.30 ms on pypy2 and
<mattip>
49.5 ms +- 1.8 ms on pypy3
<cfbolz>
so yes, something is broken
<cfbolz>
mattip: would you file an issue? I can try to take a look at some point
<raynold>
Ahh it's a wonderful day
exarkun has quit [Ping timeout: 256 seconds]
exarkun has joined #pypy
yuyichao has joined #pypy
nopf has joined #pypy
forgottenone has quit [Quit: Konversation terminated!]
<kenaan>
mattip jitviewer 28e3315cbcdf /_jitviewer/app.py: for jinja2, make version '2.10' > '2.6'