cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
<antocuni>
marmoute: maybe you are already aware of it, but pushing to heptapod is VERY slow (at least on the pypy repo): http://paste.openstack.org/show/790585/
ekaologik has joined #pypy
<Dejan>
mattip, with CPython 2 it is stuck in this state: pypy/module/cpyext/test/test_pyerrors.py .........s.....s........s.
<Dejan>
it is there for last 15m
<Dejan>
20m now
<Dejan>
i will keep it running 10m more and kill it if nothing changes
<Dejan>
It seems like it got stuck in:
<Dejan>
File "/home/dejan/oswork/pypy/rpython/tool/runsubprocess.py", line 61, in <module>
<Dejan>
operation = sys.stdin.readline()
<mattip>
can you rerun with python2 pytest.py pypy/module/cpyext/test/test_pyerrors.py -vv
<Dejan>
question is why it core dumps when executed with pypy2
<Dejan>
it is really weird, with -vv and python2 it all passes
<mattip>
the tests are not supposed to succeed with pypy. If you want to test translated, use pypy pytest.py -A pypy/module/cpyext/test/test_pyerrors.py
<Dejan>
without -vv python2 is stuck somewhere
<Dejan>
and it never finishes
<mattip>
just like the buildbot
<Dejan>
i will try to find the exact test where it gets stuck now
adamholmberg has quit [Ping timeout: 268 seconds]
<Dejan>
it passed even without the -vv now
<Dejan>
so it is completely random behaviour...
<mattip>
try cleaning out the repo
<Dejan>
now i executed it again, and it got stuck!
<Dejan>
:D
<Dejan>
it got stuck at the test_error_thread_race
<Dejan>
yea, looks like some kind of race condition and it is not blocked forever
<Dejan>
s/not/now/
<Dejan>
and like most race conditions it is random, so now it all makes sense
tsaka_ has joined #pypy
dddddd has joined #pypy
tsaka_ has quit [Ping timeout: 255 seconds]
tsaka_ has joined #pypy
zmt01 has joined #pypy
zmt00 has quit [Ping timeout: 255 seconds]
lastmikoi has quit [Excess Flood]
lastmikoi has joined #pypy
<cfbolz>
mattip: could be related to the Gil work by arigato and antocuni?
adamholmberg has joined #pypy
<Dejan>
mattip, fresh clone and pytest run works well
<Dejan>
i have tried it 10 times and it succeeded all 10 times
<Dejan>
with "13 passed, 3 skipped in 3.73 seconds"
<Dejan>
it looks like i had an old code...
<Dejan>
no, i was in the tip
<Dejan>
it pointed to the hpy branch
<Dejan>
(i am still learning Mercurial...)
<Dejan>
Blah, still the same - it freezes at test_error_thread_race
<antocuni>
Dejan, cfbolz : our gil-related work was merged to default in cd7261a5a735, so if you still experience the problem it is worth trying that revision and the one immediately before (b79b87185e0b)
<Dejan>
that is too advance for me, i barely know how to use hg
<Dejan>
but i will try to checkout b79b87185e0b
<Dejan>
and see if i experience the same problem
<Dejan>
if my boss find out i am doing this i am dead meat
<antocuni>
hg up -r REV
<antocuni>
this is the equivalent of git checkout REV
<antocuni>
I see many messages like this in the output: rpython.rtyper.debug.FatalError: GIL not held when a CPython C extension module calls 'PyGILState_Release'
<antocuni>
arigato: could this be related to 0fd6d867bff6 ?
<Dejan>
i did not know about the -k thread_race
<Dejan>
thanks!
<antocuni>
you're welcome
<antocuni>
other useful pytest tricks are: -v to get more verbose output
<Dejan>
that i know of
<Dejan>
like ssh you can combine them with -vv for more verbose output
<antocuni>
-s to avoid capturing stdout/stderr (so you see prints during the test execution)
<antocuni>
--pdb to get a pdb prompt in case of exception
<antocuni>
-x to stop on the first failed test
<antocuni>
and --ff to run the last-failed tests first
<antocuni>
so, I'm running this on 0fd6d867bff6:
<antocuni>
while ./pytest.py pypy/module/cpyext/test/test_pyerrors.py -x -v -k thread_race; do let "x=x+1"; echo $x; done
<antocuni>
after 3 runs, the test got stuck
* antocuni
tries with 2f391434f09e
<Dejan>
i am at 84b0c1357d94
<Dejan>
it failed there too
<antocuni>
yes, but the two suspicious changesets are cd7261a5a735 (which changed the way the GIL is implemented) and 0fd6d867bff6 (which changed the way the GIL is used in cpyext)
<Dejan>
well, you know it better than me :)
<Dejan>
i am using the brute force to find which commit introduced this bug
<Dejan>
s/commit/changeset/
<antocuni>
Dejan: you should learn about hg bisect as well :)
<Dejan>
can it automate this process?
<antocuni>
yes
<Dejan>
then by all means lets run it
<Dejan>
as it takes ages...
<antocuni>
to start with, it can automatically bisect, so you do log2(N) tests instead of N
<antocuni>
then you can also use hg bisect -c, to give it a command to execute automatically
<antocuni>
but it's not very useful in this case because it hangs
<Dejan>
well, i want to bisect it with the command like yours above
<Dejan>
without the infinite while
<Dejan>
:D
<antocuni>
well, it's not really useful because it might give you false negative (like, a changeset seems to work but actually is still buggy)
<Dejan>
ah, i need to mark changesets as good and bad...
jcea has joined #pypy
jvesely has joined #pypy
<Dejan>
would be nice if hg bisect could run concurrent jobs
<Dejan>
;)
<antocuni>
so, 0fd6d867bff6 is definitely broken
<antocuni>
arigato: ^^^
tsaka_ has joined #pypy
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
<Dejan>
well, it is just that particular test that has some kind of race condition
<antocuni>
well yes, that revision broke this test, so either one should be fixed. This is the point of having tests :)
tsaka_ has quit [Quit: Konversation terminated!]
<Dejan>
antocuni, how do you track down memory leaks?
<antocuni>
very broad question
<antocuni>
any more specific use case or problem in mind?
<Dejan>
I think I have found a memory leak in Celery... I used pympler and it clearly show a constant increase of memory allocated for "list" object(s)
<antocuni>
using pypy or cpython?
<Dejan>
and my test case contains a while loop and 3 lines of code that call Celery inspect API
<Dejan>
i used CPython for this
<Dejan>
but I should give PyPy a try...
<Dejan>
I would like to find what "list" object is constantly increasing in size
<antocuni>
in this case, I'd either try to bisect the code (e.g. by removing unnecessary code while still checking that the leak is still there) until I pinpoint the piece of code which leaks
<antocuni>
once the code is sufficiently small, you can e.g. try to use objgraph on it
<Dejan>
that is very hard
<Dejan>
as my test case is ~7 lines
<Dejan>
it just calls Celery's inspect API...
<antocuni>
Dejan: you can also use gc.get_objects() to inspect all the lists which are alive; if you find one which has >N items, you know what's inside
<Dejan>
that is a good idea, i never used it before
<antocuni>
Dejan: well of course if the bug is inside Celery, you need to dirty your hands inside celery code
<mattip>
often it is not a bug, maybe just a memory cycle. Did you try using "for i in range(3): gc.collect()" ?
<mattip>
inspecting things can create cycles by holding on to things that were designed to be released
<Dejan>
i kept tracking memory usage for 3h
<Dejan>
it is constantly increasing
<Dejan>
at a steady rate
<Dejan>
and I got determined to find the culprit
<Dejan>
mattip, i find it difficult to understand that GC does not collect in 3h
<Dejan>
i am printing gc.get_stats() output every 20s
<Dejan>
it constantly shows the same output :)
<Dejan>
yet the process consume more and more memory...
YannickJadoul has quit [Quit: Leaving]
<Dejan>
can you guys install pycrypto in your pypy 3.6 venvs?
<ronan>
AFAICT, the GIL issue is that the untranslated emulation of the RPython GIL is broken
<ronan>
ATM, if any thread has the GIL, all threads think they have it
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
jvesely has quit [Quit: jvesely]
<Dejan>
antocuni, do you by any chance have binary wheel of pycrypto package for pypy3.6 7.3.0 ?
jvesely has joined #pypy
<Dejan>
it does not build here...
<antocuni>
Dejan: did it work before?
<Dejan>
Yep
<Dejan>
I managed to install few of our projects (one of them use celery) and all of them depend on pycrypto
<antocuni>
maybe a recent version of pycrypto broke and previous ones work?
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
<antocuni>
ronan: is it a new issue or it was known also before?
<mattip>
the test did pass before, so if it was known before there must have been a work-around.
<mattip>
if it is hard to fix we should skip the test untranslated
<antocuni>
true enough
<antocuni>
I admit that I didn't look at it in detail yet
<ronan>
antocuni: it's new. The old code was checking cpyext_glob_tid_ptr.
<antocuni>
oh
ekaologik has joined #pypy
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
otisolsen71 has joined #pypy
otisolsen70 has quit [Ping timeout: 256 seconds]
otisolsen71 has quit [Ping timeout: 256 seconds]
xcm has quit [Killed (barjavel.freenode.net (Nickname regained by services))]
xcm has joined #pypy
mattip has quit [Ping timeout: 255 seconds]
xcm has quit [Ping timeout: 256 seconds]
xcm has joined #pypy
adamholmberg has quit [Remote host closed the connection]