#pypy on 2017-11-23 — irc logs at freenode.irclog.whitequark.org

2017-10-17 08:34 antocuni changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | "PyPy: the Gradual Reduction of Magic (tm)"

00:06 <blachance> if I want to translate my interpreter so I can profile it (e.g. w/gperftools), do I need to use any particular translation options to get debug symbols?

00:11 tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

00:16 tbodt has joined #pypy

00:18 tbodt has quit [Read error: Connection reset by peer]

00:21 <ronan> blachance: not sure what's best, but I'd try with --lldebug first

00:21 tbodt has joined #pypy

00:24 <bbot2> Success: http://buildbot.pypy.org/builders/rpython-linux-x86-64/builds/20 [default]

00:24 <blachance> hmm, that's what I suspected.. I tried that, and the different cost-centers the profiler shows are (what look like) addresses

00:24 <blachance> (there are a few that aren't addresses, e.g. _malloc)

00:26 <blachance> so, thanks for confirming about --lldebug ronan. Sounds like I'm at least close to the right path

00:28 yuyichao has quit [Ping timeout: 240 seconds]

00:30 Thinh has quit [Quit: Bye!]

00:30 Thinh has joined #pypy

00:32 <bbot2> Success: http://buildbot.pypy.org/builders/rpython-linux-x86-32/builds/13 [default]

00:41 yuyichao has joined #pypy

00:49 <bbot2> Success: http://buildbot.pypy.org/builders/build-pypy-c-jit-linux-armhf-raring/builds/1729

00:50 <bbot2> Success: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/5093 [default]

00:53 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/3438 [default]

00:59 tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

01:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-s390x/builds/738 [default]

01:01 tbodt has joined #pypy

01:02 antocuni has quit [Ping timeout: 248 seconds]

01:06 <bbot2> Success: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/4341 [default]

01:20 gclawes has quit [Read error: Connection reset by peer]

01:21 gclawes has joined #pypy

01:24 tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

01:42 pilne has joined #pypy

01:46 Thinh has quit [Quit: Bye!]

01:48 Thinh has joined #pypy

01:54 slacky__ has joined #pypy

01:55 <bbot2> Success: http://buildbot.pypy.org/builders/build-pypy-c-linux-armhf-raspbian/builds/1605

01:55 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-app-level-linux-armhf-v7/builds/1301

01:55 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-app-level-linux-armhf-raspbian/builds/1526

01:56 slackyy has quit [Ping timeout: 260 seconds]

01:58 Thinh has quit [Quit: Bye!]

01:59 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-app-level-linux-armhf-raspbian/builds/1526

02:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/3439 [py3.5]

02:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/5094 [py3.5]

02:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/6387 [py3.5]

02:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/3528 [py3.5]

02:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/5580 [py3.5]

02:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/4342 [py3.5]

02:00 Thinh has joined #pypy

02:18 marr has quit [Ping timeout: 268 seconds]

02:27 <bbot2> Success: http://buildbot.pypy.org/builders/build-pypy-c-jit-linux-armel/builds/1978

02:34 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/3528 [py3.5]

02:36 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/6387 [py3.5]

02:54 <bbot2> Success: http://buildbot.pypy.org/builders/build-pypy-c-linux-armel/builds/1865

02:57 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/5580 [py3.5]

03:06 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/5094 [py3.5]

03:25 <bbot2> Success: http://buildbot.pypy.org/builders/pypy-c-jit-linux-s390x/builds/738 [default]

03:26 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/4342 [py3.5]

03:33 jcea has quit [Ping timeout: 250 seconds]

03:34 jcea has joined #pypy

03:55 ArneBab has joined #pypy

04:00 <bbot2> Started: http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-32/builds/3189

04:00 ArneBab_ has quit [Ping timeout: 248 seconds]

04:07 <bbot2> Success: http://buildbot.pypy.org/builders/build-pypy-c-jit-linux-armhf-raspbian/builds/1734

04:07 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-armhf-raspbian/builds/1474

04:07 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-armhf-v7/builds/1311

04:13 jcea has quit [Remote host closed the connection]

04:16 <kenaan> rlamy unicode-utf8 b89046216269 /pypy/: Add (back) convenience methods space.newunicode(), space.new_from_utf8() and space.unicode_w()

04:16 <kenaan> rlamy unicode-utf8 a8f461710bf8 /pypy/module/_io/interp_textio.py: Do some unicode>utf8 conversions in interp_textio

04:27 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/3439 [py3.5]

05:04 tav` has joined #pypy

05:06 tav has quit [Ping timeout: 240 seconds]

05:06 tav` is now known as tav

05:14 <kenaan> rlamy default 6eab39056eb5 /pypy/module/_io/: Refactor interp_textio.py a little

05:35 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/3527 [default]

06:06 <kenaan> rlamy default 870515a86876 /pypy/module/_io/interp_textio.py: Use a UnicodeBuilder in _io.TextIOWrapper.readline

06:10 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-app-level-linux-armhf-v7/builds/1301

06:26 Nizumzen has joined #pypy

06:31 the_drow has quit [Ping timeout: 240 seconds]

06:42 <kenaan> rlamy unicode-utf8 031e80f0a68e /pypy/module/_io/: Refactor interp_textio.py a little

06:42 <kenaan> rlamy unicode-utf8 8c2553a25336 /pypy/module/_io/interp_textio.py: Use a UnicodeBuilder in _io.TextIOWrapper.readline

06:44 the_drow has joined #pypy

06:57 Nizumzen has quit [Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/]

07:11 the_drow has quit [Ping timeout: 248 seconds]

07:25 the_drow has joined #pypy

07:33 ronan has quit [Ping timeout: 264 seconds]

07:56 Nizumzen has joined #pypy

08:05 <fijal> ronan: for logs, uh, why are you putting UnicodeBuilder anywhere?

08:05 <fijal> the point was not to have it at all

08:06 <fijal> blachance: no, you get debug symbols by default

08:07 <fijal> lldebug will make everything unevenly slower

08:08 drolando has quit [Remote host closed the connection]

08:09 drolando has joined #pypy

08:17 Nizumzen has quit [Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/]

08:54 <arigato> fijal: there is no way in rutf8 to get the flag apart from check_utf8(), right?

08:54 <fijal> arigato: from a unicodeobject?

08:54 <arigato> no from a utf8 string we just built

08:54 <fijal> as in w_unicodeobject

08:54 <fijal> ah, no

08:55 <fijal> you have to walk characters, if you didn't get it from W_UnicodeObject

08:55 <fijal> (but you can get it from W_UnicodeObject)

08:55 <arigato> and there is no way to incrementally build up the flag, for example, and no way to call a faster function that assumes it *is* valid utf8 because we just built it

08:55 <fijal> we can create such a function

08:55 <fijal> *but*

08:56 <fijal> my thinking was that writing as SSE-based function will be faster than rpython function that assumes it's valid UTF8

08:56 <fijal> incremental building is quite a pain though

08:56 <arigato> well, the SSE function can also be faster by assuming it is valid UTF8

08:56 <fijal> yes right

08:56 <arigato> but

08:56 <fijal> so maybe we should have get_length_and_flag which for now will call check_utf8

08:57 <fijal> but we can replace it with a faster version in the future?

08:57 <arigato> we should have an incremental way

08:57 <fijal> ok

08:57 <fijal> so let's do that

08:57 <arigato> as in, when you build a utf8 string you can usually compute its length as you build it

08:57 <fijal> yes, it has been done by hand in codecs

08:57 <arigato> ok, so what do you do there?

08:58 <fijal> I use combine_flag()

08:58 <fijal> and keep track of length

08:58 <arigato> ah, that's what I'm talking about

08:58 <arigato> where is combine_flag()?

08:58 <fijal> in rutf8

08:58 <arigato> no, it's in unicodehelper.py

08:58 <fijal> ah yes

08:58 <fijal> should be moved to rutf8

08:59 <arigato> ok I can do that

08:59 <arigato> that's what I was looking for

08:59 <fijal> still, should we make a faster version?

08:59 <fijal> even if it's "call the same thing for now"?

08:59 <arigato> there is little point if you can call combine_flags() instead as you go along

09:00 <arigato> which is faster on long strings at least (no re-walking the string)

09:00 <fijal> yes sure

09:00 <fijal> but it's a pain in the ass in a lot of cases

09:00 <fijal> like, splitline

09:00 <arigato> ok

09:00 <arigato> then yes

09:00 <fijal> I must say I have no idea what is ronan doing

09:01 <arigato> :-/

09:01 <arigato> there is rutf8.get_flag_from_code and rutf8.unichr_to_flag

09:01 <arigato> identical

09:01 <arigato> which one should I kill? :-)

09:01 <fijal> heh, pick one :)

09:02 <fijal> like, he's changing code in _io module (but still using unicode instead of utf8)

09:02 <fijal> which makes it much harder to merge into anything

09:02 <arigato> I see your point, but I guess wait until he can explain?

09:03 kenaan has quit [Read error: No route to host]

09:08 <arigato> major speed-up of combine_flags(): return flag1 | flag2

09:08 <arigato> with a tweak to the actual values, of course

09:12 antocuni has joined #pypy

09:15 <bbot2> Success: http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-32/builds/3189

09:20 * arigato tries to do _cffi_backend but keeps being distracted by corner cases

09:21 <arigato> e.g. u.lower() on a unicode with surrogates would probably get an RPython-level ValueError

09:27 kenaan_ has joined #pypy

09:27 <kenaan_> arigo unicode-utf8 a1cf21d7a124 /: Tweak the unicode FLAG_xx values for performance; collapse two identical helpers; move combine_flags() to rutf8

09:27 <kenaan_> arigo unicode-utf8 25ac6121d03c /pypy/: merge heads

09:33 <kenaan_> arigo unicode-utf8 16bfad77e3d5 /pypy/objspace/std/: Tests and fixes for 'allow_surrogates=True' in various unicode methods

09:38 <arigato> we should actually have a StringBuilder and unichr_as_utf8_append() that computes the flag for us, too

09:41 <fijal> arigato: ah

09:44 marr has joined #pypy

09:45 <antocuni> arigato: thanks for fixing vmprof

09:46 <antocuni> although r1cc101a9ee5a apparently is not enough. I have an example using eventlets in which pypy nightly doesn't record any sample, which pypy 5.9 does :(

09:48 <kenaan_> arigo unicode-utf8 dc6582a05b85 /: Review for surrogates

09:49 <arigato> antocuni: well, no tests fail :-(

09:49 <antocuni> arigato: sure, I don't think it's your fault

09:50 <antocuni> I mean, I can observe this buggy behavior also before 1cc101a9ee5a

09:52 <arigato> fijal: just to be clear, sys.maxunicode == 0x10ffff in any future pypy even on platforms where it isn't the case in CPython 2.7, right?

10:01 <antocuni> to be more precise, this is an example which shows the problem: http://paste.openstack.org/show/627200/

10:15 <fijal> arigato: yes

10:15 <fijal> arigato: that question makes no longer any sense, in a way

10:15 <fijal> (but we need to keep track of size of WCHAR anyway)

10:16 <arigato> yes

10:18 <fijal> arigato: I have electricity, but I need to go outside

10:18 <fijal> feel free to apply any form of refactoring

10:18 <fijal> like faster version of check, string builder etc

10:18 <fijal> I'm not TOO happy how haphazard the stuff is right now

10:18 <fijal> arigato: ah, and I'm fine doing the mechanical work of adjusting the current solutions :)

10:37 <arigato> I'm kind of happy that you're not too happy about it :-)

10:40 <arigato> I guess I'll try to have a version of StringBuilder that is really a Utf8StringBuilder

10:40 antocuni has quit [Ping timeout: 268 seconds]

10:43 <arigato> also, UTF8_INDEX_STORAGE seems to have added one level of indirection

10:43 <arigato> if the goal was only to provide a place for the flag without overhead, then that is missed

10:46 oberstet has joined #pypy

10:55 <fijal> what do you mean?

10:55 <fijal> we still need storage for the index stuff no?

10:56 <arigato> of course, but I'm complaining that it is now one indirection farther

10:57 <fijal> ok?

10:57 <arigato> also, the *only* use of the flag "HAS_SURROGATES" is in unicode.encode('utf8')?

10:57 <arigato> is that right?

10:57 <fijal> yes

10:57 <fijal> but ASCII is used a bit more

10:58 <arigato> maybe we can have a lightway solution for "HAS_SURROGATES"

10:58 <arigato> still thinking

10:59 <arigato> I'm thinking about something that takes care of bytes.decode('utf8').encode('utf8') but not necessarily more complicated cases

10:59 the_drow has quit [Ping timeout: 260 seconds]

10:59 <fijal> I think it's kind of important to have encode that does not scan the string

11:00 <fijal> on strings that you might have gotten from splitting other strings, for example

11:00 <arigato> ok

11:00 <arigato> as implemented now I'm unsure you win in the end

11:01 <arigato> i.e.

11:01 <arigato> it creates overhead a bit everywhere

11:01 <fijal> keeping track of the flag?

11:01 <arigato> both in runtime cost and in complexity of implementation

11:01 <arigato> yes

11:01 <fijal> right, but knowing it's ascii is very important

11:01 <arigato> starting with how UTF8_INDEX_STORAGE is now two allocations instead of one

11:02 <arigato> yes

11:02 <fijal> why is it two allocations?

11:02 <fijal> in the common case it should be zero, no?

11:03 <arigato> well, maybe, but in case it's != 0, then it's 2

11:03 <fijal> why is it 2?

11:03 <fijal> because struct and array?

11:04 <arigato> yes

11:04 <arigato> it means that every indexing is slower

11:04 <fijal> so you're worried about this level of indirection, not about keeping the flag at all, ok

11:04 <arigato> no, that's a consequence

11:04 <fijal> arigato: sorry, please explain what do you actually mean, it took us two pages to understand what are you after

11:04 <arigato> we didn't so far :-)

11:05 <arigato> I am saying that indexing in a string is now slower, because it needs to walk through one more pointer indirection

11:05 <fijal> no, "the overhead of keeping a flag this way on UTF8_INDEX_STORAGE" is very different from "why do we need to keep HAS_SURROGATES at all"

11:05 <fijal> yes ok, but that took me two pages to understand

11:05 <fijal> there are other ways

11:06 <fijal> like we can keep an array one longer and the first item of array is always a flag

11:07 <fijal> that increases complexity a bit, but removes a level of indirection

11:07 <arigato> that doesn't seem to be the problem...

11:07 <arigato> you can define the GcStruct in a way that it starts with 'flag' and then has a GcArray, not a Ptr to it

11:08 <fijal> that would really confuse the JIT ;-)

11:08 <arigato> the problem is that you can't do it because of the UTF8_HAS_SURROGATES constant

11:08 <fijal> but maybe

11:08 <arigato> right

11:08 <fijal> why?

11:08 <fijal> the constant is just an id of something, it can be anything

11:08 <fijal> can also be a different GcStruct

11:09 <arigato> so that means that at every character indexing, you need to check "is it actually equal to UTF8_IS_ASCII? is it actually equal to UTF8_HAS_SURROGATES?"

11:11 <arigato> this is overhead

11:11 <fijal> no, because the length is zero

11:11 <fijal> we can have instead of NULL some other thing

11:11 <fijal> sure it's a bit more than just a pointer check, but not that much more

11:11 <fijal> and then just check the length

11:11 <arigato> well, these 500 ifs are a lot

11:11 <fijal> I'm not even sure it does not disappear in the noise of character checking with 500 ifs....

11:11 <fijal> it would need to be measured

11:11 <arigato> I know we said that looping over characters is very slow in CPython so it's ok if it's slowish in PyPy

11:11 <arigato> I'm still trying to optimize a bit :-)

11:11 the_drow has joined #pypy

11:11 <fijal> yes :-)

11:12 <fijal> arigato: also note that we probably don't have a single benchmark that would actually execute that part

11:12 <arigato> and again, that's both about the runtime cost and the complexity of implementation adding checks everywhere

11:12 <fijal> so it's a very open question how much do we care

11:12 <fijal> complexity is in single place, common

11:13 <fijal> I think you're overthinking that a tiny bit

11:13 <arigato> well, not the combine_flags mess, but I see

11:13 <fijal> no, but that's different

11:13 <fijal> that's keeping flags at all

11:13 <fijal> can we agree what's the topic of the conversation first?

11:13 <fijal> do you:

11:13 <fijal> a) not like keeping flags at all

11:13 <fijal> b) not like the current flag layout

11:14 <arigato> I mostly rant about the complexity I see everywhere, which is here *only* to speed up .encode('utf8'), because it's unclear to me that the win here is not lost in the slow-down everywhere else

11:14 <fijal> so no

11:15 <fijal> because we also keep the ascii flag

11:15 <fijal> which is far more important

11:15 <fijal> I'm ok arguing for the speeding up of only encode('utf8') btw, but this is not the topic right now, because of the ascii

11:16 <arigato> I don't fully see, because adding the "no surrogates" flag appears to require more implementation efforts than adding just the "ascii" flag, particularly around UTF8_INDEX_STORAGE

11:16 <fijal> how would you add just the ascii flag?

11:16 <arigato> I'm ok with "if it points to UTF8_IS_ASCII"

11:17 <fijal> so is it about a) or b)?

11:17 <fijal> I'm sorry, but each time I'm trying to have an argument, you jump between those two topics

11:17 <arigato> sorry, have to go soon

11:17 <fijal> the difference is:

11:17 <fijal> a) has the complexity problems

11:17 <fijal> and b) has the overheads that can be addressed, but not the complexity

11:18 <arigato> for example, I'm not sure that keeping "is ascii" and "has surrogates" in the same place makes sense

11:18 <arigato> maybe it does, but that is open

11:18 <fijal> ok

11:19 <fijal> that's b) right?

11:19 <arigato> that's neither a) nor b), because that's saying "the way you handle these two flags maybe needs to be different"

11:19 <arigato> as in,

11:19 <arigato> different from each other

11:20 <fijal> ok, maybe

11:20 <fijal> but if we need to have flags than we need to have all that complexity

11:20 <fijal> also, the complexity goes somewhere else than the runtime costs go

11:21 * arigato -> really away

11:21 <fijal> so having them is one problem and how do we store them is another problem

11:21 <arigato> sorry, mostly ranting here

11:21 <fijal> yes, but it's not very comprehensible to me what exactly is the problem

11:21 <fijal> arigato: ok see you, let's chat tonight

11:23 <fijal> ronan: (for logs) I don't believe what you did belongs to this branch at all

11:23 <fijal> either a) do it in default and b) merge to the branch and only then do c) move the stuff from unicode to utf8 on the branch

11:24 <fijal> or a) move unicode to utf8 on the branch, merge the branch, refactor later

11:31 <bbot2> Failure: http://buildbot.pypy.org/builders/rpython-win-x86-32/builds/15 [default]

11:43 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-armhf-v7/builds/1311

12:13 raynold has quit [Quit: Connection closed for inactivity]

12:26 <fijal> arigato: my take is that you're unhappy about something, but I don't exactly know what and you don't seem to know either

12:26 <fijal> If flag tracking then utf8stringbuilder should deal with it

12:27 <fijal> If flag storage then we have a few options

12:27 <fijal> I seriously doubt a single extra pointer comparison is measurable though. Most of the cost is probably amortized building of index

12:40 jcea has joined #pypy

12:52 oberstet has quit [Ping timeout: 252 seconds]

13:10 Rhy0lite has joined #pypy

13:10 adamholmberg has joined #pypy

13:19 oberstet has joined #pypy

13:28 antocuni has joined #pypy

13:32 adamholmberg has quit [Remote host closed the connection]

13:32 adamholmberg has joined #pypy

13:37 adamholmberg has quit [Ping timeout: 240 seconds]

13:40 adamholmberg has joined #pypy

13:45 slacky__ has quit [Remote host closed the connection]

13:45 slackyy has joined #pypy

13:51 adamholmberg has quit [Remote host closed the connection]

13:52 adamholmberg has joined #pypy

13:53 adamholmberg has quit [Read error: Connection reset by peer]

13:53 adamholm_ has joined #pypy

14:03 adamholm_ has quit [Remote host closed the connection]

14:10 the_drow has quit [Ping timeout: 240 seconds]

14:23 ronan has joined #pypy

14:24 the_drow has joined #pypy

14:37 <fijal> arigato: ping

14:38 <fijal> arigato: ah sorry your commits are from the morning

14:38 <fijal> please check with me before doing more changes :)

14:41 <arigato> fijal: sure

14:41 <arigato> sorry about this morning

14:41 <fijal> arigato: no worries :-)

14:41 <kenaan_> arigo unicode-utf8 a94b5860dbb3 /: Fixes for _cffi_backend

14:41 <fijal> arigato: I'm adding half of your suggestions (Utf8StringBuilder and Utf8StringIterator)

14:41 <arigato> I just pushed fixes to _cffi_backend, nothing more

14:41 <fijal> I think it's important to get interfaces first

14:41 <fijal> and we can tweak the actual details later

14:42 <fijal> so we only need to agree whether keeping any flags at all makes sense

14:42 <arigato> agreed

14:42 <fijal> let me push that and we can have a quick look

14:43 <arigato> note that UTF8_INDEX_STORAGE being changed to avoid the Ptr in 'contents' would not make the JIT unhappy: the JIT is *already* unhappy about the structure, and it's not seeing it

14:43 fryguybob has quit [Ping timeout: 260 seconds]

14:43 <arigato> (it contains a GcArray of Struct, moreover with a FixedSizeArray)

14:44 <fijal> ok

14:44 <fijal> it would make some analyzer unhappy I think

14:44 <fijal> (I whacked at it, but just a bit, not sure if enough)

14:44 <fijal> then that's a non-controversial change

14:44 <fijal> would that fix the problem?

14:45 <arigato> still unhappy about the number of checks for a simple __getitem__, which I tried very hard to keep to a minimum

14:45 <arigato> but that can come later

14:45 fryguybob has joined #pypy

14:47 <fijal> right

14:47 <fijal> I kind of agree, but I would vote for pushing towards having everything compiling so we can actually do measurments

14:48 <fijal> arigato: note that some tests fail for me

14:48 <arigato> fijal: agreed

14:49 <fijal> ok, something is off

14:49 <arigato> fijal: for me too. unless you mean inside _cffi_backend, in which case, not on linux

14:49 <fijal> arigato: test_rutf8

14:49 <fijal> hypothesis caught some examples for me

14:50 <fijal> I'll look into them

14:50 <kenaan_> fijal unicode-utf8 9ede67aee27e /rpython/rlib/: Utf8StringBuilder

14:50 <kenaan_> fijal unicode-utf8 3e45feebc910 /: merge

14:50 <arigato> passing there (I guess it means I didn't run it often enough)

14:50 <fijal> yes, something like that

14:50 <fijal> should I add failing examples?

14:50 <fijal> you can list them with @examples or something

14:51 <fijal> arigato: please tell me if this is an interface that you wanted

14:51 <fijal> (interface, not the actual implementation)

14:51 <arigato> yes, would be nice

14:52 <arigato> so .append() is for already-checked, valid utf8 strings?

14:52 <fijal> yes

14:52 marr has quit [Ping timeout: 248 seconds]

14:52 mattip has joined #pypy

14:53 <arigato> and as usual, we need to be careful when calling append_code() because it could raise ValueError

14:53 <arigato> I don't know how to improve that

14:53 <arigato> in many places we know by construction that code <= 0x10ffff

14:53 <fijal> yeah

14:53 <arigato> in other places we need to catch the ValueError or else we crash

14:54 <arigato> fine about Utf8StringBuilder

14:55 <arigato> Utf8StringIterator is probably helpful

14:55 <fijal> yes, it's just untested yet

14:57 <kenaan_> fijal unicode-utf8 d24fe4f59c96 /rpython/rlib/test/test_rutf8.py: provide explicit examples

14:58 <arigato> note that w_u._has_surrogates() means "does this unicode string contain surrogates", right? there was some confusion in _cffi_backend

14:59 <fijal> yes

14:59 oberstet2 has joined #pypy

14:59 <fijal> it should really be .has_surrogates() without an _

14:59 <arigato> I think you called it with the expectation of it answering the question "does this unicode string uses chars >= 0x10000"

14:59 <fijal> uh

14:59 <fijal> no, that was never the intention (even if I did)

15:01 <arigato> ok, then unicode_size_as_char16() != w_u._len() if and only if there are chars >= 0x10000

15:01 oberstet has quit [Ping timeout: 255 seconds]

15:02 <arigato> it's not related to surrogates chars

15:02 <fijal> ah

15:02 <fijal> I did not get that

15:02 <fijal> sorry, my _cffi_backend should probably be completely reverted, I was tired and didn't know what I was doing

15:02 <fijal> (or at the very least carefully reviewed)

15:02 <arigato> yes, I think I carefully reviewed all your changes in _cffi_backend now

15:04 <arigato> how far are we to actually translate pypy?

15:04 <arigato> and run interesting programs

15:05 <fijal> I got stopped at _io module

15:05 <fijal> not *that* far

15:05 <arigato> ok cool

15:05 <fijal> but I think with new interfaces we can do _io module quite quickly

15:06 <fijal> then there is _pypyjson and cpyext, without which we can compile

15:06 <arigato> ok

15:07 <fijal> arigato: ah the tests fail because I have a narrow build of host

15:07 <fijal> u'\U00040000'

15:07 <fijal> (Pdb++) p len(u)

15:07 <fijal> 2

15:07 <fijal> that sort of stuff

15:07 <fijal> pom pom pom

15:07 <fijal> how do I get a *real* length?

15:10 <arigato> you can't distinguish between u'\U00040000' and the two surrogates character

15:11 <arigato> there are things in runicode that try to guess anyway

15:11 <arigato> better ask the question: which test fails and can we fix the test not to use unicode

15:11 <fijal> arigato: it's the checking of check_utf8

15:11 <fijal> arigato: so we're checking "is check_utf8 returning the right value"

15:12 <fijal> which it is, but we use python len(), which isn't

15:12 <arigato> ah, right

15:12 <arigato> a bit no clue. you can't use the "guess the length" here

15:12 traverseda has quit [Ping timeout: 240 seconds]

15:13 <fijal> I mean I have, in rutf8.check_utf8, but I'm trying to see if it works :)

15:13 <fijal> we can just skip the test on narrow build maybe?

15:13 <arigato> yes

15:14 <fijal> easy, I won't check for length if the build is narrow *and* i have surrogates

15:15 <kenaan_> fijal unicode-utf8 eb564d44a7c8 /rpython/rlib/test/test_rutf8.py: fix test on narrow host

15:18 <kenaan_> fijal unicode-utf8 fa3bcbe5b09f /rpython/rlib/test/test_rutf8.py: fix tests on narrow host

15:21 <ronan> fijal: which commits are you complaining about?

15:21 <fijal> ronan: well, most of them :-

15:21 <fijal> first of all, I don't understand your plan

15:21 <ronan> the plan is to get things working

15:21 <fijal> yeah ok

15:22 <fijal> but then why do you do random refactoring?

15:22 <fijal> like, why did you add back stuff to space?

15:22 <fijal> why did you change [] to UnicodeBuilder?

15:23 <ronan> I changed it to UnicodeBuilder on default

15:23 <ronan> adding stuff to space allows tests to pass

15:24 <ronan> then I can remove the old things one by one

15:24 traverseda has joined #pypy

15:25 <fijal> so it is more like default now or less so?

15:25 marr has joined #pypy

15:25 <fijal> ah ok

15:25 <fijal> no, that does make sense sorry

15:25 <ronan> it's a bit more like default

15:26 <ronan> I mostly did parallel changes

15:26 <fijal> yeah I see

15:26 <fijal> no that makes sense

15:27 <ronan> I forgot about hg graft doing the wrong thing by default, though, sorry

15:27 <fijal> ronan: I added Utf8StringBuilder now

15:27 <fijal> that tracks flags and stuff

15:27 <ronan> good

15:27 <fijal> and Utf8StringIterator

15:29 <ronan> hmm, TextIOWrapper.readline() still needs some refactoring in order to use that

15:30 <ronan> ATM, it really relies on doing unicode[:n], where n is just a number

15:31 <fijal> how does it find n?

15:31 <ronan> by doing arithmetic on unicode indexes

15:32 <fijal> but we can do all that arithmetic on byte indexes too right?

15:32 <kenaan_> fijal unicode-utf8 e4a568e4514c /rpython/rlib/test/test_rutf8.py: more tests

15:32 <ronan> no

15:33 <ronan> well, not without a refactoring

15:33 <fijal> why not?

15:33 <fijal> they all have corresponding indexes in utf8

15:34 <fijal> ronan: are you ok continuing with _textio?

15:35 <fijal> if so, I would whack at the rest

15:35 <ronan> yes, I'm fine working on it

15:41 <ronan> fijal: merging default into the branch would be helpful

15:41 <fijal> ok

15:46 <kenaan_> fijal unicode-utf8 177352fb8cf4 /: merge default

15:48 <fijal> ronan: done, the crashes were strange

15:50 <ronan> ta

15:50 <fijal> sorry, conflicts

15:55 adamholmberg has joined #pypy

15:58 adamholmberg has quit [Remote host closed the connection]

15:58 adamholmberg has joined #pypy

16:03 adamholmberg has quit [Ping timeout: 240 seconds]

16:04 <mattip> antocuni: around?

16:04 <antocuni> yes

16:05 <mattip> about eventlet and your code, don;t you need to use a vmprof.enable somewhere in cpuburn?

16:06 <antocuni> mattip: no, it's called inside run_profiler

16:06 <antocuni> if you try to run it, you can look at the prints and follow the order of execution

16:07 <antocuni> but basically it is: "start profiling", "cpuburn 0 to 4", "stop profiling"

16:08 <mattip> ahh, the eventlet is running two "threads" concurrently

16:09 <antocuni> yes

16:09 <antocuni> hidden inside eventlet there is a "main hub" which drives the execution

16:10 <antocuni> whenever you call eventlet.sleep() (or any other non-blocking green function), the execution is transfered to the hub, which decides which greenlet to resume

16:10 <mattip> it works as it should on CPython?

16:11 <antocuni> yes, ando also on pypy-5.9

16:11 <mattip> ok, now I got it thanks.

16:11 <antocuni> I think the bug is due to my recent changes to vmprof+rstacklet (which are needed to prevent segfaults)

16:11 <antocuni> but I had no time to investigate properly yet

16:12 <mattip> not sure if I can help, but I might try to look

16:12 <arigato> antocuni: you're running with the trunk, right? I know before my fixes yesterday it would sometimes leave the state to "stopped"

16:13 <antocuni> arigato: yes, with the nightly build which reports 1cc101a9ee5a

16:14 <antocuni> arigato: I have encountered this problem a couple of days ago; then yesterday I saw your commit and though "ahah, that's the fix!"

16:14 <antocuni> but apparently, not :(

16:14 <arigato> ok :-/

16:14 <antocuni> it might be a similar problem, for all I know

16:15 <mattip> arigato: thanks for fixing the non-linux translations. There are failing "own" tests, seems simple, I am looking

16:16 <arigato> thanks to you

16:16 <mattip> antocuni: another topic - cpyext-avoid-roundtrip should get merged, correct?

16:16 <antocuni> yes

16:16 <antocuni> I think it is ready to be merged; last time I tried, it "passed" all the numpy and pandas tests

16:16 <antocuni> "passed" as: it is not more broken than default :)

16:17 <mattip> yes, seems to speed up numpy test suite by ~10%

16:17 <antocuni> I think I asked arigato to review, but then both of us forgot about it

16:18 <mattip> are there corners that need careful review, or can we simply merge?

16:19 <antocuni> I think that the biggest change which I made to the branch after the cape town sprint is 770b53602445, i.e. the merging of cpyext-refactor-methodobject

16:19 <antocuni> this is probably worth a review

16:22 jcea has quit [Quit: jcea]

16:29 <fijal> pom pom pom

16:30 <fijal> arigato: do we care about how multibytecodec.incremental works?

16:30 <fijal> right now it's kinda silly

16:39 Nizumzen has joined #pypy

16:50 <kenaan_> rlamy default ff05ee1c4b6a /pypy/module/_io/interp_textio.py: refactor

16:51 slackyy has quit [Ping timeout: 268 seconds]

17:30 raynold has joined #pypy

17:31 <kenaan_> fijal unicode-utf8 99ca8cf9bbc4 /pypy/module/_multibytecodec/: fix multibytecodec

17:31 <fijal> ronan: can you park your additions to objspace on a branch?

17:31 <fijal> it kinda breaks my workflow, or suggest some better solution

17:31 Nizumzen has quit [Ping timeout: 240 seconds]

17:35 <fijal> maybe I can be just careful about commits?

17:50 <ronan> fijal: I think we should keep some of those additions for tests and/or defining constants

17:50 <ronan> but feel free to deal with them however you want

17:55 <kenaan_> rlamy default 8369cd92f7d0 /pypy/module/_io/interp_textio.py: Simplify _find_line_ending() and fix logic in the case of embedded \r and self.readnl=='\r\n'

17:56 <fijal> ronan: defining constants?

17:56 <fijal> well, it makes everything pass, while it actually shouldn't

17:56 <fijal> that's the problem

17:56 <fijal> sure, you can kill them, redo them etc.

17:57 <fijal> but it's kinda around case, why not make a branch where you can deal with _textio on your own instead?

17:57 jcea has joined #pypy

18:00 <ronan> if you prefer it that way, that's fine by me

18:00 <fijal> cool :-)

18:00 <fijal> ronan: and immediately as I said that, I run into the fact that I might need it for sre ;-)

18:01 <fijal> but no, I don't think so

18:01 <fijal> arigato: feel like also doing sre stuff?

18:02 <fijal> it's a tiny bit fragile I think