cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
<utevo_lux>
so i realize the following is fairly cursed, and probably something you don't wish to support, but figured I'd bring it up anyways :p
<utevo_lux>
a few projects of mine are packaged as an "sfx" python file, where a small unpacker (plain python) is followed by the package+dependencies in a binary encoding - similar to the shellscript+tarfile approach commonly used on unix
<utevo_lux>
this works fine in all cpython versions, however pypy gets angry in the tokenizer, raising bad_utf8 (understandably)
<utevo_lux>
i could modify the binary encoding to only consist of legal utf-8 codepoints however that would make the file bigger, so mostly i'm curious why the utf-8 enforcement applies to comments as well?
<cfbolz>
utevo_lux: heh, I appreciate this beast ;-)
<cfbolz>
utevo_lux: isn't the solution to declare the encoding in the first line to be latin-1? then all bytes are valid
<cfbolz>
(I just tried, seems to work for pypy/cpython 2/3
<cfbolz>
)
<utevo_lux>
man why didn't I think of that, of course that works :p
<utevo_lux>
finally the #coding: pragma is actually useful for something hehe
<nimaje>
why would you tokenize before decoding? how would you even? as you can't know what the bytes mean without decoding
<cfbolz>
utevo_lux: just out of curiosity, why is it only tarred, and not eg zipped?
<utevo_lux>
it only relies on the python stdlib, so tar.bz2 seemed like the best choice (note the :bz2 filter passed to tarfile.open)
<utevo_lux>
as for tokenizing without decoding, I'm trying to think of any character encodings that would mess with whitespace or newlines, so it would probably be safe -- but it would likely be a performance drag yeah
<cfbolz>
utevo_lux: ah, right
<cfbolz>
utevo_lux: anyway, this is fun, me-approved ;-)