<enebo>
lopex: we are missing a low level sprintf for error messages which appropriately prints out non-ascii strings to stdout/err
byteflame has joined #jruby
<enebo>
lopex: if I print out ASCII via System.out.println it is fine
<lopex>
in sprintf.c ?
<enebo>
lopex: if I print out Bytelist which happens to be UTF-8 I can end up with escaped chars
<enebo>
lopex: if I print out EUC-JP I end up seeing I think UTF-8 escaping syntax (not sure)
<lopex>
enebo: print call toString right ?
<lopex>
*calls
<enebo>
lopex: println in Java calls toString and that is goofy for ByteList
<enebo>
lopex: in sprintf.java I fixed some error messages to pass tests in MRI
<enebo>
lopex: but I basically made the ByteList into a RubyString which then renders it nicely
<enebo>
lopex: unfortunately too nicely...if it is EUC-JP it transcodes to UTF-8 and prints out the char but MRI will print it out in some escape format
<lopex>
hm
<enebo>
lopex: I tracked this all down to their BSD_vfprintf impl but it is kind of confusing how they get those escapes
<lopex>
where the distinction for EUC-JP ?
<lopex>
rb_enc_vsprintf ?
<lopex>
wow
<enebo>
lopex: I think it is something like it is not valid chars for output so they just hex them into things like: named<\242\251> and if we use escape it is: named<\x{A2A9}. My current fix prints it like: named<〒>
<lopex>
or it's just a range + encoding combined ?
<enebo>
In one sense the current fix seems more logical to me if I can see the EUC-JP in UTF-8 why not display it as such
<enebo>
lopex: well I could not fully understand how BSD_vprintf worked. I believe they just pass the raw bytes to it
<enebo>
s/hex them/octalize them/
<lopex>
but there's lots of flags + vargs
<enebo>
heh yeah it is massive
<lopex>
what's the caller ?
<enebo>
but logically I am trying to understand what it does
<enebo>
lopex: rb_enc_raise(enc, rb_eArgError, "named%.*s after numbered", len, name);
<enebo>
A call like that
<enebo>
so name is just a char * or EUC_JP bytes
<lopex>
and that enc varies ?
<enebo>
yeah it is whatever those bytes are encoded as
<lopex>
enebo: so maybe external encoding is mixed up somewhere ?
<enebo>
in sprintf.c there is rb_env_vsprintf
<enebo>
OHHHH
<enebo>
ruby__sfwrite
<enebo>
wtf is this
<lopex>
er rb_enc_vsprintf ?
<enebo>
rb_enc_raise calls this and this is last function before BSD_vfprintf
<lopex>
also, format enc might be negotiated somewhere too ?
<enebo>
but it reads each char through a function via a pointer to function and I just noticed there might be something in this method I missed
<enebo>
what is 'format enc'?
<lopex>
we might have some stale transcoding tables too
<lopex>
enebo: the result encoding is built from string itself and format encodings right ?
subbu|lunch is now known as subbu
<enebo>
lopex: enc in those calls are the enc of the format string
<enebo>
lopex: so if the format was EUC-JP then it would display nicely probably
<enebo>
lopex: so I guess that is an important detail that the message is cramming the bytes into the encoding of the format
<lopex>
or null roght ?
<enebo>
or null?
<enebo>
the enc?
<lopex>
they check if (enc) { in there
<enebo>
lopex: lots of code to follow through
<enebo>
lopex: enc = rb_enc_get(fmt);
<lopex>
if so then whatver used rb_str_buf_new(f._w) with will be there
<enebo>
So either this can return NULL or something somewhere changes it to NULL
<enebo>
but I don't know of any place where it is null
<enebo>
but perhaps
<enebo>
I don't know if I care at this point
<enebo>
If I am default_external to UTF-8 then my main concern is making EUC-JP bytelist rendered into it (or anything which is not really UTF-8) display in an escaped format
<lopex>
I'm more slow today than usual
<enebo>
right now we just System.out.println() which toString() whatever into a Java String
<enebo>
lopex: even if you are not following it has helped me think about this
<lopex>
enebo: afaik String#inspect does something like that
<lopex>
so there might be clue
<enebo>
lopex: basically when we make strings for errors we use println and println knows nothing out our Ruby encodings
<enebo>
lopex: so our error string logic should basically concat all the bytes like a Ruby String and like String#inspect escape them when the encodings are not compatible
<enebo>
lopex: I do think our escaping in inspect is not perfect (I think we sometime print \x instead of \u in cases where we should be doing one or the other)
<enebo>
lopex: but I think ultimately we need to convert all our error reporting which sucks in a bytelist or even a RubyString into what amounts to a RubyString
<enebo>
any arguments which comprise message X will need to escape if they are not compatible with encoding Y
<lopex>
hmm, that's not that call path though
<enebo>
which is pretty much what inspect does (although it does more like add ")
<enebo>
lopex: I have thought about doing something for this wrong reporting a long time but never really looked into it...The coincidence of it all is this morning I was fixing errors in sprintf() itself
<lopex>
hah
<lopex>
so, er
<lopex>
the compat check should be the string against external ?
<enebo>
lopex: in the case of an error message yeah
<enebo>
lopex: I believe all error reporting ends up as whatever logic determines external encoding
<enebo>
lopex: but more generally I don't think that it is important to think about it as external
<lopex>
so just like inspect
<enebo>
lopex: we just want to display what we have been doing using java String processing to Ruby String processing against a particular encoding
<enebo>
lopex: which yeah I think is inspect
<enebo>
lopex: although as far as I can tell they do not implement it using inspect in MRI so maybe there is a subtle difference
<lopex>
lolz let's look for x%02X
camlow325 has quit [Ping timeout: 258 seconds]
<enebo>
11 references
<lopex>
madness
<enebo>
lopex: not sure the code which is printing this for BSD_vfprintf is using that format in it
<lopex>
enebo: what's the test case ?
<lopex>
enebo: I can look at it tomorrow, I'm too slow now
<enebo>
rb_enc_raise(enc, rb_eArgError, "named%.*s after unnumbered(%d)", len, name, posarg);
<lopex>
but the actual test ?
<enebo>
def test_named_untyped_enc
<enebo>
actually multiple tests in test_sprintf.rb
<lopex>
ok
<lopex>
linux ?
<enebo>
anywhere...
<lopex>
excluding windows :P
<enebo>
I am sure it is the same issue there too perhaps worse though
<enebo>
Exactly the same actually if you are not writing to console
<GitHub131>
[jruby] enebo pushed 2 new commits to ruby-2.4: https://git.io/vyEL3
<GitHub131>
jruby/ruby-2.4 782ea03 Thomas E. Enebo: Implement Feature #12686.
<GitHub131>
jruby/ruby-2.4 e7843ea Thomas E. Enebo: Generalize to use whatever ruby happens to be in your path since this script will work with any version of Ruby ever made.
pawnbox_ has quit [Remote host closed the connection]
camlow325 has joined #jruby
<lopex>
enebo: oh there's this capture zero overhead match
<enebo>
lopex: it will be neat to see how much faster this ends up although I guess it is the same as making a regexp of non-capturing parens
<lopex>
enebo: it's only at opEnd though
<lopex>
so only for short matches
<lopex>
but it would be measurable imho
<enebo>
lopex: what is opEnd? End of regexp itself?
<enebo>
lopex: or end of a region
<lopex>
yeah end bytecode
<lopex>
also, this method is new so not many libs use it though
<enebo>
lopex: but that means each capture is still recording offsets in some data structure right? Perhaps it has to though
<lopex>
enebo: it just copies from repeat stack
<lopex>
which makes me think we could compile differently
<lopex>
but
<lopex>
there might be backrefs
<enebo>
lopex: so repeat stack is to correctly parse the regexp and this opEnd stuff is extra allocation that is not needed because there is no capturing
<lopex>
lols, the subject is deep
<enebo>
lopex: I wonder how oniguruma did it
<lopex>
enebo: no, it's for tracking groups
<lopex>
like a side memory
<lopex>
enebo: it's only execution
<enebo>
lopex: but if you are not capturing then you do not need to keep track fo groups?
<lopex>
enebo: if's there's backrefs potentially
<lopex>
or calls
<enebo>
lopex: I am not totally following but that is probably ok so long as you are thinking about it :)
<lopex>
enebo: (..)\1
<lopex>
you need to remember that internally
<enebo>
oh and those work in matches? heh
<lopex>
I would assume
<lopex>
do I have 2.4 here hmm
<lopex>
p /(.)\1/.match?("aa")
<lopex>
true
<enebo>
lopex: ok so in the presence of a backref you need to record
<enebo>
lopex: but if you parse and there are none then you don't
<lopex>
but if there's no backrefs you already parser the groups
<lopex>
*parsed
<lopex>
and build the structs
<lopex>
*built
<enebo>
but you generate bytecode before you process
<lopex>
yes
<enebo>
err joni instrs
<enebo>
so you can walk that tree and no there is no backrefs?
<lopex>
at that point you can emit (:?...)
<lopex>
non capturing groups
<enebo>
Or just set a flag if you happen to emit one
<lopex>
by simple aast rewrites
<lopex>
ast
<enebo>
lopex: yeah that is possible
<lopex>
that's the first idea
<lopex>
er, (?:...)
<enebo>
but it would be a second pass then
<lopex>
enebo: there's always an analyser phase anyways
<enebo>
lopex: so () and (?:) are two different node types?
<lopex>
enebo: lots of laws, quantifiers reductions etc
<enebo>
yeah seems one can be translated to another if need be
<enebo>
lopex: another thought is that we know which capture is \1
<lopex>
enebo: depends, at parser, or compiler time ?
<enebo>
lopex: well this is your builder code right?
<enebo>
lopex: it generates instrs from your AST?
<lopex>
enebo: the cmopiler
<enebo>
"your"
<lopex>
haha
<lopex>
yeah
<lopex>
enebo: aaaah
<lopex>
you mean how to pass that info
<lopex>
yeah
<enebo>
lopex: but at this point it would probably be simplest if the EncloseNode ended up being marked as Memory based on whether is matches? regexp and whether it corresponds to a backref
<enebo>
lopex: then the compiler does not need to have any smarts at all
<lopex>
the simplest is to pass a bit flag in options
<lopex>
then not create the region
<enebo>
lopex: but Memory for enclose will not create a region right?
<lopex>
no
<enebo>
ah
<lopex>
but memory enclose will emit bytecode that will use repeat stack
<lopex>
enebo: so somany level where we could optimize out
<lopex>
levels
<enebo>
ok I don't know what that means but perhaps I don't need to :)
<lopex>
repeat stack is being copied to region ultimately
<lopex>
so three levels
<lopex>
depending on regexp and wheather we need region
<lopex>
since it must be passed anyways
<lopex>
the region ""
<lopex>
the region "need"
<lopex>
enebo: oniguruma does what we discussed some time ago
<lopex>
it just passess region& as null
<lopex>
er, Onigmo now
<enebo>
ok
<lopex>
so, at parse time, we can record encloses
<lopex>
that's 1
<lopex>
then, verify backrefs thats 2)
<lopex>
3) force non allocating region vie flag for "match?" method
<enebo>
seems reasonable
<lopex>
oh, and those calls
<enebo>
at least as far as I understand it
<lopex>
enebo: doh, imagin ruby had perl callouts
<lopex>
and concurrency issues
<enebo>
heh I would rather not
<lopex>
perl is gil right ?
<lopex>
still ?
<enebo>
lopex: I have no idea
<chrisseaton>
lopex: it has some kind of implicit parallel arrays and things like that so I don't think so in 6
<lopex>
chrisseaton: interesting given such an old runtime
<chrisseaton>
in 6 though
<lopex>
chrisseaton: via parrot ?
<chrisseaton>
I think so, not really sure
marciol has joined #jruby
<chrisseaton>
There's a guy who work on MoarVM who seems to know what he's doing - Jonathon Worthington or something
<lopex>
doh, I haven heard about parrot even though I'm subbed on some perl feeds
* lopex
looks that up
<lopex>
ah
<lopex>
and this thing is still going
<enebo>
parrot is dead isn't it?
<enebo>
rakudo is the impl I though
<lopex>
yeah
<lopex>
and freaking active
<enebo>
lopex: rakudo or parrot?
<lopex>
which was the haskell one ?
<lopex>
parrot should be I guess ?
<enebo>
lopex: I don't know...I know there was also a JVM one
<lopex>
pugs
<lopex>
er, I meat the MoarVM actually
<lopex>
*meant
<lopex>
yeah, when will apt go on it
<lopex>
sometimes I understand bsd users
<lopex>
just sh
<enebo>
looks like rakudo perl6 is the default
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<enebo>
lopex: so you going to tackle this ... this week? :P
<lopex>
er, I mixed it up again
<enebo>
lopex: matches? optimization
<lopex>
enebo: will try tomorrow
<lopex>
enebo: dont bother to poke me
<enebo>
lopex: awesome. It will be exciting to see NUMBERZ
<lopex>
yeah
<enebo>
lopex: ok I will leave you in peace
<lopex>
should be relatively easy
<lopex>
enebo: though the backref verification is complex due to levels in onigurume
<lopex>
since you can number the level deep in match
<lopex>
but I havent seen that feature being used actually
<lopex>
apart from the tests
bbrowning is now known as bbrowning_away
<enebo>
chrisseaton: you also using joni?
<chrisseaton>
enebo: yes, we do have someone looking at a Truffle version of regexps instead though, so we can inline them etc
<enebo>
chrisseaton: I see
<chrisseaton>
enebo: why?
<enebo>
chrisseaton: well lopex is modifyign the library for a 2.4 feature so I was curious if you were using it still
<chrisseaton>
yes anything else is a longer way off
<chrisseaton>
I don't know when we will try to do 2.4 features
<lopex>
chrisseaton: for PE it's aso worth to PE the regex construction
<chrisseaton>
I think we'll stick with 2.3 for quite a while
<lopex>
you can win much more in graal/truffle
<headius>
chrisseaton: you can't inline any Java code?
<chrisseaton>
I mean inline in the PE
<chrisseaton>
We do get normal inlining
<headius>
oh, ok
<lopex>
chrisseaton: but you can also do things parse wise right ?
<chrisseaton>
I'd like something like "a" =~ /a/ to constant fold
<lopex>
and also ast feature wise
<lopex>
chrisseaton: actually you could do "a" =~ /a++/ to constant fold to fail
<lopex>
since ++ is greedy non backtracking
<lopex>
I mean fail at matching, not at flding
<lopex>
chrisseaton: not very useful, but, looking at capture that isnt there could also constant fold right ?
<chrisseaton>
I don't really know regexps well enough
<headius>
in theory if we ever build the joni jit it should be able to fold as well
<headius>
I know the nashorn folks took a stab at it but I don't know how far they got
<lopex>
chrisseaton: just like /foo/ =~ bar; $1
<lopex>
no way $1 is not nil
<chrisseaton>
Does Jython use Joni?
<lopex>
no
<enebo>
headius: lopex and I daydreamed a while back on emitting JRuby IR for regexp and adding any missing instrs needed to make it complete
<headius>
enebo: that would work too
<lopex>
they seem to use python regexps
<enebo>
headius: then it would be able to use our opt passes as well
<lopex>
chrisseaton: actually last time I checked python 2 used pure python regexps
<lopex>
enebo: I remembers
<chrisseaton>
well that's another way to get inlining, if your VM is good enough with your own language
<lopex>
enebo: but imagine the backtracking overhead
<lopex>
sure
<chrisseaton>
I'm currently in an upside down world of implementing the cext API in C
<lopex>
I'm all for it
<chrisseaton>
in Ruby I mean
pawnbox has joined #jruby
<lopex>
chrisseaton: if graal can generate nested state machines then sure
<lopex>
it's all byte[]
<lopex>
so no deopt guards etc
<enebo>
chrisseaton: how is openssl going?
<lopex>
chrisseaton: am I correct ?
<GitHub173>
[jruby] headius closed issue #4519: 64-bit immediate return to prompt on console interrupt before termination https://git.io/vysOi
<headius>
enebo: you wanna look at that commit before I release a new launcher?
<headius>
while I figure out how to release a new launcher
<chrisseaton>
enebo: I got frustrated with things so decided to get all the specs passing first
<chrisseaton>
And I've found lots of bugs so it was probably the right tdecision
<lopex>
headius: did you see the mri syscall overhead in blog somewhere not long ago ?
<enebo>
headius: omgz I totally thought you were doing something else
<enebo>
headius: I mean I knew what you said you would fix but I still thought you were working on process overlaying
<enebo>
chrisseaton: yeah I can dig that
<headius>
well I can't make a 64-bit dll load in a 32-bit process no matter what
<enebo>
headius: yeah I know but I was still thinking you were going to make two .exe but you were just fixing the interrupt issue specifically
<enebo>
headius: and it looks fine although I don't know how that function works specifically
<enebo>
headius: well I can guess
<headius>
yeah, the two exe thing would require mucking about with the installer and if we install a 64-bit exe but they choose to use a 32-bit JVM it would just fail like this again
<enebo>
headius: but if you tried the snippet you sent to me earlier and it works then great
<headius>
this makes it work for all mismatching cases and we can do a 64-bit exe later if we want
<lopex>
chrisseaton: can you do anything wet inlining system calls ?
<lopex>
*wrt
<enebo>
headius: I am hoping codefinger will save us that effort
pawnbox has quit [Ping timeout: 260 seconds]
<lopex>
chrisseaton: it's quite late so maybe reverse the actual binary :P ?
<headius>
lopex: I'm not sure what openjdk does here but I know it uses gettimeofday for currentTimeMillis
<lopex>
er, reparse
<headius>
I don't recall if it does anything to avoid repeated TZ lookups
<enebo>
lopex: compile linux to LLVM IR and SVM Rubytruffle into the kernel...mode 0 rails!
<lopex>
enebo: and then run on emscripten
<headius>
and compile v8 to llvm IR and run that in svm
<chrisseaton>
enebo: for pure C applications not doing any interop I think it's essentially a fully compliant C implementation
<chrisseaton>
enebo: but MRI contains undefined behaviour of course
<chrisseaton>
enebo: Sulong doesn't support threads at the moment either
<enebo>
chrisseaton: it would be interesting to see if it even partially worked
<chrisseaton>
headius: we do support inline assembly in Sulong!
<enebo>
chrisseaton: use Ruby 1.8.7
prasun has joined #jruby
<enebo>
chrisseaton: or 1.8.6 since that was just an AST interp
<lopex>
1.8 had guarantees at least
<enebo>
well 1.8.7 was as well but the further you go back the simpler the C impl
<lopex>
good old 1.8
<enebo>
chrisseaton: that would be an awesome post about sulong progress
<headius>
or mruby
<enebo>
mruby would be neat too
<headius>
if you could compile MRI to sulong there wouldn't be any need for TruffleRuby other than concurrency, eh?
<chrisseaton>
I think the Sulong team are focused on finding security bugs in open source projects at the moment - they're going around opening issues on projects with buffer overflows etc that they've found that other tools haven't
<enebo>
I was reading a good article on web assembly...It looks fun
<chrisseaton>
headius: well it wouldn't be any faster than normal MRI
<chrisseaton>
you'd need the second futurmura projection for that!
<headius>
why not?
<headius>
wouldn't sulong be able to pe stuff that doesn't inline in MRI?
<enebo>
chrisseaton: wouldn't it make profiled inlining decisions
<chrisseaton>
Yes but not at the Ruby level
<headius>
specialize functions, inline through funcall etc
<lopex>
enebo: which one ?
<chrisseaton>
It might be a bit quicker than a static binary
<enebo>
lopex: I will see if I can find that article...I hope I pinned it
<lopex>
chrisseaton: but in theory you could rewrite optimized asm64 to whatever you want
<lopex>
that explorer is quite cool from the video
enebo_ has joined #jruby
enebo_ has quit [Client Quit]
<lopex>
but it will be ages before it wil lhappen though
<lopex>
for industry
<lopex>
headius: is matz much into mruby nowadays ?
<lopex>
mruby has much better api, no wonder though
<lopex>
no global state etc
<enebo>
lopex: no callbacks from C back into Ruby
<lopex>
enebo: no ?
<lopex>
enebo: and no DataWrapStruct
<lopex>
that's the curious part
<enebo>
lopex: nope...
<lopex>
chrisseaton: I think those made you some troubles
<lopex>
and that destructor functions lol
<lopex>
enebo: but I was quite happy using swig
<lopex>
enebo: decade ago
<lopex>
lol
<lopex>
enebo: from a c++ ext I was experiencing opengl object disappearance
<lopex>
enebo: turned out to me mri gc
<lopex>
enebo: so I just pinned them in ruby
<lopex>
what a times
<enebo>
lopex: so in mruby you basically need to make finer-grained calls since you cannot call back from your ext back into Ruby but it solves issues like you just mentioned
<headius>
enebo: you were able to repro this problem right?
<headius>
I would like you to try the gem before I push it
<lopex>
chrisseaton: how do you resolve the unions ?
<chrisseaton>
I haven't encountered any unions yet
<lopex>
chrisseaton: ah, and that snippet work ?
<chrisseaton>
Yeah
<chrisseaton>
C thinks it's a pointer and calls #[] when you read memory and #[]= when you write it
<chrisseaton>
Even in functions like memcpy
<lopex>
chrisseaton: obeisance
<headius>
I've encountered a few cext that use unions
<headius>
just as a C-ism
<lopex>
headius: there's quite a few
<headius>
nothing specific to the C API
<lopex>
but RString is a good example
<lopex>
chrisseaton: any guard for safety ?
<chrisseaton>
It won't let you read out of bounds, no
<lopex>
ok
<chrisseaton>
You'll get a Ruby exception
<lopex>
cool
<headius>
how does that RARRAY_PTR work if they're incrementing the pointer?
<headius>
maybe something I'm not seeing here
<lopex>
is there a distinction between inprocess oob and outprocess oob ?
<lopex>
headius: that's just the array start right ?
<chrisseaton>
headius: if you increment the pointer, Sulong gives you a fat pointer that is the reference to that object and an offset beyond it
<headius>
and how does that work with system calls that walk that pointer?
<chrisseaton>
Then reads add that offset and pass it to the object's #[]
<headius>
e.q. qsort and the like
<chrisseaton>
That one doesn't work with system calls, but the string one does, by converting the string to native when it is first escapes to native
<headius>
ah so like rbx approach but lazy in that case
<chrisseaton>
qsort's not a system call though, it's just C
<headius>
well, you would have to have llvm ir for libc though
<headius>
it's not "just C" on most systems
<lopex>
can that fat opinter be PEed ?
<lopex>
or hmm
<lopex>
I'm confused
<chrisseaton>
Well it's better than Rbx because we only do it when the string escapes (lazy yes) but also we don't copy again - we then use the same native memory for that string from Ruby as well for the rest of its life
<chrisseaton>
We copy once per instance at most
<chrisseaton>
lopex: yes the fat pointer can be PEd and EAd
<lopex>
cool
<lopex>
so it's a node on graph initially
<lopex>
cool
byteflame has quit [Remote host closed the connection]
<chrisseaton>
So if you escape a Ruby string's address to native through a system call, and then modify in on another thread in native code, you'll see those modifications in Ruby!
<lopex>
vie deopt ?
<chrisseaton>
I don't think Rbx or old JRuby would see those modifications as it wouldn't know to copy again
<chrisseaton>
lopex: a fat pointer is just a POJO, it's not a node
marciol has quit [Remote host closed the connection]
<chrisseaton>
lopex: what about hardware fences?
<lopex>
what;s LLVMTruffleObject ?
<chrisseaton>
the fat pointer
<lopex>
ok that pojo
<lopex>
but the runtime must know about it right ?
<chrisseaton>
What runtime? It's Sulong's object, Ruby doesn't know about it
<lopex>
oooh
<chrisseaton>
We don't have a Java dependency on Sulong - we actually just interact with it via eval!
<lopex>
er, I was in truffle land for too much
<chrisseaton>
We eval the C extension's bitcode
<lopex>
yea
<lopex>
and pass them intograal
<lopex>
er, parse them into graal
<chrisseaton>
Well the PE does that for us from Truffle nodes
<lopex>
but I'm still confused
<headius>
chrisseaton: that's correct, rbx would not see those changes, and pretty sure JRuby cext wouldn't either
<headius>
which was part of the fatal flaw of copying back and forth
<headius>
rbx still does it that way and it will always have problems
<lopex>
chrisseaton: you parse the bitcode and end up with ast, and then a graal graph right ?
<chrisseaton>
Yes
<chrisseaton>
The AST for LLVM bitcode is very linear of course
<chrisseaton>
A little sub-tree of nodes for each instruction, then a linear sequence of those little trees for each basic block, and a linear sequence of basic blocks