<GitHub103>
jruby/master 599d962 Chris Seaton: [Truffle] Link callsite versions to method versions.
<GitHub103>
jruby/master 8d1d445 Chris Seaton: [Truffle] Get blocks to include their arguments in source sections.
<GitHub103>
jruby/master c985553 Chris Seaton: [Truffle] The parser doesn't give us a node for 'end' or '}', so hack to try to get it to include that in source sections.
<GitHub131>
jruby/jruby-1_7 606b943 Thomas E. Enebo: Update to jnr-posix 3.0.25. Fixes failover stat FindFirstFile detecting FileNotFound conditions
mrmargolis has joined #jruby
<GitHub107>
[jruby] ruskotron opened issue #3599: jrubyc won't compile a file with method in outer scope (NoMethodError: undefined method `new_method' for nil:NilClass) https://github.com/jruby/jruby/issues/3599
<GitHub37>
jruby/jruby-1_7 c49b529 Thomas E. Enebo: Update joni to latest release (fixes several reported issues)
krainboltgreene has joined #jruby
<lopex>
on split
bbrowning is now known as bbrowning_away
<enebo>
bah
<enebo>
all these snapshots on master
mkristian_ has quit [Ping timeout: 255 seconds]
chrisseaton has joined #jruby
<enebo>
something changed in jcodings
talevy has joined #jruby
tjohnson has joined #jruby
mkristian has joined #jruby
<enebo>
lopex: do you feel master of jcodings is good enough for a full release?
subbu|afk is now known as subbu
<enebo>
looks like this one has a massive data update
<chrisseaton>
enebo: I updated the character tables
<lopex>
enebo: code ranges and transcoder data
<chrisseaton>
they'd been done but on stale data for the last year or so
asarih has joined #jruby
olleolleolle has joined #jruby
fidothe has joined #jruby
Guest85414______ has joined #jruby
<chrisseaton>
and actually I only got them to use newer data for the code ranges - I didn't figure out how to parse the rest of it
<chrisseaton>
it was enough to fix my bug though
<enebo>
chrisseaton: lopex: ok yeah I figured. I am more concerned about confidence
jeregrine has joined #jruby
pitr-ch has joined #jruby
aemadrid has joined #jruby
<enebo>
chrisseaton: if the differences of generated data obviously only changed a few mistakes then I will have more confidence but I am trying to determine if 1.7.x should update to a fresh jcodings release or not
mccraig has joined #jruby
<enebo>
it defintiely sounds like some stuff will get fixed
<lopex>
enebo: well, lots of other changes in jcodings, but mostly simplification
knowtheory has joined #jruby
<enebo>
lopex: not what you would consider risky stuff
<chrisseaton>
there's a failing RubySpec, so it would improve your RubySpec coverage if you care about that
<chrisseaton>
and someone somewhere cared enough to create this spec
<lopex>
enebo: also, the scripts contain a bunch of asserts when parsing c code
Scorchin has joined #jruby
<enebo>
so no one seems particularly worried about this. I will put out 1.0.16 and update jruby-1_7 branch and master
flavorjones has joined #jruby
<lopex>
enebo: you know I'm never confident of anything :)
<enebo>
lopex: yeah I know :)
<enebo>
lopex: it’s ok. I internally adjust your statements for confidence values :)
<lopex>
enebo: no-confidence release
mrmargol_ has joined #jruby
blaxter has quit [Ping timeout: 264 seconds]
mrmargolis has quit [Read error: Connection reset by peer]
<lopex>
enebo: I mean those large transcoder tables are mostly for not often used encodings
<lopex>
and they compress only at 50%
<chrisseaton>
lopex: why can't we use Java's character classes? they both just come from the Unicode spec don't they?
<enebo>
chrisseaton: these tables have direct translation without indirecting through UTF16LE
<lopex>
chrisseaton: I believe mri might have a richer set
blaxter has joined #jruby
<lopex>
chrisseaton: also there's encodings larger than unicode wrt character space
<enebo>
yeah also more encodings shipped…JRE does not bother with some in some places
<norc>
enebo: I was told you were responsible for porting the parser from MRI, is that correct?
<lopex>
unicode is only 21bit
<enebo>
norc: from some point in time forward yeah
<enebo>
norc: someone else did original spike in 2000
<lopex>
chrisseaton: there was a lot of tension between unicode consortium and asian countries
<norc>
enebo: I see. Well I am currently in the process of dissecting cruby. Along the way I was wondering how anyone managed to fork everything considering the total lack of documentation anywhere...
<chrisseaton>
yeah I got that, but I thought these tables were just generated from the unicode data
<lopex>
chrisseaton: another thing is mri likes to change and patch things
<mkristian>
enebo, I pushed my fix and I am of to bed with flu and traveling the next two days. so you have to be happy with what is :)
<enebo>
norc: well we keep same production list and it is also a LALR parser
<lopex>
chrisseaton: yeah, mri is too
<enebo>
mkristian: thans. I saw that
<enebo>
mkristian: happy to you could test it since I could not repro
<lopex>
chrisseaton: but this way we wont miss their changes
<enebo>
mkristian: also thanks for getting the squared away :)
<lopex>
chrisseaton: I'm not a fan of how it's done now though
<enebo>
norc: the lex_state is all a port at this point and pretty confusing
<enebo>
norc: digital extremist is a good person to talk to about ruby parsers since he did his own ruby parser
<norc>
enebo: That is good to know. Is he a person that hangs around here?
<mkristian>
enebo, will look at the other windows issue with expand_path after the release
<enebo>
he is @whitequark on line
<norc>
Oh. That is a familiar name. :-)
<enebo>
actually he just used to always display that name in his description
<enebo>
but I think of him as that now :)
<enebo>
but he probably knows MRI parser design better than anyone at this point
<enebo>
I have some insight but I never had the time nor inclination to try and unravel the weird state stuff in the lexer
<norc>
Started doing some modifications here and there, even authored a small compiler patch. It's been fun but really confusing, especially once you get past the parser.
<norc>
There is some implementations that are basically just 10k lines of wtf.
<enebo>
yeah the parser is really big but pretty ordinary … the lexer is pretty weird
<enebo>
and some features like nested quoting I do not think can be done without some weirdness
<enebo>
e.g. “#{“a”}”
<lopex>
chrisseaton: also just a question if it's worth to make an exception for unicode (apart from risking incompatibilities)
<norc>
I can see how that might be non-trivial to lex.
<enebo>
lopex: chrisseaton: direct encoding to encoding translations exist for jcodings but not for JVM right?
<norc>
The hard thing for me was the inability to find people over in #ruby who know their way around the source code. Figuring how all these weirdnesses in the Ruby VM for example has been a pretty hard task.
<norc>
But then again I suppose that is the price for trying to work on a strictly Japanese piece of software.
<lopex>
enebo: either not for all supported by mri or fewer of them (which makes transcod trips longer)
<lopex>
enebo: or the other way round :)
<enebo>
lopex: but Java does it for nothing but UTF16LE
<chrisseaton>
norc: I'm planning to write a new parser in Antlr at some point
<lopex>
enebo: but it has to transcode to outside world
<enebo>
lopex: so that is a small difference
<chrisseaton>
norc: I need some more detailed source information that the JRuby parser doesn't have at the moment
blaxter has quit [Ping timeout: 256 seconds]
<lopex>
enebo: but yeah, you dont have m:n problem on every occasion like in ruby
<enebo>
chrisseaton: please make it a goal to enter at def’s as an alternate entry point
<chrisseaton>
enebo: not sure what you mean
<chrisseaton>
enebo: we're making it support lazy parsing
<lopex>
enebo: also, it's a question for how "direct" those transcodes are
<enebo>
chrisseaton: for IDE support partial parsing is a huge deal
<enebo>
chrisseaton: primarily methods
<lopex>
enebo: or if you'll have different round trip problem
<lopex>
*problems
<enebo>
chrisseaton: so my request is only for IDE support
<norc>
chrisseaton: What kind of source information are you talking about?
<chrisseaton>
norc: things for the lazy parsing, and we want to know exact source location of all nodes
<chrisseaton>
norc: plus embedded JS and other possible things for the future
<enebo>
ok soup time for me…bbiab
<norc>
chrisseaton: We are strictly talking about jruby here though right?
camlow32_ has quit [Remote host closed the connection]
<chrisseaton>
well it parses Ruby and it's written in Java, but not sure how a parser could be specific to JRuby apart from that
<chrisseaton>
it's not Truffle specific
camlow325 has joined #jruby
camlow325 has quit [Remote host closed the connection]
camlow325 has joined #jruby
bb010g has joined #jruby
camlow325 has quit [Remote host closed the connection]
lanceball is now known as lance|afk
camlow325 has joined #jruby
camlow325 has quit [Remote host closed the connection]
camlow325 has joined #jruby
camlow325 has quit [Remote host closed the connection]
<enebo>
yeah table update broke this but I can see MRI no longer treats this as :blank:
<lopex>
what now ?
<lopex>
jcodings branch ?
<enebo>
latest version of jcodings (I released today) and MRI’s own data now shows this as not a [[:blank:]] but it did in 1.9.3 of MRI and jruby 1.7.23
<enebo>
lopex: so upstream data changes made this not a blank…
<lopex>
I know, but what we gonna do now
<enebo>
lopex: I don’t know
<lopex>
branch jcodings or revert for 1.7 ?
<enebo>
lopex: I would hate to unwind a bunch of fixes for this one char not working but it has not worked in MRI for years
<lopex>
with you
<enebo>
lopex: if this was a common space character I would unwind but I may just tag this out
<chrisseaton>
we tried truffelising but not sure how far it went
<lopex>
enebo: they probably removed 30% of code :)
<enebo>
chrisseaton: yeah I was also wondering that :)
<enebo>
lopex: yeah that makes sense
<chrisseaton>
we've truffelised pack, printf and unpack now, so small languages do seem to work fine
<enebo>
lopex: you could also have each instr know if they are 7bit only and then just look at all instrs and then pass it to 7bit interp if all of them are
<enebo>
lopex: or maintain some state as you build instrs
<lopex>
chrisseaton: yeah, I see the possibilities
camlow325 has joined #jruby
<nirvdrum>
enebo: Oh wow. Now that you shortened it to "oni", the name "joni" suddenly makes sense.
<enebo>
chrisseaton: yeah I am mildly surprised out of the other langs this one was not near the top
<chrisseaton>
yeah I was thinking that as well :)
<chrisseaton>
enebo: regex isn't huge in benchmarks, and we weren't ready to run real code with regexps yet
<enebo>
chrisseaton: ok. Not sure. In the old days they were extemely important because cgi.rb ot http.rb or some lib like that did several massive regexps
<enebo>
I think it was http.rb processed all headers and the body as a regexp
<enebo>
lopex: you remember?
<lopex>
enebo: cgi times ?
<enebo>
lopex: yeah maybe
<lopex>
enebo: rexml was huge
<enebo>
lopex: oh also I guess you are only generating single byte for single byte encodings
<nirvdrum>
I think strscanner makes heavy use of regexps.
<lopex>
enebo: I havent played much with perl though
<enebo>
lopex: I realize you could also generate two interpreters and if you know non-singlebyte encoding only had single byte chars you could use simpler one
<enebo>
lopex: in fact for utf-8 that would be a big improvement if JVM is not optimziing the mbc instrs well
<lopex>
enebo: yeah, but bytecode array doesnt virtualize
<lopex>
enebo: hmm, depends how much would have to be dupped
<lopex>
enebo: the branches have to go somewhere
<enebo>
lopex: I think you lost me a little bit
<enebo>
lopex: you generate an array of bytecode instrs
<enebo>
lopex: you could generate 2
<enebo>
lopex: then have 2 interpreters. one of which is optimized for single byte
<enebo>
lopex: I think this is your idea with using 7BIT CR to pass off to singlebyte when appropriate vs only doing it for pure 7bit encodings
<enebo>
lopex: assuming the 7bit-only interp is faster
camlow325 has quit [Remote host closed the connection]
<lopex>
enebo: I missed you because it's the Idea I started with
<enebo>
lopex: ah ok. I thought you meant you only would do it for purely 7-bit encodings
<enebo>
lopex: e.g. US-ASCII
camlow325 has joined #jruby
<lopex>
enebo: the other story is though (if still larger switches in hotspot as slower) spliting the big switch would pay of too
<lopex>
*are slower
<lopex>
even with dense switch values so no balanced trees are used
camlow325 has quit [Remote host closed the connection]
<lopex>
enebo: remember that issue ?
<enebo>
lopex: yeah but I do not think this is a big switch
<enebo>
lopex: and I don’t know how true that is now
<lopex>
enebo: it got slower as cases were added
<enebo>
lopex: heh
<lopex>
enebo: I tested that to oblivion
<lopex>
enebo: and the suspition was that hotspot tries to inline the cases
<enebo>
lopex: do you have an instr dumper for joni?
<enebo>
lopex: it would be neat to see how big some regexps are
<enebo>
lopex: I guess I sort of remember seeing the output from this before
camlow325 has joined #jruby
<lopex>
enebo: tremendously helps tracing bugs when comparing with oni
subbu is now known as subbu|lunch
camlow325 has quit [Remote host closed the connection]
camlow325 has joined #jruby
yfeldblum has joined #jruby
bb010g has quit [Quit: Connection closed for inactivity]
<lopex>
nirvdrum: you think that cr issue will convince mri ro fix it ?
<lopex>
*to
<nirvdrum>
lopex: I haven't filed it yet. It seems unlikely, but who knows.
<lopex>
nirvdrum: I believe on these cases we need to be bug to bug compatible
<lopex>
I recall lots of these when porting core
<nirvdrum>
I think you can be bug-for-bug compatible without having to use some of their crazy methods.
<nirvdrum>
But you have more experience in this area than I do.
<lopex>
if you're chasing moving target that becomes difficult
<nirvdrum>
But most of this hasn't changed since the 1.9 release.
<lopex>
yeah, I think most of that settled now
<lopex>
but 1.9 was terrible to chase for the first two years
<nirvdrum>
I do agree that when issues arise, it's a lot easier to diff that way.
<nirvdrum>
I'm playing with ropes a bit locally. Nothing worth sharing yet. But naturally it means changing the way some of these methods are implemented.
<lopex>
nirvdrum: the most notable place when jruby diverged in a string for example was more aggressive sharing and cow for strings
shellac has quit [Quit: Computer has gone to sleep.]
<lopex>
since before, mri strings were all null terminated and they couldnt afford that
<lopex>
afaik that has changed
camlow325 has quit [Remote host closed the connection]
<nirvdrum>
Ahh. Interesting.
camlow325 has joined #jruby
<nirvdrum>
IIRC, they're still null-terminated in Rubinius.
<lopex>
so on split they could share only the last part for example
camlow325 has quit [Remote host closed the connection]
camlow325 has joined #jruby
camlow325 has quit [Remote host closed the connection]
<lopex>
I wonder how much leak is there due to that
<lopex>
but I guess array is much more offending here
<lopex>
since it can leak whatever, not only byte[] regions
<nirvdrum>
lopex: I mean, are you asking in JRuby, Rubinius, or MRI?
lanceball is now known as lance|afk
<lopex>
nirvdrum: a reason jdk doesnt share on substring now for example
<lopex>
nirvdrum: jruby
<lopex>
in java.lang.String
<nirvdrum>
Gotcha.
shellac has joined #jruby
pawnbox has quit [Ping timeout: 265 seconds]
<lopex>
nirvdrum: headius even fantasized about spare thread that walked OS and reallocated shared strings once in a while
<lopex>
or arrays
<nirvdrum>
I had run into a case where it was a problem in JRuby. I don't think I filed an issue, but headius fixed it a couple years back.
<nirvdrum>
I think the problem was it was a large source string that a small slice was taken of, but then CoW was triggered.
<lopex>
and yet no one knows if it really pays off in th wild
<nirvdrum>
And now I had multiple copies of this large string.
<lopex>
ah
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<nirvdrum>
lopex: Have you thought at all about a method that calculates CR, length, and hash code all in a single pass?
<lopex>
nirvdrum: I guess I remember you mentioning it
<nirvdrum>
I don't think I have. Maybe I did.
<enebo>
nirvdrum: lopex: my dream is to cache length
<nirvdrum>
eregon was looking at it at one point, I think.
<nirvdrum>
enebo: That's about the only thing I have working with ropes right!
<lopex>
nirvdrum, enebo: and hash I guess
<lopex>
and now String will have more state than the entire world
<enebo>
for mbc strings length is expensive :)
<nirvdrum>
enebo: But I'm eagerly calculating CR & length in a single pass right now.
<lupine>
heh. when sexy vegan's parents went away for a funeral recently, they asked her not to have sex in their bed
<lupine>
goddamnit
<enebo>
nirvdrum: cool
<lopex>
enebo: yeah
<nirvdrum>
enebo: It may be prohibitively expensive.
<lopex>
enebo: do you remember why that unsafe utf8 walk was broken ?
<nirvdrum>
enebo: I'm hoping to take advantage of math properties and doing some of this during the parse phase.
<enebo>
what?
<enebo>
no it is not expensive
<enebo>
size?
<enebo>
err lengh
<nirvdrum>
You just said "mbc strings length is expensive" :-P
<enebo>
The only reason MRI did not do it was due to limited size in their fixed size structus
<enebo>
nirvdrum: to calculate it
<lopex>
utf8 length forbids any optimization
<lopex>
for hotspot
<nirvdrum>
enebo: Well, yeah. But that's what I mean. I'm eagerly calculating it now, whether you use it or not.
<nirvdrum>
But I want the code range cached, too. If I'm walking the thing already, I may as well get the length at the same time.
<nirvdrum>
'tis the theory anyway.
<enebo>
nirvdrum: yeah I think we should cache and wipe out on modification
<enebo>
nirvdrum: agreed
<enebo>
nirvdrum: not walking is worth a small amount of memory
<lopex>
effectful modification
<nirvdrum>
enebo: Right. And virtually every String method ends up scanning for the code range.
<nirvdrum>
I guess we have a situation currently where string literal nodes can be told it has CR_VALID without the string length.
<lopex>
nirvdrum: but depends if that's a majority, lots of operations just taint
<nirvdrum>
So you'd be doing an extra scan there.
<enebo>
nirvdrum: I afso agree lexing can get rid of initial need to unknown
<lopex>
enebo: it's not there yet ?
<enebo>
lopex: not sure
<enebo>
lopex: I know we have talked about it
<enebo>
we = the three of us
<nirvdrum>
lopex: Nearly all of them have a single byte optimizable form. You're right that they don't all scan. But then you have strings that really are 7-bit going down the mbc path and they have to walk the full string anyway.
<nirvdrum>
lopex: We have this currently for anything being read from IO, for instance.
<lopex>
nirvdrum: yeah, that is right
<nirvdrum>
The ffi constants read from file at boot up all are CR_UNKNOWN.
<lopex>
just wondering about cases where literals taint new strings
<lopex>
and how far that goes
<nirvdrum>
Basically, I'm not sure all the machinery in place to lazily calculate CR is worth it. In some cases eagerly calculating it may be wasted effort. But in most cases, you want to know the CR.
<nirvdrum>
And if you have to scan for the CR, you can work out the string length at the same time.
<lopex>
and hash
shellac has quit [Quit: Computer has gone to sleep.]
<nirvdrum>
Right.
<nirvdrum>
Hash is a bit different because it's on the ByteList right now.
<GitHub0>
jruby/jruby-1_7 eb1dd7a Thomas E. Enebo: Fixes #3550. Warning "io/console not supported; tty will not be manipulated" occurs again on 1.7.23
<GitHub144>
[jruby] enebo closed issue #3550: Warning "io/console not supported; tty will not be manipulated" occurs again on 1.7.23 https://github.com/jruby/jruby/issues/3550
robbyoconnor has joined #jruby
robbyoconnor has joined #jruby
robbyoconnor has quit [Changing host]
tcrawley is now known as tcrawley-away
shellac has quit [Quit: Computer has gone to sleep.]