<lopex>
enebo: so what's the stance wrt that coding preamble issue ?
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<enebo>
lopex: as far as adding UTF8 as an alias?
<lopex>
yeah
<enebo>
lopex: I see no harm but I also have not made the change
<lopex>
enebo: yeah saw your message
<enebo>
lopex: if you want to then go for it
<lopex>
enebo: can coding: appear in multiple palces in a file ?
<enebo>
lopex: ask nirvdrum
<enebo>
lopex: but no I don't think so
<enebo>
lopex: I think coding is only at the topish
<lopex>
ah, yeah he was involved in that conv
<lopex>
yeah
<lopex>
actually any other comment should give an error
<enebo>
lopex: so add the alias if you have time and hopefully nirvdrum will amaze us
<nirvdrum>
I pushed some changes to TruffleRuby. I scaled back what I was thinking of doing quite a bit though because it wasn't paying off as much as I thought. I'm going to prepare a JRuby patch soon.
<lopex>
cool
<lopex>
nirvdrum: cool
<lopex>
enebo: but there is such alias in mri encodings
<nirvdrum>
Oh, you guys are talking about that other bug. I wasn't looking at that. I was looking at improving startup time by not spending so much time parsing magic comments.
<nirvdrum>
And I came across a couple other bugs along the way.
<enebo>
nirvdrum: although do we have !tokenSeen?
<lopex>
yeah, that commit is about case insensitive preambles
<lopex>
enebo: do we have it ?
<enebo>
lopex: checking
<lopex>
nirvdrum: I'm still lost why that utf8 was not found
<lopex>
nirvdrum: since there an alias in jcodings already
<enebo>
nirvdrum: but is JRuby lexer missing something
<enebo>
nirvdrum: you skip parseMagicComment potentially based on it
<lopex>
aah
<lopex>
so there's more such comments
<nirvdrum>
enebo: As far as I can tell, this is completely safe. The encoding comment must be on the first line. The frozen string literals comment must occur before the first token is seen. And the warn on indent isn't implemented.
<lopex>
yeah, I recall
<nirvdrum>
enebo: The only thing that's funky is if you run with warnings enabled, you'd be notified about using frozen_string_literals after a token. So, I added that in as a check.
<nirvdrum>
All of ruby spec passes with it. And the number of lines actually visited drops quite a bit.
<enebo>
nirvdrum: I am really confused you must have other commits
<nirvdrum>
Confused by what?
<enebo>
nirvdrum: this.tokenSeen = tokenSeen where tokenSeen is unconditionally true isn't it?
<lopex>
nirvdrum: so what if someone invents another magic comment ?
<lopex>
nirvdrum: how will they mix ?
<lopex>
the coding is first I gather ?
<nirvdrum>
lopex: It would need to be revisited. But so would parseMagicComment.
<lopex>
mhm
<enebo>
ah it is the local
<nirvdrum>
Since we do an if-else chain.
<enebo>
and the local starts false but immediately sets field to true
bbrowning is now known as bbrowning_away
<lopex>
but since that warn is undefined it;s still aproblem
<enebo>
nirvdrum: does the file 'puts "A" # coding: UTF-16LE\n' work?
<lopex>
enebo: how far 2.4 is ahead wrt that staff ?
<enebo>
ah space
<enebo>
maybe puts"A"# coding: UTF16-LE\n'
<nirvdrum>
enebo: Work in what sense? I modified it to be 'puts "A".encoding' and the value is UTF-8 with MRI.
<lopex>
with that utf 16 preamble ?
<enebo>
I added a second line p "a".encoding
<enebo>
It still prints UTF-8 for second string
<enebo>
so It does not see that coding
<nirvdrum>
So, this diverges from MRI. I'm happy to submit a PR and you can evaluate it. But it saves a lot of time in joni.
<enebo>
derp I get it
<enebo>
we call into yylex more than once so first assign marks it as too late
<nirvdrum>
As lopex notes, it makes an assumption that future magic comments will follow suit.
<enebo>
nirvdrum: can you measure a significant difference?
<enebo>
I really thought we could see some of these anywhere
<nirvdrum>
Keep in mind it is a bit more pronounced for us because a lot of our core library is authored in Ruby has 30+ lines of preamble for license & copyright.
<enebo>
nirvdrum: if you cannot figure out any mechanism where it is anywhere but first line then It is crazy to keep looking every comment
<lopex>
enebo: so they are not includec in the grammar ?
<lopex>
*included
<nirvdrum>
It dropped the number of lines visited from 7,721 to 2,885.
<lopex>
that;s insane
<nirvdrum>
I had that further reduced, but it came at the cost of a more complicated regexp and was basically a wash.
<lopex>
wowo
<enebo>
nirvdrum: so the only semantic issue is encountering multiple feature comments after a token is encountered right?
<lopex>
so it;'s just a state in yy right ?
<nirvdrum>
I removed the search for '-*-' and the set_file_encoding search by pushing both into the magic comment regexp. And then guarded all of that by a quick scan for ':' or '='.
<enebo>
nirvdrum: which since it is a warning we can probably add a second boolean for that
<nirvdrum>
I need to play with that more.
<enebo>
nirvdrum: yeah I could see a double scan paying off
<nirvdrum>
But, like I said, I had to fight joni a bit.
<lopex>
how ?
<enebo>
nirvdrum: can you have a comment like # :::::: coding: UTF-16LE?
<lopex>
nirvdrum: ah, at parse time ?
<nirvdrum>
I started off with something like ^.*?(coding|frozen[-_]string[-_]literal). That was bad.
<nirvdrum>
It scanned that byte array many times.
<nirvdrum>
I realized I was dumb and didn't need to anchor it.
<enebo>
heh
<lopex>
ah
<nirvdrum>
But by complicated the regexp and having to fight with greedy matches to handle '-*-', I didn't really gain much.
* lopex
wonders about that regex absent feature
<nirvdrum>
I didn't think the pay-off was worth diverging from MRI enough.
<nirvdrum>
But, I'm going to give the ':' & '=' scan another shot.
<enebo>
Is there a way to scan backwards?
<enebo>
lopex: ^
<lopex>
enebo: no
<lopex>
enebo: apart from back matching
<enebo>
lopex: could there? Seems like regexp where it walked from end to start could be interesting
<lopex>
sure
<nirvdrum>
For its part, MRI seems to have dumped the state machine for that regexp into C and embedded that in the lexer.
<lopex>
enebo: look behind is already useful
<enebo>
lopex: ultimately find [:=] and look to the left
<enebo>
lopex: so we can look behind on that
<nirvdrum>
bbiab. Dinner time here.
<enebo>
lopex: I never used look behind but this seems perfect
<lopex>
enebo: look behind is somewhat restricted
<lopex>
but you can do some things there
<lopex>
enebo: like you cannot have alternatives there afaik
<lopex>
and some quentifier restrictions too
<enebo>
lopex: just consider if you can make one for this specific search...I am afk for about 15 minutes
<lopex>
enebo: which one exactly ?
<lopex>
the one nirvdrum posted ?
<enebo>
lopex: matching coding: or string_frozen_liter
<enebo>
lopex: the three features
<lopex>
but that's exact matching
<enebo>
match back from [:=] to look for those words
<enebo>
so not case insensitive?
<lopex>
afaik it can
<lopex>
yeah it must
<lopex>
nirvdrum: I'm not sure what enebo refers to := by ?
<lopex>
the grammar ?
subbu is now known as subbu|afk
<enebo>
lopex: perhaps I just mean coding: utf-8 style matches
<enebo>
lopex: I thought coding= utf-8 also existed
<lopex>
hmmm
<enebo>
but I don't see that in our code
<enebo>
so in any case it is a bit immaterial
<lopex>
hmm p /(?<=foo=).*bar/ =~ "foo=bar bar"
<lopex>
this will match
<enebo>
Search back from a char
<lopex>
enebo: is that close ?
<enebo>
lopex: I think something like /(?<=(?:coding|set_string_literal)\s*:)
<enebo>
heh I get a smily
<lopex>
hmm alternatives are allowed afaik
<enebo>
lopex: basically we want fast find of ':' then see what is in front
<lopex>
but abitrary quantifiers nos
<lopex>
yeah
<enebo>
lopex: we can rescan with more expensive regexp
<enebo>
lopex: since it almost never happens
<enebo>
although the nexttoken check seems like it would have a huge affect on this
<lopex>
yeah
<enebo>
making the regexp faster would drop way off in value
<enebo>
although most mri stdlib files now have comments
<lopex>
enebo: but ouy can save positions in yy right ?
<lopex>
oh I missing something
<lopex>
enebo: so why mri does that ?
<enebo>
lopex: I don't understand you
<lopex>
enebo: mri invented taht right ?
<enebo>
lopex: invented what?
<lopex>
parsing of those magic comments
<enebo>
lopex: yeah they added coding and later the two pragmas
<lopex>
yeah
<enebo>
lopex: they call them 'features' I think
<enebo>
lopex: you can also pass them in as command-line options
<lopex>
and they're position aware
<enebo>
lopex: well you mean only at the top?
<lopex>
top comments ?
<lopex>
oh yeah
<enebo>
lopex: from what nirvdrum said it looks like they must preceed first real token
<lopex>
arent they ?
<lopex>
hmm
<enebo>
lopex: but if that is so then I don't know why they constantly parse them
<enebo>
lopex: I can see one reason which is to add a warning
<lopex>
and they do ?
<enebo>
but I think maybe -w is needed
<lopex>
yeah
<lopex>
why so simple thing gets so much complicated
<lopex>
it;s just a map of values in a comment
<enebo>
so we could if (!tokenSeen || warning) && parseMagicComment
<enebo>
at that point then only the first comment lines would be slow
<enebo>
which if there are tons of doc comments then regexp is important again
<enebo>
which is typical of a lot of stdlib files now
<enebo>
so regexp probably still have some importance
<lopex>
well
<lopex>
it's all madness anyways
<enebo>
almost makes you wish you had a db
<enebo>
:)
<lopex>
yeah like redis
<lopex>
whatever
<lopex>
it';s all about moral chices
<enebo>
just would be nice to have all code in IRScopes and completely parsed and totally lazy
<enebo>
ir persistence can lazily load methods and it is not faster so perhaps AST would be better
<enebo>
doing the simplest test
<enebo>
which matters to me gem list
<enebo>
makes no obvious difference
<enebo>
although I would need to do a lot of runs to know
<enebo>
It is not big
<enebo>
2100 calls to magicComment with !tokenSeen for gem list
<enebo>
13400 without
<lopex>
quite a lot
<enebo>
yeah I guess 13 regexps is not a ton of time though
<enebo>
err 13k
<enebo>
nonetheless it seems like a good idea
<lopex>
depends on the regxps
<enebo>
lopex: of course
<enebo>
lopex: I just mean even though this could be a better regexp it is not going to be a lot of time
subbu|afk is now known as subbu
<enebo>
lopex: and consider before/after gem list was 1.6s and that is pretty short run time
<lopex>
but joni regexps are quite heavy
<enebo>
lopex: lets see if gem list will run if I comment it out
<enebo>
lopex: If I completely comment out parseMagicComment I see no timing difference
<enebo>
lopex: I am not saying it is not a reasonable change and no doubt less work is less work since this is not a complicated change but it is no smoking gun for us
<lopex>
sure
<enebo>
lopex: ok early dinner for me
<enebo>
well somewhat normal dinner for me but I eat early :)
enebo has quit [Quit: Leaving.]
camlow325 has quit [Quit: WeeChat 1.5]
tcrawley-away is now known as tcrawley
tcrawley is now known as tcrawley-away
ankitr has joined #jruby
ankitr has quit [Ping timeout: 256 seconds]
<nirvdrum>
lopex: I was using "(coding|frozen[_-]string[_-]literal|warn[_-]indent)\\s*(:|=)\\s*(\\\"?[\\w-]*\\w\\\"?)";