shellac_ has quit [Quit: Computer has gone to sleep.]
shellac_ has joined #jruby
shellac_ has quit [Client Quit]
bbrowning_away is now known as bbrowning
lroca has joined #jruby
Puffball has quit [Ping timeout: 260 seconds]
Puffball_ has joined #jruby
lroca has quit [Quit: lroca]
Puffball_ has quit [Remote host closed the connection]
enebo has quit [Ping timeout: 240 seconds]
yosafbridge` has quit [Ping timeout: 240 seconds]
enebo has joined #jruby
yosafbridge has joined #jruby
sidx64 has joined #jruby
sidx64 has quit [Client Quit]
sidx64 has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
sidx64 has joined #jruby
Guest68225 has quit [Ping timeout: 245 seconds]
me_ has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
sidx64 has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
sidx64 has joined #jruby
mkristian has joined #jruby
_whitelogger_ has joined #jruby
shellac_ has joined #jruby
claudiuinberlin has joined #jruby
shellac_ has quit [Quit: Computer has gone to sleep.]
shellac_ has joined #jruby
drbobbeaty has joined #jruby
rrutkowski has joined #jruby
rrutkowski has quit [Ping timeout: 255 seconds]
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
olle has joined #jruby
mkristian has quit [Quit: This computer has gone to sleep]
bzb has joined #jruby
fidothe has quit [Ping timeout: 240 seconds]
fidothe has joined #jruby
bzb has quit [Quit: Leaving]
drbobbeaty has joined #jruby
olle_ has joined #jruby
olle has quit [Ping timeout: 240 seconds]
olle_ is now known as olle
mkristian has joined #jruby
drbobbeaty has quit [Ping timeout: 240 seconds]
shellac_ has quit [Quit: Computer has gone to sleep.]
mkristian has quit [Quit: This computer has gone to sleep]
mkristian has joined #jruby
drbobbeaty has joined #jruby
bbrowning is now known as bbrowning_away
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
drbobbeaty has quit [Ping timeout: 245 seconds]
mkristian has quit [Quit: This computer has gone to sleep]
mkristian has joined #jruby
bbrowning_away is now known as bbrowning
<enebo> lopex: I am going to put out 1.0.28 of jcodings. I want the EUC-JP fixes
GitHub120 has joined #jruby
GitHub120 has left #jruby [#jruby]
<GitHub120> jcodings/master 3b6f3a4 Thomas E. Enebo: [maven-release-plugin] prepare release jcodings-1.0.28
<GitHub120> [jcodings] enebo pushed 1 new commit to master: https://git.io/vAbuv
GitHub41 has joined #jruby
GitHub41 has left #jruby [#jruby]
<GitHub41> [jcodings] enebo tagged jcodings-1.0.28 at 0d15ce3: https://git.io/vAbuU
GitHub83 has joined #jruby
GitHub83 has left #jruby [#jruby]
<GitHub83> jcodings/master 32707e8 Thomas E. Enebo: [maven-release-plugin] prepare for next development iteration
<GitHub83> [jcodings] enebo pushed 1 new commit to master: https://git.io/vAbuT
<lopex> enebo: ok
<lopex> nirvdrum: ^^
<lopex> enebo: do they work btw ?
<enebo> hahaha
<enebo> lopex: I hope so
<enebo> lopex: I will give it a quick check. Silly I did not bother. I trust you so much
<nirvdrum> lopex: Awesome. Thanks.
<lopex> enebo: this is the most annoying thing to test in jcodings
<enebo> lopex: well I have a test case for sure
<enebo> const_set and const_defined? now rely on these for identification ofwhether it is valid constant name
<enebo> whereas in the past we used jlString
<enebo> lopex: speaking of fun!!!! if you wanted to be a rock star you could add flag/enum support for RubySymbols so we can mark what type of identifier they can represent
<enebo> lopex: MRI added this a while back whereas we O(n) check over and over
mkristian has quit [Quit: This computer has gone to sleep]
<enebo> sorry I meant rock star ninja
mkristian has joined #jruby
<lopex> enebo: what are those types ?
<enebo> oh let me get a link
<enebo> lopex: symbol.h things like is_const_id
<lopex> oh ruby_id_types
<enebo> doh...hmm maybe they don't cache that
<enebo> I thought they did
<enebo> lopex: oh hmm seems to still be a problem
<enebo> lopex: I will debug this to make sure
<GitHub184> [jruby] greghuc opened issue #5082: Puma web server busted on Java 9.0.4 https://git.io/vAbgH
<enebo> lopex: isCodeCType(42699, 13) fails for 'λ' for EUC-JP
<lopex> enebo: might be char type offset issue, looking
<enebo> lopex: this code looks weirdf
<enebo> isWordGraphPrint does not contain ALNUM as valid
<enebo> but ALNUM(13) is less than MAX_STD_CTYPE(14)
<enebo> so either their is missing logic or isWordGraphPrint is not permissive enough
<lopex> well, this matches mri
<lopex> but yeah, those char types are broke on mri too
<enebo> oh
olle has quit [Quit: olle]
<lopex> on another sense :P
<enebo> you said they have inlined some of these checks outside of this
<enebo> somewhere not in this code right?
<enebo> in MRI
<lopex> enebo: ONIG_ENCODING_EUC_JP->is_code_ctype(42699, 13, ONIG_ENCODING_EUC_JP) is zero too
<enebo> so lambda is not an ALNUM
<enebo> from ONIG perspective
<enebo> lopex: it is frustrating because in unicode it goes down other path to isInCodeRange and returns true for ALNUM
<lopex> enebo: and those have different ranges too
<enebo> yeah
<lopex> well, isWord also should go for ranges
<lopex> everything
<enebo> of course I go to EUC-JP page and it links to unicode entry but it is an ALNUM
<enebo> I think onigmo is just wrong here
<enebo> oh hmm
<enebo> should I use isWord and not isAlnum?
<enebo> lopex: actually what is the difference?
<enebo> isWord does fix it
<lopex> yeah, those both a are true for unicode
<enebo> lopex: so maybe EUC-JP specifically does not think they are ALNUM while for unicode they do? but MRI will basically still think it is a valid identifier character for a constant.
<enebo> lopex: isWord is basically all characters which do not separate words? Is '$' isWord?
<lopex> for unicode ?
<enebo> lopex: I don't know for anything
<enebo> lopex: what does isWord mean
<lopex> for unicode it's 0-9a-zA-Z_
<lopex> from ascii range
<lopex> god knows what's there
<lopex> but the problem is in char types and not ranges
drbobbeaty has joined #jruby
rrutkowski has joined #jruby
<enebo> lopex: more or less I am depending on this method to look at a properly encoded string and ask if it represents a valid constant identifier
<lopex> what does mri have for that ?
<enebo> lopex: perhaps I should look at the lexer since it obviously is parsing
<enebo> lopex: I don't know...that was there those id types come into play
<enebo> return c != EOF && (Character.isLetterOrDigit(c) || c == '_' || !isASCII(c));
<enebo> this is our lexer
<enebo> which does not use jcodings at all
<enebo> which is fascinating since I did not read my original character through the lexer but transcoded it to EUC-JP
<enebo> likely this code should be the same as whatever ends up working in that method in RubySymbol
<enebo> #define is_identchar(p,e,enc) (rb_enc_isalnum((unsigned char)(*(p)),(enc)) || (*(p)) == '_' || !ISASCII(*(p)))
<enebo> #define parser_is_identchar() (!parser->eofp && is_identchar((lex_p-1),lex_pend,current_enc))
<enebo> That is MRI
<enebo> !ISASCII WOT!
<lopex> blech
<lopex> bleh even
<enebo> We do it as well but wtf
<enebo> So <256 gets isalnum and _ check but then anything else is fine?
<enebo> so at lexer level I could put some multibyte space char and it makes it past this point
<enebo> so MRI must validate this later somehow
<enebo> lopex: or am I mistaken?
<enebo> ./include/ruby/ruby.h:static inline int rb_isascii(int c){ return '\0' <= c && c <= '\x7f'; }
<lopex> there's #define is_identchar(p,e,enc) (ISALNUM((unsigned char)*(p)) || (*(p)) == '_' || !ISASCII(*(p))) in symbol.c too
<enebo> hehe so absolutely any character outside of that range will be valid for an identifier in the lexing portion of MRI (JRuby is a bit different since Character.isLetterOrDigit() I think will say yes/no for mbcs?
<enebo> lopex: ok so my problem right now...lambda is ok in constant name in MRI but I have no idea how they approve it. !ISASCII seems mad. That cannot possibly be valid can it?
<enebo> lopex: I am just confused...perhaps MRI just doesn't care about what the characters are once they leave ASCII space?
<enebo> If it is that easy then no problem I guess but I thought we have a huge encodings database which tells us stuff
<lopex> it might be some remnants
<lopex> I dont expect consistency from mri
<enebo> lopex: you mean they started with this weird heuristic and never used the data once it was available
<enebo> lopex: a jcodings helper method which may be nice is isASCII(c)
<lopex> enebo: does the parser switch encodings at any time ?
<enebo> you mean within a single sourcefile?
<enebo> like having #coding: half way down?
<enebo> lopex: can our codepoints ever be negative?
<enebo> lopex: since it is in a signed value
<lopex> they shouldnt
<lopex> not sure about gb18030
<lopex> unicode is too small for that
<enebo> lopex: !ISASCII for us can just be >x7f
<enebo> not that it is a massive savings :P
<lopex> yeah Encoding.isAscii is exactly that
<enebo> oh there is an isAscii
<enebo> :)
<enebo> going to lunch
<lopex> enebo: I need to redigest what you said above about that parse thing
claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]
shellac_ has joined #jruby
mkristian has quit [Quit: This computer has gone to sleep]
shellac_ has quit [Ping timeout: 240 seconds]
shellac_ has joined #jruby
akp has joined #jruby
shellac__ has joined #jruby
shellac_ has quit [Ping timeout: 264 seconds]
claudiuinberlin has joined #jruby
<nirvdrum> lopex: Thanks for the notification. I have no idea what that code does or what it's for :-)
subbu is now known as subbu|lunch
<nirvdrum> lopex: Thanks.
<lopex> managed to add that in latest joni overhaul
<nirvdrum> Forking into our own project made a lot of sense, but it's certainly made other things a bit harder.
<nirvdrum> enebo: Still working on lexer improvements?
<enebo> nirvdrum: well I am still working on bytelist internally
<nirvdrum> Ahh.
<enebo> nirvdrum: but not specifically improving lexing
<nirvdrum> I'm looking at ways to eliminate some of the CoW faults for sharing at the moment.
<enebo> yeah CoW has weird properties
<nirvdrum> ByteList can be more efficient than Ropes here. At least the way I've implemented SubstringRope.
<enebo> but CoW from lex source should not really be one
<enebo> I keep thinking all identifiers should just CoW the same byte array of the source itself
<nirvdrum> There's a weird one where it goes ByteList -> String to get an identifier just to use it to look up in a Map keyed by String.
<nirvdrum> As far as I can tell, that String isn't used for anything else.
<nirvdrum> The keyword check, IIRC.
<enebo> heh I noticed we rebuild same regexp 4 times
<enebo> that should be refactored
<enebo> or one time is for looking for lvars in the regexp
<enebo> but we should just make one in lexer and pass it all the way through
<nirvdrum> lopex: Do you plan to have a new version of joni soon, or is 2.1.15 sticking around for a while?
<lopex> enebo: I think it can be released at any time
<enebo> LOL: mri23 -e 'Object.const_set("D\u202FD", 1); p Object.constants'
<enebo> lopex: so proof enough for me that the semantics of constants are not what is documented
<enebo> lopex: ANY non-ascii multibyte character it allowed after the first one
<lopex> hooray
<enebo> which is what the code says I guess
<enebo> no Ruby book ever written says this though
<enebo> lopex: nirvdrum: ok well I am ok releasing jcodings since my problem had nothing to do with it. Someone may as well get their goodies
<lopex> enebo: and joni
<enebo> lopex: do I have to? :)
<lopex> enebo: for nirvdrum
<nirvdrum> Are you guys still open to some invasive changes to make jcodings SVM-friendly?
<lopex> enebo: since that array reading
<lopex> sure why not
<nirvdrum> Because they'd be invasive :-)
<enebo> I guess it depends on "invasive changes" means
<nirvdrum> I think we discussed in New Orleans. But it might've been Hiroshima.
<nirvdrum> enebo: Basically, we can't do dynamic class loading.
<enebo> ah yeah
<lopex> ah I recall now
<enebo> I think the answer was just having a second class which could load all those eagerly
<enebo> or something like that?
<nirvdrum> I need to look again, but I think my idea was to load all the classes, but keep them shallow. The tables would be read lazily.
<enebo> oh
<lopex> nirvdrum: and newInstants ?
<lopex> *instance
<enebo> so data would still be lazy load but all types would be present
<nirvdrum> I'm doing something different in TruffleRuby at the moment. There, I just threw away a bunch of jcodings.
<nirvdrum> It's ugly, but it works.
<nirvdrum> Look for TruffleOptions.AOT.
<nirvdrum> SVM sees the static TruffleOptions.AOT value and discards the other branch which contains the code doing the dynamic lookup.
<nirvdrum> enebo: Yeah, that's the idea. I haven't looked at it in a while. I *think* the additional overhead would be minimal. But I'd have to work out the thread-safety of the tables.
<nirvdrum> Since those are read-only, two threads both loading the tables wouldn't be the end of the world.
<nirvdrum> lopex: I'd have to look at that again.
<enebo> so if I remember this is not just a load time issue but also a memory one
<nirvdrum> Basically I don't want to head down this path if it's apt to be rejected out of hand. But I'm happy to collaborate on it.
<enebo> nirvdrum: lopex: how many types are we talking about?
<enebo> telling me one per encoding is not what I am asking :)
<lopex> no
<lopex> dunno, like 30 impls max ?
<nirvdrum> Memory potentially. But the tables would end up compiled into the process and currently the whole process is loaded into memory anyway. So I'm not sure there's really any savings to be had there.
<lopex> er more like 50
<enebo> yeah no one cares about 50 classes
<enebo> not at this point :)
<nirvdrum> Ruby has 110 encodings, but a good number of those are aliases.
<enebo> I am just wondering how much of an issue the data is from memory perspective
<nirvdrum> Loading the maps lazily would be more of a memory savings for the JVM.
<enebo> I am guessing it is megs of data not like 1meg of data
<enebo> yeah
<nirvdrum> Some of the encoding tables are 1MB+
<enebo> I am just being devil's advocate about just making it all eager
<nirvdrum> Loading all of them would be noticeable.
<enebo> ok yeah that will stack up quick
<nirvdrum> Let me just go measure.
<enebo> nirvdrum: well I wondered about loading them as a single piece of data
<nirvdrum> Maybe not so bad. 3.2 MB of table data.
<enebo> but we would not want to increase heap by several megs
<nirvdrum> They're compact binary implementations though, so it'd be more in memory.
<enebo> so perhaps lazy data makes sense unless 2 of it is utf encodings we always load
<enebo> ah yeah
<enebo> ok yeah I doubt we want that hit
<enebo> so we have compact data and we expand it on loading?
<nirvdrum> I see 51 encoding files (no idea if multiple classes per file) and 29 transcoding files.
<enebo> ah fudge
<enebo> I did a mvn:prepare before updating jcodings
<nirvdrum> It's loaded into a byte[] and int[] depending on whether it's a byte-oriented or word-oriented file.
<nirvdrum> The additional overhead won't be massive.
<nirvdrum> 16 bytes for an array header?
<enebo> well that does not sound like it is uncompressed or anything
<nirvdrum> lopex would know better.
<nirvdrum> While I'm at it, I'd love nothing more than to address the static index value in Encoding.
<nirvdrum> Basically I want to move all that readIntArray stuff out of the constructor.
<enebo> nirvdrum: It would be nice to hide it behind something simpler than making a zillion synch blocks
<nirvdrum> I can make eregon figure that part out :-)
<enebo> yeah
<enebo> I don't really know how this data is accessed either
<nirvdrum> From my naive standpoint, doing a simple null check should suffice. If two threads compete and both load the same table, whatever.
<enebo> yeah could be
<enebo> seems reasonable to me that you may have occasional race but result is same
xardion has joined #jruby
<enebo> if it is read-only it is really not that complicated to reason about
<nirvdrum> Alright. I'll take a crack at it then. You can poke holes in it when there's a PR.
sidx64 has joined #jruby
shellac__ has quit [Quit: Computer has gone to sleep.]
<enebo> nirvdrum: done joni+jcodings
rrutkowski has quit [Ping timeout: 260 seconds]
sidx64_ has joined #jruby
sidx64 has quit [Ping timeout: 256 seconds]
<lopex> nirvdrum, enebo: most compact are transcoder tables
drbobbeaty has quit [Ping timeout: 265 seconds]
<lopex> code ranges, fold tables and case mapping specials are intertwined with sub array lengths and metadata bits
<lopex> and they also pack metadata within code point values
<lopex> hard to get any uglier
<lopex> these are also pretty
<lopex> so on java heap there will me lot of smaller sub arrays
<lopex> so making it mirror image of mri data could actually make the heap smaller
cshupp has joined #jruby
<cshupp> @headius
<cshupp> This bug ios still broken
<cshupp> This bug is still broken
<enebo> IOS
<cshupp> Hi Tom
<GitHub129> [jruby] enebo reopened issue #5018: open3.rb broken in JRuby https://git.io/vNSL4
<cshupp> 9.1.16 didn't fix it.
<cshupp> thanks
<enebo> cshupp: yeah np. I guess I don't know what happened there
<cshupp> Appreciat it.
<cshupp> @ me in git if you want me to try it in a custom built branch this time around
<cshupp> Bye
cshupp has quit [Client Quit]
subbu|lunch is now known as subbu
sidx64_ has quit [Ping timeout: 240 seconds]
drbobbeaty has joined #jruby
claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
claudiuinberlin has joined #jruby
akp has quit [Remote host closed the connection]
akp has joined #jruby
akp has quit [Ping timeout: 248 seconds]
akp has joined #jruby
claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]
bbrowning is now known as bbrowning_away
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
shellac_ has joined #jruby
<nirvdrum> Where do you guys handle string interpolation?
<nirvdrum> It looks like IRBuilder#buildDStr.
<nirvdrum> I was blanking on the DStr part.
shellac_ has quit [Quit: Computer has gone to sleep.]
shellac_ has joined #jruby
shellac_ has quit [Quit: Computer has gone to sleep.]