#jruby on 2018-02-09 — irc logs at freenode.irclog.whitequark.org

2017-12-07 19:49 ChanServ changed the topic of #jruby to: Get 9.1.15.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:08 drbobbeaty has joined #jruby

02:03 Ha-Sch has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

02:29 swills has quit [Quit: quit]

04:00 xardion has quit [Ping timeout: 256 seconds]

04:02 xardion has joined #jruby

05:13 xardion has quit [Remote host closed the connection]

05:19 xardion has joined #jruby

05:51 <kares> headius: > or add a linked list for insertion ordering

05:52 <kares> but we need to keep the element order not the insertion one ... hash already tracks insertion order

05:52 <kares> btw. the internal Hash is there for compatibility

05:53 <kares> anyway I have a q - not sure what feature is that or what it does:

05:53 <kares> > We might also take this opportunity to implement direct addressing as added in Ruby 2.4

05:55 <GitHub54> [jruby] kares pushed 2 new commits to master: https://git.io/vAkSC

05:55 <GitHub54> jruby/master 89aeff6 kares: review/cleanup RubyTime a bit with some notable improvements :...

05:55 <GitHub54> jruby/master 2245465 kares: [refactor] RubyHash's keys/values impl to receive a thread-context

08:25 claudiuinberlin has joined #jruby

09:20 m4rCsi has quit [Quit: No Ping reply in 180 seconds.]

09:20 m4rCsi has joined #jruby

09:21 Ha-Sch has joined #jruby

09:22 livcd has quit [Changing host]

09:22 livcd has joined #jruby

10:10 Ha-Sch has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

11:28 shellac has joined #jruby

11:45 asarih has quit []

11:46 asarih has joined #jruby

12:35 bbrowning_away is now known as bbrowning

12:48 shellac has quit [Ping timeout: 260 seconds]

13:18 shellac has joined #jruby

13:43 shellac has quit [Quit: Computer has gone to sleep.]

13:45 rrutkowski has joined #jruby

13:47 rrutkowski has quit [Client Quit]

13:48 rrutkowski has joined #jruby

13:49 <nirvdrum> lopex, enebo: Am I way off, or can jcodings's Encoding#length be used when you know the code range is CR_7BIT or CR_VALID??

13:49 <nirvdrum> (Sorry, question mark key got stuck).

13:50 <lopex> nirvdrum: are you referring to the specialization I mentioned earlier ?

13:51 <lopex> nirvdrum: not now, but it could have separate non validating length routing

13:51 <lopex> for example if you know it's utf-8 and cr is valid then you just count the high bits on char head

13:52 <nirvdrum> lopex: No. Just something I noticed recently. If you have a UTF-8 string and you already know it is either CR_7BIT or CR_VALID, you only need to look at the first byte to get the character length.

13:52 <lopex> yes

13:52 <nirvdrum> That wouldn't be true for grapheme clusters, but MRI doesn't report those.

13:53 <lopex> you want to specialize for utf-8 ?

13:54 <lopex> afaik GB18030 is one of few where looking at char head is not enough

13:56 <lopex> but looks like you're talking exactly about the idea I mentioned earlier

13:59 rrutkowski has quit [Ping timeout: 276 seconds]

14:06 <nirvdrum> lopex: I'd expect the encoding to indicate if it can't handle it then. It looks like jcodings just uses a lookup table, so I don't really know what it does in the bad cases.

14:06 <nirvdrum> But part of it is the documentation for Encoding#length says "To be deprecated very soon (use length(byte[]bytes, int p, int end) version)"

14:06 <nirvdrum> And I can see, what looks to me, like a valid use case for it.

14:07 shellac has joined #jruby

14:07 <nirvdrum> lopex: I've read back in my log, but I think I missed what your idea was. If you don't min, please recap.

14:09 <nirvdrum> Ahh, I see. GB18030Encoding stores a null value for that table. So you end up with an NPE.

14:11 shellac has quit [Client Quit]

14:17 <lopex> nirvdrum: I meant exactly cr valid specializations using non validating length aka length(byte)

14:18 <lopex> nirvdrum: but it would have to be consistent across encodings or like you're saing marked as usable

14:19 <nirvdrum> Yeah, we're on the same page then.

14:19 <lopex> nirvdrum: for utf-8 I'm not sure what's the fastest length

14:19 <nirvdrum> There's a lot of unnecessary rediscovery of information.

14:19 <lopex> nirvdrum: it's a bitpop for high bits being one

14:20 <lopex> nirvdrum: if there's an intrinsic it could be branchless

14:21 <nirvdrum> I'm banking on the compiler being able to do two int comparsions and a conditional branch faster than a table lookup.

14:21 <lopex> yeah, definitely

14:21 <nirvdrum> But I suppose if you have a 4-byte character you're talking about doing that several times.

14:21 <lopex> yes

14:21 <lopex> not to mention unavoidable bounds checks

14:22 <nirvdrum> But both would be faster than the byte scan we're currently doing :-)

14:22 <nirvdrum> And obviously if you know that it's CR_7BIT you can short-circuit and return 1.

14:37 bbrowning is now known as bbrowning_away

15:26 <lopex> nirvdrum: actually there could be additional specialization which returns 1 for invalid code points and chars

15:27 <lopex> sometimes you want to proceed and not blow

15:29 <nirvdrum> That Ruby allows the propagation of invalid strings blows my mind.

15:29 <nirvdrum> I was looking at some ActionPack (I think) code recently that creates an invalid UTF-8 string. And I can't imagine that's what they intended.

15:32 <lopex> nirvdrum: https://github.com/ruby/ruby/blob/trunk/regenc.c#L55

15:32 <lopex> this bush of ifs could be greatly reduced

15:33 <lopex> we dont have that function yet in jruby

15:33 <lopex> or in jcodings

15:36 <lopex> and onigmo uses it now for actual encoding length

15:36 <lopex> so it doesnt have to prevalidate the string

16:02 xardion has quit [Remote host closed the connection]

16:08 xardion has joined #jruby

16:54 <nirvdrum> lopex: It's the ONIGENC_PRECISE_MBC_ENC_LEN call I want to avoid.

16:54 <nirvdrum> If I already know the code range for the string.

16:55 <lopex> I know

16:55 <lopex> just pointing there's different semantics going on in mri codebase

16:55 <nirvdrum> Maybe I can just special-case UTF-8. It's the most widely used encoding in Ruby.

16:55 <lopex> yeah

17:03 <nirvdrum> One of my gripes about MRI is how the lowest common denominator causes common cases to be unnecessarily slower.

17:07 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

17:15 <lopex> nirvdrum: for utf-8 there's fast utf-8 counter but it requires unsafe

17:15 <lopex> goes four bytes at a time

17:16 <lopex> actually I forgot why it was disabled in jruby though

17:24 <lopex> nirvdrum: did you see http://blog.unicode.org/2018/02/unicode-emoji-110-characters-now-final.html ?

17:24 <lopex> and https://xkcd.com/1953/ ?

17:24 <lopex> that's what we need

17:24 <lopex> enebo: ^^

17:25 <nirvdrum> I haven't.

17:25 <nirvdrum> Even more reason for me to dislike emoji.

17:26 <enebo> heh lobster and flats heh I suppose every noun/thing will eventually have an emoji

17:27 <enebo> in 10,000 years they will find a working flash drive and it will be from some adolescent gamer. They will think we all communicated with a pictographic language

17:28 <lopex> implemented in javascript of course

17:28 <enebo> lopex: you see: https://jordaneldredge.com/projects/winamp2-js/

17:28 <lopex> yeah

17:28 <lopex> reminded me old times

17:30 <lopex> enebo: still have whole shelf of mp3 cds

17:30 <enebo> lopex: if this guy uses one of the desktop app wrappers for JS apps this will just end up being a thing

17:31 <enebo> lopex: winamp never ran on linux did it?

17:31 <lopex> enebo: https://github.com/captbaritone/winamp2-js

17:31 <lopex> enebo: there's been some replacements

17:32 <enebo> lopex: yeah that I think is the normative site for the jordaneldredge site

17:32 <enebo> lopex: or at least that is where the github link on bottom corner takes you to

17:33 <enebo> https://github.com/captbaritone/winamp2-js/issues/394

17:33 <lopex> and it would take 2 gigs

17:33 <enebo> hahah you think?

17:33 <lopex> no idea

17:33 <enebo> gnome3 runs chromium right now

17:33 <lopex> but who cares

17:34 <enebo> I tweeted to captbaritone he should make it a gnome shell extension

17:34 <enebo> then it would just run within that process I guess

17:34 <enebo> lopex: I do partially worry about having 6-7 isolated chromium processes to display JS+HTML

17:40 <lopex> enebo: I know nothing aboit it

17:51 <enebo> nirvdrum: lopex: https://twitter.com/wycats/status/962020465165266944

17:52 <enebo> since we spied a lobster

17:52 jrafanie has joined #jruby

17:53 <lopex> enebo: 21 bits wont be enough soon

17:53 <enebo> Bill Gates thinks otherwise

17:54 <enebo> funny though how limits push by faster than many think

17:54 <lopex> every string will have to container emoji mode in the header

17:54 <enebo> I do recall several saying 128bit computers will never happen

17:56 <lopex> enebo: https://arxiv.org/abs/1802.02700

17:57 <enebo> ODINI

17:58 <enebo> lopex: predict when petabyte desktops become common

17:58 <lopex> enebo: dunno, they might just skip peta :P

17:58 <enebo> we could do a disk/memory/core prediction pool

17:59 <enebo> but many cores was done and dead and will now come back

17:59 <enebo> maybe not in traditional desktop though

18:29 <nirvdrum> enebo: You've heard my old man spiel about about how I spend longer trying to figure out what an emoji is than it would take to read.

18:30 <nirvdrum> Lobsters are obvious. But I don't talk about one enough to warrant a special character for it.

18:31 <enebo> well you don't

18:32 <enebo> nirvdrum: it is weird how someone is able to argue an emoji as neccesary

18:32 <enebo> do we have emoji for the 50 states yet?

18:33 <enebo> I mean outlines

18:33 <enebo> funny we seem to have flags but more people would recognize outlines of states instead of their state flag

18:43 claudiuinberlin has joined #jruby

18:53 rrutkowski has joined #jruby

18:54 rrutkowski has quit [Remote host closed the connection]

18:56 rrutkowski has joined #jruby

19:08 jrafanie has quit [Quit: Textual IRC Client: www.textualapp.com]

19:26 <nirvdrum> lopex: Part of what motivated this line of thought is I've been looking at the fast_blank gem. headius did a very straightforward port for JRuby at https://github.com/SamSaffron/fast_blank/pull/21

19:27 <nirvdrum> But both do a lot of unnecessary work if you already know the code range.

19:49 <lopex> nirvdrum: btw, joni now accepts cr7 bit in search options and it chooses different interpreter loop with faster opcodes

19:49 <lopex> and there

19:50 <lopex> case insensitive matching can also be sped up a lot, I have some ideas

19:51 <nirvdrum> lopex: Ooh. What version of joni is that?

19:51 <lopex> nirvdrum: 2.1.14

19:52 <lopex> untill now it only used some faster opcodes for singlebyte encodings

19:52 <nirvdrum> Nice. I'll have to check that out.

19:53 <nirvdrum> I'd love to take a good pass over our regexp code. But I don't understand a good chunk of it.

19:55 <lopex> nirvdrum: oh, and new jcodings support casemapping

19:55 <lopex> but havent landed mri core code yet

19:55 <nirvdrum> What's casemapping?

19:55 <lopex> downcase, upcase, casecmp etc

19:56 <lopex> nirvdrum: https://github.com/jruby/jcodings/blob/master/test/org/jcodings/specific/TestCaseMap.java#L48

19:56 <lopex> I remember you asking about it here

19:59 <nirvdrum> Oh. I thought you meant in joni.

19:59 <lopex> joni doesnt use it

19:59 <lopex> I guess it's the only encoding function joni doesnt use

20:00 <nirvdrum> I think MRI 2.5 or 2.6 started doing something similar to what I added for ASCII chars: https://github.com/oracle/truffleruby/blob/master/src/main/java/org/truffleruby/core/string/StringSupport.java#L1381-L1395

20:00 <nirvdrum> I haven't looked to see if they carried that through to String methods or not.

20:03 <lopex> you mean the fast paths ?

20:46 codefinger has quit []

20:46 codefinger has joined #jruby

21:00 Puffball has quit [Remote host closed the connection]

21:07 <nirvdrum> I mean they stopped using encoding table lookups for rb_isspace and things like that.

21:21 jeremyevans has quit [Quit: Lost terminal]

21:44 shellac has joined #jruby

21:52 rrutkowski has quit [Quit: rrutkowski]

21:52 jeremyevans has joined #jruby

21:53 rrutkowski has joined #jruby

21:54 <enebo> <"wrong constant name \"String\\u0000\""> expected but was

21:54 <enebo> <"wrong constant name String\u0000">.

21:54 <enebo> lopex: nirvdrum: you guys recall if there is a nice method for displaying non-printable RubyString characters and adding \" around it only in that case

21:55 <nirvdrum> enebo: Encoding has #isPrint on it.

21:55 <enebo> nirvdrum: yeah I know it does but I don't want to make another String display method if we have one

21:55 <nirvdrum> String#dump uses it, IIRC.

21:56 <enebo> yeah I did see that one ... hmm ... let me look at it again

21:56 <nirvdrum> There's String#scrub, too.

21:56 <nirvdrum> But I think that one might be the opposite of what you're looking for.

21:56 <enebo> dumpCommon in StringSupport maybe?

21:57 <enebo> ok dumpCommon is the logic I want

21:58 <enebo> I have the condition I am printing out a name from a symbol (usually) and it needs to use this string/nostring weird pattern

21:58 <enebo> err quote/no-quote

21:58 <enebo> Nice that this is working: TypeError: can't dump anonymous class #<Module:0x670b40af>::T⏰⏳

21:58 <nirvdrum> Heh.

21:59 <enebo> but it means fixing every single error message which refers to types :|

22:01 shellac has quit [Ping timeout: 260 seconds]

22:04 <nirvdrum> I thought you had your Ruby exception logic fairly centralized.

22:05 <enebo> nirvdrum: well we do but class.getName() is complicated

22:06 <enebo> nirvdrum: I could split that j.l.String apart and look up each segment in symbol table to get proper string but I am instead calling a new method rubyName -> RubyString

22:06 <enebo> nirvdrum: which means fixing all callsites which generate errors

22:07 <enebo> it is even a tad more complicated since that might be an anonymous class name so it is not just splitting

22:09 <enebo> hahahaha noooo NameError: wrong constant name "String\x00"

22:09 <enebo> ok it must not see this as utf-8

22:11 <nirvdrum> What is it you're doing?

22:11 <enebo> oh hmm

22:12 <enebo> well something is doing const_get?("String\0") and it raises name error

22:12 <enebo> since it has \0 it needs to quote wrap since it has unprintable value

22:12 <nirvdrum> I mean in general.

22:12 <enebo> I am making all encodings work for all things

22:12 <nirvdrum> I'm wondering if we inherited whatever bug you're fixing :-)

22:13 <nirvdrum> Or whether this is a Ruby 2.4+ change.

22:13 <enebo> this is just fixing so mbc works for all the things

22:13 <enebo> so the String for the class name is no longer just printed out

22:14 <enebo> I use that as a key back to the symbol table which retrieves properly encoded identifier

22:14 <enebo> so I am getting an utf-8 String\0 RubySymbol/bytelist

22:14 <enebo> but the message I build up did not quite generate things as MRI wants them

22:15 <enebo> looking at dumpCommon it I suspect this may just be if (MBCLEN_CHARFOUND_LEN(n) > 0) { is actually 0

22:15 <enebo> but I will trace through this

22:15 <nirvdrum> Okay. Null bytes are handled specially in various places.

22:15 <enebo> nirvdrum: and this is literally only for error display

22:15 <nirvdrum> String#rstrip will strip them, but String#lstrip will not, for instance.

22:16 <enebo> there is some validateConstant method which looks at \0

22:16 <enebo> so probably that was the unfucking code to make it look ok

22:16 <enebo> dumpCommon just does not have some path perhaps

22:19 <enebo> god it is hard to believe that \0 stuff is one-off in RubyModule.validateConstant?

22:20 <enebo> I can probably work around this by putting it into my error handling code but that is some weird shit

22:20 <enebo> And of course I wrote it in 14

22:32 <nirvdrum> lopex: Are the 7bit joni changes purely internal? Or is there a CR argument to pass?

22:33 <lopex> nirvdrum: Option.CR_7_BIT as match options

22:35 <nirvdrum> Thanks.

22:35 <lopex> a bit experimental still though

22:36 <nirvdrum> If I'm reading this right, you also look at the regexp's encoding, which defaults to US-ASCII.

22:38 <lopex> yeah, if it's singlebyte then also goes faster route

22:38 <nirvdrum> In those cases, I wouldn't need to pass the option, would I?

22:38 <nirvdrum> Got it.

22:42 <lopex> nirvdrum: I try to cover that mode in tests by double testing if the string is 7bit: https://github.com/jruby/joni/blob/master/test/org/joni/test/Test.java#L128

22:45 <lopex> the !encoding().isSingleByte() is to not to test the same thing twice

22:51 <nirvdrum> lopex: Hmm... I need to investigate more, but I'm seeing a pretty big performance regression when running: https://github.com/SamSaffron/fast_blank/blob/master/benchmark

22:52 <nirvdrum> This is going joni 2.1.12 -> 2.1.14

22:52 <nirvdrum> It's not Truffle trickery because we have a boundary around those calls.

22:53 <lopex> oh, I'll look into that then

22:55 <lopex> nirvdrum: that might be other changes since lots of joni was rewritten to catch up with onigmo

22:57 <nirvdrum> Yeah. Maybe a bug fix affected things.

22:57 <nirvdrum> You can comment out the native extension parts of that benchmark.

22:58 <nirvdrum> I'm really looking at the code here: https://github.com/SamSaffron/fast_blank/blob/9cb30fdb597fd2a5c13f4e603130de1ea090e28a/benchmark#L6-L15

22:58 * nirvdrum leaves for dinner

22:58 <lopex> yeah, I gather

22:59 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

23:10 claudiuinberlin has joined #jruby

23:15 atambo has quit []

23:15 atambo has joined #jruby

23:31 cremes has quit [Quit: cremes]

23:35 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

23:52 cremes has joined #jruby

23:53 cremes has quit [Client Quit]