drbobbeaty has joined #jruby
Ha-Sch has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]
swills has quit [Quit: quit]
xardion has quit [Ping timeout: 256 seconds]
xardion has joined #jruby
xardion has quit [Remote host closed the connection]
xardion has joined #jruby
<kares> headius: > or add a linked list for insertion ordering
<kares> but we need to keep the element order not the insertion one ... hash already tracks insertion order
<kares> btw. the internal Hash is there for compatibility
<kares> anyway I have a q - not sure what feature is that or what it does:
<kares> > We might also take this opportunity to implement direct addressing as added in Ruby 2.4
<GitHub54> [jruby] kares pushed 2 new commits to master: https://git.io/vAkSC
<GitHub54> jruby/master 89aeff6 kares: review/cleanup RubyTime a bit with some notable improvements :...
<GitHub54> jruby/master 2245465 kares: [refactor] RubyHash's keys/values impl to receive a thread-context
claudiuinberlin has joined #jruby
m4rCsi has quit [Quit: No Ping reply in 180 seconds.]
m4rCsi has joined #jruby
Ha-Sch has joined #jruby
livcd has quit [Changing host]
livcd has joined #jruby
Ha-Sch has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]
shellac has joined #jruby
asarih has quit []
asarih has joined #jruby
bbrowning_away is now known as bbrowning
shellac has quit [Ping timeout: 260 seconds]
shellac has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
rrutkowski has joined #jruby
rrutkowski has quit [Client Quit]
rrutkowski has joined #jruby
<nirvdrum> lopex, enebo: Am I way off, or can jcodings's Encoding#length be used when you know the code range is CR_7BIT or CR_VALID??
<nirvdrum> (Sorry, question mark key got stuck).
<lopex> nirvdrum: are you referring to the specialization I mentioned earlier ?
<lopex> nirvdrum: not now, but it could have separate non validating length routing
<lopex> for example if you know it's utf-8 and cr is valid then you just count the high bits on char head
<nirvdrum> lopex: No. Just something I noticed recently. If you have a UTF-8 string and you already know it is either CR_7BIT or CR_VALID, you only need to look at the first byte to get the character length.
<lopex> yes
<nirvdrum> That wouldn't be true for grapheme clusters, but MRI doesn't report those.
<lopex> you want to specialize for utf-8 ?
<lopex> afaik GB18030 is one of few where looking at char head is not enough
<lopex> but looks like you're talking exactly about the idea I mentioned earlier
rrutkowski has quit [Ping timeout: 276 seconds]
<nirvdrum> lopex: I'd expect the encoding to indicate if it can't handle it then. It looks like jcodings just uses a lookup table, so I don't really know what it does in the bad cases.
<nirvdrum> But part of it is the documentation for Encoding#length says "To be deprecated very soon (use length(byte[]bytes, int p, int end) version)"
<nirvdrum> And I can see, what looks to me, like a valid use case for it.
shellac has joined #jruby
<nirvdrum> lopex: I've read back in my log, but I think I missed what your idea was. If you don't min, please recap.
<nirvdrum> Ahh, I see. GB18030Encoding stores a null value for that table. So you end up with an NPE.
shellac has quit [Client Quit]
<lopex> nirvdrum: I meant exactly cr valid specializations using non validating length aka length(byte)
<lopex> nirvdrum: but it would have to be consistent across encodings or like you're saing marked as usable
<nirvdrum> Yeah, we're on the same page then.
<lopex> nirvdrum: for utf-8 I'm not sure what's the fastest length
<nirvdrum> There's a lot of unnecessary rediscovery of information.
<lopex> nirvdrum: it's a bitpop for high bits being one
<lopex> nirvdrum: if there's an intrinsic it could be branchless
<nirvdrum> I'm banking on the compiler being able to do two int comparsions and a conditional branch faster than a table lookup.
<lopex> yeah, definitely
<nirvdrum> But I suppose if you have a 4-byte character you're talking about doing that several times.
<lopex> yes
<lopex> not to mention unavoidable bounds checks
<nirvdrum> But both would be faster than the byte scan we're currently doing :-)
<nirvdrum> And obviously if you know that it's CR_7BIT you can short-circuit and return 1.
bbrowning is now known as bbrowning_away
<lopex> nirvdrum: actually there could be additional specialization which returns 1 for invalid code points and chars
<lopex> sometimes you want to proceed and not blow
<nirvdrum> That Ruby allows the propagation of invalid strings blows my mind.
<nirvdrum> I was looking at some ActionPack (I think) code recently that creates an invalid UTF-8 string. And I can't imagine that's what they intended.
<lopex> this bush of ifs could be greatly reduced
<lopex> we dont have that function yet in jruby
<lopex> or in jcodings
<lopex> and onigmo uses it now for actual encoding length
<lopex> so it doesnt have to prevalidate the string
xardion has quit [Remote host closed the connection]
xardion has joined #jruby
<nirvdrum> lopex: It's the ONIGENC_PRECISE_MBC_ENC_LEN call I want to avoid.
<nirvdrum> If I already know the code range for the string.
<lopex> I know
<lopex> just pointing there's different semantics going on in mri codebase
<nirvdrum> Maybe I can just special-case UTF-8. It's the most widely used encoding in Ruby.
<lopex> yeah
<nirvdrum> One of my gripes about MRI is how the lowest common denominator causes common cases to be unnecessarily slower.
claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]
<lopex> nirvdrum: for utf-8 there's fast utf-8 counter but it requires unsafe
<lopex> goes four bytes at a time
<lopex> actually I forgot why it was disabled in jruby though
<lopex> that's what we need
<lopex> enebo: ^^
<nirvdrum> I haven't.
<nirvdrum> Even more reason for me to dislike emoji.
<enebo> heh lobster and flats heh I suppose every noun/thing will eventually have an emoji
<enebo> in 10,000 years they will find a working flash drive and it will be from some adolescent gamer. They will think we all communicated with a pictographic language
<lopex> implemented in javascript of course
<lopex> yeah
<lopex> reminded me old times
<lopex> enebo: still have whole shelf of mp3 cds
<enebo> lopex: if this guy uses one of the desktop app wrappers for JS apps this will just end up being a thing
<enebo> lopex: winamp never ran on linux did it?
<lopex> enebo: there's been some replacements
<enebo> lopex: yeah that I think is the normative site for the jordaneldredge site
<enebo> lopex: or at least that is where the github link on bottom corner takes you to
<lopex> and it would take 2 gigs
<enebo> hahah you think?
<lopex> no idea
<enebo> gnome3 runs chromium right now
<lopex> but who cares
<enebo> I tweeted to captbaritone he should make it a gnome shell extension
<enebo> then it would just run within that process I guess
<enebo> lopex: I do partially worry about having 6-7 isolated chromium processes to display JS+HTML
<lopex> enebo: I know nothing aboit it
<enebo> since we spied a lobster
jrafanie has joined #jruby
<lopex> enebo: 21 bits wont be enough soon
<enebo> Bill Gates thinks otherwise
<enebo> funny though how limits push by faster than many think
<lopex> every string will have to container emoji mode in the header
<enebo> I do recall several saying 128bit computers will never happen
<enebo> ODINI
<enebo> lopex: predict when petabyte desktops become common
<lopex> enebo: dunno, they might just skip peta :P
<enebo> we could do a disk/memory/core prediction pool
<enebo> but many cores was done and dead and will now come back
<enebo> maybe not in traditional desktop though
<nirvdrum> enebo: You've heard my old man spiel about about how I spend longer trying to figure out what an emoji is than it would take to read.
<nirvdrum> Lobsters are obvious. But I don't talk about one enough to warrant a special character for it.
<enebo> well you don't
<enebo> nirvdrum: it is weird how someone is able to argue an emoji as neccesary
<enebo> do we have emoji for the 50 states yet?
<enebo> I mean outlines
<enebo> funny we seem to have flags but more people would recognize outlines of states instead of their state flag
claudiuinberlin has joined #jruby
rrutkowski has joined #jruby
rrutkowski has quit [Remote host closed the connection]
rrutkowski has joined #jruby
jrafanie has quit [Quit: Textual IRC Client: www.textualapp.com]
<nirvdrum> lopex: Part of what motivated this line of thought is I've been looking at the fast_blank gem. headius did a very straightforward port for JRuby at https://github.com/SamSaffron/fast_blank/pull/21
<nirvdrum> But both do a lot of unnecessary work if you already know the code range.
<lopex> nirvdrum: btw, joni now accepts cr7 bit in search options and it chooses different interpreter loop with faster opcodes
<lopex> and there
<lopex> case insensitive matching can also be sped up a lot, I have some ideas
<nirvdrum> lopex: Ooh. What version of joni is that?
<lopex> nirvdrum: 2.1.14
<lopex> untill now it only used some faster opcodes for singlebyte encodings
<nirvdrum> Nice. I'll have to check that out.
<nirvdrum> I'd love to take a good pass over our regexp code. But I don't understand a good chunk of it.
<lopex> nirvdrum: oh, and new jcodings support casemapping
<lopex> but havent landed mri core code yet
<nirvdrum> What's casemapping?
<lopex> downcase, upcase, casecmp etc
<lopex> I remember you asking about it here
<nirvdrum> Oh. I thought you meant in joni.
<lopex> joni doesnt use it
<lopex> I guess it's the only encoding function joni doesnt use
<nirvdrum> I think MRI 2.5 or 2.6 started doing something similar to what I added for ASCII chars: https://github.com/oracle/truffleruby/blob/master/src/main/java/org/truffleruby/core/string/StringSupport.java#L1381-L1395
<nirvdrum> I haven't looked to see if they carried that through to String methods or not.
<lopex> you mean the fast paths ?
codefinger has quit []
codefinger has joined #jruby
Puffball has quit [Remote host closed the connection]
<nirvdrum> I mean they stopped using encoding table lookups for rb_isspace and things like that.
jeremyevans has quit [Quit: Lost terminal]
shellac has joined #jruby
rrutkowski has quit [Quit: rrutkowski]
jeremyevans has joined #jruby
rrutkowski has joined #jruby
<enebo> <"wrong constant name \"String\\u0000\""> expected but was
<enebo> <"wrong constant name String\u0000">.
<enebo> lopex: nirvdrum: you guys recall if there is a nice method for displaying non-printable RubyString characters and adding \" around it only in that case
<nirvdrum> enebo: Encoding has #isPrint on it.
<enebo> nirvdrum: yeah I know it does but I don't want to make another String display method if we have one
<nirvdrum> String#dump uses it, IIRC.
<enebo> yeah I did see that one ... hmm ... let me look at it again
<nirvdrum> There's String#scrub, too.
<nirvdrum> But I think that one might be the opposite of what you're looking for.
<enebo> dumpCommon in StringSupport maybe?
<enebo> ok dumpCommon is the logic I want
<enebo> I have the condition I am printing out a name from a symbol (usually) and it needs to use this string/nostring weird pattern
<enebo> err quote/no-quote
<enebo> Nice that this is working: TypeError: can't dump anonymous class #<Module:0x670b40af>::T⏰⏳
<nirvdrum> Heh.
<enebo> but it means fixing every single error message which refers to types :|
shellac has quit [Ping timeout: 260 seconds]
<nirvdrum> I thought you had your Ruby exception logic fairly centralized.
<enebo> nirvdrum: well we do but class.getName() is complicated
<enebo> nirvdrum: I could split that j.l.String apart and look up each segment in symbol table to get proper string but I am instead calling a new method rubyName -> RubyString
<enebo> nirvdrum: which means fixing all callsites which generate errors
<enebo> it is even a tad more complicated since that might be an anonymous class name so it is not just splitting
<enebo> hahahaha noooo NameError: wrong constant name "String\x00"
<enebo> ok it must not see this as utf-8
<nirvdrum> What is it you're doing?
<enebo> oh hmm
<enebo> well something is doing const_get?("String\0") and it raises name error
<enebo> since it has \0 it needs to quote wrap since it has unprintable value
<nirvdrum> I mean in general.
<enebo> I am making all encodings work for all things
<nirvdrum> I'm wondering if we inherited whatever bug you're fixing :-)
<nirvdrum> Or whether this is a Ruby 2.4+ change.
<enebo> this is just fixing so mbc works for all the things
<enebo> so the String for the class name is no longer just printed out
<enebo> I use that as a key back to the symbol table which retrieves properly encoded identifier
<enebo> so I am getting an utf-8 String\0 RubySymbol/bytelist
<enebo> but the message I build up did not quite generate things as MRI wants them
<enebo> looking at dumpCommon it I suspect this may just be if (MBCLEN_CHARFOUND_LEN(n) > 0) { is actually 0
<enebo> but I will trace through this
<nirvdrum> Okay. Null bytes are handled specially in various places.
<enebo> nirvdrum: and this is literally only for error display
<nirvdrum> String#rstrip will strip them, but String#lstrip will not, for instance.
<enebo> there is some validateConstant method which looks at \0
<enebo> so probably that was the unfucking code to make it look ok
<enebo> dumpCommon just does not have some path perhaps
<enebo> god it is hard to believe that \0 stuff is one-off in RubyModule.validateConstant?
<enebo> I can probably work around this by putting it into my error handling code but that is some weird shit
<enebo> And of course I wrote it in 14
<nirvdrum> lopex: Are the 7bit joni changes purely internal? Or is there a CR argument to pass?
<lopex> nirvdrum: Option.CR_7_BIT as match options
<nirvdrum> Thanks.
<lopex> a bit experimental still though
<nirvdrum> If I'm reading this right, you also look at the regexp's encoding, which defaults to US-ASCII.
<lopex> yeah, if it's singlebyte then also goes faster route
<nirvdrum> In those cases, I wouldn't need to pass the option, would I?
<nirvdrum> Got it.
<lopex> nirvdrum: I try to cover that mode in tests by double testing if the string is 7bit: https://github.com/jruby/joni/blob/master/test/org/joni/test/Test.java#L128
<lopex> the !encoding().isSingleByte() is to not to test the same thing twice
<nirvdrum> lopex: Hmm... I need to investigate more, but I'm seeing a pretty big performance regression when running: https://github.com/SamSaffron/fast_blank/blob/master/benchmark
<nirvdrum> This is going joni 2.1.12 -> 2.1.14
<nirvdrum> It's not Truffle trickery because we have a boundary around those calls.
<lopex> oh, I'll look into that then
<lopex> nirvdrum: that might be other changes since lots of joni was rewritten to catch up with onigmo
<nirvdrum> Yeah. Maybe a bug fix affected things.
<nirvdrum> You can comment out the native extension parts of that benchmark.
* nirvdrum leaves for dinner
<lopex> yeah, I gather
claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]
claudiuinberlin has joined #jruby
atambo has quit []
atambo has joined #jruby
cremes has quit [Quit: cremes]
claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]
cremes has joined #jruby
cremes has quit [Client Quit]