xardion has quit [Remote host closed the connection]
xardion has joined #jruby
<nirvdrum>
lopex: I'm looking at the caseMap functionality in jcodings right now. Is it possible that changing the case of a character produces a new one with a different byte length?
<nirvdrum>
I wasn't trying to make a thing of this :-P
<lopex>
nirvdrum: that's that linked buffer list
<lopex>
nirvdrum: you might actually keep those as ropes but it would have pathological cases
<nirvdrum>
On these calls, I flatten all ropes and just work with byte buffers anyway.
<lopex>
but generally it's just to amortize allocations
<nirvdrum>
I think I'm okay with being slow but functional for now.
<nirvdrum>
I'm just trying to wrap my head around the API. Having to allocate the output buffer without having an idea how large the output could be is tricky.
sidx64_ has quit [Read error: Connection reset by peer]
<lopex>
nirvdrum: actually there's everythnig you need
<lopex>
option handling etc
<headius>
w00t, new class reification is in for 9.2
<headius>
specialize all the shapes!
<lopex>
cool
<headius>
after 9.2 I'll be making array specialization generator plus long and double support
<lopex>
headius: so every new ivar switches as in hidden classes ?
<headius>
should be good with graal PEA
<nirvdrum>
lopex: Cool. But that 20 byte extra is just arbitrarily chosen, isn't it?
<lopex>
nirvdrum: yes, looks like it
<headius>
lopex: right now it's a static inspection of methods in the class for ivars
<headius>
in the future it can be per-object by moving the variable shaper thingy down into each object
<nirvdrum>
lopex: I'm debating whether taking an educated guess and handling the ArrayIndexOutOfBoundsException on a bad, small guess, might be worth pursuing.
<lopex>
nirvdrum: it's to have a additional length for single buffer
<lopex>
nirvdrum: ok, I recalled, it's the single max character length increase
<lopex>
after that new buffer with (end - left) + CASE_MAPPING_ADDITIONAL_LENGTH is created
<guyboertje>
More people talking now. I asked earlier, asking again. What is the JRuby state of play regarding O_DIRECT support? Does jnr have support? For Logstash, I'm looking to read from an NFS mount directly.
<headius>
guyboertje: still new to me!
<headius>
:-D
<guyboertje>
hehe
<headius>
so "support" really is just whether you have the constant value somewhere, like in jnr-constants
<headius>
if it's not there then you can help us add it, but it's just a numeric flag you can pass to open(2)
<headius>
I don't know exactly what it does so I can't comment on the change in semantics
<nirvdrum>
It just caught me off-guard. I had allocated a buffer smaller than this slack value, so the new toEnd went negative and the loop never executed.
<lopex>
nirvdrum: according to mri it's /* length in bytes for three characters in UTF-32; e.g. needed for ffi (U+FB03) */
<lopex>
nirvdrum: actually that buffer lists should be moved all to jcodings
<nirvdrum>
It just seems odd that it would override what I'm specifying as the end point of my destination buffer.
<lopex>
nirvdrum: in mri they always add that additional length to each buffer allocation
<nirvdrum>
I don't know how anyone is supposed to know to make that buffer larger than some non-exposed constant.
<lopex>
and each all to caseMap is on a newly allocated buffer
<nirvdrum>
But that CASE_MAPPING_SLACK constant is local to UnicodeEncoding, isn't it?
<lopex>
yes
<nirvdrum>
So how should a caller ensure whatever they supply is larger than that value?
<lopex>
nirvdrum: I'd have to dig through mri commits
<nirvdrum>
No worries.
<lopex>
but yeah, it was weird for me too
<nirvdrum>
I just didn't know if there were another jcodings constant available that I should be using.
<nirvdrum>
For now, I've just arbitrarily made my value 100.
<lopex>
nirvdrum: only that additional length what I recall
<lopex>
nirvdrum: all other variable length encodings do just ascii case mapping
<nirvdrum>
I'm probably going about this strangely, but I'm going to attempt translating one character at a time.
<lopex>
it will have much more overhead that way
<nirvdrum>
I'm not convinced of that at the moment.
<nirvdrum>
I suspect the vast majority of case change operations will take up the same number of bytes and can thus be written back to the original buffer.
<lopex>
yeah, for 7 bit you can ASCIIEncoding.INSTANCE.caseMap using the same buffer
<lopex>
but for general case you dont know
<nirvdrum>
Yeah. I may very well be wrong.
<lopex>
unless you want to fallback to reallocation on first such char
<nirvdrum>
lopex: I just did a quick check. Of ~1,114,112 Unicode characters, only 124 have a different bytesize from their upcased versions. It's approximate because I just skipped if the codepoint was invalid.
<nirvdrum>
Of course, you get into character frequency and all that.
<lopex>
so fallback to allocation on first such char ?
<lopex>
and why mri didnt bother
<nirvdrum>
That's the idea.
<nirvdrum>
It certainly has demon cases.
<lopex>
also most usages are non bang versions
<lopex>
although mri doesnt use that
<nirvdrum>
I'm a bit worried with this approach. Surely someone else has thought of it.
<lopex>
same here
<nirvdrum>
But I'm willing to try it and report back results.
<lopex>
nirvdrum: the most common might be 'ß'
<nirvdrum>
But that's a 2 byte character that's replaced by 2 single-byte characters :-P