#jruby on 2018-04-13 — irc logs at freenode.irclog.whitequark.org

2018-02-21 20:49 ChanServ changed the topic of #jruby to: Get 9.1.16.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:05 <headius> did I break the build?

00:05 <headius> oops, I did!

00:06 <headius> hooray!

00:09 <GitHub22> [jruby] headius pushed 1 new commit to jruby-9.1: https://git.io/vxpbr

00:09 <GitHub22> jruby/jruby-9.1 f25b8be Charles Oliver Nutter: Delete unused method that no longer has a path.

00:10 <GitHub176> [jruby] headius pushed 1 new commit to master: https://git.io/vxpbi

00:10 <GitHub176> jruby/master bb8e0ad Charles Oliver Nutter: Merge branch 'jruby-9.1'

00:38 <travis-ci> jruby/jruby (jruby-9.1:f25b8be by Charles Oliver Nutter): The build passed. (https://travis-ci.org/jruby/jruby/builds/365882593)

00:41 drbobbeaty has quit [Ping timeout: 245 seconds]

00:41 <headius> zing

01:35 bga57 has quit [Ping timeout: 264 seconds]

01:51 bga57 has joined #jruby

02:01 akp has joined #jruby

03:22 oxddmr has joined #jruby

03:24 oxddmr has quit [Remote host closed the connection]

03:35 niKorigan has joined #jruby

03:37 niKorigan has quit [Remote host closed the connection]

03:50 eschwartz737C7X has joined #jruby

03:50 sidx64 has joined #jruby

03:52 eschwartz737C7X has quit [Remote host closed the connection]

04:09 james41382288YQX has joined #jruby

04:11 james41382288YQX has quit [Remote host closed the connection]

04:54 BizarreFruit has joined #jruby

04:57 <BizarreFruit> Good morning!

04:57 <BizarreFruit> Is anyone actually up yet? :)

05:12 BizarreFruit has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

05:14 <GitHub199> [jruby] perlun opened pull request #5140: spec/ruby: Bring in some changes from upstream (master...spec/update-with-sigterm-specs) https://git.io/vxhLb

05:42 sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

05:49 sidx64 has joined #jruby

06:44 olle has joined #jruby

07:14 sidx64 has quit [Read error: Connection reset by peer]

07:21 sidx64 has joined #jruby

07:51 shellac has joined #jruby

07:59 shellac has quit [Quit: Computer has gone to sleep.]

08:16 claudiuinberlin has joined #jruby

08:31 shellac has joined #jruby

08:44 olle has quit [Quit: olle]

09:01 olle has joined #jruby

09:33 olle has quit [Quit: olle]

09:41 <kares> enebo: rdubya: well go for it but that is just weird - was under the impression than millis/nanos worked properly at some point

09:58 drbobbeaty has joined #jruby

10:15 olle has joined #jruby

10:34 Osho has quit [Ping timeout: 260 seconds]

10:34 me- has quit [Ping timeout: 264 seconds]

10:35 Osho has joined #jruby

10:35 me has joined #jruby

10:35 me is now known as Guest1483

10:40 shellac has quit [Quit: Computer has gone to sleep.]

10:45 shellac has joined #jruby

11:26 <kares> rdubya: so I have looked into your complain with binary - that is definitely not a recent thing

11:27 <kares> it was simply hiding due tests - also the CI results somehow seem un-realiable in this sense ;(

11:27 <kares> anyway I got as far as ... https://github.com/jruby/activerecord-jdbc-adapter/commit/ed12d383b8b4d897df0f1520472a445d48e904d3

11:27 <kares> which is almost 2 months old and it is still failing

11:28 <kares> it will probably go even further down the road, the problem is that I'm getting the max_identifier_length problem with these older commits

11:29 <kares> so its not easy to get down to - but I have a feeling like this might have been the same thing as with the binary 'in-house' test suite failure

11:29 <kares> ... so really someone needs to chase it down :)

11:31 <rdubya> kares: ah, I thought we were good with binary data as long as prepared statements were disabled previously

11:31 <kares> its true that I am 'unemployed' for a few months already :) but I need to focus on stuff I care for the moment

11:31 <kares> rdubya: hey! seems not

11:31 <rdubya> were those tests being skipped or something?

11:31 <kares> maybe its due the AR version upgrade ... who knows

11:32 <kares> they were not but they're failing me know as I co older commits

11:32 <kares> + CI outcome I am not reproducing ;( so this is not smt I can help out quickly ... it seems

11:32 <kares> (I mean older CI outcome - its failing me always)

11:33 <rdubya> with prepared statements off? I know they've always failed with them on (sorry if I'm being dense, I just don't remember seeing them fail previously)

11:33 <kares> even on that old commit of yours I posted above

11:33 <kares> hmm yes I had an env set

11:33 <kares> will double check

11:34 <kares> its on bydefault for PG right?

11:34 <rdubya> ok thanks, guess if its all the same problem I can try digging in again

11:34 <rdubya> yeah its on by default

11:35 <kares> oh well you're right - was checking the wrong thing ;(

11:35 <kares> used PS=false

11:35 <kares> but that wasn't supported all the wya back with Rails suite

11:35 <kares> so I guess I'll redo a few tests again

11:36 <rdubya> ok thanks for checking it out

11:38 <kares> its looking better

11:41 <kares> seems its to be pointing to the ByteaUtils update

11:45 <kares> rdubya: good hunch ... this is the problem https://github.com/jruby/activerecord-jdbc-adapter/commit/020ba4a496a4111ebdec24ebd0803f3dddddd9d1

11:45 <kares> so I guess revert... ;(

11:45 <kares> that will bring up some new failures on Rails' suite

11:46 <kares> that directly test escaping

11:46 <kares> feel free to revert that commit and push to 50-stable

11:48 <rdubya> ok, thanks for tracking it down

11:55 Guest1483 has quit [Ping timeout: 246 seconds]

11:56 Osho has quit [Ping timeout: 264 seconds]

11:58 Osho has joined #jruby

11:59 me_ has joined #jruby

12:03 me_ has quit [Ping timeout: 268 seconds]

12:05 Osho has quit [Ping timeout: 268 seconds]

12:06 Osho has joined #jruby

12:06 me_ has joined #jruby

12:10 me_ has quit [Ping timeout: 240 seconds]

12:11 Osho has quit [Ping timeout: 260 seconds]

12:11 me_ has joined #jruby

12:11 Osho has joined #jruby

12:54 Osho has quit [Ping timeout: 256 seconds]

12:54 me_ has quit [Ping timeout: 256 seconds]

13:00 mee has joined #jruby

13:03 mee has quit [Client Quit]

13:13 ebarrett has quit [Quit: WeeChat 1.9.1]

13:58 <enebo> kares: rdubya: output = RubyTimeOutputFormatter.formatNumber(dt.getMillisOfSecond(), 3, '0');

13:58 ebarrett has joined #jruby

13:58 <enebo> This is what strftime does in JRuby itself

13:58 <enebo> So the way we make that will return "000" onto the front and it then formats weird

14:00 <kares> yeah it felt like its being hacked around ;(

14:00 <enebo> kares: I guess since nano is not in DateTime this is a big part of the problem

14:01 <kares> enebo: well it was handled before

14:01 <kares> there's places where its set onto the RubyTIme

14:01 <kares> so changing DateTimeUtils to format it did not seem right ...

14:02 <kares> but thought you would figure sooner or later since you know JRUby's internals :)

14:02 <enebo> kares: hmm so the formatting code is from 13 so perhaps it is createTime which changed

14:02 <kares> be careful to watch for test failures (regressions) in all the suites ...

14:03 <kares> time handling is quite tricky territory

14:03 <kares> I can pbly count time spent in weeks at this point

14:03 <kares> + did some of the things twice already :)

14:03 sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

14:09 <enebo> kares: well so long as I understand the actual format this is just how we call jruby itself (or I fix jruby itself but how we call it seems reasonable to me)

14:10 me has joined #jruby

14:11 me is now known as Guest23490

14:15 Osho has joined #jruby

14:22 <rdubya> enebo: sorry, just trying to catch up, are you saying we should fix the issue in arjdbc or that it is something that needs fixed in jruby internal?

14:23 <enebo> rdubya: well it is possible I fix it in JRuby itself so it no longer has this limitation but for all current jruby versions this calling convention does not work for strftime itself (maybe other methods) so I think we make a millis and nanos change to arjdbc

14:23 <enebo> https://gist.github.com/enebo/25bb5f1a37cae52d32ecd13eef4871ed

14:23 <enebo> Not sure if this is actually right though

14:24 <enebo> rdubya: from a basic logic perspective I think we just make a millis value and a nanos value but I will try and verify what JRuby really expects here

14:25 <enebo> so my first sentence may be unclear. We should change ARJDBC for sure since not everyone will immediately use 9.1.17 but that if I can fix it on JRuby itself I will also correct it there as well (so older uses of RubyTime will work)

14:25 <enebo> I should almost go back and see when this did work in the past

14:26 <enebo> strftime itself came into existence as Java code in 2013 so unless setNano logic in RubyTime affected DateTime instance this probably never worked for strftime

14:27 <enebo> setNTime is 2012 so unless constructor does it?

14:27 <rdubya> enebo: sounds good

14:27 <enebo> yeah constructor is even older I highly doubt strftime ever worked but it appears pretty much everything else does

14:28 <enebo> but this is wacky on jruby side

14:29 <enebo> we have two forms of constructors and one does not take nsec at all and does not muck with nsec as a field

14:30 <enebo> Logically how ARJDBC sets this up makes sense to me though. We set 10^9 value of nanoseconds milliseconds in DateTime should just be the first 3 of those digits

14:31 <enebo> so if we set 0 and a 10^9 should we be setting millis in DateTime also? Should we always set the nanos based on DateTime if we do not pass in a nanos

14:32 <enebo> On one hand I think milliseconds maybe is important in a DateTime? Some weird timezone or something? Don't know but having it be 0 when it is not 0 by itself seems weird. But when passed with nanos it seems like nanos could be more relevant number.

14:33 <enebo> rdubya: I think my PR is silly btw. We can calculate nanos as it was but then just calculate millis by dividing by 10^6

14:34 <enebo> rdubya: are you willing to give that a quick shot for your tests?

14:35 <rdubya> enebo: yeah I'm trying to wrap some stuff up here but I'll try to get to it

14:35 <enebo> rdubya: ok

14:43 shellac has quit [Quit: Computer has gone to sleep.]

14:44 <enebo> kares: you advocated moving 9.2 back to java.util.Date right?

14:45 <enebo> kares: it may be late but we are mandating Java 8 so it does look very attractive. It solves this nanosecond thing so long as it is as accurate for historical dates as joda is

14:55 olle has quit [Quit: olle]

14:59 olle has joined #jruby

15:01 olle has quit [Client Quit]

15:05 <rdubya> enebo: I had to tweak it a little bit (start + 5 => start + 4), but it looks like that fixes it

15:07 <enebo> rdubya: oh ok weird after I made that I figured that would not work because nanos is just a 10^6 number

15:07 <enebo> rdubya: so semantically in JRuby's constructor I am confused by nanos even means now...everything beyond ms perhaps?

15:08 <enebo> rdubya: I would double check the fields on that time object as well as strftime (like asking for milliseconds and nsecs

15:08 <enebo> )

15:09 <rdubya> yeah i verified them

15:09 <rdubya> actually is there a good way to get milliseconds? I see usec and nsec but don't see a method for milliseconds…

15:10 <enebo> rdubya: I don't know offhand

15:10 <enebo> long milliseconds = nanoseconds / 1000000;

15:10 <enebo> return RubyTime.newTime(runtime, new DateTime(milliseconds), extraNanoseconds);

15:10 <enebo> long extraNanoseconds = nanoseconds % 1000000;

15:10 <enebo> This is how we handle a sole long value for making a Time instance

15:11 <enebo> so I guess that method implies nanos is not full nanos but just fraction of nanos after the millis

15:11 <enebo> I will at a minimum add some comments around the fields as to what nsec means

15:12 <enebo> We have a really confusing type with no documentation explaining what we expect here

15:12 <rdubya> I'll run the full suite but it looks like its generating it correctly now

15:12 <enebo> (times/dates are crazy sauce in general -- not necessarily our time/date stuff :) )

15:12 <enebo> rdubya: sweetness

15:42 <rdubya> kares: it looks like your change for the bytea stuff was correct, it must have just uncovered that those two other tests should have been breaking but weren't

15:43 <rdubya> and it gives me a path to consider for debugging those ones, so thanks for pointing me in that direction

15:56 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

15:56 <kares> enebo: I advocated what?

15:57 <kares> moving to Java 8 date-time APIs

15:57 <kares> its a dead end - don't go there

15:57 <enebo> kares: wasn't that you?

15:57 <kares> would break a lot

15:57 <kares> well I asked, the truffle guys mentioned it should be easy

15:57 <kares> but I do not think so

15:58 <enebo> kares: ok. yeah I sort of assumed that but I was not sure if something changed with Java Date since we abandoned it last and I think I misattributed it to you

15:58 <kares> however the thing I will implement is Date/Time/DateTime toJava conversion do new Java date-time types

15:58 <kares> enebo: with java.util.Date?

15:58 <enebo> kares: yeah

15:58 <kares> you mean the nanos thing

15:58 <kares> ?

15:58 <kares> Date does not have either

15:58 <kares> am really out of context

15:58 <enebo> kares: heh

15:59 <kares> and its a late friday night over here

15:59 <enebo> kares: I think nanos is in Java 8 date

15:59 <kares> RubyTime is fine - the nanopart is confusing but works

15:59 <kares> well

15:59 <kares> it just need some Java API for users

15:59 <kares> but that is beyond the current discussion here

15:59 <enebo> kares: yeah I am happy to not move away from joda as we have it working but I just remembered some older conversation wrong

16:00 <enebo> kares: but the reason I even remembered it was that Java 8 has nano resolution now

16:00 <enebo> kares: so ignore all that talk

16:00 <enebo> I am not advocating anything

16:00 <kares> btw. JODA APIs are I believe 'better'than what JDK did

16:00 <kares> more flexible for sure

16:00 <kares> + a common DateTime type which you do not have in 8

16:00 <kares> you have LocalDateTime etc

16:01 <kares> it certainly isn't a direct port - so far as I looked (haven't been using the 8 stuff much myself)

16:01 <enebo> kares: if only you could just drop in a newer version and have it just work...but maybe that is true now. You did it recently

16:02 <kares> yy it seems fine so far

16:02 <kares> but a release will really tell

16:02 <enebo> historically something seemed to always break on each joda update. They probably have finished breaking changes

16:02 <enebo> and I guess data is another issue

16:02 <enebo> we will find out some babylonian date bug in new version or something :P

16:02 xardion has quit [Remote host closed the connection]

16:03 <kares> its very stable - went through the changelogs

16:03 <kares> mostly adapting to newer TZ rules

16:03 <kares> which is still not sure for me whether JRuby needs to regenerate

16:04 <kares> ... mean that piece of pom project that JRuby has

16:05 <enebo> ah

16:05 <enebo> well we will find out :)

16:05 xardion has joined #jruby

16:34 guilleiguaran has joined #jruby

16:41 claudiuinberlin has joined #jruby

17:29 <GitHub33> [jruby] lopex pushed 1 new commit to master: https://git.io/vxjG9

17:29 <GitHub33> jruby/master dac6261 Marcin Mielzynski: split upcase/downcase/swapcase/capitalize arities and optimize String#casecmp?

17:34 <GitHub71> [jruby] lopex pushed 1 new commit to master: https://git.io/vxjZ3

17:34 <GitHub71> jruby/master 3691861 Marcin Mielzynski: restore accidentally modified poms

17:56 subbu is now known as subbu|lunch

18:00 <GitHub137> [jruby] lopex pushed 1 new commit to master: https://git.io/vxjCC

18:00 <GitHub137> jruby/master a7b0833 Marcin Mielzynski: fix typos

18:27 subbu|lunch is now known as subbu

18:54 <lopex> enebo: it seems corerange isnt propagated in Symbol -> String

18:54 <lopex> *coderange even

18:55 <enebo> lopex: oh

18:55 <enebo> lopex: yeah that is something I had not considered

18:55 <lopex> enebo: just realized casemapping goes slow paths for it

18:56 <enebo> so perhaps we should save CR in Rubysymbol

18:56 <lopex> enebo: easily done from what I can see

18:56 <lopex> yes

18:56 <enebo> lopex: bytelist_love though uses Symbol a lot

18:56 <enebo> lopex: and in most places it is only used as a RubyString for exception messages

18:56 <enebo> lopex: so I am not sure my branch is in jeopardy

18:57 <enebo> lopex: where have you seen this crop up?

18:57 <enebo> lopex: I mean have you see a common case where we make a symbol into a string and then hit the slow path for something

18:57 <lopex> enebo: :foo.upcase had a bug in jcodings jruby currently uses

18:57 <lopex> enebo: it went us-ascii.caseMap

18:58 <lopex> enebo: and not a specialized version was run

18:58 <enebo> lopex: heh. but that should have worked

18:58 <enebo> lopex: so you just found a bug and happened to notice that was slow

18:58 <lopex> enebo: yes

18:59 <enebo> lopex: I am not against the idea of having CR in RubySymbol but I am wondering if this is a real performance issue at the same time

18:59 <enebo> lopex: I mean it obviously can be a performance problem but is it in practice

18:59 <lopex> enebo: hmm

18:59 <enebo> lopex: another thing of interest is most symbols are tiny so I also wonder how much faster the fast path is

19:00 <lopex> enebo: like "ąsss".foo

19:00 <lopex> it will go the slowest path possible

19:00 <lopex> er

19:00 <lopex> wait

19:00 <enebo> heh

19:04 <enebo> lopex: afk for about 15 minutes I need to buy coffee beans

19:04 <lopex> kk

19:07 <GitHub30> [jruby] lopex pushed 2 new commits to master: https://git.io/vxjRY

19:07 <GitHub30> jruby/master de4e6cd Marcin Mielzynski: Merge branch 'master' of https://github.com/jruby/jruby

19:07 <GitHub30> jruby/master f20c4e6 Marcin Mielzynski: update jcodings

19:11 <GitHub187> [jruby] lopex pushed 1 new commit to master: https://git.io/vxjRV

19:11 <GitHub187> jruby/master fbfc388 Marcin Mielzynski: untag String#test_casecmp? and Symbol#test_casecmp?

19:17 sidx64 has joined #jruby

19:20 sidx64_ has joined #jruby

19:22 sidx64 has quit [Ping timeout: 240 seconds]

19:30 <enebo> lopex: I should not have even asked the question about performance. It just makes sense that since we scanned the bytes in a symbol we can leverage CR

19:31 <lopex> enebo: like :foo.upcase

19:32 <lopex> it doesnt rescan but goes slow path

19:32 <lopex> well, like mri

19:32 <enebo> lopex: yeah my original question was mostly does that happen but it is not a fair question since I cannot think of when that would happen either (but it might)

19:32 <lopex> enebo: but can a symbol end up utf-8 with unscanned 7 bit cr ?

19:32 <enebo> no I doubt it

19:32 <lopex> I mean, a string from symbol

19:33 <enebo> I think any string from a symbol which is 7bit is almost guaranteed to be 7bit clean

19:33 <enebo> so only mbc will end up utf-8 generally

19:33 <enebo> unless it is a non-ascii supporting encoding

19:33 <lopex> enebo: since there's no force_encoding on symbol

19:34 <enebo> yeah if it can be ascii from any ascii encoding it is just made into ascii encoding

19:34 <lopex> which invalidates cr

19:34 <enebo> err any 7bit ascii will bs US-ASCII

19:34 <lopex> enebo: but still, it omits 7 bit cr specializations

19:34 <enebo> as a symbol even if it comes from UTF-8

19:34 <lopex> er

19:35 <enebo> if it is US-ASCII we can probably just mark CR as valid and 7bit

19:35 <enebo> err 7bit

19:35 <enebo> lopex: another thing we could do is maybe not use CR but merely length in bytes

19:36 <lopex> enebo: https://github.com/ruby/ruby/blob/trunk/string.c#L6521

19:36 <lopex> it's bug ?

19:36 <enebo> symbol.bytes == length && ascii is 7bit

19:36 <lopex> why not 7bit for both ?

19:37 <enebo> yeah good question

19:37 <lopex> plus ascii compatible

19:37 <enebo> maybe for non-scanned sequences?

19:38 <lopex> and why care about utf8?

19:38 <enebo> I don't get this if stmt though

19:38 <enebo> yeah how can they casemap utf8

19:38 <enebo> I guess it will only try to casemap ascii chars in utf8?

19:39 <lopex> enebo: jcodings has casemap ascii only flag

19:39 <enebo> I don't quite get it though. I mean to be utf-8 I think means it must contain non-7bit chars

19:39 <enebo> they just ignore them here don't they?

19:40 <enebo> lopex: just seems like a weird feature

19:41 <enebo> lopex: so this is their fast path for string

19:42 <lopex> enebo: and only for upcase/downcase

19:42 <enebo> ok

19:42 <enebo> yeah I see this is an option the user asks for

19:42 <lopex> enebo: we can do for swap/cap too

19:42 <enebo> perhaps this is super common with strings where anything non-ascii is not cased data

19:43 <enebo> so if that flag is set it will use any single byte encoding or utf-8 OR any 7bit which is not turkish

19:43 <lopex> enebo: well, the spec is for upcase/downcase

19:43 <lopex> :ascii is supported for all of those

19:45 <enebo> lopex: and the bug you are working through is this fast path is ignored because we make a symbol into a string which is 7bit but we do not know that because we lost cr

19:45 <lopex> enebo: that bug was fixed in jcodings some time ago

19:45 <lopex> just havent been updated

19:46 <enebo> lopex: even without passing in CR for new string we can just set it to 7BIT if it is US-ASCII right?

19:46 <lopex> enebo: but it seems because we lost cr

19:46 <enebo> because symbols must be valid CR

19:46 <enebo> symbols cannot be binary data

19:46 <enebo> just adding 7BIT would eliminate 99.999% of all slow paths just doing that tweak

19:47 <enebo> without having to change all symbol creation paths to accept or calc CR

19:51 <headius> good afternoon

19:52 <lopex> enebo: well I though newShared could

19:52 <lopex> enebo: because it's the most frequent producer of such strings

19:52 <lopex> enebo: not that I'm saying it's worth the penny

19:53 <enebo> lopex: but you saw my point about just marking 7bit if bytelist is US-ASCII?

19:53 <lopex> enebo: otoh newShared is used on vast majority of symbol methods

19:53 <lopex> enebo: yeah

19:53 <enebo> lopex: I think that eliminates all common uses of symbols

19:53 <lopex> enebo: still in newShared ?

19:54 <enebo> maybe

19:54 <lopex> enebo: this way you dont have to add cr there

19:54 <enebo> lopex: it could be done in to_s(context) too

19:55 <lopex> enebo: I'm saying about return newSymbol(runtime, newShared(runtime).downcase(context).getByteList());

19:55 <enebo> ah

19:55 <lopex> lots of those

19:55 <enebo> newShared may be best place then

19:56 <lopex> headius: looks like test/mri/ruby/enc isnt run at all ?

19:56 <enebo> lopex: I did see that to_s calls RubyString.newStringShared and not newShared

19:56 <enebo> lopex: so you may want to audit to make sure all symbol to string conversions call through that

19:56 <enebo> heh

19:57 <enebo> only to_s(runtime) does that

19:57 <enebo> everything else seems to use newShared

19:57 <lopex> which is RubyString.newStringShared(runtime, symbolBytes);

19:58 <lopex> hmm

19:58 <lopex> enebo: what if encoding insance had a cr field ?

19:58 <lopex> and 7 bit for singlebyte encodings

19:59 <lopex> enebo: we wouldnt have to have anyconditions for initing cr

19:59 <lopex> and unknown for utf-8 for example

19:59 <lopex> it would simplify quite a few bits there

19:59 <enebo> lopex: so if I make something which is mutable then its encoding might change

19:59 <enebo> lopex: but I guess so would the CR

19:59 <enebo> if we did not have this

20:00 <lopex> yeah

20:00 <enebo> It would somewhat mean if I did Encoding theEncoding = someStringIwillMutate.getEncoding(); may not be valid later in the method

20:00 <lopex> er, that's a question what does it change

20:00 <lopex> enebo: just for initial setting the cr

20:00 <enebo> well it will be valid for what think of today as encoding but we would have to know not to trust the cr status

20:01 <enebo> so we can look at it initially but not later?

20:01 <lopex> dunno

20:01 <lopex> maybe we could

20:01 <enebo> yeah something "feels" wrong :)

20:01 <lopex> but what ?

20:02 <enebo> we store encoding as temp local a lot

20:02 <lopex> it might not matter

20:02 <lopex> that' why I'm saying for init

20:02 <enebo> if we ask for CR from it or even think we can after a particular point then we will get a bad answer

20:02 <enebo> lopex: but if I don't know the codebase how do I know that?

20:03 <enebo> I mean RubyString and RubySymbol will have cr field/methods but if you are just jumping around through source code you will see Encoding has it too

20:03 <lopex> not if encoding wont change

20:03 <enebo> yeah true

20:04 <lopex> and changing enc invalidates cr anyways

20:04 <enebo> I don't mean changing enc though

20:04 <enebo> I mean if I have 7bit CR on utf-8 encoding instance for a String and I add an mbc then I need to at a minimum change the encoding right?

20:05 <enebo> if we want encoding to match what the string has set as a cr field

20:05 <headius> lopex: maybe not...I know I ran them when implementing the rest of transcoding though

20:05 <enebo> so cr as advisory coming in and we replace the encoding in constructor with non cr holding encoding?

20:06 <enebo> lopex: perhaps that was what you meant

20:06 <lopex> enebo: another question

20:06 <lopex> if we force_encoding(:ascii-8bit)

20:06 <lopex> can we say cr is valid anyways ?

20:07 <enebo> I don't know :)

20:07 <enebo> probably valid

20:07 <enebo> it is binary at that point

20:07 <enebo> we cannot walk it wrong at that point

20:08 <enebo> I don't know if valid means that specifically though

20:08 <enebo> perhaps nirvdrum can tell me :)

20:09 <lopex> and force_encoding("iso-8859-1") ?

20:09 <lopex> s = "ą";s.force_encoding("iso-8859-1");p s.length

20:10 <enebo> seems reasonable it will print >1

20:10 <lopex> now it doesnt matter that there's no 7bit

20:10 <enebo> lopex: yeah but why are you asking?

20:11 <enebo> I look at both of those force_encodings as basically saying, fuck it I don't care what the date is...it will be a stream of bytes now

20:11 <enebo> s/date/data

20:11 <enebo> too much ardjbc debugging :)

20:12 <lopex> enebo: and then there's tons of specializations for encodings based on their instances

20:12 <lopex> and other properties like max length

20:12 <enebo> well I think passing CR as subtype to actual encoding is just a hack to get around needing to change signatures. I am not sure if are talking about that now though

20:12 <lopex> if cr would have more priority them maybe it could be simplified ?

20:13 <lopex> enebo: well, it would be "the lowest possible cr"

20:13 <enebo> lopex: if we talk about solving this we should not even be using CR as bit flags

20:13 <lopex> yeah, that's a different story

20:14 <enebo> I think putting CR into Encoding would make it harder to remove

20:14 <enebo> Same for bytelist for that matter

20:14 <enebo> If I had my way since we almost always scan bytelists I would add a charlength field

20:15 <enebo> if not scanned it would be -1 or something like that. If valid it would just be number of chars. If invalid it would be another special number -2

20:16 <lopex> yeah, we discussed that I think

20:16 <enebo> CR_VALID != -2, CR_UNKNOWN == -1, CR_7BIT (realSize == characterLength)

20:16 <enebo> yeah but getting back to original problem this would be in the bytelist

20:16 <enebo> so we would not need to pass in CR since we could derive it entirely from the bytelist

20:17 <enebo> but it would obviously have other significant benefits like immediately know it's length

20:17 <enebo> main downside would be more memory

20:17 <nirvdrum> Storing char length makes a lot of sense.

20:17 <enebo> nirvdrum: I know you have thought about this a lot

20:17 <enebo> nirvdrum: other than memory is there really any downside over this bit stuff

20:18 <nirvdrum> I haven't put much thought into using special length values for code range encoding. We actually use an enum right now.

20:19 <nirvdrum> But I wouldn't mind getting rid of another field.

20:19 <enebo> I like the idea of string.length just returning immediately as well

20:19 <nirvdrum> Your definition of CR_7BIT doesn't work for ASCII-8BIT though.

20:19 <lopex> the best thing is cr/length invalidation in one shot

20:19 <enebo> unless it is some big creation from IO where we need to walk it but we would do that now

20:19 <nirvdrum> lopex: You could also get rid of that packed long you use now to encoding CR and length.

20:20 <lopex> that's the point

20:20 <lopex> and more atommeness :P

20:20 <enebo> nirvdrum: ah well ASCII-8BIT can just be -2 right?

20:20 <nirvdrum> I think there might be one of the IBM encodings that's byte-wide an ASCII-compatible as well.

20:20 <enebo> realSize will still == length

20:20 <nirvdrum> ASCII-8BIT is always CR_7BIT or CR_VALID

20:21 <enebo> nirvdrum: so scan still has to happen so if not 7bit it is set to -2

20:21 <enebo> otherwise it is characterLength = realSize

20:21 <enebo> but in whether VALID or 7BIT realSize will be length

20:21 <nirvdrum> You wrote CR_VALID != -2

20:21 <nirvdrum> Maybe I'm confused.

20:22 <enebo> oh yeah I wrote that wrong :)

20:22 <enebo> so perhaps we have one more negative number

20:22 <enebo> or flip that but we have a lot of negative numbers

20:22 <nirvdrum> That works for ASCII-8BIT. But where would you store the char length for CR_VALID UTF-8?

20:23 <lopex> enebo: you have one more number since there's no -0 :P

20:23 <nirvdrum> This whole thing is kinda dumb. CR_BROKEN strings just shouldn't be allowed to propagate through the system.

20:23 <enebo> characterLength is validLength if positive

20:23 <lopex> er, less

20:23 <enebo> so non-valid is a special designator

20:23 <enebo> not known yet as another

20:24 <enebo> I guess we need to still know valid vs 7bit on 8bit ascii/raw

20:24 <nirvdrum> Don't you store CR in the object header?

20:24 <lopex> in flags

20:24 <enebo> nirvdrum: currently yeah

20:25 <enebo> at least in what I think of as part of ruby object header :)

20:25 <nirvdrum> So whether you encode CR in the length or not, you're still going to have to eat the cost of allocating a new int for charLength.

20:25 <enebo> nirvdrum: yep

20:25 <nirvdrum> I'd recommend starting by caching charLength and leave CR alone for the time being.

20:25 <nirvdrum> You're going to have to deal with essentially a new type of invalidation even without CoW.

20:25 <enebo> nirvdrum: this whole conversation started from realizing symbols made into strings lose their ability to know the CR

20:25 <nirvdrum> E.g., taking a substring of a ByteList means a new byte scan.

20:26 <enebo> nirvdrum: maybe anyways

20:26 <nirvdrum> Why would they lose that ability?

20:27 <enebo> we only pass in the bytelist when we make the string (or we make with a jlString)

20:27 <enebo> err make the symbol

20:27 <enebo> we do not pass CR in as part of that even if we already know it

20:27 <nirvdrum> Ahh.

20:27 <nirvdrum> You guys need to decide what ByteList should be :-)

20:28 <lopex> nirvdrum: and then intermediate string on :foo.upcase didnt have a cr

20:28 <nirvdrum> It's a nifty class for working with bytes, but since it already tracks an Encoding, it's really just the JRuby string representation.

20:28 <lopex> and upcase went slow path

20:28 <nirvdrum> Storing the CR in there makes sense to me.

20:28 <enebo> nirvdrum: I agree personally

20:28 <nirvdrum> But then you basically have mutable ropes (oxymoron, yes).

20:29 <enebo> Actually I suspect we all agree but history

20:29 <lopex> exaclty

20:30 <enebo> nirvdrum: obviously our other big issue with bytelist is encapsulation :P

20:30 <headius> I've tried several times to migrate CR and encoding logic into ByteList...it's not easy to do incrementally

20:30 <headius> I have a branch that at least makes CharSequence methods do proper encoding logic (toString, charAt, etc)

20:30 <enebo> are lambdas good enough to fix our issues?

20:30 <enebo> for encapsulation

20:30 <lopex> enebo: bytelist should be private github project :P

20:31 <enebo> headius: oh btw did you happen to notice subSequence was broken in bytelist 1.x

20:31 <headius> yes, I fixed that as well

20:31 <headius> kinda had to

20:31 <nirvdrum> I had migrated everything over to that CodeRangeable interface before we forked away.

20:31 <headius> but I haven't returned to that work

20:31 <nirvdrum> Maybe that would make things easier?

20:31 <enebo> yeah it is broken in arjdbc now

20:32 <headius> there are many methods in ByteList that are broken that we leave there because of legacy

20:32 <enebo> or a single usage of it is

20:32 <headius> but we need to make a clean break at some point

20:32 <nirvdrum> ByteList 2.0?

20:32 <enebo> yeah knowing it is 2.0 is fine since it is an internal dep

20:32 <enebo> I nearly have all bytelist.toString() killed in bytelist_love

20:38 <headius> yeah 2.0

20:38 <headius> i pushed a snapshot and a JRuby branch to see how it affected things but have not gotten back to it

20:38 <headius> pretty swamped right now

20:40 <GitHub168> [jruby] lopex pushed 1 new commit to master: https://git.io/vxjV5

20:40 <GitHub168> jruby/master e85f1f7 Marcin Mielzynski: update joni

21:03 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

21:13 sidx64_ has quit [Ping timeout: 264 seconds]

21:37 hbautista has joined #jruby

21:39 hbautista has quit [Remote host closed the connection]

22:18 drbobbeaty has quit [Ping timeout: 240 seconds]