<kostya>
hi, with last compiler brainfuck samle just hangs
<kostya>
outputs only Z
<kostya>
oh, it just become very slow, in release
asterite1 has joined #crystal-lang
<asterite1>
Oh, really? Let me see
e_dub has joined #crystal-lang
<asterite1>
We recently changed strings to make them fully utf-8 aware, so maybe some methods are now slower
<asterite1>
but we should make them fast again if possible
<asterite1>
@kostya well, it's because String#length has to traverse all chars to count the length, and String#[](index) has to traverse the chars from the beginning to find it
<asterite1>
I guess we'll have to do what C# does and what @waj wants: we'll store a bit in the string that says if the string is ascii-only. That way many algorithms can be done faster
<asterite1>
Both #length and #[] become O(1) that way, instead of O(n)
<asterite1>
@kostya thanks for letting us know :)
<jhass>
asterite1: what are the general plans for encoding support?
<asterite1>
All strings will be utf-8 (unlike ruby). When you create an IO object you would specify the encoding, the default being utf-8, and strings will be read from that encoding and converted to utf-8
asterite has joined #crystal-lang
e_dub has joined #crystal-lang
asterite has joined #crystal-lang
e_dub has joined #crystal-lang
asterite has joined #crystal-lang
bcardiff1 has joined #crystal-lang
bcardiff has joined #crystal-lang
asterite has joined #crystal-lang
<CraigBuchek1>
@asterite: Might it make sense to carry the length of the string around? I.e. determine it at initialization? In a lot of cases, I'd expect that we'd know that at initialization time. For example, for string literals, we could compute that at compile time. For string concatenation, we could add the 2 lengths together. I suppose we could have the option to initialize it, else calculate it lazily and memoize it.
<asterite>
strings already carry that information
<asterite>
a string right now is represented as a length and an array of bytes that end with \0
<asterite>
but it's the number of bytes, not the number of unicode characters
<asterite>
so String#length has to be recomputed over and over, but String#bytesize not
<asterite>
That's why we might want to store a bit in the string if we know that the string is ascii only, because then String#length is the same as bytesize
<asterite>
Still, the brainfuck sample becomes a bit slower (from 5.6s to 6.8s)… but if you change that sample to use an array of chars instead of a string (for the tape), then it becomes fast again