<companion_cube>
well I'm wondering if it's possible to have a union of structs with recursion, like Json::Any, but where the recursion doesn't use Array/Hash
<companion_cube>
yep, because of that
<FromGitter>
<Blacksmoke16> dunno
<companion_cube>
so I'm thinking there sholud be a Box/Ref that is like a one-element Array, basically
<FromGitter>
<spTorin> How I can keep some uniq values with fast access on check existence value. Like `[1, 5, 78, 12].includes?(12)`, but array is big and do this many times. Think what `set` is good for it, but speed `includes?` on set is 10x slower then `includes?` on array.
<FromGitter>
<spTorin> Like hash table, but not need value and need only key.
<Yxhuvud>
spTorin: That shouldn't be the case if the list is indeed large. Can you show us your code and benchmarks?
<FromGitter>
<spTorin> In my code speed of `array.includes?` is enough. It’s just interesting if there are structures for quick searching in elements. ⏎ `array.includes?` - O(n) ⏎ `hash.has_key?` - O(log n)
<FromGitter>
<spTorin> Don't know why set slower. In docs `Set uses Hash as storage`.
<Yxhuvud>
I'm not certain what is going on there - if I change the size to 100000000 it doesn't change the time of the array case, so it looks to be optimized out
<Yxhuvud>
(and also change the value to seek for to be at the end)
<FromGitter>
<spTorin> Now I looked at the size of the array and the location of the search element, it seems that the speed does not really change. Why? Indeed, the comparison goes element by element.
<FromGitter>
<Blacksmoke16> i.e. the `#groups=` setter on the context handles adding that strategy, same idea if you were to do like `context.version = "1.25.0"`
<FromGitter>
<Blacksmoke16> whereas if you had custom ones it would be more lik
<FromGitter>
<Blacksmoke16> or ofc apply conditional logic etc
<FromGitter>
<Blacksmoke16> i mean its flexible enough where the user can do it however they like, im just trying to think of all the ways i can be done to make sure everything work as expected
<FromGitter>
<jwoertink> oh, I guess there's a top level method `future`
<FromGitter>
<jwoertink> I'm assuming it's meant to use that instead
alex``` has quit [Quit: WeeChat 2.5]
<FromGitter>
<kingsleyh> @watzon hey sorry I'm not working on the Crystal Idea plugin anymore - too busy on SushiChain
<FromGitter>
<kingsleyh> It would take me several months to complete the original vision I had for it
<FromGitter>
<kingsleyh> I think the best bet would be to get Jetbrains to build it
<companion_cube>
hopefully a LSP server will emerge first
<FromGitter>
<kingsleyh> I wrote the D Language intellij plugin and it took a lot of help from Jetbrains devs to get it done and I handed it off to another team who are still constantly improving it
<FromGitter>
<kingsleyh> yes - Scry does seem to work pretty well with Intellij
<FromGitter>
<kingsleyh> if you take my base intellij plugin and install the LSP plugin and Scry
<FromGitter>
<kingsleyh> a fully functioning intellij plugin is awesome though - with full refactoring, hints, completion, project view, test runner, tooling for shards etc - that was my original vision
<FromGitter>
<watzon> It would be really nice if JetBrains built it
<FromGitter>
<watzon> But it will probably take then years if they ever do
<companion_cube>
not before 1.0 anyway
return0e has quit [Read error: Connection reset by peer]
return0e has joined #crystal-lang
dannyAAM has quit [Quit: znc.saru.moe : ZNC 1.6.2 - http://znc.in]
<FromGitter>
<watzon> Chances are it's because you're not setting the type of `&block`
<FromGitter>
<watzon> @didactic-drunk as far as your question, I have no clue
<FromGitter>
<watzon> I never use Atomics
<FromGitter>
<didactic-drunk> @asterite? Array of Atomic broken.
<FromGitter>
<watzon> What I'm wondering is why `Char#letter?` doesn't work for the entire unicode space. It seems to work with Cyrillic, but not Japanese, Chinese, Hindi, etc.
<FromGitter>
<kinxer> What do you mean by "errored out with not enough arguments"?
<FromGitter>
<JohnDowson> Uhh, I can't replicate it now. ⏎ https://play.crystal-lang.org/#/r/7jwt ⏎ this is sort of thing that wasn't working, but it clearly works now
<FromGitter>
<Blacksmoke16> 👍
<FromGitter>
<watzon> Ok this is driving me crazy. I need a way to tokenize a string, without using regex. The problem is I need it to be utf8 compatible (ie. it needs to work with hindi, japanese, etc) and it needs to remove tokens that contain anything that's not a letter character.
<FromGitter>
<watzon> Doing it with Regex is easy, but it's also extremely slow
<FromGitter>
<Blacksmoke16> tokenize meaning get array of all chars? or?
<FromGitter>
<stronny> utf8 is encoding, but looks like you need to parse it into unicode codepoints and then interate over them?
<FromGitter>
<watzon> Tokenize meaning separate a string into words
<FromGitter>
<watzon> It looks like the slow part of my tokenization might have been the other part though, where I'm breaking each token up into characters, running `each_cons` on them, and then building smaller strings and adding them to an array
<FromGitter>
<watzon> I just wrapped that part in a `spawn` block and it made everything almost instantanious
<FromGitter>
<stronny> well, does it work as expected?
<FromGitter>
<watzon> Yep, for now anyway haha
<FromGitter>
<stronny> are you sure?
<FromGitter>
<watzon> My previous tokenization method worked fine until I threw Hindi at it
<FromGitter>
<stronny> you fibers run after your method returns
<FromGitter>
<kinxer> It doesn't look like you're doing anything with the fibers you spawn.
<FromGitter>
<kinxer> Oh, no, actually I'm wrong. Listen to @stronny .
<FromGitter>
<watzon> You're right, I need to yield to the fiber
<FromGitter>
<watzon> Doesn't seem to be an issue in my code for a reason, but running it by itself doesn't return anything
<FromGitter>
<stronny> what should it return?
<FromGitter>
<watzon> Eh nvm
<FromGitter>
<watzon> Spawn doesn't fix shit
<FromGitter>
<watzon> With MT it would
<FromGitter>
<stronny> with MT you'll have a massive race condition
<FromGitter>
<watzon> True, unfortunately
<FromGitter>
<stronny> do you really need i18n?
<FromGitter>
<stronny> I prefer to just ignore anything that's not English honestly
<FromGitter>
<watzon> Yeah it's kinda the purpose of this code
<FromGitter>
<watzon> I'm tokenizing files in other languages for training
<FromGitter>
<JohnDowson> split tokens among the threads, wait for threads, join results into groupings?
<FromGitter>
<stronny> well, I think the right way is to use external C libs
<FromGitter>
<watzon> @JohnDowson that would work
<FromGitter>
<stronny> it's not even about speed, it's about correctness
<FromGitter>
<watzon> Well I can get the correctness pretty easily, but I'd like to get a bit more performance out of it
<FromGitter>
<watzon> And I definitely don't want to resort to using C, that's not worth it
<FromGitter>
<stronny> I mean use them from inside Crystal
<FromGitter>
<watzon> Yeah I know, but if I have to write C code and write bindings for it I've already lost 😄
<FromGitter>
<stronny> whay do you need each_cons? what are you doing there?
<FromGitter>
<watzon> It's part of the training. I'm building a language detector which uses a Bayes classifier. `each_cons` is breaking each token up into bite size pieces that my classifier can use to determine what letters are closer to each other in any given language.
<FromGitter>
<stronny> do you need a separate array for that? can you feed each grouping directly?
<FromGitter>
<watzon> I just need the result of `#tokenize` to be an array of Strings where each string is a piece of a token with a length no greater than 3.
<FromGitter>
<watzon> I wish there was a `map_cons` or something. I hate having to use `each_cons` and push each one to an array
<FromGitter>
<stronny> I think there are lots of allocations that you don't really need
<FromGitter>
<Blacksmoke16> try using the reader then map?
<FromGitter>
<Blacksmoke16> idk if that would help
<FromGitter>
<stronny> String.split can take a block, String.each_char is an Iterator
<FromGitter>
<Blacksmoke16> got some sample code we can benchmark @watzon ?
<FromGitter>
<watzon> I want all the tokens to be downcased for training, otherwise the same string could be seen differently by the classifier because the case is different
<FromGitter>
<stronny> do you need an array though?
<FromGitter>
<watzon> Yes
<FromGitter>
<asterite> @didactic-drunk I'm not Crystal :-)
<FromGitter>
<watzon> The classifier has to take that array and make a frequency table
<FromGitter>
<stronny> can you show the classifier?
<FromGitter>
<stronny> I suspect it will iterate the array, split the strings and inspect each char again
<FromGitter>
<watzon> It doesn't. I'm overriding the default Tokenizer it's using there, but it doesn't end up splitting the strings again. That's what the whole purpose of the tokenizer is.
<FromGitter>
<watzon> All it does is assign each token to a hash and counts the occurrences. Then does some math to determine the probability that a particular token belongs to a category.
<FromGitter>
<asterite> @watzon in your code you might be able to do `each_cons(@min_size, reuse: true)` to reuse the array that's yielded to the block, since anyway you just `join` and `downcase` it so it doesn't matter if it's the same array between each iteration. Might speed up things a bit. Also you probably want to `cons.map!(&.downcase)` first to avoid a second array
<FromGitter>
<watzon> Ooh nice!
ht_ has quit [Quit: ht_]
<FromGitter>
<watzon> I think you *are* Crystal, @asterite
<FromGitter>
<watzon> 😉
<FromGitter>
<stronny> I would refactor the library
<FromGitter>
<watzon> Tbh I wrote it a while ago and it does need some refactoring
<FromGitter>
<stronny> I think you are Crystal after all
<FromGitter>
<watzon> 😂
<FromGitter>
<watzon> At least there's 1294 pieces of him in Crystal
absolutejam1 has joined #crystal-lang
<FromGitter>
<watzon> Crystal is merely a horcrux
absolutejam1 has quit [Ping timeout: 276 seconds]
return0e has joined #crystal-lang
sorcus has quit [Ping timeout: 245 seconds]
<FromGitter>
<watzon> What would be the best way to emulate Ruby's `eval`?
<FromGitter>
<watzon> I know `eval` is generally a bad idea, but I have a good reason
<FromGitter>
<Blacksmoke16> *do you*
<FromGitter>
<asterite> > I think you are Crystal after all ⏎ lol :-)
<FromGitter>
<watzon> It's for an interpreter type environment, so yes :)
<FromGitter>
<watzon> Don't worry, it will be sandboxed
<FromGitter>
<asterite> What would be the best way to emulate Ruby's `eval`? --> `nil.not_nil!`
<FromGitter>
<watzon> 😂
<FromGitter>
<watzon> I know I could save the string as a file and then invoke the compiler and execute it, but I'm wondering if there's a more streamlined approach
<FromGitter>
<stronny> no
<FromGitter>
<stronny> tell us the good reason
<FromGitter>
<watzon> > It's for an interpreter type environment, so yes :) ⏎ ⏎ Already did
<FromGitter>
<watzon> I'm writing an interpreter
<FromGitter>
<stronny> please clarify
<ukd1_>
\q
ukd1_ has quit [Quit: leaving]
<FromGitter>
<watzon> It's for a bot which can execute crystal code and return the result
<FromGitter>
<stronny> well, okay. there is the Play with its source, there are several I think repls, you can try investigating how it's done there
<FromGitter>
<watzon> I already know how it's done there, that's why I'm asking if there's a more streamlined approach. Those tend to be much more complicated than this is.
<FromGitter>
<watzon> But I figured it would be a no
<FromGitter>
<stronny> not easily at least
<FromGitter>
<watzon> Oh well, should be interesting to figure out
<FromGitter>
<stronny> after poking around I currently think there is no good way to do it
<FromGitter>
<stronny> you can't kill a fiber
<FromGitter>
<watzon> True
<FromGitter>
<stronny> you can't fork either
<FromGitter>
<stronny> go's model was a mistake imo
<FromGitter>
<stronny> what's inside the timeout?
<FromGitter>
<watzon> A lot about Go was a mistake
<FromGitter>
<stronny> there is no general solution, but there are partial ones
<FromGitter>
<stronny> I mean Crystal adopting Go's model was a mistake
<FromGitter>
<stronny> go itself is a lost cause
<FromGitter>
<watzon> > what's inside the timeout? ⏎ ⏎ Could theoretically be anything, in this case it would be executing an external process and waiting for it to complete