ChanServ changed the topic of #crystal-lang to: The Crystal programming language | http://crystal-lang.org | Crystal 0.19.4 | Fund Crystals development: http://is.gd/X7PRtI | Paste > 3 lines of text to https://gist.github.com | GH: https://github.com/crystal-lang/crystal | Docs: http://crystal-lang.org/docs/ | API: http://crystal-lang.org/api/ | Logs: http://irclog.whitequark.org/crystal-lang
soveran has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Changing host]
soveran has quit [Ping timeout: 244 seconds]
pduncan has joined #crystal-lang
<Papierkorb> jots_twitter, also look at how the times are spent. the crystal program is waiting 40s just on I/O. even if you just look at the user times, the crystal program is still slower than wc, but faster already than perl
vikaton has joined #crystal-lang
<FromGitter> <jots_twitter> interesting to see the different calls that the 3 different programs make: https://gist.github.com/anonymous/8386f531d69ece380b34efb42f1cd202
pduncan has quit [Ping timeout: 258 seconds]
<Papierkorb> you should get rid of the mem* functions if you get rid of the useless arrays
<FromGitter> <jots_twitter> yes. interesting that perl somehow gets away with it though.
soveran has joined #crystal-lang
<FromGitter> <drosehn> I'm pretty sure you posted it earlier, but where's the source for your crystal program? I might take a look at it tomorrow, if I have time. (I am not an expert at crystal, so don't expect that I'll get anywhere!)
soveran has quit [Ping timeout: 260 seconds]
<FromGitter> <jots_twitter> https://gist.github.com/anonymous/33d1973647b81cd5769a436775b816cc it is based on https://github.com/sferik/wc.cr (not me) with a tiny fix. It reads entire files into memory so only for smaller datasets.
shawn42 has quit [Quit: Connection closed for inactivity]
pduncan has joined #crystal-lang
matp has quit [Excess Flood]
matp has joined #crystal-lang
snsei has joined #crystal-lang
p0p0pr37 has quit [Changing host]
p0p0pr37 has joined #crystal-lang
pduncan has quit [Ping timeout: 258 seconds]
vikaton has quit [Quit: Connection closed for inactivity]
pawnbox has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Ping timeout: 256 seconds]
snsei_ has joined #crystal-lang
vikaton has joined #crystal-lang
snsei_ has quit [Remote host closed the connection]
snsei has quit [Ping timeout: 268 seconds]
snsei has joined #crystal-lang
Raimondii has joined #crystal-lang
Raimondi has quit [Ping timeout: 244 seconds]
Raimondii is now known as Raimondi
pawnbox has quit [Remote host closed the connection]
phase_ has joined #crystal-lang
pduncan has joined #crystal-lang
snsei_ has joined #crystal-lang
snsei has quit [Ping timeout: 260 seconds]
pduncan has quit [Ping timeout: 258 seconds]
<FromGitter> <jots_twitter> my latest try: ⏎ ⏎ ```x = 0 ⏎ while gets ⏎ x += 1 ⏎ end ⏎ puts x``` ⏎ ⏎ feel like i'm doing something wrong here. [https://gitter.im/crystal-lang/crystal?at=582a947ddf5ae966455ab70c]
mgarciaisaia has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Ping timeout: 252 seconds]
snsei has joined #crystal-lang
snsei_ has quit [Ping timeout: 265 seconds]
snsei has quit [Remote host closed the connection]
snsei has joined #crystal-lang
vikaton has quit [Quit: Connection closed for inactivity]
mgarciaisaia has quit [Quit: Leaving.]
pduncan has joined #crystal-lang
snsei_ has joined #crystal-lang
snsei_ has quit [Remote host closed the connection]
snsei_ has joined #crystal-lang
unshadow has quit [Ping timeout: 256 seconds]
snsei has quit [Ping timeout: 258 seconds]
pduncan has quit [Ping timeout: 260 seconds]
bjz has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Ping timeout: 240 seconds]
bjz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
snsei has joined #crystal-lang
snsei_ has quit [Ping timeout: 256 seconds]
bjz has joined #crystal-lang
bjz has quit [Client Quit]
bjz has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Changing host]
soveran has joined #crystal-lang
phase_ has quit [Quit: cya l8r alig8r]
pawnbox has joined #crystal-lang
unshadow has joined #crystal-lang
p0p0pr37 has quit [Quit: p0p0pr37]
p0p0pr37 has joined #crystal-lang
p0p0pr37 has joined #crystal-lang
Philpax has joined #crystal-lang
pduncan has joined #crystal-lang
pduncan has quit [Ping timeout: 258 seconds]
bjz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
bjz has joined #crystal-lang
j2k has joined #crystal-lang
mark_66 has joined #crystal-lang
bjz_ has joined #crystal-lang
bjz has quit [Ping timeout: 256 seconds]
j2k has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
j2k has joined #crystal-lang
bjz has joined #crystal-lang
bjz_ has quit [Ping timeout: 245 seconds]
p0p0pr37 has quit [Remote host closed the connection]
p0p0pr37 has joined #crystal-lang
p0p0pr37 has joined #crystal-lang
p0p0pr37 has quit [Client Quit]
p0p0pr37 has joined #crystal-lang
p0p0pr37 has joined #crystal-lang
bjz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
bjz has joined #crystal-lang
ome has joined #crystal-lang
pduncan has joined #crystal-lang
pduncan has quit [Ping timeout: 258 seconds]
soveran has quit [Remote host closed the connection]
soveran has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Changing host]
Raimondii has joined #crystal-lang
Raimondi has quit [Ping timeout: 244 seconds]
Raimondii is now known as Raimondi
<FromGitter> <luislavena> @jots_twitter actually `wc` will count lines between `LF` characters. There is another implementation of `wc` in Rust that might provide efficient approach of scanning big text files: https://github.com/uutils/coreutils/blob/master/src/wc/wc.rs
<FromGitter> <luislavena> The ruby code might be faster because `gets` will read until `LF` is found, and return that as a string.
gloscombe has joined #crystal-lang
<FromGitter> <luislavena> Perhaps the issue is the allocation process, remove of the allocation of arrays and strings and just walk over the IO *might* be faster
unshadow has quit [Quit: Lost terminal]
pawnbox has quit [Remote host closed the connection]
matp has quit [Read error: Connection reset by peer]
Philpax has quit [Ping timeout: 260 seconds]
pduncan has joined #crystal-lang
matp has joined #crystal-lang
bjz_ has joined #crystal-lang
bjz has quit [Ping timeout: 260 seconds]
ome has quit [Quit: Connection closed for inactivity]
pawnbox has joined #crystal-lang
pduncan has quit [Ping timeout: 260 seconds]
snsei has quit [Remote host closed the connection]
bjz_ has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<crystal-gh> [crystal] bmulvihill reopened pull request #3518: Enum from_value when Flags (master...enum-from-value) https://git.io/vX0n7
<crystal-gh> [crystal] asterite pushed 1 new commit to master: https://git.io/vX1b5
<crystal-gh> crystal/master c534cb4 Ary Borenszweig: Fixed #3548: Compiler crashes when calling class method of an alias
sooli has joined #crystal-lang
<FromGitter> <johnjansen> @jots_twitter are you implementings a complete `wc` or just `-l`
pduncan has joined #crystal-lang
gloscombe has quit [Quit: Lost terminal]
gloscombe has joined #crystal-lang
pduncan has quit [Ping timeout: 258 seconds]
sooli has quit [Remote host closed the connection]
soveran has quit [Remote host closed the connection]
<travis-ci> crystal-lang/crystal#c534cb4 (master - Fixed #3548: Compiler crashes when calling class method of an alias): The build passed. https://travis-ci.org/crystal-lang/crystal/builds/176056859
<DeBot> https://github.com/crystal-lang/crystal/issues/3548 (Compiler crashes when calling class method of an alias.)
snsei has joined #crystal-lang
<FromGitter> <johnjansen> for anyone who cares (@jots_twitter), the following is ~6 times slower than ruby ⏎ ⏎ ```code paste, see link``` [https://gitter.im/crystal-lang/crystal?at=582b34cfe097df7575b53a7c]
pawnbox has quit [Ping timeout: 252 seconds]
<FromGitter> <sdogruyol> @johnjansen maybe it's something with `gets`?
soveran has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Changing host]
pawnbox has joined #crystal-lang
<FromGitter> <drosehn> Well, I have only had time to glance at @jots_twitter 's code, but I'll make the observation that in his code he processes the entire file once for each option. The call to `row << text.lines.size if options[:lines]` is going to start at byte #0 of the file, and go through every byte of it to count the number of lines. And then `row << text.split.size if options[:words]` is going to start back at byte #0, and process all of
<FromGitter> ... those same bytes, this time looking for word boundaries.
mark_66 has quit [Remote host closed the connection]
<FromGitter> <drosehn> And code which calls `gets()` will (at some level in the processing) do a read of probably BUFSIZ bytes, and then copy a single byte up to your program. And that lower-level code will have to keep track of where it is in the larger buffer, so it needs to keep a pointer and update that pointer for each call to `gets()`.
soveran has quit [Ping timeout: 250 seconds]
mgarciaisaia has joined #crystal-lang
<crystal-gh> [crystal] samueleaton opened pull request #3550: Implement Hash notation for examples in docs (master...fix-hash-example) https://git.io/vXMET
<FromGitter> <asterite> @johnjansen Did you try compiling with --release ?
soveran has joined #crystal-lang
soveran has joined #crystal-lang
soveran has quit [Changing host]
maxpowa has quit [Ping timeout: 244 seconds]
pawnbox has quit [Remote host closed the connection]
pduncan has joined #crystal-lang
gloscombe has quit [Quit: Lost terminal]
soveran has quit [Remote host closed the connection]
kochev has joined #crystal-lang
pduncan has quit [Ping timeout: 256 seconds]
maxpowa has joined #crystal-lang
mgarciaisaia has left #crystal-lang [#crystal-lang]
<FromGitter> <drosehn> I started to make another minor observation from quick-skimming, when the comments made by several other people suddenly clicked in my head.
<FromGitter> <drosehn> When you call `text.lines`, crystal is building a full-blown Array(String). It is creating an Array object, and then going through `text` and adding each line that it finds as another element in that Array. You're then taking that array, and asking it "So, how many elements do you have?". And then you throw away that entire array. And then you do the *same* thing (building a completely different array) when you call
<FromGitter> ... `text.split`.
<Yxhuvud> that does seem a bit inefficient.
<Yxhuvud> creating lots of objects for lines, that is.
<FromGitter> <drosehn> You can see this, btw, if you add `printf " lines=%s\n", text.lines.class`. You really are creating a full-blown array which has copied data from the original `text` object into many `string` objects that are stored in that `Array(String)`.
<FromGitter> <sdogruyol> @drosehn what's your suggestion then
<FromGitter> <drosehn> Well, I know what I'd do in C, and in fact I *have* written a program pretty similar to this in C. I'm not 100% sure how to translate all the tricks in my C program into crystal. So I need to do a few more experiments before making any claims that I'll regret later. :smiley:
soveran has joined #crystal-lang
<Yxhuvud> does it create new strings or references to inside the original string?
<FromGitter> <drosehn> Unfortunately I'm at work now, so I'll need to focus on work-related tasks at the moment!
<FromGitter> <drosehn> Yeah, I wondered that. Given that crystal keeps strings as immutable, it might not need to copy any of the data-bytes into the new `String` objects. However, it does has to do *something* which will create each of those string objects, even if that's just to create a pointer to the start and end of the characters as they exist in `text`.
<FromGitter> <johnjansen> @asterite yeah that was with release, its not for me BTW someone else is trying to duplicate `wc` in crystal, but that struck me as a little odd ;-)
<FromGitter> <drosehn> Consider, for instance, that you could also say `puts text.lines[143]`, and the crystal code will expect that it can go to element #143 of an Array, and pull out the string which matches that context.
<FromGitter> <drosehn> They're trying to duplicate `wc` as a simple exercise. The real goal is to understand how to process large data files as quickly as possible.
<FromGitter> <drosehn> For instance, my "wc-like" program written in C is not counting lines. It's finding line-boundaries, checking for lines that start with "%%" (Postscript comments), and then doing things based on what it finds on those lines.
<FromGitter> <drosehn> And given that our print servers may throw around several hundreds of gigabytes of postscript files per day, I really needed to do that as efficiently as possible.
<FromGitter> <sdogruyol> @drosehn that sounds interesting. What do you do? Fintech?
<FromGitter> <johnjansen> WOW @drosehn Postscript, you are bring back some memories / nightmares from the past
<FromGitter> <drosehn> Oh, I guess I should not put the number-sign before numbers unless I'm talking about github issues!
<FromGitter> <johnjansen> @sdogruyol Postscript is a printer language for want of a better word … files are routinely enormous
<FromGitter> <johnjansen> @drosehn is in the academic world ;-)
<FromGitter> <drosehn> No, I work at a college. Over the last 15 years we've built up a pretty popular service for printing out large-format outputs. 3-foot-wide by many-feet-long. During the last two weeks of each semester, we'll print out more than a mile of 3-foot-wide paper.
<FromGitter> <sdogruyol> @johnjansen just learned that. Thanks you :)
<FromGitter> <sdogruyol> @drosehn that's awesome
soveran has quit [Remote host closed the connection]
<FromGitter> <drosehn> It also means you have to be really obsessive about any processing of those postscript files!
<FromGitter> <johnjansen> im feeling sympathy for @drosehn, debugging PS is ... well … interesting ;-)
<FromGitter> <sdogruyol> :smile:
<FromGitter> <johnjansen> @drosehn did you guys build a RIP ?
<FromGitter> <drosehn> These days actual-debugging of postscript is nearly impossible. My program just helps us know exactly what the postscript file expects to do *before* we send it to the plotters. This is very valuable info, when you have a lot of plots and you need to be obsessive.
<FromGitter> <drosehn> Wow. No!! That's way beyond my abilities!!
soveran has joined #crystal-lang
<FromGitter> <drosehn> [That emphatic "no" is wrt building a RIP]
<FromGitter> <johnjansen> thank god …
<FromGitter> <drosehn> In any case, I need to get back to work...
<FromGitter> <johnjansen> ;-)
soveran has quit [Ping timeout: 260 seconds]
j2k has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
kochev has quit [Remote host closed the connection]
<FromGitter> <crisward> What does everyone use for mocks / spies / stubs in testing?
j2k has joined #crystal-lang
<FromGitter> <jwoertink> What's testing? Oh, that thing you do after you push your app to production?
mgarciaisaia has joined #crystal-lang
pduncan has joined #crystal-lang
<FromGitter> <johnjansen> anyone know the status of Crystal right now, like whens the next release?
pduncan has quit [Ping timeout: 245 seconds]
mgarciaisaia has left #crystal-lang [#crystal-lang]
<FromGitter> <drosehn> Here's another indication of the files that my program had to deal with. I notice the "wcg" program used `row = [] of Int32`. In my program, I have to use 64-bit integers, not 32-bit...
bjz has joined #crystal-lang
<FromGitter> <drosehn> yeah, if you send a file >2gig to the `wcg` program, it exits sideways with `negative capacity (ArgumentError)`. The call to `text = File.read(fd)` will need to create a single `String` object which needs to be more than 2-gig, and since `String.size` returns an Int32, I suspect it is impossible to create a single String which is larger than that.
<FromGitter> <drosehn> if it's any consolation, the first version of my `scanps` program was written in perl, and it worked fairly well for my 10-meg test files. I then put it in production, and when a postscript file larger than 200 meg (*meg*, not gig) arrived, the entire machine crashed. It had run out of memory. And swap space.
<FromGitter> <sdogruyol> @drosehn just what you'd expect from any code in production :)
<RX14> you really don't want to be using File.read on large files ever
j2k has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
bjz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
soveran has joined #crystal-lang
bjz has joined #crystal-lang
bjz has quit [Read error: Connection reset by peer]
bjz has joined #crystal-lang
bjz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
Philpax has joined #crystal-lang
Philpax has quit [Ping timeout: 246 seconds]
<crystal-gh> [crystal] ysbaddaden pushed 1 new commit to master: https://git.io/vXDBT
<crystal-gh> crystal/master e5deb09 Sam Eaton: Implement Hash notation for examples in docs (#3550)
<FromGitter> <jots_twitter> @drosehn : postscript brings back memories of sun workstations running NeWS (display postscript) good times, good times :-)
soveran has quit [Remote host closed the connection]
am_ has joined #crystal-lang
am_ has quit [Ping timeout: 250 seconds]
<FromGitter> <johnjansen> Oh boy the postscript club is in session now next will be the Linotype vs compugraphic discussion
<FromGitter> <drosehn> My experience with postscript started with the first NeXTstation (not the original NeXT Cube).
<FromGitter> <johnjansen> wow now some PasteUp is all we need
<FromGitter> <drosehn> Getting back to rewriting `wc` in crystal, I have something which works much faster than `wcg.cr` for larger files (over 100-meg), and which isn't dangerous to run for very large files (say, over 6-gig). However it's *slower* that `wcg.cr` for files under 1-meg, it does not get word-counts correct, and if it's given arbitrary binary files (such as a disk-image file) then it can get totally wrong answers.
<RX14> Papierkorb, i'm still making progress on the select with both channels and IO. I'm pretty sure it can be done now.
<Papierkorb> kk
<Papierkorb> drosehn, well, if you feed wc binary data, it won't come up with something useful either
<FromGitter> <drosehn> well, it sometimes gets the word-count correct, depending on the file. I'm pretty sure I understand why the word-count is often wrong, but given all the other problems I'm not too concerned about fixing that.
<FromGitter> <drosehn> it'll come up with a correct byte-count, and it comes up with a line-count that's probably correct. Mine won't even get the byte-count correct!
<FromGitter> <drosehn> Hmm, maybe it is running into a x'0', and treating that as end-of-file.
<FromGitter> <drosehn> nope.
<FromGitter> <drosehn> in any case, it's clear I need to know more about crystal before I'll have something that works faster & better than the `wcg.cr` attempt. I won't have the time for that. I have learned a number of things, so this mini-project has benefitted me even though it didn't help anyone else!
<FromGitter> <drosehn> oops. I mean that I won't have time for that anytime soon. Maybe next weekend, maybe not.
<Papierkorb> drosehn, this one beats wc for me in word count https://gist.github.com/Papierkorb/5ac54d2764d234e69bea244ae2d35111 gives incorrect results at some point though, no idea why, and don't want to investigate further. Line counts also not implemented at all.
<Papierkorb> You can leave out the bit hack part, though it actually helps quite a bit
<Papierkorb> Tried on a 100Meg file of lorem ipsum
<Papierkorb> An amazingly accurate (*cough*) benchmark shows that it's ~15x faster than wc