<nuck>
That should just drop all non-ASCII characters
<devyn>
no, because it's ASCII-8BIT, which means that post-0x7F characters are allowed, IIRC
<devyn>
but if it was read in in UTF-8, what that would do is drop any non-UTF-8 characters, I think
<devyn>
because what they're doing is #encode as ASCII-8BIT
<devyn>
not #force_encoding
<devyn>
it's... weird
<nuck>
The lead dev says "I think it results from something in the notes."
<nuck>
Apparently xmllint will fix it all for us though
<devyn>
the weird thing to me is that *after* #encode → ASCII-8BIT, they're doing #split('') and #select with a regex that matches multiple bytes, which makes no sense; they should have split('') before encoding if they wanted to match multiple bytes
<devyn>
and really they should have used #force_encoding
<nuck>
"To filter out invalid characters that MAL sometimes emits in XML, which nokogiri doesn't like."
<devyn>
I don't even think this works
<nuck>
He claims he was entirely shitfaced when he wrote this
<nuck>
As he put it, "I can't work with MAL's stuff sober"
<nuck>
And the scariest part is that MAL is the *best* site besides HB and our primary competitor AniList
<devyn>
yep, this totally doesn't work; it ends up stripping anything non-ASCII
<nuck>
The right answer is probably to just cram it into xmllint
<devyn>
ASCII-8BIT can contain any octet, so not really; any encoding is also ASCII-8BIT :p
<devyn>
so
<nuck>
Well yes
<devyn>
he should have forced encoding, run his regexes
<devyn>
and then forced back to UTF-8
<devyn>
that would have worked, I think
<nuck>
I might forward this onto him
<nuck>
See if it lets us avoid the pain of spawning xmllint
<devyn>
in fact
<nuck>
It turns out xmllint doesn't have a ruby lib
<devyn>
there's another problem I mentioned above; the use of split('')
<devyn>
how to do that better:
<devyn>
nuck: nah, this won't really work, never mind. if spawning xmllint is painful, write an FFI shim if you can
<nuck>
Yeah I was thinking of an FFI shim
<devyn>
Ruby has a good FFI library that's actually relatively painless to bind to C libs
<nuck>
He just sent me the broken XML
<nuck>
So I guess it's time to dig in
<devyn>
I remember when binding Ruby to C libs used to be horrifyingly painful haha
<nuck>
I don't! :D
<nuck>
I just got into Ruby like... 1 year ago ish?
<nuck>
Maybe?
<nuck>
I don't even remember
<nuck>
I think it was December 2012
<devyn>
well, you basically had to write a shared library of your own to act as the shim, and the .so would hook into Ruby internals and basically use Ruby's internal C API to expose everything
<devyn>
XML should probably never be generated with basic string-based templating (including PHP or ERB or EJS or whatever), that's why
<devyn>
lol
<purr>
lol
<devyn>
anyway this doesn't really look broken to me, glancing at it
<devyn>
I'll have to try to parse it lol
<nuck>
I'm thinking that too
<nuck>
But it's huge
<nuck>
So who knows
<devyn>
seems like 0 is used when NULL is really meant… a lot of 0000-00-00 dates around lol
<nuck>
This is probably close to the db
<devyn>
knowing PHP developers, it's probably literally just a DB query and a for loop
<nuck>
in some implementations, a date column with null is actually 0000-00-00 iirc
<nuck>
Oh that's absolutely what it is
<nuck>
Like, you're not just talking PHP, but PHP from about 2008 at its most recent
<devyn>
it's so unfortunate that PHP ever took off
<devyn>
nuck: so, uh, Nokogiri parsed that file just fine
<nuck>
Well then.
<nuck>
I guess my regression test is pointless
<devyn>
get some data from your colleague that actually fails to parse lol
<purr>
lol
<devyn>
because... this looks fine
<nuck>
Yeha I guess I'll have to lol
<nuck>
He's out right now, I think on a train
<nuck>
And it's india, trains don't have wifi
<devyn>
though, I guarantee that it will totally break if a string contains ]]>, because they're using CDATA
<nuck>
I should go test if I can break it with that
<nuck>
brb adding that as a note on MAL
<devyn>
]]> alone might not break it; try doing ]]><
<nuck>
I mostly wanna see if they're smart and escape it or not
<nuck>
They might
<nuck>
They thought to use CDATA at all, which tells me *something*
<devyn>
it's impossible to escape in CDATA; the only thing you can do is change it to something else
<devyn>
lol
<devyn>
CDATA doesn't have any escape sequences, which is why the terminator is relatively long
<nuck>
Can't just do ]]>< ?
<devyn>
you could but it would come out that way; the XML parser wouldn't turn the entities into ><
<devyn>
that's sorta the point of CDATA; no parsing at all until ]]> is found
<nuck>
Probably what they would do I'd guess
<joelteon>
you have to split ]]> across two CDATA sections right
<devyn>
but you don't always want to escape > and <
<devyn>
if they always escaped > and < with > and < it would always turn out that way
<devyn>
which would be … completely wrong
<nuck>
This is PHP
<nuck>
"completely wrong" is par for the course
<devyn>
joelteon: <![CDATA[hello]]>]]><![CDATA[world]]> would be pretty much the only way to do it
<joelteon>
huehue
<devyn>
nuck: I would say it's far more likely that they wouldn't have even considered it, given how common SQLi vulnerabilities are in PHP scripts, and that's exactly the same kind of flaw
<devyn>
even if it doesn't, if they're not turning & into & as well, people could put whatever entities they want in
<nuck>
They've done a shitty job
<devyn>
ok
<devyn>
not surprising
<devyn>
:p
<nuck>
But it's at least a shitty job that produces valid XML
<nuck>
Which makes me wonder wtf we're doing all this for
<devyn>
maybe it used to be worse
<nuck>
hahahahahahahahahaha
<devyn>
maybe they actually fixed it
<devyn>
lol
<nuck>
That would imply MAL actually has programmers
<nuck>
They don't
<devyn>
huh
<devyn>
lol
<nuck>
Literally, they're sitting back and collecting money
<nuck>
All the people working on MAL left, it's now run by some other company and it's got zero development
<devyn>
in any case, I bet you that the problem is if you get something with EUC-JP or Shift-JIS instead of UTF-8
<devyn>
the XML isn't actually malformed aside from the encoding of the data being bad, but <![CDATA[]]> just contains octets anyway; no particular encoding is required IIRC
<nuck>
I should paste in some japanese characters and see whath appens
<nuck>
Whether it's mangled or passed through safely
<nuck>
I know that's true
<nuck>
I've seen images embedded in CDATA
<devyn>
yeah
<devyn>
so really it's Nokogiri's fault for parsing CDATA as UTF-8 when really it should be treating it as ASCII-8BIT
<devyn>
haha
<nuck>
I hate that "ASCII-8BIT" means "bytestring"
<nuck>
It's silly
<devyn>
anyway nuck, throw in some actually invalid UTF-8, not just any Japanese chars
<nuck>
I doubt I can do that in a browser and I'm too lazy to curl it into place lol
<purr>
lol
<joelteon>
what's wrong with bytestrings
<nuck>
Nothing
<nuck>
I just wish people didn't label them as "ASCII"
<nuck>
When they're more accurately encodingless
<devyn>
joelteon: it's that Ruby labels "no encoding" as "ASCII-8BIT"
<devyn>
pretty weird
<devyn>
lol
<joelteon>
oh ok
<nuck>
The same thing is in GLib iirc
<devyn>
nuck: there is also, for the record, ASCII-7BIT, which is properly ASCII
<nuck>
lol
<nuck>
But nobody uses 7 bit things anymore
<nuck>
Since all the machines this runs on are 8bit
<devyn>
no, it's not that the bytes are interpreted in 7 bit groups; that would be... wayy too much work
<devyn>
it just means that the 8th bit can only be 0
<nuck>
Just zeroed
<nuck>
mmmm
<joelteon>
anyway, nix
<joelteon>
is awesome
<devyn>
any char that has bit 8 set will be considered invalid according to the encoding
<joelteon>
free distributed builds
<devyn>
ASCII is a 7bit-on-8bit encoding which is exactly what this describes, anyway
<devyn>
it goes from 0 to 127; 128..255 is undefined
<joelteon>
who needs the other 128 though
<nuck>
Russia
<devyn>
well, it's actually excellent, because it means other encodings could hijack that space and still maintain ASCII backward-compatibility
<devyn>
including beloved UTF-8 :D
<nuck>
You wanna se horrifying in a different way?
<devyn>
nuck: hah, I've done things like that; it's not that bad though if you know the HTML output is always going to be predictable because the site has no programmers :p
<nuck>
Exactly
<nuck>
We're not sure why it's in models/
<nuck>
The guy who put it there according to git blame has no idea either
<devyn>
it makes sense to me; it is technically a "model"
<devyn>
models are really any data source
<nuck>
Kind of, but we don't use it as a data source for any rendering, it's only used by a side thing
<nuck>
Since all MAL scrpaing is Sidekiq'd
<nuck>
I'm gonna be cleaning that sucker up today and replacing it with AnimeNewsNetwork
<devyn>
oh but how would that be malformed; that looks generated by an XML library instead of a template
<nuck>
No clue, but it doesn't use CDATA anywhere
<devyn>
an XML lib should properly escape things though... unless it's still pumping the <XX> shit out with no escaping
<devyn>
that would suck a lot
<nuck>
No clue
<nuck>
But, I think we've figured out the answer to our woes: just stop doing anything
<nuck>
Don't look a gift horse in the mouth, as they say
<prophile>
don't trust them
<prophile>
they like bob marley too much
* devyn
puffs
<devyn>
ELLIOTTCABLE: so I'm thinking now that Paws could actually be performant and none of the aforementioned blockers to parallelism really matter too much as long as we can let the reactors do more than one thing in a tick
prophile has quit [Quit: The Game]
<devyn>
Paws implementations can't really do anything about bad code, but good code with larger-ticked reactors should run just fine and in parallel too
<devyn>
and I'm thinking ultimately this comes down to allowing native ops to decide whether they want to produce a staging to be completed immediately by the reactor without even touching the queue (unless a mask can't be acquired)
<devyn>
if they do return to the reactor with a staging to be completed immediately (and there will be a distinction made)
<devyn>
then the reactor will just do that; it won't go back to the queue
<devyn>
a native op can, of course, produce a staging to be completed immediately and also add things to the queue
<devyn>
that's fine too
<devyn>
ELLIOTTCABLE: nvm, saw your concerns in the google drive doc, still thinking
<devyn>
ELLIOTTCABLE: basically my idea would be to change Paws to introduce, essentially, synchronicity-by-alien, but I think that kind of goes against the fundamental philosophy of Paws
<devyn>
ELLIOTTCABLE: so then… the question is, is the fundamental philosophy of Paws, having everything always be asynchronous, flawed?
<devyn>
ELLIOTTCABLE: and honestly, I have a feeling it might be; time-sharing i.e. traditional preemptive multitasking just seems like a more efficient idea
<devyn>
ELLIOTTCABLE: I think we should go back to what you think the original benefits of Paws would be. what is there to be gained from this programming model?
<devyn>
ELLIOTTCABLE: in any case, I think as you said, having the default receiver execute synchronously is a pretty good idea
<ELLIOTTCABLE>
if I introduce synchronicity, it will be available to libside, too.
<ELLIOTTCABLE>
never forget that Paws is abstractive. There are too many truly fundamental operations that will be implemented libside; and if the conclusion is come to that "fundamental operations simply must be executed unordered" (or rather, synchronously, so "simply ordered"), then that statement equally much applies to abstractive fundamental operations as to alien
<ELLIOTTCABLE>
fundamental operations
<ELLIOTTCABLE>
there's a lot on my mind about this, because I also have thoughts about revamping it to be **more** asynchronous, in some ways. Ones that notably don't restrict us from having fully-synchronous paths of execution.
<devyn>
at the moment, the best course of action I can see is to introduce synchronicity by allowing aliens to produce combinations to be reacted immediately, jumping the queue. don't skip responsibility checks; if responsibility checks fail then push to the queue
<devyn>
aliens as well as core ops, I mean
<devyn>
I would like to hear about the other idea though
<devyn>
in fact I don't think that it's dangerous to jump the queue, because responsibility should take care of any situations in which it would be dangerous, I think
<devyn>
but of course this also means a `yield` alien would inevitably be added, which makes it basically like any other cooperative multitasking system
<devyn>
(yield being, of course, stage caller on queue instead of immediately)
sharkbot has quit [Remote host closed the connection]
sharkbot has joined #elliottcable
<alexgordon>
hi ELLIOTTCABLE
yorick has joined #elliottcable
prophile has joined #elliottcable
<alexgordon>
lol I just googled "can't be arsed" and elliott's face came up
<purr>
lol
TheMathNinja has joined #elliottcable
eligrey has joined #elliottcable
oldskirt_ has joined #elliottcable
oldskirt_ has quit [Changing host]
oldskirt_ has joined #elliottcable
oldskirt has quit [Ping timeout: 240 seconds]
prophile has quit [Quit: The Game]
<joelteon>
can someone who's good at networking tell me why video streams often stop streaming video
<cloudhead>
heh
<cloudhead>
well it's not easy to resume in case of a connection problem
<cloudhead>
and if there is not enough bandwidth left, they might have to drop a client
<cloudhead>
happens with webpages too it's just that it's more noticeable with videos
<cloudhead>
for ex if a packet is lost, all subsequent packets will have to wait for the lost one, the client starts buffering new packets because it's waiting for the old ones, after a while it tells the server it can't handle more packets
<cloudhead>
the server drops the client
<joelteon>
oh
<cloudhead>
this wouldn't be a problem with say, UDP
<cloudhead>
but RTMP which is used for streaming is TCP based
<cloudhead>
so it can't really "skip" a frame
<cloudhead>
it has to wait
<joelteon>
i wish it was UDP
oldskirt_ has quit [Ping timeout: 240 seconds]
<cloudhead>
do your videos stop often?
<joelteon>
yeah
<cloudhead>
hm
<joelteon>
i'm watching the BBC stream of argentina vs bosnia, i get about 5 seconds of gameplay at a time
<joelteon>
then it hangs for 10 seconds
<cloudhead>
oh wow
<devyn>
should really use something like μTP
<glowcoil>
hi
<purr>
glowcoil: hi!
<katlogic>
cloudhead: easier said than done. it is tough to encode h264 stream to have it withstand gop packet skips
<cloudhead>
oh I wasn't suggesting UDP would be better
<cloudhead>
just that front of the line blocking wouldn't happen
<cloudhead>
there are good reasons to use TCP
<katlogic>
things like rtfmp/webrtc p2p offloading seem to be like dead end too
<katlogic>
because the situation is um like, 3/4 of folks streaming from crappy comcast/verizon
<cloudhead>
yea
<cloudhead>
I hope webrtc picks up though
<katlogic>
and naturally those two have congested aggregated last mile, so the p2p part makes it only worse
<katlogic>
cloudhead: there already are webrtc swarm implementations