ec changed the topic of #elliottcable to: a 𝕯𝖊𝖓 𝖔𝖋 𝕯𝖊𝖙𝖊𝖗𝖒𝖎𝖓𝖊𝖉 𝕯𝖆𝖒𝖘𝖊𝖑𝖘 slash s͔̞u͕͙p͙͓e̜̺r̼̦i̼̜o̖̬r̙̙ c̝͉ụ̧͘ḷ̡͙ţ͓̀ || #ELLIOTTCABLE is not about ELLIOTTCABLE
Sgeo_ has quit [Read error: Connection reset by peer]
Sgeo has joined #elliottcable
<ec>
i wrote a thing, if anybody wants to proofread my technical writing, lol
<ec>
'specially if you don't Know Unicode Crap that well.
<ec>
sigh i love and miss programming
<ljharb>
ec: interesting. so this is for when you’re calling directly into Ocaml-compiled native modules?
<ec>
ehhhhhhm, close. I see where I went wrong with the word "native"
<ec>
there's no native component here; but there's code that was *written for* native platforms.
<ljharb>
but invoked in node? or, js compiled to run in Ocaña
<ec>
which you, the (ab)user, is compiling via BuckleScript to JavaScript instead. and thus need my little horror to adapt it.
<ljharb>
ocaml
<ljharb>
so is this lib something the compiler could inject?
<ec>
I wish lol
<ljharb>
but like ideally
<ec>
so now that I have it working, and can publish working libraries for Current Real-World BuckleScript stuff, as I need to,
<ec>
I'm definitely going to go complain to People
<ljharb>
presumably you could write a babel transform that could be applied to the js output tho
<ec>
but part of the problem is that this is a Can't-Make-Everyone-Happy situation
<ljharb>
so that nobody ever has to manually use your lib
<ec>
out of BuckleScript users, there's "people writing their JavaScript in OCaml", and then there's "people compiling their OCaml to JavaScript" — which sound similar, but aren't the same.
<ec>
the former group know, and expect, one string-semantic; the latter group another
<ec>
the real problem here, it boils down, is not BuckleScript, or JavaScript; it's OCaml. OCaml *doesn't have* a Unicode-handling story. All the UTF-8 handling stuff is very … ad-hoc, and ‘just what people do.’
<ec>
there actually *is* no String type (in the JavaScript, fully-featured, Unicode-aware sense) in OCaml; only a "char array" type that's *named* `string`.
<ec>
unfortunately, people expect to. y'know. use strings. and do string-y stuff with them. so, BuckleScript took a reasonable-if-annoying stance of "We're gonna leverage all of the JavaScript string-machinery, so most of the time, things function as you expect … and so code transpiles to clean, minimal, obvious operations"
<ec>
but, yeah, that totally fucks up Unicode-handling in all these ancient rickety OCaml libraries.
<ec>
in an ideal world it's not BuckleScript, or me, that comes up with a solution, but the *OCaml* community.
<ec>
I'm trying to find a venerable GitHub Issue about this
<ec>
but yeah *ideally* we'd collectively stop using, and maybe even eventually deprecate, the `string` type. (we've already started this in a different direction, for a different reason, with the new `bytes` type.) and have real, type-level encoding information and tooling ......
<ec>
which is exactly the painful transition Ruby made from 1.9 to 2.0, btw. this is a well-documented growing pain for language designers: turns out, you can't make a language without already knowing Literally Everything about encoding and human language and ughhhhhhhhhhhhhhhhhh; otherwise, you're just, just *gonna* have to rebuild everything from scratch after community input from people who Actually Know Encoding
<ec>
Thingies™
<ljharb>
i mean tho, how did the ocaml designers not know about this
<ec>
you mean, in the '80s, before Unicode existed? :P
<ljharb>
is ocaml that old?
<ec>
that's a somewhat facetious response, of course; OCaml, as opposed to the progenitor languages it extended, is younger than that … but also *not* so much, because Unicode also wasn't actually, well, universal, for a long time
<ec>
*but* that said it's not just a matter of knowing Unicode exists. It's more … 'how do we allow developers to ergonomically deal with the real-world landscape of encodings?'
<ec>
which is just a specific instantiation of the single, only Language Development Question that encompasses all language decisions: ‘How much do we hide from our user? How much do we abstract, how much power do we take away for their safety?’
<ec>
what it boils down to is people building *programming* languages are somewhat rarely *human*-language nerds; and tend to belong to the tribe of programmers borne of silicon valley: "eh, I can type "LOL", it's good enough"
<ec>
aaaaaaaaand then their languages grow and gain users that have to deal with real-world things like higher-plane glyphs, combining characters, legacy encodings or even outright malformed input, interoperability with systems that won't transit *well*-formed output … and those users get pissed, and kinda by definition-of-the-problem the language is now popular and established enough that those mistakes can't be unmade …
<ec>
aaaaaaaaaaaand now your popular tool is a part of that ecosystem-of-other-shitty-tools-making-encoding-horrible-for-everyone, doing its very darnedest to make everything worse for everybody. great!
<ec>
tl;dr I strongly respect Ruby for literally making the first large breaking-backwards-compatibility (1.0 to 2.0, after what, fifteen years? woah.) because the maintainers finally realised how important this was to The World As A Whole, lol
<ec>
ANYWAY re: ocaml specifically: this is fixable of course, but OCaml is a community of crotchety academics, prolly mostly white, prolly mostly male, not exactly brimming with SJW culture and wokeness … everyone seems to think "uhhh just install Camomile if you have to 'deal with' some unicode crap ... idk? worry about it when it breaks." is good enough
<jfhbrook>
lgpl huh?
<ec>
hm?
<jfhbrook>
idk in python I have to use byte strings and unicode strings
<jfhbrook>
and like ok you have to pull in a lgpl library (camomile) to get unicode strings, but now you have unicode strings and it's fine, right?
<jfhbrook>
though to be fair in your case
<jfhbrook>
probably every library is written to use bytestrings so you'd have to convert in and out all the time anyway
<ec>
nnnnnot quite — it's more "what's the interop story". Are all 'unicode strings' UTF-8 bytes in a byte-array? that should be something the language standardises (and, ideally, provides alternatives/escape-hatches to, as well), not something Some library authors Sometimes do.
<jfhbrook>
it's fair to say that standardization is useful
<jfhbrook>
when something's already a de facto standard is when you need it the least tho
<ec>
Daniel Bünzli had a thing on this that I'm trying to find, in the docs to one of his Unicode-handling modules
<ec>
ugh anyway I've already spent too long on this today, time to actually *use* this effort I put in, back in the place where I unearthed the bug, and get something shipped 🙄
<jfhbrook>
hah, I hear that
<jfhbrook>
work's been a little hectic lately
<jfhbrook>
I mean hectic is the wrong word - busy I guess, stressful
<ec>
oh but anyway you can see one of the fallout effects of that sort of agnosticism-based choice, right now
<jfhbrook>
but I've been learning emacs, that's fun!
<ec>
if BuckleScript were working off of a base that *inherently* differentiated, then it could sanely compile the two things two different ways.
<jfhbrook>
predictably malformed - that's good
<ec>
"the byte-string type" gets compiled to array-handling JavaScript, effort can be expended to maintain semantics for existing byte-array-manipulating-OCaml-code *and* produce idiomatic output; whereas "the user-input-string type" gets compiled with encoding/decoding machinery to massage it into JavaScript UCS-2 yadda yadda yadda.
<ec>
but with this design? from the *language* perspective, the two are indistinguishable. there's no way to satisfy both requirements.
<ec>
this exact thing is playing out with mutation — having mutable strings was causing serious problems for both the compiler and the community;
<ec>
sure, you can just say "hey this is a string, and we're not gonna mutate it, and you shouldn't either", and document that at the library-level, maybe mint a type,
<ec>
but that's just not the same.
<ec>
finally things snapped in favour of breaking backwards-compatibility (in a really well-thought-out way, btw, imo!)
<ec>
OCaml 4.05 introduced a new type, `bytes`, for mutable strings, just an alias to `string` … then 4.06 introduced an optional compiler-flag, `-safe-string`, to make `string` immutable, thus opting-in to breaking code that should have already switched from `string` to the explicit `bytes` type if they needed mutation ...
<ec>
then 4.07 swapped the default, leaving iirc `-unsafe-string` to make legacy code work, but defaulting to `string` being an immutable type … and finally 4.08 removed the flag, breaking code that wasn't fixed in the intervening years
<ec>
I might be off by one on all those numbers idk lmao
<ec>
but. I appreciated that careful approach. I think processes like that are a good candidate for a replacement for the effectively-defunct SemVer, may ye rest in peace
<ec>
anybody know if you can export/import typescript types? I still don't use typescript often enough to keep any of it in my head between forays ;_;
englishm has quit [Excess Flood]
englishm has joined #elliottcable
Rurik has joined #elliottcable
<ljharb>
ec: lol things that don't follow semver make me facepalm so hard
<ljharb>
ec: yes, you can import and export type space values
<ljharb>
ec: sadly, TS doesn't have `import type` like flow does, so you have no way of knowing lexically at the callsite
Rurik has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]