xkapastel has quit [Quit: Connection closed for inactivity]
Phoenixwater[m] has left #picolisp [#picolisp]
<
Regenaxer>
I don't want to implement those huge unicode tables for 'uppc' and 'lowc' again for pil21
<
Regenaxer>
Now I'm very surprised to see that there seems no portable way (i.e. a C function)
<
Regenaxer>
At least it seems overkill
<
tankf33der>
i think you should generate tables once in different file(s) and just use it
<
Regenaxer>
The problem is that I don't understand this case conversion
<
tankf33der>
but this is already done in src64, right?
<
Regenaxer>
for example, there is now support for uppercase ß (german s) in unicode
<
Regenaxer>
yes, but not correct now
<
Regenaxer>
needs to update for upper case ß
<
Regenaxer>
How to do it?
<
Regenaxer>
I don't want to maintain such stuff too
<
Regenaxer>
Really surprising why there is no standard support
<
Regenaxer>
C only has toupper/tolower for ascii or wide chars
<
Regenaxer>
not for utf8
<
Regenaxer>
I don't even find a clear description of the algorithm how to do
*correct* case conversion in UTF-8
<
Regenaxer>
Unicode consortium
<
Regenaxer>
all very confusing
<
Regenaxer>
What do you think about the above glib?
<
Regenaxer>
portable?
<
Regenaxer>
overkill?
<
Regenaxer>
It supports tons of functions
<
Regenaxer>
I have to link them all into pil just to get uppc and lowc
<
tankf33der>
i belive you should not link to glib
<
tankf33der>
or musl
<
Regenaxer>
What I really want is up-to-date tables plus a clear description how to handle them
<
tankf33der>
let me check myrlang implementation
<
Regenaxer>
myrlang?
<
tankf33der>
language no one cares, as usual
<
Regenaxer>
Myrddin?
<
tankf33der>
i seen somewhere tables and thought picolisp have the same
<
Regenaxer>
I took them from some free Java project
<
Regenaxer>
25 years ago or so
<
Regenaxer>
"Kaffee" project
<
Regenaxer>
But I never understood those tables
<
Regenaxer>
GNU Kaffe Project
<
Regenaxer>
(see comment in src/sym.c)
<
Regenaxer>
I could easily convert them to pil21 syntax
<
Regenaxer>
no problem
<
Regenaxer>
But how to handle new things*
<
Regenaxer>
I could even just copy/paste from pico/src/sym.c to pil21/src/lib.c
<
Regenaxer>
But I don't like this
<
Regenaxer>
Having to roll everything yourself for such a standard thing like utf8
<
Regenaxer>
How do we know these tables and algos are correct, or better than src/sym.c?
<
Regenaxer>
"plan 9's runetype.c" is that even still supported?
<
tankf33der>
problem only in conv up-low ?
<
tankf33der>
because current utf8 is simple, tested by me
<
tankf33der>
because current utf8 is correct, tested by me
<
Regenaxer>
yes, only for uppc and lowc
<
Regenaxer>
All other utf8 is already in pil21
<
tankf33der>
solution create
*full* test vector by python and test.
<
Regenaxer>
General testing is perhaps not needed
<
Regenaxer>
only
*new* characters in unicode
<
Regenaxer>
like upper-case ß
<
Regenaxer>
Is in unicode recently
<
Regenaxer>
and perhaps other characters
<
Regenaxer>
Unicode is changing all the time
<
Regenaxer>
Ideal would be some library published by the unicode consortium
<
Regenaxer>
some
*official* code
<
Regenaxer>
Not everybody rolling his own
<
tankf33der>
not portable, even libffi maybe problem
<
Regenaxer>
libffi too?
<
Regenaxer>
I thought it looks very portable
<
tankf33der>
maybe.
<
tankf33der>
so you already have dependeci
<
tankf33der>
i dont trust glib, who will port glib to riscv? :)
<
tankf33der>
linux distro maintainers?
<
Regenaxer>
clang maintainers
<
Regenaxer>
What we really need is support in clang
<
Regenaxer>
pil21 should use only clang for system calls
<
tankf33der>
wow, some utf8 maybe invalid sequences
<
Regenaxer>
Ah, yes, of course
<
Regenaxer>
utf8 has a special byte format
<
Regenaxer>
So almost any random byte sequence is illegal
<
tankf33der>
damn, unicode 13 is coming.
<
Regenaxer>
Perhaps ask in some clang forum?
<
Regenaxer>
I think it should be the duty of clang to maintain such stuff
<
tankf33der>
and python and ruby and dlang and so on
<
tankf33der>
also we have this one
<
Regenaxer>
We need it across Linux, BSD, Mac, Android and iOS
<
Regenaxer>
yeah, wide.l
<
Regenaxer>
forgot that one
<
tankf33der>
also checking all links from this:
<
Regenaxer>
Tons of docs, yes, but which is the "right" one? ;)
<
tankf33der>
no one knows until you started do something
<
tankf33der>
hunting for simple pages like this:
<
Regenaxer>
very good explanations
<
Regenaxer>
the second link has "One-to-many: (ß → SS )"
<
Regenaxer>
So this is the case where we have a new (single) char now
<
Regenaxer>
And that link also shows how complicated it all is. So
*not* everybody should have to roll his own
<
Regenaxer>
There must be some reference implementation somewhere ...
_whitelogger has joined #picolisp
<
tankf33der>
found how musl do case things
<
Regenaxer>
Looks quite short
<
Regenaxer>
What does musl do with "ß"?
<
tankf33der>
unknown yet.
<
Regenaxer>
(same plase as EastAsianWidth.txt)
<
Regenaxer>
So I will study this. Perhaps we'll do it similar to the wide char stuff
<
tankf33der>
sounds good
<
Regenaxer>
yeah, at least easy
<
Regenaxer>
just lookup
<
tankf33der>
analysis about my dlang bugint multiplication
<
Regenaxer>
ah, yeah
<
Regenaxer>
buffer size bug
<
Regenaxer>
hehe "got undetected for so long"!
<
Regenaxer>
The CaseFolding table has it: 1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
<
Regenaxer>
But the other direction maps to "SS": 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
<
Regenaxer>
Problem is that I don't know from the table which one
*is* already upper or lower
<
Regenaxer>
It just maps to the other case
<
Regenaxer>
I need to split that into two tables it seems
<
tankf33der>
like musl, right? this function also ignores a lot of ranges
<
Regenaxer>
Not sure
<
tankf33der>
static wchar_t __towcase(wchar_t wc, int lower)
<
Regenaxer>
wchar_t is not helpful as far as I understand
<
Regenaxer>
And I don't understand the CaseFolding table
<
Regenaxer>
The left column contains lowercase and some uppercase
<
Regenaxer>
How to use it?
<
Regenaxer>
no, opposite: The left column contains uppercase but also
*some* lowercase
<
Regenaxer>
ok, so I can use the text on the right side to filter! :)
<
Regenaxer>
"# LATIN SMALL LETTER"
<
Regenaxer>
If the tables are too big, I put it all into a shared library, loaded only when really needed
<
Regenaxer>
i.e. when 'lowc' or 'uppc' is called
<
Regenaxer>
Must have a script somewhere, but I don't find it
<
tankf33der>
generator.l
<
Regenaxer>
indeed! You are great!!
<
tankf33der>
we did it together to update to latest version :)
<
Regenaxer>
So I found it here too, in opt/genWide.l
<
Regenaxer>
did not know what to search for
_whitelogger has joined #picolisp
DerGuteMoritz has quit [Ping timeout: 268 seconds]
DerGuteMoritz has joined #picolisp