<beneroth>
Regenaxer, what is the name/where do I find the bulk insert stuff?
<beneroth>
single user db, bulk insert
<Regenaxer>
ah
<Regenaxer>
hmm :)
<Regenaxer>
'create' ?
<beneroth>
!!
<Regenaxer>
yay
<beneroth>
yeees
<beneroth>
I mixed it up with blk
<Regenaxer>
:)
<beneroth>
but that is for reading
<Regenaxer>
yes, different thing
<beneroth>
a but unlucky naming, I think ;-)
<Regenaxer>
'blk'?
<beneroth>
Reads raw object data from the cnt'th block in the file open on descriptor fd.
<beneroth>
raw pilDB reads from a foreign pilDB file
<Regenaxer>
yes, block(s)
<beneroth>
I think "blk" seems much closer to "bulk insert" than "create"
<beneroth>
of course "blk" could also stand for bulk read
<Regenaxer>
true
<beneroth>
but 'blk' is neither really bulk read
<Regenaxer>
T, not at all
<beneroth>
How did you arrive at the name for 'blk' ? not via "bulk" ?
<Regenaxer>
The internal vars are all like BlkXxxx
<beneroth>
I don't argue for change
<Regenaxer>
yes, no problem
<beneroth>
BlkXxxx ?
<Regenaxer>
cnt'th block
<Regenaxer>
(BlkIndex) etc
<Regenaxer>
in src64/db.l
<Regenaxer>
(BlkLink)
<beneroth>
aah
<Regenaxer>
it is all about "Blk"s :)
<beneroth>
so the etymology of 'blk' is indeed 'block', not "bulk"
<beneroth>
the database block
<Regenaxer>
yep, as the ref says, cnt'th block
<beneroth>
yeah, obviously xD
<beneroth>
ok
<Regenaxer>
it is an advanced function anyway
<Regenaxer>
for the advanced user I mean
<beneroth>
T
<beneroth>
I rest my point, create and blk names are a bit unlucky
<beneroth>
but not wrong
<Regenaxer>
yes, not so many short names left
<beneroth>
well I recommend you to take vowels into your alphabet
<beneroth>
:P
<beneroth>
I see how and why you got your naming habits
<Regenaxer>
ok :)
<beneroth>
but for my projects so far I strive for more descriptive names. for maintainability, meaning the developer who has to read it in 2-3 years
<beneroth>
(even if it is me)
<beneroth>
(nut not exclusively, if my plans work out)
<beneroth>
slightly more overhead, yes, but in most cases negligible I believe
<beneroth>
though I fully understand your principle, I think you do it as a habit, so that you never have to adapt your habits, independent of doing an embedded project or a system on a big powerful machine
<Regenaxer>
It is not about efficiency, but readability
<beneroth>
:)
<Regenaxer>
I hate verbose code
<beneroth>
T
<Regenaxer>
Sorry, I'm in a tel conf
<Regenaxer>
can't concentrate
<beneroth>
np
<beneroth>
I agree with you in spirit, but I'm slightly a bit less pure on the dense <-> verbose scale as you :)
<Regenaxer>
np :)
<beneroth>
Regenaxer, when you are back I would have a little question about (create)
<Seteeri>
Would it be idiomatic to set the scl for the reader (and entire env explicitly) like so: (load "-scl 16" "file.l")
<beneroth>
hi Seteeri
<beneroth>
I think there is *Sctl
<beneroth>
*Scl
<beneroth>
and I would do the load more like (scl 16 (load "file.l"))
<beneroth>
then *Scl is 16 during load, and will be reset to its previous value afterwards
<beneroth>
so basically the same as (let *Scl 16 (load "file.l"))
<beneroth>
you can also use (scl 16) to set *Scl globally
<beneroth>
same as (setq *Scl 16), I guess
<beneroth>
so (scl) is nothing else than syntax sugar to generalize interaction with *Scl
<beneroth>
Seteeri, (load "-scl 16") will try to find a file named "-scl 16"
<beneroth>
(load) is not starting up a separate pil process or the like :-)
<beneroth>
setting *Scl globally (for the whole lifetime of your picolisp process and its children) is absolutely idiomatic :-)
<Regenaxer>
yes, setting in load may be fine
xkapastel has joined #picolisp
<Regenaxer>
Usually I set it in the main file
<Seteeri>
@beneroth (load "lib.l" "-* 1 2 3") outputs 6, but it should work regardless of the order?
<Seteeri>
cool
<Regenaxer>
well, it just returns 6, not printing
<beneroth>
wuuuuut
<Regenaxer>
the repl prints it
mtsd has quit [Quit: mtsd]
<Regenaxer>
(load "lib.l" "-* 1 2 3") I mean
<beneroth>
I'm sorry Seteeri, I was not aware that load can interpret -arguments
<Seteeri>
Based on the docs, it says if the argument is a sym with 1st char a hyphen then it is passed as an exec list
<Regenaxer>
it is as on the command line
<Seteeri>
no worries
* beneroth
is flabbergasted
<beneroth>
ok, didn't know, I believed thats only true for invocation of the picolisp binary
<Regenaxer>
I call -'symbols ...' in some cases
<Regenaxer>
for debugging
<Seteeri>
I see
<Seteeri>
it's a lot more convenient that having an entire file to execute a simple expression :)
<Regenaxer>
beneroth, right, but the command line is just 'load'ed
<beneroth>
ah, because the arguments are still evaluated in the context of the loaded file, Regenaxer ?
<beneroth>
so (file) would be the loaded file :)
<beneroth>
Regenaxer, I see. makes sense.
<beneroth>
Regenaxer, I'm just shocked to learn something new about a piece of picolisp I believed to know :O
<Regenaxer>
hehe :)
<beneroth>
ok, I go update my *Humility
<Seteeri>
I have main.l which calls 'load recursively, so I figure instead of passing 'load to scale everytime, just do it once at the beginning
<beneroth>
Regenaxer, if you got time, question about (create).. it prefers to have single user mode
<beneroth>
Regenaxer, what about doing a (dbSync) (commit 'upd) around it, what is the problem besides the problems natural with a long-running transaction?
<Regenaxer>
Yes, but I'm with BTG the next hours it seems
<beneroth>
ok
<beneroth>
we can also discuss later :)
<Regenaxer>
yes, thanks
<beneroth>
or I look at the implementation and waste less of your time :)
<beneroth>
good
<Regenaxer>
Sorry :) Need to concentrate here
<beneroth>
it is absolutely alright
peterhil` has quit [Read error: Connection reset by peer]
peterhil has joined #picolisp
<Regenaxer>
A pause at BTG at the moment ... 'create' does repeated commits for the big accumulated sets of inserts, so it cannot be synced
<Regenaxer>
Even if, other processes will be blocked for long times
<beneroth>
I see
<beneroth>
so after initial imports, I would plan the following steps:
<Regenaxer>
This is also why the name "create" fits
<beneroth>
T
<beneroth>
1) preprocess the data to be imported, maybe even by turning it into a plio file (no interaction with live application required)
<Regenaxer>
It is to build a new DB
<Regenaxer>
ok
<beneroth>
2) put the live application into maintenance mode, start it in single user, import the prepared file using (create), go back to normal operation
<beneroth>
that sounds feasible?
<Regenaxer>
yes
<Regenaxer>
I did this with the OSM DB
<beneroth>
ah
<beneroth>
yeah similiar use case
<beneroth>
I have big amounts of data (multiple 10k records), coming from another database (non-pil)
<Regenaxer>
after that I never used it again ;)
<Regenaxer>
ok, though mul 10k is not big, so create may be overkill
<beneroth>
I like to make periodically a new export from the other system (= complete data, not differential), and import it with the steps above into my pil application
<beneroth>
T
<beneroth>
I'm aware, I worked with huge databases, but not with pil :)
<beneroth>
and I like to optimize
<beneroth>
so plan is to have some standard functionality/library to do the steps above
<Regenaxer>
If the db fits mostly into memory, a naive import may be the same speed
<Regenaxer>
or faster
<beneroth>
ok
<Regenaxer>
as no temp files are managed
<beneroth>
I see
<beneroth>
I agree with naming it (create) :D
<Regenaxer>
:)
aw- has joined #picolisp
<beneroth>
Regenaxer, in case you have time
<beneroth>
(class +Foo +Entity)
<beneroth>
(trace 'put> '+Foo)
<beneroth>
(rel a (+Key +Number))
<beneroth>
(pool "create.db")
<beneroth>
(rel b (+String))
<beneroth>
(trace 'isa)
<beneroth>
(let "XS" '((1 "one") (2 "two"))
<beneroth>
(create '(+Foo) 'a '(b)
<beneroth>
(and "XS" (pop '"XS")) ) )
<beneroth>
: (show '{3})
<beneroth>
{3} (NIL (1 NIL) (2 NIL))
<beneroth>
-> {3}
<beneroth>
Entity counter is correct, but index tree misses the external symbols
<beneroth>
I don't see the mistake
<Regenaxer>
Will take a look soon
<beneroth>
I can wait, your other stuff takes priority
<beneroth>
thank you
<beneroth>
no hurry :)
<Regenaxer>
perfect
<Regenaxer>
ok
<Regenaxer>
I think you need 'dbs'
<Regenaxer>
'create' does (new (meta Typ 'Dbf 1) Typ))
<Regenaxer>
So it assumes it is not a single-file DB
<beneroth>
ha, I thought about that and tried so
<beneroth>
let me try again, a moment
<Regenaxer>
yeah
<Regenaxer>
other functions do (or (meta Typ 'Dbf 1) 1)
<Regenaxer>
defaulting to 1
<Regenaxer>
But 'create' is usful only in big DBs
<beneroth>
T
<Regenaxer>
useful
<beneroth>
yeah (dbs) solved it
<Regenaxer>
cool! :)
<beneroth>
I tested it before, but apparently I didn't do that properly
<beneroth>
the obj is created using (new (meta Typ 'Dbf 1) Typ), but the index update only uses (meta Typ Var 'dbf)
<beneroth>
many thanks Regenaxer
<beneroth>
should have found it myself
<Regenaxer>
no problem
<beneroth>
maybe make the implementation consequential ?
<Regenaxer>
Good that I looked at it again. Almost forgot we havs such a powerful function
<beneroth>
hehe
<beneroth>
this will become very handy for my work, eventually
<beneroth>
I'm gradually taking on bigger and bigger projects :)
<Regenaxer>
Great!
<beneroth>
I suggest the reference help entry of (create) should at least have a note that (dbs) is required
xkapastel has quit [Quit: Connection closed for inactivity]
<beneroth>
it absolutely makes sense etc., but should be mentioned, for completeness
<Regenaxer>
right
<Regenaxer>
the ref
<beneroth>
the implementation of (create) is good as it is. not so consequential (requiring dbs for indices, but not objects), but I think this is alright
<beneroth>
can always use index rebuild, and at least nothing is lost this way
<Regenaxer>
Why rebuild here?
<Regenaxer>
when adding dbs later?
<beneroth>
nevermind
<beneroth>
you are absolutely right
<beneroth>
I cannot read code
<Regenaxer>
:)
* beneroth
blames lack of sleep
<beneroth>
I thought the object creation in (create) would not require (dbs)
<beneroth>
well no
<beneroth>
it doesn't
<beneroth>
or its because Dbf is set on default?
<Regenaxer>
The idea is that the db model is optimized as far as possible, then create is used
* beneroth
talks about (new (meta Typ 'Dbf 1) Typ)
<beneroth>
ah
<Regenaxer>
by default NIL iirc
<beneroth>
or is it because (new) defaults
<beneroth>
aye
<beneroth>
you are right
<beneroth>
T
<beneroth>
all perfect, as usual
<Regenaxer>
ok :)
<beneroth>
picolisp problems are nearly always just stupid users :)
<beneroth>
so here too :)
<Regenaxer>
not stupid
<Regenaxer>
it is all very terse
<beneroth>
T
<beneroth>
but still :)
<beneroth>
picolisp stupid is still pretty clever :P
<Regenaxer>
exactly
<Regenaxer>
Not for the faint at heart ;)
<beneroth>
thank you for your time & patience :)
<beneroth>
aye, and that is how we like it!
<Regenaxer>
Always welcome!
<beneroth>
programmers who prefer to be tampered can stay with java/c#/python
<Regenaxer>
T :)
<beneroth>
s/tampered/pampered
Regenaxer has left #picolisp [#picolisp]
Regenaxer has joined #picolisp
<Regenaxer>
oops
<beneroth>
welcome back :)
<Regenaxer>
Ctrl-D again
<beneroth>
maybe you should catch it and ask for confirmation
<beneroth>
if possible
<Regenaxer>
true
<beneroth>
might also turn bad sometimes
<Regenaxer>
It uses whe line editor in @lib/led.l
<beneroth>
ask for confirmation, and if not cancelled, proceed after 15 seconds
<Regenaxer>
iirc it is just a key definition in line edit
<beneroth>
so not equal to the usual Ctrl+D logout shortcut
<beneroth>
for shells etc
<Regenaxer>
Just same meaning
<Regenaxer>
line 384 (fkey "^D" (prinl) (bye))
<beneroth>
two more questions about (create)
<beneroth>
so it can be used without a key 'sym, right ?
<beneroth>
but when used with a key, 'prg should return the records sorted by 'sym ?
<beneroth>
why the sorted requirement?
<Regenaxer>
yes, without key is ok
<Regenaxer>
It is all about sorted inserts
<Regenaxer>
so that disk cache use is optimized
<beneroth>
ok, so not a requirement by (create), but hm.. meaningful
<beneroth>
yeah I understand how that makes sense :)
<Regenaxer>
I don't remember why the first key is important
<beneroth>
though it only makes sense for traditional "primary key" data
<Regenaxer>
not only
<beneroth>
I mean, in picolisp there is no preference from one +Key rel over another +Key rel on the same one
<Regenaxer>
yes
<Regenaxer>
and +Ref's are fine here too
<Regenaxer>
and other indexes
<beneroth>
of course, if this +Key is mainly queried, than it makes sense to have them in an order on disk where nearby entries are together
<beneroth>
which certainly will be the case in most use cases for (create), as that would be data previously exported from other databases
<Regenaxer>
No, it is not the objects. Only the b-trees matter
<Regenaxer>
The trees are the bottleneck
<Regenaxer>
if the b-tree nodes are accessed at random, it clobbers all disk cache
<Regenaxer>
and starts thrashing
<beneroth>
ok, thought obj block order on disk could also be better or worse for disk cache
<beneroth>
ah makes sense
<Regenaxer>
Doesnt matter here
<beneroth>
and (create) is writting the tree all over the place, and if its in order, than the tree blocks will be less fragmented
<Regenaxer>
the objects are written sequentially when created
<beneroth>
this is the point, right?
<Regenaxer>
I would not say fragmented
<beneroth>
I mean in "filesystem fragmentation"
<Regenaxer>
No, file access is ok
<Regenaxer>
it is lssek
<Regenaxer>
it is really the random tree nodes
<beneroth>
ok
<Regenaxer>
you get *all* nodes in memory
<Regenaxer>
if you access random keys
<Regenaxer>
So actually *not* disk cache
<Regenaxer>
it is heap space
<Regenaxer>
the cached node symbols
<beneroth>
so this is about when the 'sym index does not fit into memory
<Regenaxer>
yep
<Regenaxer>
any index
<Regenaxer>
also the ones from lst
<beneroth>
so the disadvantage of unsorted 'sym would not be as bad when the 'sym index fits in memory (while 'obj don't, I mean that could be plausible)
<beneroth>
yeah, you just take the biggest index for 'sym
<beneroth>
because that is the most affected
<beneroth>
which is more true for +Key than +Ref
<Regenaxer>
I dont remember well
<Regenaxer>
it concerns all indexes
<beneroth>
well, maybe not so true for +String indices, as the index as multiple entries per obj in many cases
<Regenaxer>
and they are imported one at a time
<beneroth>
aah
<beneroth>
you would do multiple (create) runs, one per indexed property
<beneroth>
yeah makes sense
<Regenaxer>
multiple are no problem either
<beneroth>
after the first using the update mechanism instead of create
<beneroth>
new objs
<beneroth>
still using (create) of course
<Regenaxer>
No, only one (create) run
<Regenaxer>
it processes each index after the other
<Regenaxer>
I think several runs are not intended
<beneroth>
but 'sym is just one of the indexed properties
<Regenaxer>
yes
<Regenaxer>
and lst are the others
<beneroth>
yeah
<beneroth>
what if one of the others is as big/bad as the 'sym one?
<Regenaxer>
for some reason sym is special
<beneroth>
only 'sym would be sorted
<beneroth>
e.g. having two +Key relations on a single entity
<beneroth>
two unique ids, but different numbers, because they come from two different third party systems
<Regenaxer>
I think the reason is only to avoid sorting already sorted data
<Regenaxer>
as you say, after an export
<Regenaxer>
otherwise sym and lst are equivalent
<beneroth>
that makes sense, but is another argument than the btree caching point
<Regenaxer>
yes
<beneroth>
so the disadvantage to use 'sym without sorted, is not so bad in many cases
<Regenaxer>
without sym, it sorts all
<beneroth>
not optimal, but not wrong
<beneroth>
ah better no 'sym then
<beneroth>
ok
<Regenaxer>
not wrong, but import may fail
<beneroth>
ok
<beneroth>
so in conclusion
<beneroth>
if the data to be imported happens to be sorted by one (or multiple in parallel, not combined) properties, then you can take one of them for 'sym for a bit quicker/more optimized import
<Regenaxer>
exactly
<beneroth>
if the data is not sorted, or you cannot surely rely on it, than use (create) without the 'sym argument
<Regenaxer>
yep
<beneroth>
!!
<beneroth>
thanks
<Regenaxer>
thank you, now I remembered :)
<Regenaxer>
you gave the hint
<Regenaxer>
ie the exported data
<beneroth>
thank you, without you I would never had guessed xD
<Regenaxer>
:)
<beneroth>
so but then my point about multiple 'create runs is not completely false
<Regenaxer>
yes, it can re-use the objects
<beneroth>
e.g. for two extremely big +Key properties
<Regenaxer>
fin of lst
<beneroth>
yep
<beneroth>
its questionable that multiple runs would be better, than one big
<beneroth>
but depending on data and use case, it might be case
<Regenaxer>
in osm I have 2 runs two, but because it is two classes
<beneroth>
that is another topic, yeah
<Regenaxer>
yes
<Regenaxer>
create imports one class only
<beneroth>
aye
<Regenaxer>
You can download the osm source iirc
<beneroth>
there might also be use cases where every obj creation actually would mean multiple creations, then maybe (create) is not so good
<beneroth>
e.g. a data schema with heavily data dedublication using many classes
<Regenaxer>
I think no problem
<beneroth>
then the "naive" approach, and many commits and pauses, might be better
<Regenaxer>
create is concerned only about the trees
<beneroth>
T
<beneroth>
aye
<beneroth>
you are rightz
<Regenaxer>
the "naive" approach may take forever
<Regenaxer>
if the data are large
<beneroth>
right
<beneroth>
if one create would be bad (because of very use case specific data schema), then multiple (create) runs with each optimal prepared input files would be correct approach
<beneroth>
if a single (create) run cannot be done for some reason
<Regenaxer>
yes, probably no problem
<beneroth>
e.g. because it is not an initialization, but update on a live system you take temporarily down for it, and you want to reduce this downtime (increasing the overall one, but minimizing the length per downtime)
<Regenaxer>
right
<Regenaxer>
Sorry, tel
<beneroth>
all good, I think we're finished here
<beneroth>
thank you!
<beneroth>
much appreciated
<Regenaxer>
:)
aw- has quit [Ping timeout: 240 seconds]
emacsomancer has quit [Read error: Connection reset by peer]
emacsomancer has joined #picolisp
peterhil has quit [Read error: Connection reset by peer]
peterhil` has joined #picolisp
Seteeri has quit [Ping timeout: 272 seconds]
casaca has quit [Quit: leaving]
casaca has joined #picolisp
rob_w has quit [Quit: Leaving]
emacsomancer has quit [Read error: Connection reset by peer]
emacsomancer has joined #picolisp
peterhil` has quit [Quit: Must not waste too much time here...]