Regenaxer, what is the name/where do I find the bulk insert stuff?
single user db, bulk insert
hmm :)
'create' ?
I mixed it up with blk
but that is for reading
yes, different thing
a but unlucky naming, I think ;-)
Reads raw object data from the cnt'th block in the file open on descriptor fd.
raw pilDB reads from a foreign pilDB file
yes, block(s)
I think "blk" seems much closer to "bulk insert" than "create"
of course "blk" could also stand for bulk read
but 'blk' is neither really bulk read
T, not at all
How did you arrive at the name for 'blk' ? not via "bulk" ?
The internal vars are all like BlkXxxx
I don't argue for change
yes, no problem
BlkXxxx ?
cnt'th block
(BlkIndex) etc
in src64/db.l
it is all about "Blk"s :)
so the etymology of 'blk' is indeed 'block', not "bulk"
the database block
yep, as the ref says, cnt'th block
yeah, obviously xD
it is an advanced function anyway
for the advanced user I mean
I rest my point, create and blk names are a bit unlucky
but not wrong
yes, not so many short names left
well I recommend you to take vowels into your alphabet
I see how and why you got your naming habits
ok :)
but for my projects so far I strive for more descriptive names. for maintainability, meaning the developer who has to read it in 2-3 years
(even if it is me)
(nut not exclusively, if my plans work out)
slightly more overhead, yes, but in most cases negligible I believe
though I fully understand your principle, I think you do it as a habit, so that you never have to adapt your habits, independent of doing an embedded project or a system on a big powerful machine
It is not about efficiency, but readability
I hate verbose code
Sorry, I'm in a tel conf
can't concentrate
I agree with you in spirit, but I'm slightly a bit less pure on the dense <-> verbose scale as you :)
np :)
Regenaxer, when you are back I would have a little question about (create)
Would it be idiomatic to set the scl for the reader (and entire env explicitly) like so: (load "-scl 16" "file.l")
hi Seteeri
I think there is *Sctl
and I would do the load more like (scl 16 (load "file.l"))
then *Scl is 16 during load, and will be reset to its previous value afterwards
so basically the same as (let *Scl 16 (load "file.l"))
you can also use (scl 16) to set *Scl globally
same as (setq *Scl 16), I guess
so (scl) is nothing else than syntax sugar to generalize interaction with *Scl
Seteeri, (load "-scl 16") will try to find a file named "-scl 16"
(load) is not starting up a separate pil process or the like :-)
setting *Scl globally (for the whole lifetime of your picolisp process and its children) is absolutely idiomatic :-)
yes, setting in load may be fine
xkapastel has joined #picolisp
Usually I set it in the main file
@beneroth (load "lib.l" "-* 1 2 3") outputs 6, but it should work regardless of the order?
well, it just returns 6, not printing
the repl prints it
mtsd has quit [Quit: mtsd]
(load "lib.l" "-* 1 2 3") I mean
I'm sorry Seteeri, I was not aware that load can interpret -arguments
Based on the docs, it says if the argument is a sym with 1st char a hyphen then it is passed as an exec list
it is as on the command line
no worries
* beneroth
is flabbergasted
ok, didn't know, I believed thats only true for invocation of the picolisp binary
I call -'symbols ...' in some cases
for debugging
I see
it's a lot more convenient that having an entire file to execute a simple expression :)
beneroth, right, but the command line is just 'load'ed
ah, because the arguments are still evaluated in the context of the loaded file, Regenaxer ?
so (file) would be the loaded file :)
Regenaxer, I see. makes sense.
Regenaxer, I'm just shocked to learn something new about a piece of picolisp I believed to know :O
hehe :)
ok, I go update my *Humility
I have main.l which calls 'load recursively, so I figure instead of passing 'load to scale everytime, just do it once at the beginning
Regenaxer, if you got time, question about (create).. it prefers to have single user mode
Regenaxer, what about doing a (dbSync) (commit 'upd) around it, what is the problem besides the problems natural with a long-running transaction?
Yes, but I'm with BTG the next hours it seems
we can also discuss later :)
yes, thanks
or I look at the implementation and waste less of your time :)
Sorry :) Need to concentrate here
it is absolutely alright
peterhil` has quit [Read error: Connection reset by peer]
peterhil has joined #picolisp
A pause at BTG at the moment ... 'create' does repeated commits for the big accumulated sets of inserts, so it cannot be synced
Even if, other processes will be blocked for long times
I see
so after initial imports, I would plan the following steps:
This is also why the name "create" fits
1) preprocess the data to be imported, maybe even by turning it into a plio file (no interaction with live application required)
It is to build a new DB
2) put the live application into maintenance mode, start it in single user, import the prepared file using (create), go back to normal operation
that sounds feasible?
I did this with the OSM DB
yeah similiar use case
I have big amounts of data (multiple 10k records), coming from another database (non-pil)
after that I never used it again ;)
ok, though mul 10k is not big, so create may be overkill
I like to make periodically a new export from the other system (= complete data, not differential), and import it with the steps above into my pil application
I'm aware, I worked with huge databases, but not with pil :)
and I like to optimize
so plan is to have some standard functionality/library to do the steps above
If the db fits mostly into memory, a naive import may be the same speed
or faster
as no temp files are managed
I see
I agree with naming it (create) :D
aw- has joined #picolisp
Regenaxer, in case you have time
(class +Foo +Entity)
(trace 'put> '+Foo)
(rel a (+Key +Number))
(pool "create.db")
(rel b (+String))
(trace 'isa)
(let "XS" '((1 "one") (2 "two"))
(create '(+Foo) 'a '(b)
(and "XS" (pop '"XS")) ) )
: (show '{3})
{3} (NIL (1 NIL) (2 NIL))
-> {3}
Entity counter is correct, but index tree misses the external symbols
I don't see the mistake
Will take a look soon
I can wait, your other stuff takes priority
thank you
no hurry :)
I think you need 'dbs'
'create' does (new (meta Typ 'Dbf 1) Typ))
So it assumes it is not a single-file DB
ha, I thought about that and tried so
let me try again, a moment
other functions do (or (meta Typ 'Dbf 1) 1)
defaulting to 1
But 'create' is usful only in big DBs
yeah (dbs) solved it
cool! :)
I tested it before, but apparently I didn't do that properly
the obj is created using (new (meta Typ 'Dbf 1) Typ), but the index update only uses (meta Typ Var 'dbf)
many thanks Regenaxer
should have found it myself
no problem
maybe make the implementation consequential ?
Good that I looked at it again. Almost forgot we havs such a powerful function
this will become very handy for my work, eventually
I'm gradually taking on bigger and bigger projects :)
I suggest the reference help entry of (create) should at least have a note that (dbs) is required
xkapastel has quit [Quit: Connection closed for inactivity]
it absolutely makes sense etc., but should be mentioned, for completeness
the ref
the implementation of (create) is good as it is. not so consequential (requiring dbs for indices, but not objects), but I think this is alright
can always use index rebuild, and at least nothing is lost this way
Why rebuild here?
when adding dbs later?
you are absolutely right
I cannot read code
* beneroth
blames lack of sleep
I thought the object creation in (create) would not require (dbs)
well no
it doesn't
or its because Dbf is set on default?
The idea is that the db model is optimized as far as possible, then create is used
* beneroth
talks about (new (meta Typ 'Dbf 1) Typ)
by default NIL iirc
or is it because (new) defaults
you are right
all perfect, as usual
ok :)
picolisp problems are nearly always just stupid users :)
so here too :)
not stupid
it is all very terse
but still :)
picolisp stupid is still pretty clever :P
Not for the faint at heart ;)
thank you for your time & patience :)
aye, and that is how we like it!
Always welcome!
programmers who prefer to be tampered can stay with java/c#/python
T :)
Regenaxer has left #picolisp [#picolisp]
Regenaxer has joined #picolisp
welcome back :)
Ctrl-D again
maybe you should catch it and ask for confirmation
if possible
might also turn bad sometimes
It uses whe line editor in @lib/led.l
ask for confirmation, and if not cancelled, proceed after 15 seconds
iirc it is just a key definition in line edit
so not equal to the usual Ctrl+D logout shortcut
for shells etc
Just same meaning
line 384 (fkey "^D" (prinl) (bye))
two more questions about (create)
so it can be used without a key 'sym, right ?
but when used with a key, 'prg should return the records sorted by 'sym ?
why the sorted requirement?
yes, without key is ok
It is all about sorted inserts
so that disk cache use is optimized
ok, so not a requirement by (create), but hm.. meaningful
yeah I understand how that makes sense :)
I don't remember why the first key is important
though it only makes sense for traditional "primary key" data
not only
I mean, in picolisp there is no preference from one +Key rel over another +Key rel on the same one
and +Ref's are fine here too
and other indexes
of course, if this +Key is mainly queried, than it makes sense to have them in an order on disk where nearby entries are together
which certainly will be the case in most use cases for (create), as that would be data previously exported from other databases
No, it is not the objects. Only the b-trees matter
The trees are the bottleneck
if the b-tree nodes are accessed at random, it clobbers all disk cache
and starts thrashing
ok, thought obj block order on disk could also be better or worse for disk cache
ah makes sense
Doesnt matter here
and (create) is writting the tree all over the place, and if its in order, than the tree blocks will be less fragmented
the objects are written sequentially when created
this is the point, right?
I would not say fragmented
I mean in "filesystem fragmentation"
No, file access is ok
it is lssek
it is really the random tree nodes
you get *all* nodes in memory
if you access random keys
So actually *not* disk cache
it is heap space
the cached node symbols
so this is about when the 'sym index does not fit into memory
any index
also the ones from lst
so the disadvantage of unsorted 'sym would not be as bad when the 'sym index fits in memory (while 'obj don't, I mean that could be plausible)
yeah, you just take the biggest index for 'sym
because that is the most affected
which is more true for +Key than +Ref
I dont remember well
it concerns all indexes
well, maybe not so true for +String indices, as the index as multiple entries per obj in many cases
and they are imported one at a time
you would do multiple (create) runs, one per indexed property
yeah makes sense
multiple are no problem either
after the first using the update mechanism instead of create
new objs
still using (create) of course
No, only one (create) run
it processes each index after the other
I think several runs are not intended
but 'sym is just one of the indexed properties
and lst are the others
what if one of the others is as big/bad as the 'sym one?
for some reason sym is special
only 'sym would be sorted
e.g. having two +Key relations on a single entity
two unique ids, but different numbers, because they come from two different third party systems
I think the reason is only to avoid sorting already sorted data
as you say, after an export
otherwise sym and lst are equivalent
that makes sense, but is another argument than the btree caching point
so the disadvantage to use 'sym without sorted, is not so bad in many cases
without sym, it sorts all
not optimal, but not wrong
ah better no 'sym then
not wrong, but import may fail
so in conclusion
if the data to be imported happens to be sorted by one (or multiple in parallel, not combined) properties, then you can take one of them for 'sym for a bit quicker/more optimized import
if the data is not sorted, or you cannot surely rely on it, than use (create) without the 'sym argument
thank you, now I remembered :)
you gave the hint
ie the exported data
thank you, without you I would never had guessed xD
so but then my point about multiple 'create runs is not completely false
yes, it can re-use the objects
e.g. for two extremely big +Key properties
fin of lst
its questionable that multiple runs would be better, than one big
but depending on data and use case, it might be case
in osm I have 2 runs two, but because it is two classes
that is another topic, yeah
create imports one class only
You can download the osm source iirc
there might also be use cases where every obj creation actually would mean multiple creations, then maybe (create) is not so good
e.g. a data schema with heavily data dedublication using many classes
I think no problem
then the "naive" approach, and many commits and pauses, might be better
create is concerned only about the trees
you are rightz
the "naive" approach may take forever
if the data are large
if one create would be bad (because of very use case specific data schema), then multiple (create) runs with each optimal prepared input files would be correct approach
if a single (create) run cannot be done for some reason
yes, probably no problem
e.g. because it is not an initialization, but update on a live system you take temporarily down for it, and you want to reduce this downtime (increasing the overall one, but minimizing the length per downtime)
Sorry, tel
all good, I think we're finished here
thank you!
much appreciated
aw- has quit [Ping timeout: 240 seconds]
emacsomancer has quit [Read error: Connection reset by peer]
emacsomancer has joined #picolisp
peterhil has quit [Read error: Connection reset by peer]
peterhil` has joined #picolisp
Seteeri has quit [Ping timeout: 272 seconds]
casaca has quit [Quit: leaving]
casaca has joined #picolisp
rob_w has quit [Quit: Leaving]
emacsomancer has quit [Read error: Connection reset by peer]
emacsomancer has joined #picolisp
peterhil` has quit [Quit: Must not waste too much time here...]