<freemint>
So calling dbs is not an optimization but a necessity for the pilDB to work?
<Regenaxer>
It is not necessary, but then you have a single file with factor 2
<Regenaxer>
(like (dbs (2))
<Regenaxer>
'dbs' modifies the relations, and sets *Dbs
<Regenaxer>
which is then passed to 'pool'
<rick42>
hello peeps (don't let me interrupt you though :)
<Regenaxer>
So it is for tuning the DB
<freemint>
ok and after the byte which contains 2 starts the first block of a size of (<< 64 2)
<Regenaxer>
Hi rick42!
<rick42>
Hi Regenaxer and freemint!
<freemint>
Hi rick42
<freemint>
Hi alexshendi
<Regenaxer>
No, all blocks are same size, also the first
<Regenaxer>
So the first data block starts at 256
<Regenaxer>
if factor is 2
<rick42>
ah alexshendi! hello!
<freemint>
or 64 if the factor is 0?
<Regenaxer>
yep
<Regenaxer>
This factor is *used* only upon file creation
<Regenaxer>
Later *Dbs is ignored
<Regenaxer>
and the factor in the root block is used
<Regenaxer>
*Dbs is used only then to know how many files to open
<freemint>
so the factor in the root block says how to read the file regardless of *Dbs
<Regenaxer>
the sizes are taken from the files
<Regenaxer>
yes
<rick42>
metadata nice
<Regenaxer>
The size cannot be changed
<Regenaxer>
after the file was created
<freemint>
(not without rebuilding the file or not at all?)
<Regenaxer>
You need to 'dump' the objects, delete the file, and import
<Regenaxer>
May be impossible if the new block size is smaller
<Regenaxer>
well, no, possible
<Regenaxer>
but leads to more fragmentation
<freemint>
I just followed the same train of thought
<Regenaxer>
@lib/db32-64.l in fact does something like that
<Regenaxer>
I used it to port old 64-only DBs to the new system
<Regenaxer>
Single-file DBs with 64 blocks
<Regenaxer>
64-byte blocks
<Regenaxer>
oh, no
<Regenaxer>
this is porting 32-bit DB to pil64
<Regenaxer>
But similar
<freemint>
whatwas the reason to allow flexible blocksize?
<Regenaxer>
Efficiency
<Regenaxer>
Large objects may take too many blocks
<Regenaxer>
possibly fragmented
<freemint>
have you seen a lot of fragmentation?
<Regenaxer>
I never measured
<Regenaxer>
But it surely will
<freemint>
have you meassured performance improvements?
<Regenaxer>
Not exactly
<Regenaxer>
Imagine a B~tree node of 4 KiB
<Regenaxer>
or 1 K
<freemint>
i understand the theory
<Regenaxer>
it would be 16 blocks
<Regenaxer>
possibly spread over the full file size
<Regenaxer>
means 16 seeks and 16 reads
<Regenaxer>
more for larger symbols
<Regenaxer>
so a flexible size is a *must*
<freemint>
It is just that when i do stuff like that you cry "premature optimazation" ;)
<Regenaxer>
no, it is different
<Regenaxer>
I know the limits
<Regenaxer>
used single file DB for 10 years
<freemint>
hey das war ein Witz
<Regenaxer>
and hit limits for huge DBs
<Regenaxer>
yeah :)
<freemint>
anyway. I now got a better understand how the DB works at the header level.
<Regenaxer>
So it was not premature, but too slow for the Smapper projects
<Regenaxer>
ok
<freemint>
all so it cleared up my confusion about *Dbs and why it is not set on import
<Regenaxer>
Also very important is to have critical indexes in their private file
<Regenaxer>
Which import do you mean?
<freemint>
on pool
<freemint>
Can i try to summarize?
<Regenaxer>
So "open"
<Regenaxer>
yes
<freemint>
Oh a question appeared what was NEXT and FREE good for. I Think next is the offset of the root cell
<Regenaxer>
Next is the next free block, ie. the end of the file
<Regenaxer>
Free is the start of the free list
<Regenaxer>
ie deleted blocks
<Regenaxer>
Next is redundant for normal files, but you can also use d /dev/ directly (without filesystem), so there is no EOF
<freemint>
So Next is rougly the size of the file (number of blocks) and free is a pointer to the next free cell
<Regenaxer>
yes, to the *first* free cell perhaps
<Regenaxer>
a linked list of free blocks
<freemint>
what to the first free cell perhaps?
<Regenaxer>
How do you mean that?
<Regenaxer>
the "what"
<freemint>
Free is the index of the first free cell or Next is the index of the first free cell?
<Regenaxer>
blocks, not cells here
<Regenaxer>
Free is the avail list
<freemint>
s/cells/blocks
<Regenaxer>
Next is size of the file divided by blocksize
<freemint>
thanks
<freemint>
that makes perfect sense when explained that way
<Regenaxer>
all these "pointers" must be shifted by the scale factor
<freemint>
i am getting something wrong or can a file only contain 2^5 blocks?
<Regenaxer>
no
<Regenaxer>
2**42 (4 Tera) Blocks per file
<freemint>
2^(6*8-1)
<Regenaxer>
2**16 Files -> 256 Peta objects
<Regenaxer>
T
<freemint>
i tihnk it is only 2^41
<Regenaxer>
It is 42
<Regenaxer>
6 * 7
<freemint>
i made a mistake
<freemint>
question why only 6*(8-1) and not (6*8)-1 ?
<Regenaxer>
It is 6 bytes (48), but 6 bits reserved
<freemint>
ahh
<freemint>
i thought only the LSB was always 0
longshi has quit [Ping timeout: 252 seconds]
<freemint>
+-------------+-+-------------+-+----+
<freemint>
Block 0: | Free 0| Next 0| << |
<freemint>
+-------------+-+-------------+-+----+
<freemint>
suggest that only the last bit is reserved in free
<Regenaxer>
I dont remember new
<freemint>
well that limit in unlikely to be hit.
<Regenaxer>
The lowest 6 bits in a pointer are reserved
<Regenaxer>
marker for first block, and following
<Regenaxer>
eg see "ID-Block:"
<Regenaxer>
It result from the non-shifted min of 64 bytes
<freemint>
summary time
<freemint>
A PicoLisp DB file contains a header inside it's first block, since the header is always smaller than 64 bits (the smallest possible blocksize) there is no problem.
<Regenaxer>
right
<freemint>
The header contains several flags, the offset of the next free block (which in turn points to the next free block, creating a list of blocks to recycle which maybe fragmented), and the current block count in the allocated space for the database (stored in next).
<freemint>
and the block size.
<Regenaxer>
correct
<Regenaxer>
"block count" is more clear than "next"
Regenaxer has left #picolisp [#picolisp]
Regenaxer has joined #picolisp
<Regenaxer>
oops :)
<freemint>
The layout of the header is as follows: Free spans 6 bytes, but 6 bit of that are reserved giving a pilDB file a total adress space of 2^42.
<Regenaxer>
right
<Regenaxer>
the 'n' in
<Regenaxer>
EXT-Block: | Link n| Data
<Regenaxer>
is max 63
<Regenaxer>
so if a symbol has more blocks, they all have 63
<Regenaxer>
The 'n' is used only by 'dbck'
<Regenaxer>
consistency check
<freemint>
The block count (refered as NEXT in the documentation) is also 6 bytes with 6 bit reserved for flags. Again 2^42 blocks
<Regenaxer>
If it is 0, it is used in several places
<Regenaxer>
yes
<Regenaxer>
0 is used by 'seq' to find the next ID block
<Regenaxer>
Skipping EXT-Blocks
<Regenaxer>
So it mostly checks for zero or non-zero
<freemint>
The next byte (byte 12 when starting to count with 0) encodes the blocksize. It is used a shift factor which shifts the smallest possible blocksize (64 bit) to the left. If a value of 2 (default value) is picked the blocksize 256 bytes. It is rare to see values larger than 7.
<Regenaxer>
Correct
<Regenaxer>
even 6 is seldom
<Regenaxer>
I never used 7
<Regenaxer>
but who knows?
<freemint>
Is there any information in the header i did not document (other than the flags?)
<Regenaxer>
No, that's all
<freemint>
with 7 you get bigger than most hard disk sectors
<Regenaxer>
the sectors are pretty irrelevant, not even known
<Regenaxer>
it reads full tracks
<Regenaxer>
and caches them
<Regenaxer>
Logical sector size is 512 still (?)
<Regenaxer>
Unix blocksize is 8192 usually
<Regenaxer>
buffer size used by stdio etc.
<freemint>
But these disk sectors are only relevant to performance and have only a minor impact
<Regenaxer>
Not even relevant to performance any more I suspect
<Regenaxer>
The sectors are completely tranparent as I see it
<freemint>
Opposed to the block size, which can result in a lot of jumping when to small (slowing performance a lot) or in a lot of uselessly transfared 0 and increased file size and more ram use
<freemint>
when to big
<Regenaxer>
Yep
<freemint>
Since you want refer records of different sizes in a database and these are stored in different files. I am curious how pilDB refers objects in ther databases. I suspect ext is involved
<Regenaxer>
'ext' is for other DBs (not the one opened by 'pool')
<freemint>
ohh
<freemint>
A PicoLisp db maybe either a file with a single blocksize or a folder of different db files (with differing or the same blocks size) and a folder for Blobs (Binary objects not stored in the db).
<Regenaxer>
correct
<freemint>
So how can i refer from a large object in file A to a small one in B
<Regenaxer>
The size does not matter. It is simply in the data
<Regenaxer>
Typically a +Link or +Joint
<Regenaxer>
An external symbol is a first class object
<freemint>
My problem is that with the offset we can only point to location in the current file.
<Regenaxer>
no, it encodes both file AND block
<Regenaxer>
{A7}
<freemint>
ahh so refering to other databases does not happen on the storage level but at the content level.
<Regenaxer>
A = 1 (hax notation), so file 1 (starting tith '@' = zero)
<Regenaxer>
No idea about storage or content. The symbol itself encodes its location
<freemint>
you are to quick for me here. Now that we have blocks, how do we store information in them?
<Regenaxer>
Using PLIO
<Regenaxer>
Moment, brb
Regenaxer has left #picolisp [#picolisp]
Regenaxer has joined #picolisp
<Regenaxer>
ret
<Regenaxer>
The blocks are only used to store the PLIO
<freemint>
That is the same PLIO format used by (rd) and 'pr which is refered to as "encoded binary format" in the docs?
<Regenaxer>
The point is how the symbols interact in the heap
<Regenaxer>
yes
<Regenaxer>
So the block in the files are only used to fetch the data, and write back modifiaions (persistence)
<freemint>
"The point is how the symbols interact in the heap" Do you want to say that there is no magic translation layer. just that you jumoto a certain offset rd pilIO from there put it into address space (heap) of the pucolisp programm and the picolisp progamm is responsible what it makes out of it?
<Regenaxer>
The program logic itself uses normal symbols
<Regenaxer>
Even simpler
<freemint>
oh tell me how it can be even simpler?
<Regenaxer>
The "certain offset" is used only to fetch the data
<freemint>
Yes
<Regenaxer>
after that you have *normal* symbols
<Regenaxer>
no idea of file and block
<freemint>
or a number?
<freemint>
or a cell?
<Regenaxer>
only symbols
<freemint>
ok
<Regenaxer>
the val or prop of such a symbol can contain anything
<Regenaxer>
numbers, lists, other (also external) syms
<freemint>
except nil as key in the property list ;)
<Regenaxer>
right
<Regenaxer>
And *not* certain graph structures ;)
<freemint>
\me tries looks like he is not the perpetrator
<Regenaxer>
hehe, no
<freemint>
So the pilIO in a database is only symbols "containing" lists, numbers and other symbols?
<Regenaxer>
right
<Regenaxer>
It stores a single value and one propert list in a single block(-list)
<freemint>
as a symbols do in picolisp (except NIL, T)
<Regenaxer>
yes
<Regenaxer>
NIL and T *may* have properties, just the value is protected
<freemint>
I played around with 'pr and used 'hd to inspect the resulting file. I tried to serialize a symbol with 'pr but it did not work out can you take a look?
<freemint>
: (setq Sym 'Value)
<freemint>
-> Value : (out "sym" (pr Sym)) -> Value
<Regenaxer>
'pr' does not serialize a complete symbol
<freemint>
: (hd "sym")
<Regenaxer>
only prints an expression
<freemint>
Ahhh that explains my problem
<freemint>
how do serialize a symbol then
<freemint>
so pr is only for lists and numbers?
<Regenaxer>
you could do (out "sym" (pr (val Sym) (getl Sym)))
<Regenaxer>
no, also internal, external, transient symbols
<freemint>
so when are two symbols the same, the name does not cut it since there are symbols which have no name
<freemint>
i have played around and i noticed that there are two different "sames"
<Regenaxer>
yes, '=' and '==' ?
<freemint>
somethings are = but not ==
<Regenaxer>
yes
<Regenaxer>
'==' is exactly the same item
<Regenaxer>
ie the same address
<Regenaxer>
"pointer equality"
<freemint>
so they have the same reference
<Regenaxer>
T
longshi has joined #picolisp
<Regenaxer>
Comparison with '==' is fast
<Regenaxer>
'=' needs to traverse the structures
<Regenaxer>
(name characters, or list elements)
<freemint>
Addresses point at cells
<freemint>
?
orivej has quit [Ping timeout: 245 seconds]
<Regenaxer>
yes
<Regenaxer>
they point to the first, 4th or 8th byte of a cell (in pil64)
<Regenaxer>
first is pair, 4th is bignum and 8th is symbol
<freemint>
Why that?
<freemint>
ah
<Regenaxer>
so for a symbol it points to the value
<Regenaxer>
(car Sym) is the same as (val Sym)
<freemint>
Is it possible to have two "different" addresses point to the same cell? As one address says you find a number the other says you find a list?
<Regenaxer>
It would never be the same address, because the tag bits modify it
<Regenaxer>
Would be possible if the tag systematics would be different
<Regenaxer>
(outside the actual pointer)
<freemint>
so having two different of address to a single cell only causes chaos?
<Regenaxer>
Not chaos, you can do that with 'adr'
<Regenaxer>
Lets say it is a bit surprising
<freemint>
is there any use in it?
<Regenaxer>
and may crash easily
<Regenaxer>
Some debugging
<Regenaxer>
or brute force poking in the heap
<freemint>
let me try to summarize about address and cells
<Regenaxer>
good
<freemint>
PicoLisp has a heap made out of cells where it stores all symbols, numbers and lists?
<Regenaxer>
correct
<freemint>
cells itself are not aware whether they are number or a symbol, you need an address containg the type get meaning out of a cell
<Regenaxer>
exactly
<Regenaxer>
It is in the eye of the observer
<Regenaxer>
a cell just *is* ;)
<freemint>
i always thought there were flags for types in the cell. Guess i was very mistaken.
<Regenaxer>
yes, only the mark bits
<Regenaxer>
gc and circ
<freemint>
that reminds me of evil structures i built.
<Regenaxer>
the graphs?
<freemint>
yes
<freemint>
anyway is there more to say about data storage in ram, than there are typeless cells and the type is in the pointer?
<freemint>
Oh there is garberage collection ... but that is a topic for another time.
<Regenaxer>
Perhaps that the heap is segmented in chunks
<Regenaxer>
they are linked together
<Regenaxer>
Each heap segment is 1 MiB
<freemint>
i think that is not important at application level. it is a the GC level
<Regenaxer>
yes
<freemint>
tracking back we started out with the database now that i got a better understanding of cells can you explain again the "simple" way how we load {A7}.
<Regenaxer>
the name has two parts
<Regenaxer>
the first one is in "hax" notation, @ - O
<Regenaxer>
so it is a hex number encoding the file
<Regenaxer>
the other is in octal notation for the block number
<Regenaxer>
pil32 user a syntax {123-45}
<freemint>
a that is why i cam in to trouble when trying to get the {9}
<Regenaxer>
yes
<Regenaxer>
pil32 was inefficient
<Regenaxer>
the name was really in ASCII
<Regenaxer>
and needed a delimiter "-"
<Regenaxer>
with hax/octal it is clear which part is which
<Regenaxer>
and internally it is stored as interleaved bit patterns
<Regenaxer>
so few files with few objects result in shorter names
<Regenaxer>
and thus less space on disk
<Regenaxer>
(in the heap the size is always exactly one cell)
<freemint>
i still do not understand the roole of the interleaved bit pattern
<freemint>
where do we use it?
<Regenaxer>
the name is technically a number
<Regenaxer>
a short num
<Regenaxer>
using a variable size in PLIO
<freemint>
so external symbols have a name
<Regenaxer>
Do hd on (pr 12) vs (pr 123456789)
<Regenaxer>
yes
<Regenaxer>
it is the bit pattern
<freemint>
is that name {A7} or the bitpattern of
<Regenaxer>
in the TAIL part of the sym
<Regenaxer>
A7 gives a small number
<Regenaxer>
even smaller are objects in the first file
<Regenaxer>
{7}
<freemint>
so the bit pattern is used in the heap as address?
<Regenaxer>
use only 2 bytes
<Regenaxer>
not in the heap
<Regenaxer>
only in PLIO
<Regenaxer>
in the heap it is used to locate the block
<freemint>
ah, that is how we refer to blocks in different files
<Regenaxer>
for read and later for write
<Regenaxer>
yes
<Regenaxer>
always a file and a block offset
<Regenaxer>
The application does not use this name
<Regenaxer>
Only when printing it during debugging
<freemint>
so when i want to point somewhere i have a symbol at a block offset which is encode value first the property list. and the value is an interleaved bit pattern?
<freemint>
this is {A9}?
<Regenaxer>
no, the name is the bit pattern
<Regenaxer>
9 is not legal octal
<freemint>
sorry
<Regenaxer>
Would be {A11)
<freemint>
where is the name stored i though we only serialized value and p-list in to the blocks with pilIO
<Regenaxer>
The name of *that* symbol does not need to be stored
<Regenaxer>
but if the data *in* the symbol refer to other objects, it is stored there
<freemint>
because it was implicit in the position?
<Regenaxer>
yes
<Regenaxer>
try (out "a" (pr '{A11})) (hd "a")
<freemint>
You mean:but if the data *in* the symbol refer to other symbols, the name of the other symbol is stored there?
<Regenaxer>
yes
<Regenaxer>
encoded as external symbol in PLIO
<freemint>
00000000 0F 09 00 10 ....
<freemint>
0F is A
<freemint>
09 is 11 in octal
<Regenaxer>
just as an internal symbol 'a' would be encoded as INTERN + "a"
<freemint>
what was 00 01 for?
<Regenaxer>
0F is EXTERN
<freemint>
INTERN being a some constant from pilIO right?
<Regenaxer>
the lowest 2 bits of the first byte encode th type
<Regenaxer>
then 3 for the length
<freemint>
05 is intern?
<Regenaxer>
3 << 2 | 3
<Regenaxer>
I C: enum {NUMBER, INTERN, TRANSIENT, EXTERN};
<Regenaxer>
The last line shows how many bytes are needed in PLIO
<freemint>
ahh so when a symbol {B2} refers to {A11} then there is a block at b2, which when rd is {A11} ?
<Regenaxer>
So A11 needs 3 bytes
<Regenaxer>
hmm, no
<Regenaxer>
when a symbol {B2} refers to {A11} then the data (some property) have {A11}
<Regenaxer>
(show '{B2})
<freemint>
data = value?
<Regenaxer>
you see {A11} somewhere then
<freemint>
mhh so it depends how i refer to ti
<Regenaxer>
Or DB symbols the value usually holds the classes
<freemint>
if i use classes it is in the p-list ofcourse
<Regenaxer>
But in B-tree nodes *all* is in the value
<Regenaxer>
yes
<freemint>
ah and if we have really many files or objects we need 4 or 5 bytes?
<Regenaxer>
exactly
<freemint>
Regenaxer: that is so the value does not need to be skipped?
<freemint>
(B tree)
<Regenaxer>
Where would it be skipped?
<freemint>
if B trees used the p-list instead of the value
<freemint>
and we serialize the value first, as per convention
<freemint>
and the value is useless, we would need to skip it
<freemint>
which is more inefficient that it has to be
<freemint>
is that the reason why the b tree uses the value?
Regenaxer has quit [Ping timeout: 268 seconds]
Regenaxer has joined #picolisp
<Regenaxer>
The btree node needs no properties
<Regenaxer>
it is a list structure
<freemint>
ok
<Regenaxer>
searched with 'rank'
<freemint>
i came across something weird while playing around
<freemint>
finish your thought
<Regenaxer>
no, done
<freemint>
? (pool "g") -> T ? (set (print (new T)) 9) {2}-> 9 ? (commit) -> T ? (bye) joto@l148:~$ pil + : (pool "g") -> T : {2} -> NIL
<freemint>
I tried to store a number in the value of {2}
<Regenaxer>
it is not fetche this way
<Regenaxer>
you need (val '{2})
<freemint>
there is my9
<Regenaxer>
yes
<freemint>
i am happy
<Regenaxer>
The lowest-level eval does not trigger fetching of the symbol
<freemint>
I thought that if X is a symbol (= X (val X))
<Regenaxer>
It would be very expensive
<Regenaxer>
and never happens
<Regenaxer>
as externals should never be directly in the code
<Regenaxer>
except *DB
<freemint>
It worked in when i was just using the heap. But it holds no longer true for DB, you are right about the not using {2} in code.
<Regenaxer>
So you always have 'val' or 'get' or derived
<Regenaxer>
yes, the name is not known
<Regenaxer>
and *if* code helds such a symbol, it would not be gc'ed and fill up the heap
<freemint>
I think it we got all the basic about the storage side of data in the DB and in the heap
<Regenaxer>
on a single symbol potentially the whole DB may hang
<Regenaxer>
OK, good :)
<freemint>
or have missed anything?
<Regenaxer>
So lets stop for today, I need some stuff to clean up
<Regenaxer>
probably
<Regenaxer>
but you can ask again
<Regenaxer>
or investigate a little
<freemint>
Another day it might be interesting to built an understanding of actual db usage with classes from scratch
<Regenaxer>
ok, yes, the next layer
<freemint>
There are still some question marks in my mind there.
<freemint>
For example how does picoLisp know where the index starts when there is no explicit reference to the index starting external symbol in the code
<Regenaxer>
it all hangs on the *DB value
<Regenaxer>
on {1}
<Regenaxer>
we saw last time
<freemint>
so {1} has a property for the indexes in the DB?
<Regenaxer>
Entities are properties in {1}
<freemint>
i ask you what entities are next time
<Regenaxer>
We saw last time with (edit *DB)
<Regenaxer>
ok
<freemint>
it was really enlightening
<Regenaxer>
Great! :)
<Regenaxer>
Have a good night!
<freemint>
good night
<Regenaxer>
I'm afp now
<Regenaxer>
bye! :)
<freemint>
You earned it
<freemint>
beneroth aw- alexshendi razzy rick42 tankf33der and all others. Do you have thoughts on what we just did?
razzy has quit [Ping timeout: 250 seconds]
shpx has joined #picolisp
freemint has quit [Quit: Page closed]
shpx has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]