So calling dbs is not an optimization but a necessity for the pilDB to work?
It is not necessary, but then you have a single file with factor 2
(like (dbs (2))
'dbs' modifies the relations, and sets *Dbs
which is then passed to 'pool'
hello peeps (don't let me interrupt you though :)
So it is for tuning the DB
ok and after the byte which contains 2 starts the first block of a size of (<< 64 2)
Hi rick42!
Hi Regenaxer and freemint!
Hi rick42
Hi alexshendi
No, all blocks are same size, also the first
So the first data block starts at 256
if factor is 2
ah alexshendi! hello!
or 64 if the factor is 0?
This factor is *used* only upon file creation
Later *Dbs is ignored
and the factor in the root block is used
*Dbs is used only then to know how many files to open
so the factor in the root block says how to read the file regardless of *Dbs
the sizes are taken from the files
metadata nice
The size cannot be changed
after the file was created
(not without rebuilding the file or not at all?)
You need to 'dump' the objects, delete the file, and import
May be impossible if the new block size is smaller
well, no, possible
but leads to more fragmentation
I just followed the same train of thought
@lib/db32-64.l in fact does something like that
I used it to port old 64-only DBs to the new system
Single-file DBs with 64 blocks
64-byte blocks
oh, no
this is porting 32-bit DB to pil64
But similar
whatwas the reason to allow flexible blocksize?
Large objects may take too many blocks
possibly fragmented
have you seen a lot of fragmentation?
I never measured
But it surely will
have you meassured performance improvements?
Not exactly
Imagine a B~tree node of 4 KiB
or 1 K
i understand the theory
it would be 16 blocks
possibly spread over the full file size
means 16 seeks and 16 reads
more for larger symbols
so a flexible size is a *must*
It is just that when i do stuff like that you cry "premature optimazation" ;)
no, it is different
I know the limits
used single file DB for 10 years
hey das war ein Witz
and hit limits for huge DBs
yeah :)
anyway. I now got a better understand how the DB works at the header level.
So it was not premature, but too slow for the Smapper projects
all so it cleared up my confusion about *Dbs and why it is not set on import
Also very important is to have critical indexes in their private file
Which import do you mean?
on pool
Can i try to summarize?
So "open"
Oh a question appeared what was NEXT and FREE good for. I Think next is the offset of the root cell
Next is the next free block, ie. the end of the file
Free is the start of the free list
ie deleted blocks
Next is redundant for normal files, but you can also use d /dev/ directly (without filesystem), so there is no EOF
So Next is rougly the size of the file (number of blocks) and free is a pointer to the next free cell
yes, to the *first* free cell perhaps
a linked list of free blocks
what to the first free cell perhaps?
How do you mean that?
the "what"
Free is the index of the first free cell or Next is the index of the first free cell?
blocks, not cells here
Free is the avail list
Next is size of the file divided by blocksize
that makes perfect sense when explained that way
all these "pointers" must be shifted by the scale factor
i am getting something wrong or can a file only contain 2^5 blocks?
2**42 (4 Tera) Blocks per file
2**16 Files -> 256 Peta objects
i tihnk it is only 2^41
It is 42
6 * 7
i made a mistake
question why only 6*(8-1) and not (6*8)-1 ?
It is 6 bytes (48), but 6 bits reserved
i thought only the LSB was always 0
longshi has quit [Ping timeout: 252 seconds]
Block 0: | Free 0| Next 0| << |
suggest that only the last bit is reserved in free
I dont remember new
well that limit in unlikely to be hit.
The lowest 6 bits in a pointer are reserved
marker for first block, and following
eg see "ID-Block:"
It result from the non-shifted min of 64 bytes
summary time
A PicoLisp DB file contains a header inside it's first block, since the header is always smaller than 64 bits (the smallest possible blocksize) there is no problem.
The header contains several flags, the offset of the next free block (which in turn points to the next free block, creating a list of blocks to recycle which maybe fragmented), and the current block count in the allocated space for the database (stored in next).
and the block size.
"block count" is more clear than "next"
Regenaxer has left #picolisp [#picolisp]
Regenaxer has joined #picolisp
oops :)
The layout of the header is as follows: Free spans 6 bytes, but 6 bit of that are reserved giving a pilDB file a total adress space of 2^42.
the 'n' in
EXT-Block: | Link n| Data
is max 63
so if a symbol has more blocks, they all have 63
The 'n' is used only by 'dbck'
consistency check
The block count (refered as NEXT in the documentation) is also 6 bytes with 6 bit reserved for flags. Again 2^42 blocks
If it is 0, it is used in several places
0 is used by 'seq' to find the next ID block
Skipping EXT-Blocks
So it mostly checks for zero or non-zero
The next byte (byte 12 when starting to count with 0) encodes the blocksize. It is used a shift factor which shifts the smallest possible blocksize (64 bit) to the left. If a value of 2 (default value) is picked the blocksize 256 bytes. It is rare to see values larger than 7.
even 6 is seldom
I never used 7
but who knows?
Is there any information in the header i did not document (other than the flags?)
No, that's all
with 7 you get bigger than most hard disk sectors
the sectors are pretty irrelevant, not even known
it reads full tracks
and caches them
Logical sector size is 512 still (?)
Unix blocksize is 8192 usually
buffer size used by stdio etc.
But these disk sectors are only relevant to performance and have only a minor impact
Not even relevant to performance any more I suspect
The sectors are completely tranparent as I see it
Opposed to the block size, which can result in a lot of jumping when to small (slowing performance a lot) or in a lot of uselessly transfared 0 and increased file size and more ram use
when to big
Since you want refer records of different sizes in a database and these are stored in different files. I am curious how pilDB refers objects in ther databases. I suspect ext is involved
'ext' is for other DBs (not the one opened by 'pool')
A PicoLisp db maybe either a file with a single blocksize or a folder of different db files (with differing or the same blocks size) and a folder for Blobs (Binary objects not stored in the db).
So how can i refer from a large object in file A to a small one in B
The size does not matter. It is simply in the data
Typically a +Link or +Joint
An external symbol is a first class object
My problem is that with the offset we can only point to location in the current file.
no, it encodes both file AND block
ahh so refering to other databases does not happen on the storage level but at the content level.
A = 1 (hax notation), so file 1 (starting tith '@' = zero)
No idea about storage or content. The symbol itself encodes its location
you are to quick for me here. Now that we have blocks, how do we store information in them?
Using PLIO
Moment, brb
Regenaxer has left #picolisp [#picolisp]
Regenaxer has joined #picolisp
The blocks are only used to store the PLIO
That is the same PLIO format used by (rd) and 'pr which is refered to as "encoded binary format" in the docs?
The point is how the symbols interact in the heap
So the block in the files are only used to fetch the data, and write back modifiaions (persistence)
"The point is how the symbols interact in the heap" Do you want to say that there is no magic translation layer. just that you jumoto a certain offset rd pilIO from there put it into address space (heap) of the pucolisp programm and the picolisp progamm is responsible what it makes out of it?
The program logic itself uses normal symbols
Even simpler
oh tell me how it can be even simpler?
The "certain offset" is used only to fetch the data
after that you have *normal* symbols
no idea of file and block
or a number?
or a cell?
only symbols
the val or prop of such a symbol can contain anything
numbers, lists, other (also external) syms
except nil as key in the property list ;)
And *not* certain graph structures ;)
\me tries looks like he is not the perpetrator
hehe, no
So the pilIO in a database is only symbols "containing" lists, numbers and other symbols?
It stores a single value and one propert list in a single block(-list)
as a symbols do in picolisp (except NIL, T)
NIL and T *may* have properties, just the value is protected
I played around with 'pr and used 'hd to inspect the resulting file. I tried to serialize a symbol with 'pr but it did not work out can you take a look?
: (setq Sym 'Value)
-> Value : (out "sym" (pr Sym)) -> Value
'pr' does not serialize a complete symbol
: (hd "sym")
only prints an expression
Ahhh that explains my problem
how do serialize a symbol then
so pr is only for lists and numbers?
you could do (out "sym" (pr (val Sym) (getl Sym)))
no, also internal, external, transient symbols
so when are two symbols the same, the name does not cut it since there are symbols which have no name
i have played around and i noticed that there are two different "sames"
yes, '=' and '==' ?
somethings are = but not ==
'==' is exactly the same item
ie the same address
"pointer equality"
so they have the same reference
longshi has joined #picolisp
Comparison with '==' is fast
'=' needs to traverse the structures
(name characters, or list elements)
Addresses point at cells
orivej has quit [Ping timeout: 245 seconds]
they point to the first, 4th or 8th byte of a cell (in pil64)
first is pair, 4th is bignum and 8th is symbol
Why that?
so for a symbol it points to the value
(car Sym) is the same as (val Sym)
Is it possible to have two "different" addresses point to the same cell? As one address says you find a number the other says you find a list?
It would never be the same address, because the tag bits modify it
Would be possible if the tag systematics would be different
(outside the actual pointer)
so having two different of address to a single cell only causes chaos?
Not chaos, you can do that with 'adr'
Lets say it is a bit surprising
is there any use in it?
and may crash easily
Some debugging
or brute force poking in the heap
let me try to summarize about address and cells
PicoLisp has a heap made out of cells where it stores all symbols, numbers and lists?
cells itself are not aware whether they are number or a symbol, you need an address containg the type get meaning out of a cell
It is in the eye of the observer
a cell just *is* ;)
i always thought there were flags for types in the cell. Guess i was very mistaken.
yes, only the mark bits
gc and circ
that reminds me of evil structures i built.
the graphs?
anyway is there more to say about data storage in ram, than there are typeless cells and the type is in the pointer?
Oh there is garberage collection ... but that is a topic for another time.
Perhaps that the heap is segmented in chunks
they are linked together
Each heap segment is 1 MiB
i think that is not important at application level. it is a the GC level
tracking back we started out with the database now that i got a better understanding of cells can you explain again the "simple" way how we load {A7}.
the name has two parts
the first one is in "hax" notation, @ - O
so it is a hex number encoding the file
the other is in octal notation for the block number
pil32 user a syntax {123-45}
a that is why i cam in to trouble when trying to get the {9}
pil32 was inefficient
the name was really in ASCII
and needed a delimiter "-"
with hax/octal it is clear which part is which
and internally it is stored as interleaved bit patterns
so few files with few objects result in shorter names
and thus less space on disk
(in the heap the size is always exactly one cell)
i still do not understand the roole of the interleaved bit pattern
where do we use it?
the name is technically a number
a short num
using a variable size in PLIO
so external symbols have a name
Do hd on (pr 12) vs (pr 123456789)
it is the bit pattern
is that name {A7} or the bitpattern of
in the TAIL part of the sym
A7 gives a small number
even smaller are objects in the first file
so the bit pattern is used in the heap as address?
use only 2 bytes
not in the heap
only in PLIO
in the heap it is used to locate the block
ah, that is how we refer to blocks in different files
for read and later for write
always a file and a block offset
The application does not use this name
Only when printing it during debugging
so when i want to point somewhere i have a symbol at a block offset which is encode value first the property list. and the value is an interleaved bit pattern?
this is {A9}?
no, the name is the bit pattern
9 is not legal octal
Would be {A11)
where is the name stored i though we only serialized value and p-list in to the blocks with pilIO
The name of *that* symbol does not need to be stored
but if the data *in* the symbol refer to other objects, it is stored there
because it was implicit in the position?
try (out "a" (pr '{A11})) (hd "a")
You mean:but if the data *in* the symbol refer to other symbols, the name of the other symbol is stored there?
encoded as external symbol in PLIO
00000000 0F 09 00 10 ....
0F is A
09 is 11 in octal
just as an internal symbol 'a' would be encoded as INTERN + "a"
what was 00 01 for?
INTERN being a some constant from pilIO right?
the lowest 2 bits of the first byte encode th type
then 3 for the length
05 is intern?
3 << 2 | 3
The last line shows how many bytes are needed in PLIO
ahh so when a symbol {B2} refers to {A11} then there is a block at b2, which when rd is {A11} ?
So A11 needs 3 bytes
hmm, no
when a symbol {B2} refers to {A11} then the data (some property) have {A11}
(show '{B2})
data = value?
you see {A11} somewhere then
mhh so it depends how i refer to ti
Or DB symbols the value usually holds the classes
if i use classes it is in the p-list ofcourse
But in B-tree nodes *all* is in the value
ah and if we have really many files or objects we need 4 or 5 bytes?
Regenaxer: that is so the value does not need to be skipped?
(B tree)
Where would it be skipped?
if B trees used the p-list instead of the value
and we serialize the value first, as per convention
and the value is useless, we would need to skip it
which is more inefficient that it has to be
is that the reason why the b tree uses the value?
Regenaxer has quit [Ping timeout: 268 seconds]
Regenaxer has joined #picolisp
The btree node needs no properties
it is a list structure
searched with 'rank'
i came across something weird while playing around
finish your thought
no, done
? (pool "g") -> T ? (set (print (new T)) 9) {2}-> 9 ? (commit) -> T ? (bye) joto@l148:~$ pil + : (pool "g") -> T : {2} -> NIL
I tried to store a number in the value of {2}
it is not fetche this way
you need (val '{2})
there is my9
i am happy
The lowest-level eval does not trigger fetching of the symbol
I thought that if X is a symbol (= X (val X))
It would be very expensive
and never happens
as externals should never be directly in the code
except *DB
It worked in when i was just using the heap. But it holds no longer true for DB, you are right about the not using {2} in code.
So you always have 'val' or 'get' or derived
yes, the name is not known
and *if* code helds such a symbol, it would not be gc'ed and fill up the heap
I think it we got all the basic about the storage side of data in the DB and in the heap
on a single symbol potentially the whole DB may hang
OK, good :)
or have missed anything?
So lets stop for today, I need some stuff to clean up
but you can ask again
or investigate a little
Another day it might be interesting to built an understanding of actual db usage with classes from scratch
ok, yes, the next layer
There are still some question marks in my mind there.
For example how does picoLisp know where the index starts when there is no explicit reference to the index starting external symbol in the code
it all hangs on the *DB value
on {1}
we saw last time
so {1} has a property for the indexes in the DB?
Entities are properties in {1}
i ask you what entities are next time
We saw last time with (edit *DB)
it was really enlightening
Great! :)
Have a good night!
good night
I'm afp now
bye! :)
You earned it
beneroth aw- alexshendi razzy rick42 tankf33der and all others. Do you have thoughts on what we just did?
razzy has quit [Ping timeout: 250 seconds]
shpx has joined #picolisp
freemint has quit [Quit: Page closed]
shpx has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]