aw- has quit [Quit: Leaving.]
ubLIX has quit [Quit: ubLIX]
orivej has quit [Ping timeout: 244 seconds]
xkapastel has quit [Quit: Connection closed for inactivity]
aw- has joined #picolisp
alexshendi has quit [Ping timeout: 252 seconds]
rob_w has joined #picolisp
rob_w has quit [Client Quit]
<
Regenaxer>
Hi tankf33der!
<
tankf33der>
morning all
<
Regenaxer>
I'm still testing the new DB 'create' function
<
Regenaxer>
but will release soon
<
Regenaxer>
Do you think you (we) can test with your taxi data?
<
tankf33der>
i think so
<
Regenaxer>
What is the volume?
<
Regenaxer>
ie. number of objects
<
tankf33der>
counting one month.
<
tankf33der>
this is root file.
<
tankf33der>
i work with yellow cabs
<
tankf33der>
this is my last workflow.
<
Regenaxer>
Is it one billion objects?
<
Regenaxer>
This still take time to import
<
tankf33der>
all of them, yes
<
tankf33der>
yellow cabs one month ~14M trips
<
Regenaxer>
100 million are no problem
<
Regenaxer>
For small objects (3 indexes) it takes 2 hours here
<
Regenaxer>
But time increases rapidly for larger sets, as it needs more passes over the data
<
razzy>
why would you need multiple passes when loading data?
<
razzy>
for indexing?
<
Regenaxer>
Sorting
<
razzy>
why not sort in "free" time after data are loaded as coroutine perhaps?
<
Regenaxer>
What is free time?
<
razzy>
every actually used database need constant sorting anyways :]
<
razzy>
when there are no other jobs?
<
razzy>
problem suggesting picolisp scheduler :]
<
Regenaxer>
tankf33der, do we need all the indexes?
<
Regenaxer>
eg on time?
<
Regenaxer>
the +Aux with date is not enough?
<
Regenaxer>
(each index is expensive)
<
tankf33der>
i dont remember details
<
Regenaxer>
We can try anyway
<
tankf33der>
i believe only you could do it
<
Regenaxer>
I will suggest a slightly modified 'dbs'
<
Regenaxer>
You solution is almost perfect
<
Regenaxer>
I would just put each index into a separate file
<
Regenaxer>
(a requirement by 'create' when importing in parallel)
<
tankf33der>
i still dont understand it, i cant help you here at all!
<
Regenaxer>
no problem
<
Regenaxer>
Instead of making more files, I suggest we remove the +Ref's from 'pt' and 'dt' and see later whether we need them
<
Regenaxer>
or are they needed?
<
Regenaxer>
eg find all trips between 10:00 and 10:30 independent of the day?
<
Regenaxer>
I think this would be a very special case only
<
Regenaxer>
Any index we don't need saves a
*lot* of time
<
Regenaxer>
CPU resources
<
Regenaxer>
You need about a GiB RAM per index
<
Regenaxer>
razzy: Hint, this is for sorting
<
Regenaxer>
tankf33der: What machine can you use
<
Regenaxer>
How many cores?
<
Regenaxer>
The indexes are imported by parallel processes, so one core per index would be good
<
Regenaxer>
plus enough RAM
<
tankf33der>
i dont have powerfull machine
<
Regenaxer>
So better use less indexes ;)
<
Regenaxer>
One more point: It would be very helpful if
*one* of the data columns is already sorted
<
Regenaxer>
Is that the case? E.g. date/time?
<
Regenaxer>
Then we save one index process
<
Regenaxer>
hmm, I just see that I'm not ready yet
<
Regenaxer>
Need to modify 'create' a little ;)
<
Regenaxer>
I have no way to import NON-indexed data. Like 'amnt'
<
beneroth>
Good morning all
<
Regenaxer>
Good morning beneroth
<
beneroth>
Regenaxer, so you centralise work on one index to a single process?
<
beneroth>
I see. nice.
<
beneroth>
probably not relevant for you, Regenaxer - else I will tell you when I eventually read it.
<
aw->
hi beneroth, Regenaxer
<
beneroth>
hi aw- o/
orivej has joined #picolisp
<
Regenaxer>
Thanks beneroth for the link. Downloaded it
<
aw->
just curious, what Let's Encrypt client are you guys using?
<
beneroth>
mainly getssl (bash implementation). on one server I use the official certbot.
<
Regenaxer>
I'm using certbot certonly --standalone
<
beneroth>
I don't like the big dependencies of certbot, installing python on machines which don't need it otherwise :/
<
beneroth>
I would like to have a picolisp implementation, but so far I haven't found the time to do it :)
<
Regenaxer>
yes, python installations are a nightmare
<
aw->
ok so everyone's in the same boat as me
<
aw->
i was using something designed for OpenBSD, but the author recently decided to EOL his tool because it was officially integrated into OpenBSD
<
aw->
dropped support for other OS's/platforms
freemint has joined #picolisp
<
aw->
would be nice to have a picolisp alternative :)
xificurC has joined #picolisp
<
freemint>
I use letsacme running on python and i am happy
<
Regenaxer>
Perfect! Now 'create' can also directly import non-index data
<
Regenaxer>
Testing again
<
Regenaxer>
But I think I can release
<
Regenaxer>
I replaced the +Aux with a +Bag
<
tankf33der>
i will check, but all this white noise for me.
<
Regenaxer>
An +Aux would work too
<
tankf33der>
i didnt progress in picodb :/
<
Regenaxer>
Where is the problem? ;)
<
Regenaxer>
The piece you did looked perfect
<
Regenaxer>
I just modified a little for import
<
Regenaxer>
One thing is important:
<
Regenaxer>
The date+time in the CSV should be sorted
<
Regenaxer>
is that the case?
<
Regenaxer>
If not we must slightly change the 'create' call
<
Regenaxer>
and it gets slower (a very little bit)
<
Regenaxer>
But normally you have at least one column already sorted, so I want to take advantage of it
<
Regenaxer>
tankf33der, note that I did not bother to search for a proper CSV, so the above paste is not tested at all
freemint has quit [Ping timeout: 250 seconds]
xkapastel has joined #picolisp
freemint has joined #picolisp
orivej has quit [Ping timeout: 245 seconds]
aw- has quit [Quit: Leaving.]
orivej has joined #picolisp
aw- has joined #picolisp
orivej has quit [Ping timeout: 268 seconds]
orivej has joined #picolisp
orivej has quit [Ping timeout: 246 seconds]
orivej has joined #picolisp
ubLIX has joined #picolisp
freemint has quit [Ping timeout: 268 seconds]
freemint has joined #picolisp
libertas has quit [Ping timeout: 246 seconds]
libertas has joined #picolisp
aw- has quit [Quit: Leaving.]
aw- has joined #picolisp
aw- has quit [Quit: Leaving.]
aw- has joined #picolisp
ubLIX has quit [Quit: ubLIX]
freemint has quit [Ping timeout: 268 seconds]
freemint has joined #picolisp
freemint has quit [Ping timeout: 250 seconds]
freemint has joined #picolisp
razzy` has joined #picolisp
razzy has quit [Ping timeout: 246 seconds]
freemint2 has joined #picolisp
freemint has quit [Ping timeout: 245 seconds]
freemint2 has quit [Ping timeout: 268 seconds]
freemint has joined #picolisp
pchrist has quit [Quit: leaving]
pchrist has joined #picolisp
freemint2 has joined #picolisp
freemint has quit [Ping timeout: 245 seconds]
freemint2 has quit [Ping timeout: 272 seconds]
freemint has joined #picolisp
orivej has quit [Ping timeout: 244 seconds]
freemint2 has joined #picolisp
freemint has quit [Ping timeout: 250 seconds]
orivej has joined #picolisp
alexshendi has joined #picolisp
ubLIX has joined #picolisp
razzy` has quit [Remote host closed the connection]
razzy` has joined #picolisp
razzy` has quit [Quit: ERC (IRC client for Emacs 26.1)]
razzy has joined #picolisp
freemint has joined #picolisp
freemint2 has quit [Ping timeout: 268 seconds]
freemint has joined #picolisp
xificurC has joined #picolisp
ubLIX has quit [Ping timeout: 245 seconds]
ubLIX has joined #picolisp