<raggi>
evan: drbrain: i'm also thinking those dep indexes have enough info to avoid loading gemspecs for activation - it might be worth considering in place of https://github.com/rubygems/rubygems/pull/435
<evan>
raggi: I think the biggest bang for the buck is per-gem indexes and/or incremental updates
<raggi>
drbrain: no worries, i was going to do the real impl. in a PR anyway, and not planning to delete the old indexers yet
<raggi>
evan: yep
<raggi>
evan: this serves both
<evan>
I don't see a reason for your names file
<raggi>
evan: mirrors
<raggi>
that's the only use case
<evan>
why do they need that?
<raggi>
discovery
<raggi>
oh
<raggi>
they can use verisons
<raggi>
nvm
<raggi>
sorry, was thinking of a different version- k, i'll remove that
<raggi>
i was thinking to implement IndexCommand to support both an incremental and full modes, and implement incrememntal inside rubygems.org, with maybe a rake task we can run if/when we want to recompress
<raggi>
evan: drbrain: do you have any objections to the concept of leaving the webserver to handle gzip, and making the fetcher send the appropriate negotiation headers, etc?
<drbrain>
we'd need to add code to RubyGems to do it for 1.9 and older
<drbrain>
(which is no objection)
<raggi>
drbrain: right, the fetcher client would need work to consume this anyway, so i was going to do what i'd view as a "complete" client solution in the same pass through
<evan>
raggi: how about incremental updates?
<drbrain>
yup
<raggi>
evan: so, the client is going to parse the file by appending to a hash like the read examples do, so incremental updates can just be line-appends to the various indices
<evan>
but how does the client get them?
<drbrain>
range requests?
<raggi>
evan: the client can make a conditional HTTP Range request to get data after what's on disk
<evan>
just use http byte-ranges?
<raggi>
yeah
<raggi>
followed by a local checksum
<drbrain>
that gets tricky with gzip compression
<raggi>
if the checksum fails, refetch whole file
<drbrain>
the range request is on the gzip content
<raggi>
drbrain: did i space something there, i thought it was on the resource content
<raggi>
drbrain: well, i'd still preference range requests over gzip, so if that has to lose gzip, that seems ok?
<evan>
so
<evan>
i don't really care about incremental updates to the full index
<evan>
if there is a deps/rails
<evan>
file
<drbrain>
all the server's I've done Range + Accept-Encoding: gzip on give partial gzip output, not full gzip streams
<evan>
that lists all the versions and info
<evan>
because 99% of rubygems usage keys off the gem name
<raggi>
oh, yeah, that was the other use case for names
<raggi>
the search stuff, but that can be done with the versions list too ofc
<evan>
I think hoping for the clients to do range requests with gzip is too much
<raggi>
(the hamming distance / metaphone stuff we inherited)
<evan>
I don't think we can get it working well.
<raggi>
if we can't that's totally fine
<raggi>
if it's pulling the whole file, it can pull a gzip
<raggi>
if it's pulling incremental, it can pull plain
KewinWang has joined #rubygems
<raggi>
incrememnts will be tiny
<evan>
if we have deps/<name>
<evan>
I don't see using the full index often at all
<evan>
because how often do you not know the name you're looking for?
<evan>
it would only be search
yut148 has quit [Read error: Connection reset by peer]
<raggi>
i agree, it's search and mirrors basically
<drbrain>
it's my dinner time
<raggi>
no worries, thanks for the input
yut148 has joined #rubygems
<drbrain>
I think you and evan know the needs of the index better than I do anyhow :D
<raggi>
it sounds liek there's a few other things to consider, but i'll start getting a more concrete implementation together so there's more real stuff as a subject
<raggi>
right now, deps is ideal for bundler usage, and argument-less rubygems usage
<raggi>
the clients would need to pull gemspecs in order to resolve development dependencies, though
<raggi>
for gem(1), that's actualyl just pulling the canonical names gemspec, as --development-dependencies is specifically not recursive
<raggi>
which is why i set it up like this
<raggi>
in some ways, including all the deps and having them typed in the index "seems more correct"
<raggi>
but it's also much less used
<raggi>
whichever separator we end up wiht in the index, either the indexer, or the Specification validation should explicitly disallow that separator in names and platforms (versions are already tight) - none of the current proposals/options have any real world collisions though
<raggi>
so, on signing
<raggi>
I'm waiting to speak wiht the TUF guys about a few of my concerns
<raggi>
but it's looking like TUF is a good solution to generally handling signing of files available on the server
<raggi>
conveniently, TUF sits outside of those files, not embedded, so the signing of the index files won't need to be inside the format
<evan>
sorry, Zoe needed food.
<evan>
I'd say only runtime are needed
<evan>
development aren't often used.
<raggi>
that is likely going to take a little longer to get underway, so i was considering dropping a .sha1 and .md5 alongside the index files, particularly if we use ranges
<raggi>
no worries :)
<raggi>
getting back to the gzip and/vs range thing, it is interesting, these files compress so well, that it may be simpler to just gzip them and always fetch them
<evan>
I was thinking the same.
<raggi>
on the server side, we can keep a non-gzip'd file, that we append to, then gzip and upload to s3, to get incremental write performance
<evan>
pre-gzip them
<raggi>
i ahve to shoot in a second
<evan>
k
<evan>
one thing to think about
<evan>
if we care
<evan>
should we consider if/how we should deal with growth
<raggi>
yeah, i've been considering that too
<raggi>
right now, pre-gzip'd files grow slow on the transport side
<evan>
ie, the full index contains A LOT of data that never changes
<raggi>
so i'm not worried there
<raggi>
but parsing could slow down
<raggi>
equally though, this is currently just used on install
<evan>
sure, but i'd rather build in a mechanism to rotate the "end" into a new file
<evan>
so that we don't end up with a 1G file for the index at some point
<evan>
(yes, very optimizastic)
havenwood has quit [Remote host closed the connection]
<raggi>
yeah
<raggi>
i'll have a think about that
<raggi>
ok, gotta run, but i'll bbl/tomorrow
<evan>
later
sn0wb1rd has joined #rubygems
sn0wb1rd has quit [Remote host closed the connection]
<postmodern>
users of bunder-audit reported that rubygems 1.6.x could not match versions against ~> 1.2.3 versions
<postmodern>
was giving false-positives
<drbrain>
rubygems 1.6 is not supported
<postmodern>
is this a known issue or can I workaround it
<postmodern>
or should i set required_rubygems_versions to avoid 1.6.x
<drbrain>
upgrade to 1.8.x
<postmodern>
alright
corundum has joined #rubygems
zzak has joined #rubygems
phantomcircuit has joined #rubygems
imperator has joined #rubygems
phantomcircuit has quit [Quit: quit]
terceiro has quit [Read error: Connection reset by peer]
yerhot has quit [Remote host closed the connection]
terceiro has joined #rubygems
yerhot has joined #rubygems
reset has quit [Ping timeout: 260 seconds]
phantomcircuit has joined #rubygems
tcopeland has quit [Ping timeout: 260 seconds]
workmad3 has quit [Ping timeout: 255 seconds]
terceiro has quit [Read error: Connection reset by peer]
terceiro has joined #rubygems
markstarkman has joined #rubygems
markstarkman has quit [Ping timeout: 248 seconds]
havenwood has quit [Remote host closed the connection]
cowboyd has quit [Remote host closed the connection]
yerhot has quit [Remote host closed the connection]
yerhot has joined #rubygems
tbuehlmann has quit [Remote host closed the connection]
yerhot has quit [Ping timeout: 256 seconds]
imperator has quit [Quit: Leaving]
terceiro has quit [Read error: Connection reset by peer]
crandquist has quit [Quit: Leaving...]
terceiro has joined #rubygems
kgrz has quit [Quit: Computer has gone to sleep.]
<raggi>
evan: drbrain: do you think we want to keep the old rubygems index code around in the rubygems distribution? (obviously we want to keep the rubygems.org code for now)
postmodern has left #rubygems ["Leaving"]
<drbrain>
raggi: the 1.2+ version? yes
<raggi>
k, np at all
<drbrain>
people won't immediately upgrade their in-house gem servers
* raggi
nods
<evan>
they need to stick around
<evan>
at least 3 years
<evan>
maybe more.
<evan>
I need to put together the official policy about data format migration
<drbrain>
does the original index code still exist in 1.8?
<evan>
yes.
<drbrain>
or was it removed after 1.3?
<drbrain>
ok, then I must have ripped it out for 2.0
<evan>
i'd have to go back and check
<evan>
I thought it was in 1.8
<raggi>
i'll fork off master, and not break any of the existing index code
<evan>
k
<raggi>
on the client side, want support to fall back to consuming old index type?
terceiro has quit [Read error: Connection reset by peer]
<evan>
yeah
<evan>
for private gem servers
<drbrain>
that's how we did it when transferring to the 1.2 index format
* raggi
nods
terceiro has joined #rubygems
<raggi>
heh
<raggi>
so "build_modern" seems like a misnomer all of a sudden
* raggi
grins
<drbrain>
:D
<evan>
change it to build_beta_indexes
<evan>
the new ones will be called gamma indexes
<raggi>
isn't alpha better than beta?
<drbrain>
evan: 1.4 still had it, but 1.7 doesn't seem to
<evan>
k
<drbrain>
I thought delta came after beta
<raggi>
"Index ME Edition"
<evan>
whatever is after beta :)
<drbrain>
evan: you're right
<drbrain>
star trek has taught me nothing
_maes_ has quit [Ping timeout: 272 seconds]
<drbrain>
at the current rate this gives us 120 years before we run out of names for index formats
<drbrain>
give or take a decade
<evan>
thats pretty good.
eighthbit has joined #rubygems
_maes_ has joined #rubygems
hahuang65 has joined #rubygems
crandquist has joined #rubygems
<raggi>
what is Builder::XChar, i can't remember
<drbrain>
I think it does UTF-8 byte packing or something like that
<raggi>
ah
<drbrain>
I never bothered figuring out how to remove the builder dependency