(no title)
rmgraham | 4 years ago
There are definitely ways to do it without those problems, though. They just wouldn't be quite as simple as the approach done for supporting zip.
rmgraham | 4 years ago
There are definitely ways to do it without those problems, though. They just wouldn't be quite as simple as the approach done for supporting zip.
remram|4 years ago
klauspost|4 years ago
This may be feasible on small TAR files, and for single PutObject you could index while uploading. However for multipart objects, parts can arrive in any order so you are forced to read it back. This would lead to unpredictable response times.
Compare that to reading the directory of a zip, which maybe on big files are a couple of megabytes max.
Add to that that tar.gz will require you to decompress from the start to reach any offset. You can recompress while indexing, but an object-store mutating your data is IMO a no-no.
danudey|4 years ago
Then he just had to write some code to index article names based on which chunk(s) they were in, and boom, random-access compressed archive.
blacha|4 years ago
I work with serving tiled geospatial data [2] (Mapbox vector tiles) to our users as slippy maps where we serve millions of small (mostly <100KB) files to our users, our data only changes weekly so we precompute all the tiles and store them in a tar file in s3.
We compute a index for the tar file then use s3 range requests to serve the tiles to our users, this means we can generally fetch a tile from s3 with 2 (or 1 if the index is cached) requests to s3 (generally ~20-50ms).
To get full coverage of the world with map box vector tiles it is around 270M tiles and a ~90GB tar file which can be computed from open street map data [3]
> Though even that would only work with a subset of compression methods or no compression.
We compress the individual files as a work around, there are options for indexing a compressed (gzip) tar file but the benefits of a compressed tar vs compressed files are small for our use case
[1] https://github.com/linz/cotar (or wip rust version https://github.com/blacha/cotar-rs) [2] https://github.com/linz/basemaps or https://basemaps.linz.govt.nz [3] https://github.com/onthegomap/planetiler
unknown|4 years ago
[deleted]