top | item 40764749

(no title)

Howdy! I work at MotherDuck and DuckDB Labs (part time as a blogger). At MotherDuck, we have both client side and server side compute! So the initial reduction from PB/TB to GB/MB can happen server side, and the results can be sliced and diced at top speed in your browser!

discuss

victor106|1 year ago

Does duckdb work with delta files?

code_biologist|1 year ago

Please spend a sentence or two explaining the server side filtering mechanism and linking to documentation! I would like to know the conditions required for streaming queries! From the sibling comment and a search of the docs it seems like this is a Parquet only feature, which seems pretty important to note!

FridgeSeal|1 year ago

Parquet is designed with predicate push-down in mind. Partitions are laid out on disk, and then blocks within files are laid out so that consumers can very, very easily narrow in on which files they need to read, before doing anymore IO than a list, or a small metadata read.

Once you know what you are reading, many parquet/arrow libraries will support streaming reads/aggregations, so the client doesn’t need to load the whole working set in memory.

justincormack|1 year ago

Not specific to Ducks but S3 select https://docs.aws.amazon.com/AmazonS3/latest/userguide/select... can filter Parquet server side on S3 and is supported by some other object stores.