top | item 34771959

(no title)

stagg | 3 years ago

We're using it to migrate data pipelines in AWS which were previously run using Glue to Lambda with duckdb. Glue was too heavyweight, slow and expensive for our GB data volumes. We consume csv files use a lambda and duckdb to convert them to parquet. Then another lambda to load these parquet files and do our transformation logic (deduplications, enrichments, clean up, etc) and writing out to parquet files.

discuss

order

smt88|3 years ago

Hmm, interesting... so basically DuckDB works in this case because there's no way to parallelize the migration of a single volume anyway?

This is definitely a pretty niche case, though, so there must be something more general that this was built to do.