top | item 35388511

(no title)

erikcw | 2 years ago

Ideally you’d used Parquet or ORC if querying from Athena/Trino/Presto. Since they are columnar formats you will enjoy considerably faster queues and lower query costs for most query patterns other than “select *…” since the query engine can just retrieve the columns needed for your query instead of the entire row.

discuss

order

mateo411|2 years ago

100 percent correct.

Ideally you will also partition the data, so your queries can use partitions. Athena charges $5 per terabyte of data scanned, so it's important to get this right.