(no title)
erikcw
|
2 years ago
Ideally you’d used Parquet or ORC if querying from Athena/Trino/Presto. Since they are columnar formats you will enjoy considerably faster queues and lower query costs for most query patterns other than “select *…” since the query engine can just retrieve the columns needed for your query instead of the entire row.
mateo411|2 years ago
Ideally you will also partition the data, so your queries can use partitions. Athena charges $5 per terabyte of data scanned, so it's important to get this right.