top | item 41768371

(no title)

dmw_ng | 1 year ago

That's been a feature of S3 for quite a long time now, called S3 Select https://docs.aws.amazon.com/AmazonS3/latest/userguide/select...

Despite it being an awesome feature I've been itching to use, I've never actually found a use for it beyond messing around. Most places where S3 Select might make sense seems to be subsumed (for my uses) by Athena. Athena has a rather large amount of conceptual and actual boilerplate to get up and running with, though, S3 Select requires no upfront planning beyond building a fancy query string (or using their SDK wrappers)

Where S3 Select is likely to become fiddly is anywhere multiple files are involved. Athena makes querying large collections of CSVs (etc) straightforward, and handles all the scheduling and results merging for you.

discuss

ju-st|1 year ago

S3 Select is not available anymore for new customers. Athena with columnar file format (eg parquet) in S3 and partitioning with Glue Data catalog is the solution for OP's problem. The cost of this kind of queries is very low because you only pay the actual data consumed/requested. And with the columnar file format Athena only accesses the necessary columns. And the data in the columns is usually compressed so the amount of data is even less.

twic|1 year ago

> Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual.

But you and patrickthebold are spot on in pointing out Athena. I've always thought of it as a database you load via S3, but of course it's equally a tool for querying data in S3.