This looks very neat. I'm someone who deals with a lot of plaintext data from a variety of sources, and so I find using ack/grep and csvkit to be efficient enough for my purposes of exploration. I love using SQL and SQLite but rarely do it for "fun" -- that is, I'll use it when I've committed to building a project, but not for exploration. This seems like it could lighten the friction quite a bit.
If anyone from AWS is here: how is this used internally at Amazon?
The real question to ask is, will Amazon contribute back to open source? Presto itself is plenty proven and scalable: after all, it was created at Facebook.
It looks really interesting but I'm surprised they launched it with the create table flow broken. The query you see here was generated by their wizard...
Would be useful if AVRO files were supported. This was the data can also be imported into Redshift if needed (Redshift does support Avro).
Other formats are schema-less (JSON,CSV, etc.) or not supported by Redshift (ORC, Parquet). Perhaps less efficient for some queries (AVRO is not a columnar format) but still useful.
Just like with BigQuery, a carefully thought out partitioning scheme is critical, or your queries need to be carefully locked down to prevent excessive table scanning. I burned through my BigQuery trial credit fast, by not using partitions during a quick-and-dirty test.
Wondering if I could use this like SQLite for Lambdas. I'd like to build some serverless apps, but the commitment to a monthly fee from DynamoDB puts me off. Could I use Athena to drive down my cost to zero as long as the app is unused?
Note the 10MB minimum "charge" per Query. For small datasets under 10MB, you'd only get up to 200 Queries for the minimum billable $0.01. That would be a fairly small number of queries, so probably not that useful. Plus you'd have all kinds of issues regarding consistency if your data was dynamic (s3 is a blob store, not a database, normal s3 consistency guarantees still apply).
I'm confused though. The monthly fees for dynamodb only apply after you exceed the free tier, and for someone who is unable to commit to a monthly fee because they envision low usage, shouldn't the free tier be sufficient? (Honest question, I'm looking at using dynamodb, but comments like this make me think I'm missing something)
DynamoDB is like $5 or $10 bucks a month? but I understand the need to keep it to a minimum.
Athena is really interesting and if it can be as it is advertised "Serverless SQL" then they've got a killer product in the pipes: A future where developers no longer need to spend time on scaling, configuring, maintaining, strategizing deployments but upload code and instantly begin reaping the benefits of serverless tech.
The only missing component that would be a killer feature is something that answers to Azure's Active Directory. It would be nice if we had serverless plug-and-play user authentication and access control that integrated with Lambda and Athena.
I'd imagine some sort of "RoR on Serverless" type of framework that will scaffold out CRUD, User Management & REST Api is going to be in the works as well.
The only potential downside I see at the moment for Serverless is the uncertainty surrounding cold boots, it will directly affect user experience. It's fine when you got enough traffic to keep things in the "warm" state but there needs to be no dead zone when the call to the API Gateway is taking many seconds waiting for Lambda function to fire.
danso|9 years ago
If anyone from AWS is here: how is this used internally at Amazon?
ktamura|9 years ago
kermatt|9 years ago
I wonder if this is essentially a Presto SaaS product?
maslam|9 years ago
spullara|9 years ago
https://www.dropbox.com/s/s4cw5x7yyrdl3ch/Screenshot%202016-...
jakozaur|9 years ago
Even the pricing is same: $5 / TB of data scanned.
estefan|9 years ago
buremba|9 years ago
nimrody|9 years ago
Other formats are schema-less (JSON,CSV, etc.) or not supported by Redshift (ORC, Parquet). Perhaps less efficient for some queries (AVRO is not a columnar format) but still useful.
dhananjayc|9 years ago
nodesocket|9 years ago
neximo64|9 years ago
raghavsethi|9 years ago
cdevs|9 years ago
bsg75|9 years ago
nulagrithom|9 years ago
asteadman|9 years ago
I'm confused though. The monthly fees for dynamodb only apply after you exceed the free tier, and for someone who is unable to commit to a monthly fee because they envision low usage, shouldn't the free tier be sufficient? (Honest question, I'm looking at using dynamodb, but comments like this make me think I'm missing something)
brilliantcode|9 years ago
Athena is really interesting and if it can be as it is advertised "Serverless SQL" then they've got a killer product in the pipes: A future where developers no longer need to spend time on scaling, configuring, maintaining, strategizing deployments but upload code and instantly begin reaping the benefits of serverless tech.
The only missing component that would be a killer feature is something that answers to Azure's Active Directory. It would be nice if we had serverless plug-and-play user authentication and access control that integrated with Lambda and Athena.
I'd imagine some sort of "RoR on Serverless" type of framework that will scaffold out CRUD, User Management & REST Api is going to be in the works as well.
The only potential downside I see at the moment for Serverless is the uncertainty surrounding cold boots, it will directly affect user experience. It's fine when you got enough traffic to keep things in the "warm" state but there needs to be no dead zone when the call to the API Gateway is taking many seconds waiting for Lambda function to fire.
balls187|9 years ago
balls187|9 years ago
asafm|9 years ago
intrasight|9 years ago
justinsaccount|9 years ago
Simply point to your data in Amazon S3
mrwnmonm|9 years ago