(no title)
ozataman | 8 years ago
For example, you may need to provision a 20-node cluster only because you need the 10+ terabytes in storage across several datasets you need to keep "hot" for sporadic use throughout the day/week, but don't nearly need all that computational capacity around the clock. Unlike BigQuery, Redshift doesn't separate storage from querying. Redshift also doesn't offer a practically acceptable way to scale up/down; resizes at that scale take up to a day, deleting/restoring datasets would cause lots of administrative overhead and even capacity tuning between multiple users is a frequent concern.
Making matters worse, it is common for a small number of tables to be the large "source of truth" tables that you need to keep around to re-populate various intermediate tables even if they themselves don't get queries that often. In Redshift, you will provision a large cluster just to be able to keep them around even though 99% of your queries will hit one of the smaller tables.
That said, I haven't tried the relatively new "query data on S3" Redshift functionality. It doesn't seem quite the equivalent of what BigQuery does, but may perhaps alleviate this issue.
Sidenote: I have been a huge Redshift fan pretty much since its release under AWS. I do however think that it is starting to lose its edge and show its age among the recent advances in the space; I have been increasingly impressed with the ease of use (including intra team and even inter-team collaboration) in the BigQuery camp.
joeharris76|8 years ago
Spectrum extends that even further, allowing you to have recent and reference data locally stored and keep archival data in S3 available for query at any time.