top | item 6684961

(no title)

rbolkey | 12 years ago

Any best practices or gotchas about query authoring in comparison to a typical relational database? What part of the traditional relational db mentality needs to be changed or thrown out as I write queries. For example, should I avoid using this aggregate function or that join in a way I'm familiar with in the context of something like Postgres.

discuss

electrum|12 years ago

When running against Hive data, Presto is similar to many analytic databases in that queries typically perform a full table or partition scan, so "point lookups" that search for one or a few records will be much less efficient than they would be in an OLTP system like PostgreSQL that has a precomputed index for that query. (This is actually a property of the data source and not the Presto query engine. For example, we are writing an HBase connector that can take advantage of HBase's indexes for very efficient queries.)

In general, you should be able to write your query in the simplest and most readable way, and Presto should execute it efficiently. We already have the start of an advanced optimizer that supports equality inference, full predicate move-around, etc. This means that you don't need to write redundant predicates everywhere as is required with some query engines.

Also, if you are familiar with PostgreSQL, you should feel right at home using Presto. When making decisions for things not covered by ANSI SQL, the first thing we look at is "what does PostgreSQL do".