top | item 46780489

Database Benchmarks Lie (If You Let Them)

15 points| exagolo | 1 month ago |exasol.com

12 comments

order

ugamarkj|1 month ago

This got me curious about our Exasol environment, which we've been running since 2016 at Piedmont Healthcare. We average 2 million queries per day (DDL/DML/DQL). Our query failure rate is ~0.1%. Only 7% of those failures were due to hitting resource limits. The rest were SQL issues: constraint errors, data type issues, etc. Average connected users is ~400. Average concurrent queries is ~7 with a daily max average of ~78 concurrent queries. Avg query time across DQL statements is around 10 seconds, which is only that high due to some extreme outliers -- I have users that like to put 200k values in a WHERE clause IN statement, and Tableau sometimes likes to write gnarly SQL with LOD calcs and relationship models.

TPC-H benchmarks are what convinced us to purchase Exasol 10 years ago. Still happy with that decision! Congrats to the Exasol team on these results vs ClickHouse.

asteroidtunnel|1 month ago

Very interesting. What are the bottlenecks you've faced with Exasol?

"200k values in a WHERE clause IN statement"? What is that column about?

Average concurrent query is ~7 in what time period?

asteroidtunnel|1 month ago

When I search for "high performance analytical database" in Bing, Ai summarized results are ClickHouse, Apache Druid, Singlestore, Couchbase, and Apache Pinot are considered among the best databases for real-time analytics due to their low query latency and high performance.

In Google, Ai summarized results are ClickHouse, StarRocks, Snowflake, and Google BigQuery.

Clickhouse is there in both of them and Exasol is not mentioned. If these claims were relevant, why is it not in the limelight?

Clickhouse is known to ingest and analyze massive volumes of time-series data in real-time. How good is Exasol for this use case?

dashdoesdata|25 days ago

You need to look at use-case alignment as well as performance.

Apache Pinot, Druid and Clickhouse are designed for low-latency analytical queries at high concurrency with continuous ingestion. Pinot is popular because of it's native integration with streaming systems like Kafka, varied indexing, and it's ability to scale efficiently. They're widely used in observability and user-facing analytics – which are how “real-time analytics databases” are commonly perceived today.

Exasol (and SingleStore, Snowflake, BigQuery, etc) are more focused on enterprise BI and complex SQL analytics rather than application serving, or ultra-high ingest workloads. It performs well for structured analytical queries and joins, but it’s less commonly deployed with the user-facing analytics or high volume usage.

A good rundown from Tim Berglund in this video here: https://startree.ai/resources/what-is-real-time-analytics/

exagolo|1 month ago

In case you have a single table with time-series data, then Clickhouse will perform typically better. It's very much optimized for this type of use cases. Once you are joining tables and having more advanced analytics, than Exasol will easily outperform it.

Exasol has been performance leader for more than 15 years in the market, as you can see in the official TPC-H publications, but has not gotten the broader market attention yet. We are trying to change that now and have recently been more active in the developer communities. We also just launched a completely free Exasol Personal edition that can be used for production use cases.

dataDominSA|1 month ago

This is the article I wish existed when we were evaluating platforms. "Reliability under realistic conditions is the first scalability constraint". Speed means nothing if queries don't finish.

exagolo|1 month ago

Traditional database benchmarks focus on throughput and latency – how many queries per second can be processed, how execution time changes as hardware resources increase. This benchmark revealed something different: reliability under realistic conditions is the first scalability constraint.

hero-24|1 month ago

From my experience, planning is often the first headache I have to deal with (join order, hash sizing, operator choice), before concurrency and memory even come into play.

exagolo|1 month ago

You mean the "execution plan" for your queries? Ideally, those types of decisions are automatically done by the database.