(no title)
Ameo | 5 months ago
It might be the case that SQLite has a simpler or less sophisticated query planner than other databases like Postgres or MariaDB, but in my experience those DBs struggle a lot with good querying planning as well. I’ve spent many hours in the past with issues like Postgres suddenly starting to ignore an index entirely because its computed table data distribution statistics got out of balance, or having to add manual annotations to MariaDB queries like STRAIGHT_JOIN in order to get a query to run faster.
I’m guessing that this is a really hard problem since it doesn’t seem to be really “solved” by any major DB vendor I’ve seen. A lot of modern DB engines like Clickhouse tend to just work around this problem by being so fast at full table scans that they don’t even need any sophisticated indexing set up at all.
crazygringo|5 months ago
This doesn't appear to be true at all.
The order of WHERE conditions does not matter; the order of columns in an index does.
Everything you're describing is pretty much just how indexes fundamentally work in all databases. Which is why you're saying it hasn't been "solved" by anyone.
Indexes aren't magic -- if you understand how they work as a tree, it becomes very clear what can be optimized and what can't.
It is true that occasionally query planners get it wrong, but it's also often the case that your query was written in a non-obvious way that is equivalent in terms of its formal results, but is not idiomatic -- and making it more idiomatic means the query planner can more easily understand which indexes to use where.
Ameo|5 months ago
The order of conditions in a WHERE definitely does matter, especially in cases where the conditions are on non-indexed columns or there are CPU-intensive search operations like regex, string ops, etc.
I just ran this test locally with a table I created that has 50 million rows:
``` » time sqlite3 test.db "select count() from test WHERE a != 'a' AND a != 'b' AND a != 'c' AND a != 'd' AND b != 'c' AND d != 'd' AND e != 'f' AND f = 'g'" sqlite3 test.db 5.50s user 0.72s system 99% cpu 6.225 total » time sqlite3 test.db "select count() from test WHERE f = 'g' AND a != 'a' AND a != 'b' AND a != 'c' AND a != 'd' AND b != 'c' AND d != 'd' AND e != 'f'" sqlite3 test.db 1.51s user 0.72s system 99% cpu 2.231 total ```
The only difference is swapping the `f = 'g'` condition from last to first. That condition never matches in this query, so it's able to fail fast and skip all of the work of checking the other conditions.
SkiFire13|5 months ago
There's only so much you can do with this approach due how to the algorithmic complexity scales as more joins are added. At some points you'll need some additional data structures to speed things up, though they not be indexes in name (e.g. materialized views)
setr|5 months ago
Their problem is instead that getting back to a row, even within a table, is essentially a join. Which is why they fundamentally suck at point lookups, and they strongly favor analytic queries that largely work column-wise.
emschwartz|5 months ago
Ameo|5 months ago
I just ran this test locally with a table I created that has 50 million rows:
``` » time sqlite3 test.db "select count() from test WHERE a != 'a' AND a != 'b' AND a != 'c' AND a != 'd' AND b != 'c' AND d != 'd' AND e != 'f' AND f = 'g'" sqlite3 test.db 5.50s user 0.72s system 99% cpu 6.225 total » time sqlite3 test.db "select count() from test WHERE f = 'g' AND a != 'a' AND a != 'b' AND a != 'c' AND a != 'd' AND b != 'c' AND d != 'd' AND e != 'f'" sqlite3 test.db 1.51s user 0.72s system 99% cpu 2.231 total ```
The only difference is swapping the `f = 'g'` condition from last to first. That condition never matches in this query, so it's able to fail fast and skip all of the work of checking the other conditions.