(no title)
nunwuo | 10 years ago
That's unfortunately naive. For example, for a certain type of query there are two possible plans. Which one is better depends on the parameters to the plan and the distribution of the data, and getting the plan wrong can be the difference between the query finishing and the query not finishing. With a stability guarantee you can't even try to make an intelligent choice based on the parameters.
crazygringo|10 years ago
But that's exactly the point. If the SQLite got the plan right at first, you know it will always "get it right" in the future, i.e. the query will finish. Maybe it'll go from being optimal to somewhat less optimal, but it'll still be fine -- or at worst you'll see a very, very gradual slowdown (as your tables grow) that you'll have time to investigate later.
With MySQL or Postgres, that can suddenly change and go from, in your example, the query finishing, to the query not finishing, because the database suddenly chooses to use a different plan. That can be a catastrophic production failure, when a query that used to run in 10ms now takes 10s.
anarazel|10 years ago
Reality is much more complex. Unless your data set size and the distribution continously stay the same, the algorithm that made complete sense initially (say an index nested loop) will often not make any sense anymore after some growth. Imagine a nested loop over a couple hundred thousand rows where the inner side's index lookup always has to hit storage. That just won't perform and will, if run concurrently, slow down the entire system to a standstill because it consumes all the IOPS. In many cases that slowdown won't be gradual at all - it'll be fast as long the working set fits entirely into the available memory, and will stop very soon afterwards (depends on value distribution).
barrkel|10 years ago
For some data sets, it's reasonably quick (100ms), for others it's quite slow (1600ms).
Whereas if I add 'force index for order by' to the query, I get a predictable 150ms across the board.
Predictability is often more important than optimality. Variance in service response time is hard to work around, especially when there's a UI involved. Algorithmically, that often means always using the logarithmic data structure even when the constant overhead is higher for smaller data.
hinkley|10 years ago
Another place where they do this is oversubscribing a server, by putting 4 services that need 20-30% of the system memory or network to run, and don't understand why throughput drops off by 90% when the system becomes saturated and starts swapping or dropping packets.