top | item 39940775

(no title)

tlocke | 1 year ago

I'd like to see an option for automatically adding indexes for performance.

Perhaps a background process could re-run the query planner for frequent queries and add an appropriate index if there's a big speedup.

discuss

order

codegeek|1 year ago

As a developer who is not a DB expert, I always wonder why index cannot be auto added based on queries may be ? So lets say I have a table and once it starts filling in data and sees the typical queries coming in (say based on user id or email etc), just add the index ?

FrancoisBosun|1 year ago

Adding an index means the write path suddenly became slower. Sometimes, a covering index will be very beneficial. « It depends » is usually the right answer. I’m fine with the engine telling me « this query is usually slow » and I investigate why. I’m not ok with the engine adding indices willy-nilly and suddenly writing to the DB is 10x slower.

INeedAnother|1 year ago

Indexes aren't free. They take up storage space and reduce the performance of inserts. On top of that, if an index gets too big it can be really slow, the table needs partitioning instead.

In some applications you might value the performance of insert a lot more than select, and not want to pay for extra storage of the index.

mamcx|1 year ago

I work as part of building an RDBMS, but most importantly, I know how people misuse RDBMS for more than +20 years.

People write terrible schemas. ORMs hide complexities and make terrible queries.

Some people refuse to use sophisticated features like DataTime types for storing DateTypes (using String instead. I'm not talking about formatting!).

Also, neither use JOINS or VIEWS or FUNCTIONS, or even adding indexes.

They build MonDB schemas on top of PostgreSQL. And then build their own query engine, that is not like the one an RDBMS is happy to deal it.

In short: "Trash-in Trash-out" but for queries, and that is something will be triggered by a system like this.

Plus:

Query engine optimizations IS the HARDEST aspect of build a DB: Example:

https://db.in.tum.de/~radke/papers/hugejoins.pdf

> The largest real-world query that we are aware of accesses more than 4,000 relations.

And you have a lot of constraints:

- More indexes more data, less RAM, less SPACE.

- More indexes, more query optimization paths, more complex solving of query optimization

- Adding more indexes could create suboptimal access patterns.

- And that changes with time

- And that changes could be on a second - And then you can create an unintentional delay that WILL impact the pocket of somebody

Doing this stuff "silently" is a sure way to add problems that will be very hard to solve or debug. Sure, without this some of the problems still remain, but at least will be possible to see why and to know when it get solved.

If anyone manages to create a solid implementation of this, it will deserve a noble prize, an oscar, a place in the half of fame, and will be rich.

skeeter2020|1 year ago

I've always found that this is a great idea that doesn't hold up to the reality of DB tuning. There's no "typical" query, in the sense that should it optimize for the more common but pretty performant query, or the rarer, very slow query? What type of client are you most concerned about? Both? Is that possible? which queries are worth the (total) cost of the index? Does a specific index hurt some other workload like automated integrations, or is it OK to have slow updates in the middle of the night? You can definitely profile over time and there are tools that help you capture data and make recommendations, but auto-indexing would be a limited to poor solution a lot of the time.