(no title)
adamzochowski | 1 year ago
- db will generate new plans as necessary when row counts and values change. Putting in hints makes the plan rigid likely leading to headaches down the line.
- as new Postgres comes out, it's planner will do a better job. Again forcing specific plan might force the planner into optimization that no longer is optimal.
In my experience developers almost never comeback to a query with hints to double check if hints are really needed.
Famously oracle has query hints that don't do nothing no more, that are ignored, but oracle can't remove them from query language because that would break too many existing queries.
I like Postgres stance that if query planner doesn't do a good job, then dba should first update table/column statistics, and if things are truly bad, submit but to Postgres so the query optimizer can be updated itself.
Saying all that, hints support through an extension to Postgres is a good compromise. Postgres developers don't need to bother with hints, its a third party feature. And dba/users, if they really need hints, now they have them.
williamdclt|1 year ago
That’s a very unhelpful stance when I’m having an incident in production because PG decided to use a stupid query plan. Waiting months - years for a bugfix (which might not even be backported to my PG version) is not a solution.
I agree that hints are a very dangerous thing, but they’re invaluable as an emergency measure
phamilton|1 year ago
Before using a hint or rewriting a query to force a specific plan, I try and push the team to do these things:
1. Run `vacuum analyze` and tune the auto vacuum settings. This fixes issues surprisingly often. 2. Increase statistics on the table. 3. Tweak planner settings globally or just for the query. Stuff like `set local join_collapse_limit=1` can fix a bad plan. This is pretty similar to hinting, so not a huge argument that this is better beyond not requiring an extension.
samus|1 year ago
tempest_|1 year ago
Is this a more common occurrence that I just have not encountered before?
ahachete|1 year ago
> - db will generate new plans as necessary when row counts and values change. Putting in hints makes the plan rigid likely leading to headaches down the line.
> - as new Postgres comes out, it's planner will do a better job. Again forcing specific plan might force the planner into optimization that no longer is optimal.
It depends what is your priority. Most production environments want to favor predictability over raw performance. I'd rather trade 10% performance degradation in average for a consistent query performance.
Even if statistics or new versions could come with 10% better plans, I prefer that my query's performance is predictable and does not experience high p90s or even worse that you risk experiencing plan flips that turn your 0.2s 10K/qps query into a 40s query.
kevincox|1 year ago
It would be very interesting if there was some sort of system that ensured query plans don't change "randomly" but if it noticed that a new plan was expected to be better on most queries it would surface this in some interface that you could then test and approve.
Maybe it could also support gradual rollout, try this on 1% of queries first, then if it does improve performance it can be fully approved.
It would be extra cool if this was automatic. New plans could slowly be slowly rolled out by the DB and rolled back if they aren't actually better.
jandrewrogers|1 year ago
Many of the issues people run into with the Postgres query planner/optimizer are side effects of architectural peculiarities with no straightforward fix. Some of these issues, such as the limitations of Postgres statistics collection, have been unaddressed for decades because they are extremely difficult to address without major architectural surgery that will have its own side effects.
In my opinion, Postgres either needs to allow people to disable/bypass the query planner with hints or whatever, or commit to the difficult and unpleasant work of fixing the architecture so that these issues no longer occur. This persistent state of affairs of neither fixing the core problems nor allowing a mechanism to work around them is not user friendly. I like and use Postgres but this aspect of the project is quite broken.
The alternative is to put a disclaimer in the Postgres documentation that it should not be used for certain types of common workloads or large data models; many of these issues don't show up in force until databases become quite large.
This is a chronic source of pain in large-scale Postgres installations.
pgaddict|1 year ago
The way I see it the issues we have in query planning/optimization are largely independent of the overall architecture. It's simply a consequence of relying on statistics too much. But there's always going to be inaccuracies, because the whole point of stats is that it's a compact/lossy approximation of the data. No matter how much more statistics we might collect there will always be some details missing, until the statistics are way too large to be practical.
There are discussions about considering how "risky" a given plan is - some plan shapes may be more problematic, but I don't think anyone submitted any patch so far. But even with that would not be a perfect solution - no approach relying solely on a priori information can be.
IMHO the only way forward is to work on making the plans more robust to this kind of mistakes. But that's very hard too.
deniska|1 year ago
silon42|1 year ago
CuriouslyC|1 year ago
btown|1 year ago
phamilton|1 year ago
RMarcus|1 year ago
nextaccountic|1 year ago
What's needed, then, is a benchmark suite that tests if those hints still give a better performance than a hintless query
thesnide|1 year ago
It is certainly a bad practice, but on very particular occasions it is invaluable.
bsdpufferfish|1 year ago
perrygeo|1 year ago
Your data model might contain obvious mistakes. Statistics can be out of date too, if e.g. a table was bulk loaded and never analyzed. `ANALYZE tablename` done. Sometimes removing unused indexes can improve things. TLDR; it's always something that you need to tune in your own database. When in doubt, the query planner is right and you are wrong. Good engineering means having the intellectual curiosity to exhaust these possibility before resorting to hints.
Hints are an extreme measure. You're basically saying that you know better than the query planner, for now until eternity, and you choose to optimize by hand. That may be the case but it requires detailed knowledge of your table and access patterns. The vast majority of misbehaving query plans just need updated statistics or a better index.
jupp0r|1 year ago
Finding out what went wrong in the query plan by looking at optimizer traces is a lot of work. I did so recently and the trace alone was 317MB.
jandrewrogers|1 year ago
Statistics collection is a weak spot in Postgres and query optimization relies on that information to do its job.
bsdpufferfish|1 year ago
Nope, you just have to know it's fixing a real problem today.
Having a query regress in performance below a KPI would be worse than not taking advantage of a further optimization in the future, due to out of date hint.
> Good engineering means having the intellectual curiosity to exhaust these possibility before resorting to hints
Why is that better?
Luckily we don't have to rely on such grandiose claims. Just try it out. If you find a query that you can tune better than the planner for your data set, then it's a better outcome.
justinclift|1 year ago
While it's great that "good engineering" exists, it often requires a bunch more time and effort than people have for the task right then.
Being inflexible and always demanding that time is taken is an extremely poor approach, and often doesn't lead to a good quality result.
Instead it often leads to the inflexible thing being retired and a more flexible alternative being used from then on.
RaftPeople|1 year ago
Absolutely.
A query planner does not analyze the complete and precise solution space, none of them do. The query planner will be extremely wrong sometimes. The only logical solution is to provide a mechanism to guide the planner towards the correct solution.
I've worked with DB2, Oracle and MS SQL Server over the last 3 decades and the story is always the same, the planner is not perfect.
heavenlyblue|1 year ago
Yes I do know better what data is going to be in the database I am managing that some heuristic mechanism.