-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest to create an extended statistics entry about a functional dependency between columns #7
Comments
Hi Stefan, Thanks for your suggestion. :-D Stay safe! |
Thank you for this good news! I always wondered about users complaining about planners having no clue - while they leave Extended Statistics unused. I've learned something from Julien and you (and also from @laurenz ). BTW: So far only option dependencies from Extended Statistics has been mentioned. After all, automatic optimization using Extended Statistics finally got a place ;-). |
Thanks @sfkeller ! Adding such something to detect correlated column is already on my todo list, especially since I added the selectivity estimation error in v2, but you can definitely add an issue to make sure it doesn't get forgotten. |
I guess almost users don't care about a query plan whether it is efficient or not. When they faced performance trouble, they investigated and realized it. From the perspective of query performance, it would be better to check a query plan on performance tests before production.
Hmm... I didn't think them deeply. However, as you might know, you can create ndistinct and mcv of extended statistics at the same time. #Before #After
Thanks to @rjuju! Yours, |
Closed. Thanks! |
After some discussion over here ankane/dexter#32 I'v following suggestion:
Optimization assumes that columns are independent, so a query plan can go wrong if there are functional dependencies (FD) between columns which he doesn't know. Real world datasets often have FDs between columns because of denormalization or just because the data is like this.
Actually, finding FDs for all columns could become a rather hard problem to solve. But hopefully this challenge becomes feasible if applied only to columns as part of given slow queries.
Expected behaviour:
PostgreSQL offers CREATE STATISTICS in order to inform the planner about FDs. So, looking e.g. at the freely available database "ATP tennis tour", and given a slow query involving tourney_id, tourney_name, pg_plan_advsr could make a specific suggestion like this:
The text was updated successfully, but these errors were encountered: