Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dexter to suggest to create or to create an extended statistics entry about a functional dependency #32

Closed
sfkeller opened this issue Apr 13, 2020 · 2 comments

Comments

@sfkeller
Copy link

sfkeller commented Apr 13, 2020

This is perhaps out-of-scope but still worth mentioning as a nice feature:

Optimization assumes that column are independent, so a query plan can go wrong not because of a missing index, but because the optimizer doesn't know about functional dependencies (FD) between columns.

Real world datasets often have FD between columns because of denormalization or just because the data is like this.

PostgreSQL offers CREATE STATISTICS on functional dependencies and there are attempts to detect those, like fdtool (with paper) or this tool. Actually, this potentially hard problem to find FDs in a table/entity becomes linear, if applied only to given slow queries - as Dexter does!

Expected behaviour:

Looking e.g. at this test database from ATP tennis tour given a slow query involving tourney_id, tourney_name in the WHERE clause, Dexter could suggest to do a CREATE STATISTICS on tourney_id, tourney_name like this:

create statistics atp_matches_2019_tourney_id_tourney_name (dependencies) 
	on tourney_id, tourney_name 
	from atp_matches_2019;

analyze atp_matches_2019; -- update extended statistics

select * from pg_statistic_ext; -- now shows the entry of "atp_matches_2019_tourney_id_tourney_name"

Thus, hoping that the planner next time considers this dependency.

BTW: CREATE STATISTICS currently also supports ndistinct, which enables n-distinct statistics, and mcv which enables most-common values lists.

@sfkeller sfkeller changed the title Suggest to create extended statistics about dependencies Dexter to suggest to create or to create an extended statistics entry about a functional dependency Apr 13, 2020
@ankane
Copy link
Owner

ankane commented Apr 14, 2020

Hey @sfkeller, thanks for the suggestion and the links! If there's an extension to add hypothetical statistics, I think Dexter could use the same approach it uses for indexes. I'm not sure how difficult hypothetical statistics are, but may be worth suggesting to HypoPG.

@ankane ankane closed this as completed May 4, 2020
@sfkeller
Copy link
Author

sfkeller commented May 4, 2020

I'll suggest hypothetical statistics to HypoPG (see HypoPG/hypopg#49
) - and eventually come back to this issue, if there's any response over at HypoPG.

-- Stefan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants