You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is perhaps out-of-scope but still worth mentioning as a nice feature:
Optimization assumes that column are independent, so a query plan can go wrong not because of a missing index, but because the optimizer doesn't know about functional dependencies (FD) between columns.
Real world datasets often have FD between columns because of denormalization or just because the data is like this.
PostgreSQL offers CREATE STATISTICS on functional dependencies and there are attempts to detect those, like fdtool (with paper) or this tool. Actually, this potentially hard problem to find FDs in a table/entity becomes linear, if applied only to given slow queries - as Dexter does!
Expected behaviour:
Looking e.g. at this test database from ATP tennis tour given a slow query involving tourney_id, tourney_name in the WHERE clause, Dexter could suggest to do a CREATE STATISTICS on tourney_id, tourney_name like this:
create statistics atp_matches_2019_tourney_id_tourney_name (dependencies)
on tourney_id, tourney_name
from atp_matches_2019;
analyze atp_matches_2019; -- update extended statistics
select * from pg_statistic_ext; -- now shows the entry of "atp_matches_2019_tourney_id_tourney_name"
Thus, hoping that the planner next time considers this dependency.
BTW: CREATE STATISTICS currently also supports ndistinct, which enables n-distinct statistics, and mcv which enables most-common values lists.
The text was updated successfully, but these errors were encountered:
sfkeller
changed the title
Suggest to create extended statistics about dependencies
Dexter to suggest to create or to create an extended statistics entry about a functional dependency
Apr 13, 2020
Hey @sfkeller, thanks for the suggestion and the links! If there's an extension to add hypothetical statistics, I think Dexter could use the same approach it uses for indexes. I'm not sure how difficult hypothetical statistics are, but may be worth suggesting to HypoPG.
I'll suggest hypothetical statistics to HypoPG (see HypoPG/hypopg#49
) - and eventually come back to this issue, if there's any response over at HypoPG.
This is perhaps out-of-scope but still worth mentioning as a nice feature:
Optimization assumes that column are independent, so a query plan can go wrong not because of a missing index, but because the optimizer doesn't know about functional dependencies (FD) between columns.
Real world datasets often have FD between columns because of denormalization or just because the data is like this.
PostgreSQL offers CREATE STATISTICS on functional dependencies and there are attempts to detect those, like fdtool (with paper) or this tool. Actually, this potentially hard problem to find FDs in a table/entity becomes linear, if applied only to given slow queries - as Dexter does!
Expected behaviour:
Looking e.g. at this test database from ATP tennis tour given a slow query involving
tourney_id, tourney_name
in the WHERE clause, Dexter could suggest to do a CREATE STATISTICS ontourney_id, tourney_name
like this:Thus, hoping that the planner next time considers this dependency.
BTW: CREATE STATISTICS currently also supports
ndistinct
, which enables n-distinct statistics, andmcv
which enables most-common values lists.The text was updated successfully, but these errors were encountered: