Dexter to suggest to create or to create an extended statistics entry about a functional dependency #32

sfkeller · 2020-04-13T21:04:08Z

This is perhaps out-of-scope but still worth mentioning as a nice feature:

Optimization assumes that column are independent, so a query plan can go wrong not because of a missing index, but because the optimizer doesn't know about functional dependencies (FD) between columns.

Real world datasets often have FD between columns because of denormalization or just because the data is like this.

PostgreSQL offers CREATE STATISTICS on functional dependencies and there are attempts to detect those, like fdtool (with paper) or this tool. Actually, this potentially hard problem to find FDs in a table/entity becomes linear, if applied only to given slow queries - as Dexter does!

Expected behaviour:

Looking e.g. at this test database from ATP tennis tour given a slow query involving tourney_id, tourney_name in the WHERE clause, Dexter could suggest to do a CREATE STATISTICS on tourney_id, tourney_name like this:

create statistics atp_matches_2019_tourney_id_tourney_name (dependencies) 
	on tourney_id, tourney_name 
	from atp_matches_2019;

analyze atp_matches_2019; -- update extended statistics

select * from pg_statistic_ext; -- now shows the entry of "atp_matches_2019_tourney_id_tourney_name"

Thus, hoping that the planner next time considers this dependency.

BTW: CREATE STATISTICS currently also supports ndistinct, which enables n-distinct statistics, and mcv which enables most-common values lists.

The text was updated successfully, but these errors were encountered:

ankane · 2020-04-14T00:55:14Z

Hey @sfkeller, thanks for the suggestion and the links! If there's an extension to add hypothetical statistics, I think Dexter could use the same approach it uses for indexes. I'm not sure how difficult hypothetical statistics are, but may be worth suggesting to HypoPG.

sfkeller · 2020-05-04T00:21:52Z

I'll suggest hypothetical statistics to HypoPG (see HypoPG/hypopg#49
) - and eventually come back to this issue, if there's any response over at HypoPG.

-- Stefan

sfkeller changed the title ~~Suggest to create extended statistics about dependencies~~ Dexter to suggest to create or to create an extended statistics entry about a functional dependency Apr 13, 2020

ankane closed this as completed May 4, 2020

sfkeller mentioned this issue May 4, 2020

Add functional dependency between columns from extended statistics HypoPG/hypopg#49

Closed

sfkeller mentioned this issue May 29, 2020

Suggest to create an extended statistics entry about a functional dependency between columns ossc-db/pg_plan_advsr#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dexter to suggest to create or to create an extended statistics entry about a functional dependency #32

Dexter to suggest to create or to create an extended statistics entry about a functional dependency #32

sfkeller commented Apr 13, 2020 •

edited

Loading

ankane commented Apr 14, 2020

sfkeller commented May 4, 2020 •

edited

Loading

Dexter to suggest to create or to create an extended statistics entry about a functional dependency #32

Dexter to suggest to create or to create an extended statistics entry about a functional dependency #32

Comments

sfkeller commented Apr 13, 2020 • edited Loading

ankane commented Apr 14, 2020

sfkeller commented May 4, 2020 • edited Loading

sfkeller commented Apr 13, 2020 •

edited

Loading

sfkeller commented May 4, 2020 •

edited

Loading