Suggest to create an extended statistics entry about a functional dependency between columns #7

sfkeller · 2020-05-29T21:00:06Z

After some discussion over here ankane/dexter#32 I'v following suggestion:

Optimization assumes that columns are independent, so a query plan can go wrong if there are functional dependencies (FD) between columns which he doesn't know. Real world datasets often have FDs between columns because of denormalization or just because the data is like this.

Actually, finding FDs for all columns could become a rather hard problem to solve. But hopefully this challenge becomes feasible if applied only to columns as part of given slow queries.

Expected behaviour:

PostgreSQL offers CREATE STATISTICS in order to inform the planner about FDs. So, looking e.g. at the freely available database "ATP tennis tour", and given a slow query involving tourney_id, tourney_name, pg_plan_advsr could make a specific suggestion like this:

create statistics atp_matches_2019_tourney_id_tourney_name (dependencies) 
	on tourney_id, tourney_name 
	from atp_matches_2019;

The text was updated successfully, but these errors were encountered:

yamatattsu · 2020-06-02T06:56:23Z

Hi Stefan,

Thanks for your suggestion. :-D
I knew Extended statistics are useful for creating a good query plan, but I couldn't get enough time to implement it in my extension. However, as you might know, there is a more appropriate extension to get extended statistics than mine. It is pg_qualstats by Julien (@rjuju).
I gave a joint talk about pg_qualstats and pg_plan_advsr with him on PGCon 2020 [1], and also I discuss this topic as you suggested after the event. Probably, He will implement an Extended statistics advising feature in his extension. :-)

[1] https://www.pgcon.org/events/pgcon_2020/schedule/session/134-building-automatic-adviser-and-performance-tuning-tools-in-postgresql/

Stay safe!
Tatsuro Yamada

sfkeller · 2020-06-02T07:24:35Z

Thank you for this good news!

I always wondered about users complaining about planners having no clue - while they leave Extended Statistics unused. I've learned something from Julien and you (and also from @laurenz ).

BTW: So far only option dependencies from Extended Statistics has been mentioned.
=> What about ndistinct and mcv options?

After all, automatic optimization using Extended Statistics finally got a place ;-).
=> Should I open an issue over at https://github.com/powa-team/pg_qualstats/issues ?

rjuju · 2020-06-02T07:42:59Z

Thanks @sfkeller !

Adding such something to detect correlated column is already on my todo list, especially since I added the selectivity estimation error in v2, but you can definitely add an issue to make sure it doesn't get forgotten.

yamatattsu · 2020-06-02T08:46:28Z

Hi @sfkeller and @rjuju,

I always wondered about users complaining about planners having no clue

I guess almost users don't care about a query plan whether it is efficient or not. When they faced performance trouble, they investigated and realized it. From the perspective of query performance, it would be better to check a query plan on performance tests before production.
However, it depends on the culture or custom of a developing system.

What about ndistinct and mcv options?

Hmm... I didn't think them deeply. However, as you might know, you can create ndistinct and mcv of extended statistics at the same time.
The default option of create extended statistics syntax includes dependencies, ndistinct, and mcv (see below). For now, I'm not sure this approach is appropriate or not.

#Before
create statistics atp_matches_2019_tourney_id_tourney_name (dependencies)
on tourney_id, tourney_name
from atp_matches_2019;

#After
create statistics atp_matches_2019_tourney_id_tourney_name
on tourney_id, tourney_name
from atp_matches_2019;

Adding such something to detect correlated column is already on my todo list,

Thanks to @rjuju!

Yours,
Tatsuro Yamada

yamatattsu · 2020-06-03T00:31:36Z

Closed. Thanks!

sfkeller mentioned this issue Jun 2, 2020

Detect columns with dependencies and suggest to create extended statistics on it powa-team/pg_qualstats#32

Open

yamatattsu closed this as completed Jun 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest to create an extended statistics entry about a functional dependency between columns #7

Suggest to create an extended statistics entry about a functional dependency between columns #7

sfkeller commented May 29, 2020

yamatattsu commented Jun 2, 2020

sfkeller commented Jun 2, 2020

rjuju commented Jun 2, 2020

yamatattsu commented Jun 2, 2020

yamatattsu commented Jun 3, 2020

Suggest to create an extended statistics entry about a functional dependency between columns #7

Suggest to create an extended statistics entry about a functional dependency between columns #7

Comments

sfkeller commented May 29, 2020

yamatattsu commented Jun 2, 2020

sfkeller commented Jun 2, 2020

rjuju commented Jun 2, 2020

yamatattsu commented Jun 2, 2020

yamatattsu commented Jun 3, 2020