Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

michaelpq · 2024-04-30T23:29:41Z

Hi all,

I have that on my tablets for some time. Let's do a new release of this module at the same time as PostgreSQL 17. I have on my TODO list a few things, as of:

Finish any compatibility work required for HEAD.
Fork a new branch called PG17, with changes on HEAD to plug in PG18.
Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.
Feature: Introduce a proper parser for the hint strings. bfb4544 has done most of the work to be able to extract the hints, but we still need to translate the hints to binary trees, which is something that the code now does with some manual steps in the hint type callbacks.
A round table of existing bugs, to see what could be addressed.
More work and analysis is needed for a3646e1, which is still on HEAD. This has been reverted from the back-branches following plan instabilities as of bfaabf1, and I think it's not quite safe yet in terms of parallel handling.
This will be the last release for PG12, with PostgreSQL 12 compatibility, following upstream that will EOL it. So let's be very careful to not break anything on this branch.

The release would be planned for September, potentially in its 2nd week but let's be flexible about that, just before PostgreSQL GA. In terms of features, I think that I'll be able to tackle item 3, not sure about 4.

If @yamatattsu , @rjuju or any others have any comments, feel free.

The text was updated successfully, but these errors were encountered:

andyatkinson · 2024-05-01T00:37:34Z

Looking forward to trying out new changes! I'm sure @FranckPachot would be interested as well.

yamatattsu · 2024-05-01T09:23:36Z

Hi Michael-san!

3. Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.

Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead? The release would be planned for September, potentially in its 2nd week but

let's be flexible about that, just before PostgreSQL GA. In terms of features, I think that I'll be able to tackle item 3, not sure about 4. If @yamatattsu <https://github.com/yamatattsu> , @rjuju <https://github.com/rjuju> or any others have any comments, feel free.

For now, The schedule seems fine for me. I'd like to add this item to the TODO list: 6. Add new Japanese translation file (*.po) for the document Message ID: ***@***.***>

…

samimseih · 2024-05-01T18:37:41Z

Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.

#3 is a really important feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space.

Just a thought on this feature:

I do think there needs to be a column that tracks the last time the hint was used. IMO, there are existing reasons a last_used is important to have, but especially with queryId changing if a table is dropped and recreated, pg_dump, etc., the user will need to perform maintenance on the table and remove no longer valid queryIds. Of course there is an overhead to continually updating a last_used, and I don't think that accuracy is required here ( maybe update last_used every so often ).

Regards,

Sami Imseih

fakdaddy75 · 2024-05-01T21:43:41Z

Cant wait for the query_id enhancement. thx

michaelpq · 2024-05-01T21:57:04Z

Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?

Yes, it means to wipe out entirely this dependency from the tree. It would still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.

yamatattsu · 2024-05-03T06:58:32Z

Hi Michael-san and Sami, Yes, it means to wipe out entirely this dependency from the tree. It would

still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.

Ah, I see. In other words, pg_stat_statements is required as a prerequisite for using the hint table of pg_hint_plan. Right? #3 <#3> is a really important

feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space.

As an extra information, pg_plan_advsr automatically generates normalized query strings and automatically registers them in the hint table for use. I don't understand the issue with normalized query strings in detail, but if you are having trouble with it, you might want to try pg_plan_advsr.

…

On Thu, May 2, 2024 at 6:57 AM Michael Paquier ***@***.***> wrote: Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead? Yes, it means to wipe out entirely this dependency from the tree. It would still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring. — Reply to this email directly, view it on GitHub <#190 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADSHQG2VEVZNHSZFDQKYB7DZAFQMLAVCNFSM6AAAAABHBEXSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBZGE4TQMRUGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

michaelpq · 2024-05-07T03:01:44Z

I don't understand the issue with normalized query strings in detail, but if you are having trouble with it,

Duplication. The normalization code in pg_hint_plan is a copy-paste of pg_stat_statements.

I can see what you are getting at with pg_plan_advsr as it relies on pg_hint_plan's normalize_query.h. Still it looks like the same thing to me. Is there any point in normalizing the query if we have a query ID. Perhaps that's an overoptimistic assumption, but we can assume that pg_stat_statements is used, meaning that the query would be normalized and that it would be able to retrieve that with a join? @rjuju , thoughts?

rjuju · 2024-05-07T03:44:25Z

well, there's an assumption that some normalisation stuff is installed, as now pg_stat_statements can work with custom algorithm. so in theory if your use case is to add hints on queries touching temp tables because you don't issue vacuum and you know that a certain plan is better it could be a good idea to rely on a custom jumbling approach that computes a query identifier based on the relation names for temp tables so that you can actually have a stable queryid.
I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.

michaelpq · 2024-05-07T04:03:26Z

I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.

pg_hint_plan uses the normalized query to do lookups of the hint table, using the same normalization as pg_stat_statements where the comments don't matter. There's a point that this reduces debugging if relying only on the normalized string if looking at the hint table, but honestly I'm going to hang on my argument about "let's just use pg_stat_statements" for the mapping and do JOINs, reducing the bloat of the hint table if using long query strings. We're paying twice the normalization costs, currently.

rjuju · 2024-05-07T04:29:02Z

my point was more with the curent plan of relying on the queryid only. if we do that there's no need for the query string or normalised query string right? so there would be no need for pgss either, just enable core queryid computation or install a custom one that better suits your needs.

yamatattsu · 2024-05-07T11:01:20Z

I can see what you are getting at with pg_plan_advsr as it relies on pg_hint_plan's normalize_query.h. Still it looks like the same thing to me. Is there any point in normalizing the query if we have a query ID. Perhaps that's an overoptimistic assumption, but we can assume that pg_stat_statements is used, meaning that the query would be normalized and that it would be able to retrieve that with a join?

Ah, right, pg_plan_advsr uses pg_hint_plan's normalize_query.h. I thought the problem was that it was difficult to create a normalized query string, so I proposed to use pg_plan_advsr. But I understand that it is not the problem on this thread. I'm going to hang on my argument about "let's just use pg_stat_statements"

for the mapping and do JOINs, reducing the bloat of the hint table if using long query strings. We're paying twice the normalization costs, currently.

The problem here is two things like: 1. reducing the bloat of the hint table by long query text 2. reducing the costs of normalization string Is my understanding right? If so, using queryid based on compute_query_id on PG core seems better to solve the problems. P.S. pg_plan_advsr uses queryid by using compute_query_id since PG14, so I guess it's okay to use it in the hint table.

samimseih · 2024-05-07T13:31:07Z

feature IMO as the current hint_table is hard to use when dealing with
large sql text or the same sql text that could have differences in comments
and white space.
As an extra information, pg_plan_advsr automatically generates normalized
query strings and
automatically registers them in the hint table for use.

I don't understand the issue with normalized query strings in detail, but
if you are having trouble with it,
you might want to try pg_plan_advsr.

Here is a repro of an example that I have seen on the field a few times. In the below case, my hints table includes the following query text

 SELECT * FROM t1 WHERE t1.id = ?;

postgres=# select * from hint_plan.hints;
 id |         norm_query_string         | application_name |    hints    
----+-----------------------------------+------------------+-------------
  2 | SELECT * FROM t1 WHERE t1.id = ?; |                  | SeqScan(t1)
(1 row)

However, an application might issue a variant ( or variants ) of the query text in which a comment is added.

/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;

In this case, the hints table will only work for the uncommented version of the query text, even though they are semantically similar queries.

postgres=# SELECT * FROM t1 WHERE t1.id = 1;
LOG:  pg_hint_plan[qno=0x1b]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG:  pg_hint_plan[qno=0x1c]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG:  pg_hint_plan[qno=0x1b]: planner
LOG:  pg_hint_plan[qno=0x1b]: setup_hint_enforcement index deletion: relation=16404(t1), inhparent=0, current_hint_state=0x11a015578, hint_inhibit_level=0, scanmask=0x1
LOG:  pg_hint_plan[qno=0x1b]: HintStateDump: {used hints:SeqScan(t1)}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}
 id 
----
  1
(1 row)

postgres=# /*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1; 
LOG:  pg_hint_plan[qno=0x1d]: no match found in table:  application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x1e]: no match found in table:  application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
INFO:  pg_hint_plan: hint syntax error at or near " APPLICATION 2 "
DETAIL:  Unrecognized hint keyword "APPLICATION".
LOG:  pg_hint_plan[qno=0x1d]: planner: no valid hint
 id 
----
  1
(1 row)


select query, queryid, calls from pg_stat_statements where query like '%t1%';

               query               |       queryid        | calls 
-----------------------------------+----------------------+-------
 SELECT * FROM t1 WHERE t1.id = $1 | -7122654414306258210 |     2
(1 row)

This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.

postgres=# SELECT *
postgres-# FROM t1 WHERE t1.id = 1;
LOG:  pg_hint_plan[qno=0x3b]: no match found in table:  application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x3c]: no match found in table:  application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x3b]: planner: no valid hint
 id 
----
  1
(1 row)

Regards,

Sami

michaelpq · 2024-05-08T03:10:48Z

This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.

From the point where the hint table requires the values to be inserted, providing a 64b value is just going to be less error-prone because there are less contents to manipulate..

michaelpq · 2024-05-08T03:12:41Z

The problem here is two things like: 1. reducing the bloat of the hint table by long query text 2. reducing the costs of normalization string Is my understanding right? If so, using queryid based on compute_query_id on PG core seems better to solve the problems.

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then, tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

fakdaddy75 · 2024-05-08T17:42:33Z

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then,
tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

From my lonely perspective this is perfect. Works like oracle using a statement id or hash.

Here's a real world example - no-one has been able to make it work. Meaning - postgres user groups, myself and a consultancy
Never finds a text match in hint_table ...... noone knows what it should be as we cant expose the normalized query $1 the recursive optimizer is looking for.

#180

thanks Michaelpq !!

michaelpq · 2024-05-10T07:56:45Z

I have begun my butcher work for this release, and began with two things:

Stabilization of the regression tests. A few things have changed in upstream, for the best actually. I have applied these directly to allow the code to run installcheck.
As promised, a patch to integrate query IDs in the hint table: Integrate query IDs with the hint table #193.

yamatattsu · 2024-05-11T04:56:56Z

/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;

postgres=# SELECT *

postgres-# FROM t1 WHERE t1.id = 1;

Thanks for the clarification! I understand the problems.

yamatattsu · 2024-05-14T12:37:19Z

Michael-san,

2. As promised, a patch to integrate query IDs in the hint table: #193 <#193>.

FYI: I talked about using query_id in the hint table with Horiguchi-san today, and He has no objections. So, It's okay from us. :-D

michaelpq · 2024-05-14T22:42:10Z

So, It's okay from us. :-D

If you have any questions, we can talk about that on Friday in live.

yamatattsu · 2024-05-15T23:57:28Z

If you have any questions, we can talk about that on Friday in live.

Sounds good! Thanks.

devrimgunduz · 2024-07-17T15:56:24Z

What is the status in here? v17 beta2 is already out.

michaelpq · 2024-07-19T05:51:27Z

What is the status in here? v17 beta2 is already out.

Still need a bit of time, but I am towards a release by the end of August with the core changes pushed by the end of next week, before brushing what I can from the bug list. I don't have more plans for in-core changes with the module for this release, only bug fixes.

michaelpq · 2024-07-21T23:10:13Z

Looking at my calendar, I am proposing the 29th of August for the tarball release. @devrimgunduz

michaelpq · 2024-07-22T22:49:19Z

The branches have been structured and prepared for the upcoming release, and a new PG17 branch has been created with a github action job for automated tests. At this point it would be only a matter of tagging and creating the tarballs.

Now is time for the real action: clean up of the existing bug list and address issues.
I also have on my list to reassess a3646e1 as it has been impacting the plan stability of some customer cases. This may finish by being reverted.

devrimgunduz · 2024-07-30T13:40:30Z

So, ping :-)

michaelpq · 2024-08-01T13:42:53Z

So, ping :-)

Yep. I'm aware of that. But my schedule does not give me the possibility to get back to that until the 17th of August. So.

yamatattsu · 2024-08-16T06:51:43Z

Hi Michael-san, I finished updating .po files for the next release on PG17 branch. 7757374 Thanks, Tatsuro Yamada

michaelpq · 2024-08-18T07:26:43Z

I finished updating .po files for the next release on PG17 branch.

Thanks Yamada-san!

SeinoYuki · 2024-08-20T06:05:10Z

Thanks.

Looking at my calendar, I am proposing the 29th of August for the tarball release.

When the release page is created, @shinyaaa will upload the RPMs for RHEL 8/9.

michaelpq · 2024-08-21T05:51:39Z

Here is an update. I have most of the things I wanted to tackle for the moment in the tree, remains:

Instability of pg_hint_plan.sql test in PG16 + master on a slow box (or when testing with valgrind). #161, this is not really mandatory for this release, and it requires more careful redesign of the tests.
leading hint + join methods hint cannot totally force the join order. #195, where I've asked for more input, because there is no way I am going to spend time second-guessing things.
Query Errors if pg_hint_plan.enable_hint_table and enable_hint is enabled but hints table does not exist #191, where the patch should be simple, and involve a check that the extension exists.
Hints do not work on subqueries with DISTINCT #187, this one is something I'd rather treat as a new feature. I'm working with someone else to see what we could do here, only for v18~ so no impact for the upcoming release.

Hence in all that, I just have #191 that should get its backpatch.

michaelpq · 2024-08-29T03:56:23Z

The tags have been pushed, the release notes have been written, the release has been pushed to PGXN and the release announcement has been sent to pgsql-announce, currently being hold for moderation.

So my job here is done. @shinyaaa, could you upload the RPMs, please?

shinyaaa · 2024-08-29T10:18:16Z

@michaelpq Thank you! I've uploaded the RPMs.

michaelpq · 2024-08-29T22:46:32Z

Cool, issue closed.

michaelpq added help wanted release labels Apr 30, 2024

This was referenced May 8, 2024

Cant get normalized text to work with pg plan hint #180

Closed

Hint table doesn't seem to have effect on query plans #183

Closed

michaelpq self-assigned this Jul 22, 2024

michaelpq closed this as completed Jul 22, 2024

michaelpq reopened this Jul 23, 2024

michaelpq closed this as completed Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

michaelpq commented Apr 30, 2024 •

edited

Loading

andyatkinson commented May 1, 2024

yamatattsu commented May 1, 2024 via email

samimseih commented May 1, 2024

fakdaddy75 commented May 1, 2024

michaelpq commented May 1, 2024

yamatattsu commented May 3, 2024 via email

michaelpq commented May 7, 2024

rjuju commented May 7, 2024

michaelpq commented May 7, 2024

rjuju commented May 7, 2024

yamatattsu commented May 7, 2024 via email

samimseih commented May 7, 2024

michaelpq commented May 8, 2024

michaelpq commented May 8, 2024

fakdaddy75 commented May 8, 2024

michaelpq commented May 10, 2024

yamatattsu commented May 11, 2024 via email

yamatattsu commented May 14, 2024 via email

michaelpq commented May 14, 2024

yamatattsu commented May 15, 2024 via email

devrimgunduz commented Jul 17, 2024

michaelpq commented Jul 19, 2024

michaelpq commented Jul 21, 2024

michaelpq commented Jul 22, 2024

devrimgunduz commented Jul 30, 2024

michaelpq commented Aug 1, 2024

yamatattsu commented Aug 16, 2024 via email

michaelpq commented Aug 18, 2024

SeinoYuki commented Aug 20, 2024 •

edited

Loading

michaelpq commented Aug 21, 2024

michaelpq commented Aug 29, 2024

shinyaaa commented Aug 29, 2024

michaelpq commented Aug 29, 2024

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

Comments

michaelpq commented Apr 30, 2024 • edited Loading

andyatkinson commented May 1, 2024

yamatattsu commented May 1, 2024 via email

samimseih commented May 1, 2024

fakdaddy75 commented May 1, 2024

michaelpq commented May 1, 2024

yamatattsu commented May 3, 2024 via email

michaelpq commented May 7, 2024

rjuju commented May 7, 2024

michaelpq commented May 7, 2024

rjuju commented May 7, 2024

yamatattsu commented May 7, 2024 via email

samimseih commented May 7, 2024

michaelpq commented May 8, 2024

michaelpq commented May 8, 2024

fakdaddy75 commented May 8, 2024

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then, tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

michaelpq commented May 10, 2024

yamatattsu commented May 11, 2024 via email

yamatattsu commented May 14, 2024 via email

michaelpq commented May 14, 2024

yamatattsu commented May 15, 2024 via email

devrimgunduz commented Jul 17, 2024

michaelpq commented Jul 19, 2024

michaelpq commented Jul 21, 2024

michaelpq commented Jul 22, 2024

devrimgunduz commented Jul 30, 2024

michaelpq commented Aug 1, 2024

yamatattsu commented Aug 16, 2024 via email

michaelpq commented Aug 18, 2024

SeinoYuki commented Aug 20, 2024 • edited Loading

michaelpq commented Aug 21, 2024

michaelpq commented Aug 29, 2024

shinyaaa commented Aug 29, 2024

michaelpq commented Aug 29, 2024

michaelpq commented Apr 30, 2024 •

edited

Loading

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then,
tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

SeinoYuki commented Aug 20, 2024 •

edited

Loading