Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

Closed
michaelpq opened this issue Apr 30, 2024 · 33 comments
Closed

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

michaelpq opened this issue Apr 30, 2024 · 33 comments

Comments

@michaelpq
Copy link
Collaborator

michaelpq commented Apr 30, 2024

Hi all,

I have that on my tablets for some time. Let's do a new release of this module at the same time as PostgreSQL 17. I have on my TODO list a few things, as of:

  1. Finish any compatibility work required for HEAD.
  2. Fork a new branch called PG17, with changes on HEAD to plug in PG18.
  3. Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.
  4. Feature: Introduce a proper parser for the hint strings. bfb4544 has done most of the work to be able to extract the hints, but we still need to translate the hints to binary trees, which is something that the code now does with some manual steps in the hint type callbacks.
  5. A round table of existing bugs, to see what could be addressed.
  6. More work and analysis is needed for a3646e1, which is still on HEAD. This has been reverted from the back-branches following plan instabilities as of bfaabf1, and I think it's not quite safe yet in terms of parallel handling.
  7. This will be the last release for PG12, with PostgreSQL 12 compatibility, following upstream that will EOL it. So let's be very careful to not break anything on this branch.

The release would be planned for September, potentially in its 2nd week but let's be flexible about that, just before PostgreSQL GA. In terms of features, I think that I'll be able to tackle item 3, not sure about 4.

If @yamatattsu , @rjuju or any others have any comments, feel free.

@andyatkinson
Copy link

Looking forward to trying out new changes! I'm sure @FranckPachot would be interested as well.

@yamatattsu
Copy link
Member

yamatattsu commented May 1, 2024 via email

@samimseih
Copy link

Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.

#3 is a really important feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space.

Just a thought on this feature:

I do think there needs to be a column that tracks the last time the hint was used. IMO, there are existing reasons a last_used is important to have, but especially with queryId changing if a table is dropped and recreated, pg_dump, etc., the user will need to perform maintenance on the table and remove no longer valid queryIds. Of course there is an overhead to continually updating a last_used, and I don't think that accuracy is required here ( maybe update last_used every so often ).

Regards,

Sami Imseih

@fakdaddy75
Copy link

Cant wait for the query_id enhancement. thx

@michaelpq
Copy link
Collaborator Author

Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?

Yes, it means to wipe out entirely this dependency from the tree. It would still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.

@yamatattsu
Copy link
Member

yamatattsu commented May 3, 2024 via email

@michaelpq
Copy link
Collaborator Author

I don't understand the issue with normalized query strings in detail, but if you are having trouble with it,

Duplication. The normalization code in pg_hint_plan is a copy-paste of pg_stat_statements.

I can see what you are getting at with pg_plan_advsr as it relies on pg_hint_plan's normalize_query.h. Still it looks like the same thing to me. Is there any point in normalizing the query if we have a query ID. Perhaps that's an overoptimistic assumption, but we can assume that pg_stat_statements is used, meaning that the query would be normalized and that it would be able to retrieve that with a join? @rjuju , thoughts?

@rjuju
Copy link
Collaborator

rjuju commented May 7, 2024

well, there's an assumption that some normalisation stuff is installed, as now pg_stat_statements can work with custom algorithm. so in theory if your use case is to add hints on queries touching temp tables because you don't issue vacuum and you know that a certain plan is better it could be a good idea to rely on a custom jumbling approach that computes a query identifier based on the relation names for temp tables so that you can actually have a stable queryid.
I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.

@michaelpq
Copy link
Collaborator Author

I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.

pg_hint_plan uses the normalized query to do lookups of the hint table, using the same normalization as pg_stat_statements where the comments don't matter. There's a point that this reduces debugging if relying only on the normalized string if looking at the hint table, but honestly I'm going to hang on my argument about "let's just use pg_stat_statements" for the mapping and do JOINs, reducing the bloat of the hint table if using long query strings. We're paying twice the normalization costs, currently.

@rjuju
Copy link
Collaborator

rjuju commented May 7, 2024

my point was more with the curent plan of relying on the queryid only. if we do that there's no need for the query string or normalised query string right? so there would be no need for pgss either, just enable core queryid computation or install a custom one that better suits your needs.

@yamatattsu
Copy link
Member

yamatattsu commented May 7, 2024 via email

@samimseih
Copy link

feature IMO as the current hint_table is hard to use when dealing with
large sql text or the same sql text that could have differences in comments
and white space.
As an extra information, pg_plan_advsr automatically generates normalized
query strings and
automatically registers them in the hint table for use.

I don't understand the issue with normalized query strings in detail, but
if you are having trouble with it,
you might want to try pg_plan_advsr.

Here is a repro of an example that I have seen on the field a few times. In the below case, my hints table includes the following query text

 SELECT * FROM t1 WHERE t1.id = ?; 
postgres=# select * from hint_plan.hints;
 id |         norm_query_string         | application_name |    hints    
----+-----------------------------------+------------------+-------------
  2 | SELECT * FROM t1 WHERE t1.id = ?; |                  | SeqScan(t1)
(1 row)

However, an application might issue a variant ( or variants ) of the query text in which a comment is added.

/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?; 

In this case, the hints table will only work for the uncommented version of the query text, even though they are semantically similar queries.

postgres=# SELECT * FROM t1 WHERE t1.id = 1;
LOG:  pg_hint_plan[qno=0x1b]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG:  pg_hint_plan[qno=0x1c]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG:  pg_hint_plan[qno=0x1b]: planner
LOG:  pg_hint_plan[qno=0x1b]: setup_hint_enforcement index deletion: relation=16404(t1), inhparent=0, current_hint_state=0x11a015578, hint_inhibit_level=0, scanmask=0x1
LOG:  pg_hint_plan[qno=0x1b]: HintStateDump: {used hints:SeqScan(t1)}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}
 id 
----
  1
(1 row)

postgres=# /*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1; 
LOG:  pg_hint_plan[qno=0x1d]: no match found in table:  application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x1e]: no match found in table:  application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
INFO:  pg_hint_plan: hint syntax error at or near " APPLICATION 2 "
DETAIL:  Unrecognized hint keyword "APPLICATION".
LOG:  pg_hint_plan[qno=0x1d]: planner: no valid hint
 id 
----
  1
(1 row)


select query, queryid, calls from pg_stat_statements where query like '%t1%';

               query               |       queryid        | calls 
-----------------------------------+----------------------+-------
 SELECT * FROM t1 WHERE t1.id = $1 | -7122654414306258210 |     2
(1 row)

This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.

postgres=# SELECT *
postgres-# FROM t1 WHERE t1.id = 1;
LOG:  pg_hint_plan[qno=0x3b]: no match found in table:  application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x3c]: no match found in table:  application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x3b]: planner: no valid hint
 id 
----
  1
(1 row)

Regards,

Sami

@michaelpq
Copy link
Collaborator Author

This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.

From the point where the hint table requires the values to be inserted, providing a 64b value is just going to be less error-prone because there are less contents to manipulate..

@michaelpq
Copy link
Collaborator Author

The problem here is two things like: 1. reducing the bloat of the hint table by long query text 2. reducing the costs of normalization string Is my understanding right? If so, using queryid based on compute_query_id on PG core seems better to solve the problems.

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then, tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

@fakdaddy75
Copy link

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then,
tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

From my lonely perspective this is perfect. Works like oracle using a statement id or hash.

Here's a real world example - no-one has been able to make it work. Meaning - postgres user groups, myself and a consultancy
Never finds a text match in hint_table ...... noone knows what it should be as we cant expose the normalized query $1 the recursive optimizer is looking for.

#180

thanks Michaelpq !!

@michaelpq
Copy link
Collaborator Author

I have begun my butcher work for this release, and began with two things:

  1. Stabilization of the regression tests. A few things have changed in upstream, for the best actually. I have applied these directly to allow the code to run installcheck.
  2. As promised, a patch to integrate query IDs in the hint table: Integrate query IDs with the hint table #193.

@yamatattsu
Copy link
Member

yamatattsu commented May 11, 2024 via email

@yamatattsu
Copy link
Member

yamatattsu commented May 14, 2024 via email

@michaelpq
Copy link
Collaborator Author

So, It's okay from us. :-D

If you have any questions, we can talk about that on Friday in live.

@yamatattsu
Copy link
Member

yamatattsu commented May 15, 2024 via email

@devrimgunduz
Copy link

What is the status in here? v17 beta2 is already out.

@michaelpq
Copy link
Collaborator Author

What is the status in here? v17 beta2 is already out.

Still need a bit of time, but I am towards a release by the end of August with the core changes pushed by the end of next week, before brushing what I can from the bug list. I don't have more plans for in-core changes with the module for this release, only bug fixes.

@michaelpq
Copy link
Collaborator Author

Looking at my calendar, I am proposing the 29th of August for the tarball release. @devrimgunduz

@michaelpq michaelpq self-assigned this Jul 22, 2024
@michaelpq
Copy link
Collaborator Author

The branches have been structured and prepared for the upcoming release, and a new PG17 branch has been created with a github action job for automated tests. At this point it would be only a matter of tagging and creating the tarballs.

Now is time for the real action: clean up of the existing bug list and address issues.
I also have on my list to reassess a3646e1 as it has been impacting the plan stability of some customer cases. This may finish by being reverted.

@michaelpq michaelpq reopened this Jul 23, 2024
@devrimgunduz
Copy link

So, ping :-)

@michaelpq
Copy link
Collaborator Author

So, ping :-)

Yep. I'm aware of that. But my schedule does not give me the possibility to get back to that until the 17th of August. So.

@yamatattsu
Copy link
Member

yamatattsu commented Aug 16, 2024 via email

@michaelpq
Copy link
Collaborator Author

I finished updating .po files for the next release on PG17 branch.

Thanks Yamada-san!

@SeinoYuki
Copy link
Contributor

SeinoYuki commented Aug 20, 2024

Thanks.

Looking at my calendar, I am proposing the 29th of August for the tarball release.

When the release page is created, @shinyaaa will upload the RPMs for RHEL 8/9.

@michaelpq
Copy link
Collaborator Author

Here is an update. I have most of the things I wanted to tackle for the moment in the tree, remains:

Hence in all that, I just have #191 that should get its backpatch.

@michaelpq
Copy link
Collaborator Author

The tags have been pushed, the release notes have been written, the release has been pushed to PGXN and the release announcement has been sent to pgsql-announce, currently being hold for moderation.

So my job here is done. @shinyaaa, could you upload the RPMs, please?

@shinyaaa
Copy link
Contributor

@michaelpq Thank you! I've uploaded the RPMs.

@michaelpq
Copy link
Collaborator Author

Cool, issue closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants