Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL][ASH] Refactor PG Active Session History code #20180

Closed
1 task done
abhinab-yb opened this issue Dec 6, 2023 · 0 comments
Closed
1 task done

[YSQL][ASH] Refactor PG Active Session History code #20180

abhinab-yb opened this issue Dec 6, 2023 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@abhinab-yb
Copy link
Contributor

abhinab-yb commented Dec 6, 2023

Jira Link: DB-9125

Description

Right now, the intended way to use ASH is to use the gflag TEST_yb_enable_ash. However, since some of the code lies in the extension, it's possible to turn the feature on partially by turning off the gflag and adding the extension yb_ash in the shared_preload_libraries gflag. This starts the collector and allocates shared memory but the rest of the feature doesn't work.

The code should be moved out of the extension and the gflag should be the only way to turn on/off ASH.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@abhinab-yb abhinab-yb added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Dec 6, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Dec 6, 2023
abhinab-yb added a commit that referenced this issue Dec 7, 2023
…k it with Perform RPCs

Summary:
This diff adds Active Session History (ASH) metadata to the outgoing `Perform` RPCs from `pggate` to the local tserver so that the wait states in the tserver layer have enough context. The following fields make up the ASH metadata -

- `root_request_id` (16 bytes): A unique id corresponding to a YSQL query.
- `yql_endpoint_tserver_uuid` (16 bytes): UUID of the node where the YSQL query originated in bytes.
- `rpc_request_id` (8 bytes): A single YSQL query can generate multiple RPCs. Each RPC can be differentiated with this rpc_request_id. This is not globally unique. This is generated for incoming requests. For PG, it's the same as the last 8 bytes of `root_request_id`.
- `query_id` (8 bytes): Query id as seen on pg_stat_statements to identify identical normalized queries. There might be many queries with different root_request_id but with the same query_id.
- `client_addr` (16 bytes): IP address of the client which sent the query to PG.
- `client_port` (2 bytes): Port of the client which sent the query to PG.
- `addr_family` (1 byte): Stores the address family of the socket.

`client_addr` and `client_port` is null if `addr_family` is not `AF_INET` or `AF_INET6`.

The ASH metadata is stored in `PGPROC` because the ASH collector should be able to read this data. Another reason is that `wait_event_info` is also stored inside `PGPROC` which will be used by ASH to determine the wait event.

For utility statements, query id is always set to zero and needs special handling, this will be addressed in a future diff. The ASH metadata instrumentation is going to be polished in future diffs.

This diff also updates the `yb_enable_ash` GFlag from a preview flag to a test flag.

Additionally, there are alternative methods distinct from the `yb_enable_ash` GFlag for enabling or disabling ASH, which is not intended. More details on this issue (#20180).

Upgrade/Rollback safety: Safe to upgrade/downgrade. Protos only affect the new functionality -- ASH, and do not interfere with existing functionality. ASH will be unavailable if downgraded. Moreover, `yb_enable_ash` Gflag is used to guard the feature.
Jira: DB-7933

Test Plan: Jenkins: compile-only

Reviewers: hbhanawat, amitanand, myang, jason

Reviewed By: jason

Subscribers: jason, yql, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D28659
@hbhanawat hbhanawat removed the status/awaiting-triage Issue awaiting triage label Dec 8, 2023
@hbhanawat hbhanawat changed the title [YSQL][ActiveHistory] Refactor PG Active Session History code [YSQL][ASH] Refactor PG Active Session History code Jan 3, 2024
abhinab-yb added a commit that referenced this issue Jan 9, 2024
Summary:
D29238 / f1f252a introduced ASH
as an extension which had a background worker to collect the ASH samples from PG
and local tserver, and a circular buffer in shared memory. It was possible to turn the
feature on partially by turning off the gflag `TEST_yb_enable_ash` and adding the
extension `yb_ash` in the `shared_preload_libraries` guc. This starts the collector
and allocates shared memory but the rest of the feature doesn't work.

This diff moves the code from extension to core PG so that the only way to turn on/off
the feature is by using the gflag.

List of changes
- The ASH GUC variables are initialized in `guc.c`
- The hooks to instrument ASH metadata are installed while initializing the postgres backend
- Shared memory for ASH is allocated and initialized during postmaster start
- A new builtin tranche id is added for the circular buffer lock
- The background worker is added to the list of internal workers as it starts under postmaster
- The library name of the background worker is updated to `postgres` for it to start as an internal worker (There are some PG checks which force this)
- `YbStoreAshSamples` doesn't need a callback anymore, it can directly call the `yb_ash.h` function
- The external functions of `yb_ash.h` are updated to pascal case. The internal functions will be updated in a separate diff
Jira: DB-9125

Test Plan: Jenkins

Reviewers: jason

Reviewed By: jason

Subscribers: yql, hbhanawat, amitanand

Differential Revision: https://phorge.dev.yugabyte.com/D31068
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants