Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL][ActiveHistory] Instrument the query paths so that the AUH metadata is generated and is passed to tserver #19135

Closed
1 task done
hbhanawat opened this issue Sep 17, 2023 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@hbhanawat
Copy link
Contributor

hbhanawat commented Sep 17, 2023

Jira Link: DB-7933

Description

- Fetch TServer UUID using an API call. Set/reset - query id, toplevel request id, client node ip, top level node id for all queries. Use AUH hooks wherever possible. Update RPCs to have AUH metadata. Random number generator for top level request id

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@hbhanawat hbhanawat added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage area/active-history Active history of sessions labels Sep 17, 2023
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue labels Sep 17, 2023
@yugabyte-ci yugabyte-ci removed area/active-history Active history of sessions status/awaiting-triage Issue awaiting triage labels Sep 17, 2023
@hbhanawat hbhanawat added the area/active-history Active history of sessions label Sep 18, 2023
@yugabyte-ci yugabyte-ci removed the area/active-history Active history of sessions label Oct 13, 2023
abhinab-yb added a commit to abhinab-yb/yugabyte-db that referenced this issue Nov 7, 2023
… and pack it with Perform RPCs

    Summary:
    This diff adds Active Universe History (AUH) metadata to the outgoing `Perform` RPCs from `pggate` to the local tserver so that the wait states in the tserver layer have enough context. The following fields make up the AUH metadata -

    - `top_level_request_id` (16 bytes): A unique ID generated per query.
    - `top_level_node_id` (16 bytes): PG node where the YSQL query is being executed.
    - `client_node_ip` (8 bytes): Client node from where the query is generated.
    - `current_request_id` (6 bytes): Request ID per Perform RPC. This is not globally unique.
    - `query_id` (8 bytes): Query id as seen on `pg_stat_statements`.

    The AUH metadata is maintained in two places, in the `PGPROC` struct so that the `yb_auh` extension can read from them, and in the `PgSession` class so that it can be easily packed with the outgoing Perform RPCs. The metadata in `PGPROC` and `PgSession` is always the same.

    `top_level_request_id` is generated by using the newly introduced `XorshiftRandomNumberGenerator` class, this uses the Xorshift algorithm to generate random numbers, which is pretty fast. `top_level_node_id` is same as the local tserver uuid. A new RPC `GetTserverUuid` is introduced which fetches the tserver uuid from the local tserver and stores it in the PG backend every time a new PG process starts. `client_node_ip` is extracted the same way as `pg_stat_activity`, only ipv4 addresses are supported for now. `current_request_id` is an increasing counter, reset to `0` every time a new `top_level_request_id` is generated. `query_id` is being set using hooks in the `yb_auh` extension. For utility statements, the `query_id` is not being set properly, and that will be addressed in future revisions.

    `query_id` is being set from 3 different hooks because the code paths are different for prepared and unprepared statements, reads and writes.

    Depends on D29238

    Test Plan: ```./yb_build.sh --cxx-test pgwrapper_pg_auh-test

    Reviewers: hbhanawat, amitanand, myang, jason

    Subscribers: jason, yql, ybase, bogdan

    Differential Revision: https://phorge.dev.yugabyte.com/D28659
abhinab-yb added a commit that referenced this issue Dec 7, 2023
…k it with Perform RPCs

Summary:
This diff adds Active Session History (ASH) metadata to the outgoing `Perform` RPCs from `pggate` to the local tserver so that the wait states in the tserver layer have enough context. The following fields make up the ASH metadata -

- `root_request_id` (16 bytes): A unique id corresponding to a YSQL query.
- `yql_endpoint_tserver_uuid` (16 bytes): UUID of the node where the YSQL query originated in bytes.
- `rpc_request_id` (8 bytes): A single YSQL query can generate multiple RPCs. Each RPC can be differentiated with this rpc_request_id. This is not globally unique. This is generated for incoming requests. For PG, it's the same as the last 8 bytes of `root_request_id`.
- `query_id` (8 bytes): Query id as seen on pg_stat_statements to identify identical normalized queries. There might be many queries with different root_request_id but with the same query_id.
- `client_addr` (16 bytes): IP address of the client which sent the query to PG.
- `client_port` (2 bytes): Port of the client which sent the query to PG.
- `addr_family` (1 byte): Stores the address family of the socket.

`client_addr` and `client_port` is null if `addr_family` is not `AF_INET` or `AF_INET6`.

The ASH metadata is stored in `PGPROC` because the ASH collector should be able to read this data. Another reason is that `wait_event_info` is also stored inside `PGPROC` which will be used by ASH to determine the wait event.

For utility statements, query id is always set to zero and needs special handling, this will be addressed in a future diff. The ASH metadata instrumentation is going to be polished in future diffs.

This diff also updates the `yb_enable_ash` GFlag from a preview flag to a test flag.

Additionally, there are alternative methods distinct from the `yb_enable_ash` GFlag for enabling or disabling ASH, which is not intended. More details on this issue (#20180).

Upgrade/Rollback safety: Safe to upgrade/downgrade. Protos only affect the new functionality -- ASH, and do not interfere with existing functionality. ASH will be unavailable if downgraded. Moreover, `yb_enable_ash` Gflag is used to guard the feature.
Jira: DB-7933

Test Plan: Jenkins: compile-only

Reviewers: hbhanawat, amitanand, myang, jason

Reviewed By: jason

Subscribers: jason, yql, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D28659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants