[DocDB] Turning on pg_client_use_shared_memory causes regression in scan workloads #23999

spolitov · 2024-09-18T04:20:41Z

Description

Currently we use page size to create shared memory segment for pg client communication.
By default it is just 4KB.
When response does not fit into segment, it is transferred via RPC.
Such logic increase latency. Nearly 10ms per 1000 requeses.

During scan we fetch data by chunks of 1000 rows.
So nearly all scan read responses does not fit into 4KB and we fallback to RPC every time.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

Summary: Currently we use page size to create shared memory segment for pg client communication. By default it is just 4KB. When response does not fit into segment, it is transferred via RPC. Such logic increase latency. Nearly 10ms per 1000 requeses. During scan we fetch data by chunks of 1000 rows. So nearly all scan read responses does not fit into 4KB and we fallback to RPC every time. This diff introduces intermediate shared memory buffers, that are greater in size. And could be reused by different postgres connections. Read time changes in newly added test (PgSingleTServerTest.ScanOneColumn): Don't use shared memory at all: 1.17s Only 4KB segments: 1.35s With intermediate big buffers: 1.06s Jira: DB-12886 Test Plan: PgSingleTServerTest.ScanOneColumn Reviewers: rthallam, esheng Reviewed By: esheng Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D38149

Summary: 12b2c40 [#23999] DocDB: Big shared memory segments b1e6329 [PLAT-15279] Add gzip compression to core dumps from DB. 06472d5 [#24050] docdb: Fix re-packing rows after alter table add column with default value 9009d11 [#23837] YSQL: Temporarily disable some tests with Connection Manager enabled 11acca7 [#23325][#23326] yugabyted: Support for adding new databases for xCluster replication (Phase 2) 96703da [PLAT-15465][PLAT-15466] Minor fixes in YNP c5aca3b [PLAT-14924][PLAT-12829][PLAT-15446] - ui bugs and improvements 6e82692 [#23770] [#23797] YSQL: Stabilise some test failures with Connection Manager enabled b50bd1b [PLAT-15279] Adjusting the core pattern to create the cores with the core_ prefix for collect cores to catch it f692a60 [PLAT-14045] UBI-8 images don't have hostname d6a19da [PLAT-15377] Adding a global uncaught exception handler to yugaware acbb1ba [PLAT-15225] Verify there is no running master on nodes selected for master replacement Excluded: 3e93354 [#23686] YSQL: Build relcache foreign key list from YB catcache Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: tfoucher, fizaa, telgersma Differential Revision: https://phorge.dev.yugabyte.com/D38503

Summary: Currently we use page size to create shared memory segment for pg client communication. By default it is just 4KB. When response does not fit into segment, it is transferred via RPC. Such logic increase latency. Nearly 10ms per 1000 requeses. During scan we fetch data by chunks of 1000 rows. So nearly all scan read responses does not fit into 4KB and we fallback to RPC every time. This diff introduces intermediate shared memory buffers, that are greater in size. And could be reused by different postgres connections. Read time changes in newly added test (PgSingleTServerTest.ScanOneColumn): Don't use shared memory at all: 1.17s Only 4KB segments: 1.35s With intermediate big buffers: 1.06s Original commit: 12b2c40/D38149 Jira: DB-12886 Test Plan: PgSingleTServerTest.ScanOneColumn Reviewers: rthallam, esheng Reviewed By: esheng Subscribers: ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D38684

spolitov added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Sep 18, 2024

spolitov self-assigned this Sep 18, 2024

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Sep 18, 2024

rthallamko3 added 2024.2 Backport Required 2024.1 Backport Required labels Sep 30, 2024

qvad mentioned this issue Oct 3, 2024

[DocDB] Shared memory related core dumps in tserver/pg_gate/postgres processes in aggressive cgroups test #24263

Closed

1 task

rthallamko3 removed the 2024.1 Backport Required label Oct 14, 2024

rthallamko3 closed this as completed Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocDB] Turning on pg_client_use_shared_memory causes regression in scan workloads #23999

[DocDB] Turning on pg_client_use_shared_memory causes regression in scan workloads #23999

spolitov commented Sep 18, 2024 •

edited by jira bot

Loading

[DocDB] Turning on pg_client_use_shared_memory causes regression in scan workloads #23999

[DocDB] Turning on pg_client_use_shared_memory causes regression in scan workloads #23999

Comments

spolitov commented Sep 18, 2024 • edited by jira bot Loading

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

spolitov commented Sep 18, 2024 •

edited by jira bot

Loading