lightning: add send-kv-size to avoid oom when each kv is large on default config #43870

D3Hunter · 2023-05-16T07:52:34Z

What problem does this PR solve?

Issue Number: close #43853

Problem Summary:

What is changed and how it works?

add config send-kv-size together with existing config send-kv-pairs to control how much data to accumulate before send to tikv

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
- there's 1 file of 1G in source data, each row is about 20K
- set range-concurrency=1(2 thread writing concurrently); use a separate membuf.Pool for writeToTiKV and disable CGO version of membuf pool so we can know how much memory used by it; add sleep before send to tikv, so we can take a heap profile

before this pr:

after

No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

add config `send-kv-size` together with existing config `send-kv-pairs` to control how much data to accumulate before send to tikv

ti-chi-bot · 2023-05-16T07:52:36Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

gozssky
lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

D3Hunter · 2023-05-16T07:53:07Z

br/pkg/lightning/backend/local/local.go

@@ -395,7 +395,9 @@ type BackendConfig struct {
 	ConnCompressType config.CompressionType
 	// concurrency of generateJobForRange and import(write & ingest) workers
 	WorkerConcurrency int
-	KVWriteBatchSize  int


renamed to KVWriteBatchCount

we'd better use a new name, rather than change the meaning of it. If we debug in different release version this is confusing

we have this name start from 7.1, if we pick to 7.1, i think it's ok

D3Hunter · 2023-05-16T08:35:57Z

/run-integration-br-test

D3Hunter · 2023-05-16T09:11:12Z

/run-integration-br-test

D3Hunter · 2023-05-16T09:59:51Z

/run-integration-br-test

D3Hunter · 2023-05-16T10:58:05Z

/run-integration-br-test

purelind · 2023-05-17T01:32:55Z

/run-integration-br-test

D3Hunter · 2023-05-17T02:18:47Z

https://ci.pingcap.net/blue/organizations/jenkins/tidb_ghpr_integration_br_test/detail/tidb_ghpr_integration_br_test/12581/pipeline
ci pass

br/pkg/lightning/backend/local/region_job.go

lance6716

rest lgtm

lance6716 · 2023-05-17T03:18:20Z

br/pkg/lightning/backend/local/local.go

@@ -395,7 +395,9 @@ type BackendConfig struct {
 	ConnCompressType config.CompressionType
 	// concurrency of generateJobForRange and import(write & ingest) workers
 	WorkerConcurrency int
-	KVWriteBatchSize  int


we'd better use a new name, rather than change the meaning of it. If we debug in different release version this is confusing

lance6716 · 2023-05-17T03:23:00Z

br/pkg/lightning/config/config.go

+	KVWriteBatchCount = 32768
+	// KVWriteBatchSize batch size when write to TiKV.
+	// this is the default value of linux send buffer size(net.ipv4.tcp_wmem) too.
+	KVWriteBatchSize        = 16 * units.KiB


It's too small considering a large row KV. Maybe we should use 4M which is the maximun value of it?

it depends on how often do customer tune network parameters, if user do set to 4M
(256 * (accumulate 16k + serialize + append to linux send buffer)) + send to network compared to (accumulate 4M + serialize + append to linux send buffer) + send to network, seems no much difference, will test the default tcp_wmem + 16k/4m batch size using qa env

i misunderstand, it works like this:

When a new TCP connection is established, a Send Buffer will be created using the default value (16KB); the buffer size will then be automatically adjusted within the maximum and minimum boundaries as needed and based on usage.

D3Hunter · 2023-05-17T07:03:03Z

no speed difference during write/ingest whether use default 16k or 4m(573.1GiB source data)

sh-5.1# grep 'send-kv-size' /tmp/sorted-kv-dir/lightning-default.log|sed -e 's/.*\(send-kv-size\\":\w*\).*/\1/g'
send-kv-size\":16384
send-kv-size\":16384
sh-5.1# grep 'import completed' /tmp/sorted-kv-dir/lightning-default.log
[2023/05/17 05:13:57.019 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:1] [engineUUID=6a3dab49-c31c-5ac7-bde9-6b69aff9c175] [retryCnt=0] [takeTime=26m34.527426011s] []
[2023/05/17 05:13:57.959 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:2] [engineUUID=c3eca481-19b1-54ea-96e9-8c81442d8797] [retryCnt=0] [takeTime=26m35.267945546s] []
[2023/05/17 05:13:58.719 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:0] [engineUUID=37838caf-8257-5e42-98b2-a4c68350cf98] [retryCnt=0] [takeTime=26m39.689968454s] []
[2023/05/17 05:13:58.803 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:5] [engineUUID=1e94b89f-1c47-5334-a1b7-2ec9883de322] [retryCnt=0] [takeTime=26m39.756035012s] []
[2023/05/17 05:13:58.854 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:4] [engineUUID=91ff3623-ac29-5a10-b0b8-41236476f52d] [retryCnt=0] [takeTime=26m35.24200116s] []
[2023/05/17 05:13:58.914 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:3] [engineUUID=1f8172f3-a74c-575e-a955-d25336615364] [retryCnt=0] [takeTime=26m35.506627828s] []
[2023/05/17 05:14:04.846 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:-1] [engineUUID=d5b7ce1d-48da-51e0-ab64-c572d2662ff5] [retryCnt=0] [takeTime=4.795442123s] []
sh-5.1# grep 'send-kv-size' /tmp/sorted-kv-dir/lightning.log|sed -e 's/.*\(send-kv-size\\":\w*\).*/\1/g'
send-kv-size\":4194304
sh-5.1# grep 'import completed' /tmp/sorted-kv-dir/lightning.log
[2023/05/17 06:51:16.150 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:2] [engineUUID=c3eca481-19b1-54ea-96e9-8c81442d8797] [retryCnt=0] [takeTime=26m20.269360517s] []
[2023/05/17 06:51:18.395 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:4] [engineUUID=91ff3623-ac29-5a10-b0b8-41236476f52d] [retryCnt=0] [takeTime=26m18.665723666s] []
[2023/05/17 06:51:18.641 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:5] [engineUUID=1e94b89f-1c47-5334-a1b7-2ec9883de322] [retryCnt=0] [takeTime=26m30.520199639s] []
[2023/05/17 06:51:18.672 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:1] [engineUUID=6a3dab49-c31c-5ac7-bde9-6b69aff9c175] [retryCnt=0] [takeTime=26m22.901256154s] []
[2023/05/17 06:51:18.681 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:0] [engineUUID=37838caf-8257-5e42-98b2-a4c68350cf98] [retryCnt=0] [takeTime=26m29.639917829s] []
[2023/05/17 06:51:18.699 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:3] [engineUUID=1f8172f3-a74c-575e-a955-d25336615364] [retryCnt=0] [takeTime=26m23.120782966s] []
[2023/05/17 06:51:23.438 +00:00] [INFO] [backend.go:350] ["import completed"] [engineTag=`sysbench`.`user_data1`:-1] [engineUUID=d5b7ce1d-48da-51e0-ab64-c572d2662ff5] [retryCnt=0] [takeTime=3.719838598s] []

D3Hunter · 2023-05-17T12:28:41Z

/retest

D3Hunter · 2023-05-17T15:28:44Z

/retest

D3Hunter · 2023-05-18T02:21:41Z

/merge

ti-chi-bot · 2023-05-18T02:21:44Z

This pull request has been accepted and is ready to merge.

Commit hash: 48d321d

D3Hunter · 2023-05-18T03:02:19Z

/retest

D3Hunter · 2023-05-18T03:32:19Z

/retest

D3Hunter · 2023-05-18T03:57:26Z

/retest

D3Hunter · 2023-05-18T04:23:18Z

/retest

D3Hunter · 2023-05-18T06:04:57Z

/retest

D3Hunter · 2023-05-18T07:19:34Z

unstable cases will be fixed in #43880

D3Hunter · 2023-05-18T09:15:08Z

/merge

ti-chi-bot · 2023-05-18T09:15:12Z

This pull request has been accepted and is ready to merge.

Commit hash: 86b6e11

ti-chi-bot · 2023-05-18T09:50:19Z

In response to a cherrypick label: new pull request created to branch release-7.1: #43964.

…ault config (pingcap#43870) close pingcap#43853

ti-chi-bot · 2023-05-18T09:51:02Z

In response to a cherrypick label: new pull request created to branch release-6.5: #43965.

Signed-off-by: ti-chi-bot <[email protected]>

D3Hunter added 2 commits May 16, 2023 12:10

change

aec2099

change

96cd35d

ti-chi-bot bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 16, 2023

D3Hunter requested review from sleepymole and lance6716 May 16, 2023 07:52

ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 16, 2023

D3Hunter commented May 16, 2023

View reviewed changes

D3Hunter added the component/lightning This issue is related to Lightning of TiDB. label May 16, 2023

ti-chi-bot bot added needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. labels May 16, 2023

sleepymole reviewed May 17, 2023

View reviewed changes

br/pkg/lightning/backend/local/region_job.go Show resolved Hide resolved

lance6716 reviewed May 17, 2023

View reviewed changes

sleepymole approved these changes May 17, 2023

View reviewed changes

ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 17, 2023

ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2023

Merge remote-tracking branch 'origin/master' into send-kv-size

48d321d

ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2023

lance6716 approved these changes May 18, 2023

View reviewed changes

ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 18, 2023

ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 18, 2023

Merge remote-tracking branch 'origin/master' into send-kv-size

86b6e11

ti-chi-bot bot removed the status/can-merge Indicates a PR has been approved by a committer. label May 18, 2023

ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 18, 2023

ti-chi-bot bot merged commit ca62944 into master May 18, 2023

ti-chi-bot bot deleted the send-kv-size branch May 18, 2023 09:49

ti-chi-bot mentioned this pull request May 18, 2023

lightning: add send-kv-size to avoid oom when each kv is large on default config (#43870) #43964

Closed

12 tasks

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request May 18, 2023

lightning: add send-kv-size to avoid oom when each kv is large on def…

bb45f8b

…ault config (pingcap#43870) close pingcap#43853

ti-chi-bot mentioned this pull request May 18, 2023

lightning: add send-kv-size to avoid oom when each kv is large on default config (#43870) #43965

Open

12 tasks

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request May 18, 2023

This is an automated cherry-pick of pingcap#43870

fe71788

Signed-off-by: ti-chi-bot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lightning: add send-kv-size to avoid oom when each kv is large on default config #43870

lightning: add send-kv-size to avoid oom when each kv is large on default config #43870

D3Hunter commented May 16, 2023 •

edited

Loading

ti-chi-bot bot commented May 16, 2023 •

edited

Loading

D3Hunter May 16, 2023

lance6716 May 17, 2023

D3Hunter May 17, 2023

D3Hunter commented May 16, 2023

D3Hunter commented May 16, 2023

D3Hunter commented May 16, 2023

D3Hunter commented May 16, 2023

purelind commented May 17, 2023

D3Hunter commented May 17, 2023

lance6716 left a comment

lance6716 May 17, 2023

lance6716 May 17, 2023

D3Hunter May 17, 2023 •

edited

Loading

D3Hunter May 17, 2023

D3Hunter commented May 17, 2023 •

edited

Loading

D3Hunter commented May 17, 2023

D3Hunter commented May 17, 2023

D3Hunter commented May 18, 2023

ti-chi-bot bot commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

ti-chi-bot bot commented May 18, 2023

ti-chi-bot commented May 18, 2023

ti-chi-bot commented May 18, 2023

lightning: add send-kv-size to avoid oom when each kv is large on default config #43870

lightning: add send-kv-size to avoid oom when each kv is large on default config #43870

Conversation

D3Hunter commented May 16, 2023 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot bot commented May 16, 2023 • edited Loading

D3Hunter May 16, 2023

Choose a reason for hiding this comment

lance6716 May 17, 2023

Choose a reason for hiding this comment

D3Hunter May 17, 2023

Choose a reason for hiding this comment

D3Hunter commented May 16, 2023

D3Hunter commented May 16, 2023

D3Hunter commented May 16, 2023

D3Hunter commented May 16, 2023

purelind commented May 17, 2023

D3Hunter commented May 17, 2023

lance6716 left a comment

Choose a reason for hiding this comment

lance6716 May 17, 2023

Choose a reason for hiding this comment

lance6716 May 17, 2023

Choose a reason for hiding this comment

D3Hunter May 17, 2023 • edited Loading

Choose a reason for hiding this comment

D3Hunter May 17, 2023

Choose a reason for hiding this comment

D3Hunter commented May 17, 2023 • edited Loading

D3Hunter commented May 17, 2023

D3Hunter commented May 17, 2023

D3Hunter commented May 18, 2023

ti-chi-bot bot commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

D3Hunter commented May 18, 2023

ti-chi-bot bot commented May 18, 2023

ti-chi-bot commented May 18, 2023

ti-chi-bot commented May 18, 2023

D3Hunter commented May 16, 2023 •

edited

Loading

ti-chi-bot bot commented May 16, 2023 •

edited

Loading

D3Hunter May 17, 2023 •

edited

Loading

D3Hunter commented May 17, 2023 •

edited

Loading