Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't hold lock during flushing #27

Merged
merged 1 commit into from
Mar 1, 2022
Merged

Conversation

YuJuncen
Copy link
Collaborator

@YuJuncen YuJuncen commented Feb 28, 2022

Currently, we hold the tasks mutex during the whole procedure of flushing, but the mutex must be retained by each time we save the event to local file. This made a long tail of consuming batch:

image

We should release the lock sooner for eliding that. This is effective according to the test result:

image

This PR also skips the whole event when the event is empty (Maybe only operations on Lock CF), which is more frequent than expected (The log only contains 5 mins of events):

root@tikvs:/tidb-deploy/tikv-20160/log# cat tikv.log | grep 'the apply events' | grep 'len=0' |  wc -l
323499

NOTE: The time cost of flushing is a little longer than expected... (Averagely ~30s)

[2022/02/28 15:33:37.131 +08:00] [INFO] [endpoint.rs:386] ["flushing and refreshing checkpoint ts."] [checkpoint_ts=431497828298129485]
[2022/02/28 15:37:08.984 +08:00] [INFO] [router.rs:401] ["try flushing task"] [size=134225043] [task=test2]
[2022/02/28 15:37:43.754 +08:00] [INFO] [router.rs:822] ["flush done"] [cost=34.768s]
[2022/02/28 15:37:43.754 +08:00] [INFO] [endpoint.rs:386] ["flushing and refreshing checkpoint ts."] [checkpoint_ts=431497891763191864]
[2022/02/28 15:41:17.751 +08:00] [INFO] [router.rs:401] ["try flushing task"] [size=134218417] [task=test2]
[2022/02/28 15:41:50.731 +08:00] [INFO] [router.rs:822] ["flush done"] [cost=32.981s]
[2022/02/28 15:41:50.731 +08:00] [INFO] [endpoint.rs:386] ["flushing and refreshing checkpoint ts."] [checkpoint_ts=431497956840439886]
[2022/02/28 15:45:31.012 +08:00] [INFO] [router.rs:401] ["try flushing task"] [size=134218065] [task=test2]
[2022/02/28 15:46:05.932 +08:00] [INFO] [router.rs:822] ["flush done"] [cost=34.917s]
[2022/02/28 15:46:05.933 +08:00] [INFO] [endpoint.rs:386] ["flushing and refreshing checkpoint ts."] [checkpoint_ts=431498023372587109

Copy link
Owner

@3pointer 3pointer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@joccau joccau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joccau joccau merged commit 9c3c279 into 3pointer:br-stream Mar 1, 2022
YuJuncen added a commit that referenced this pull request Apr 26, 2022
…h `br-stream`:

- backup-stream: update kvproto to master [by @3pointer] (#48)

- Added retry for initial scanning and some metrics [by @YuJuncen] (#47)

- Added a new observer type 'pitr' [by @YuJuncen] (#46)

- backup-stream: fix some bugs in log backup [by @3pointer] (#45)

- Refactor resolver [by @YuJuncen] (#44)

- report error to PD server [by @YuJuncen] (#43)

- br: support pause/resume stream task [by @joccau] (#42)

- Adapt error code for `endpoint::Error` and implement the contextual error [by @YuJuncen] (#41)

- set uuid for header to prevent raftstore merging [by @YuJuncen] (#40)

- fix the upload part of S3 storage [by @YuJuncen] (#39)

- Use min ts of mem lock [by @YuJuncen] (#38)

- Fix size leakage and build [by @YuJuncen] (#37)

- eliminate the block call in ticker [by @YuJuncen] (#36)

- br: support checkSum during stream restore dml kv-events [by @joccau] (#35)

- refine br-stream to backup-stream [by @3pointer] (#34)

- Allow local storage support directory and partition the log files by table [by @YuJuncen] (#33)

- br-stream: added store error to the store [by @YuJuncen] (#32)

- Scan on Leader Change [by @YuJuncen] (#31)

- use local thread pool for downloading [by @YuJuncen] (#30)

- display the error when failed to get snapshot [by @YuJuncen] (#29)

- br-stream: remove duplicate entry in apply kv file [by @3pointer] (#28)

- don't hold lock during flushing [by @YuJuncen] (#27)

- fix listener on follower if region changed [by @YuJuncen] (#26)

- Update Service GC Safe Point after Flushing [by @YuJuncen] (#25)

- br-stream: use raft router to apply kv files for sst_importer [by @3pointer] (#24)

- added integration test framework [by @YuJuncen] (#22)

- br-stream: add restore ts to filter data out of range. [by @3pointer] (#20)

- *: batch write to temp file [by @YuJuncen] (#19)

- omit coping in EventIterator, use write batch for apply [by @YuJuncen] (#18)

- br: don't flush to externStorage periodically when have empty kv-record [by @joccau] (#17)

- added SegmentMap to replace the BTreeMap [by @YuJuncen] (#16)

- added resolved ts related metrics [by @YuJuncen] (#15)

- Added resolved timestamp uploading [by @YuJuncen] (#14)

- add flush tick [by @3pointer] (#13)

- implement stream restore for tikv side. [by @3pointer] (#12)

- added metrics [by @YuJuncen] (#11)

- update kvproto [by @YuJuncen] (#9)

- Initial scanning && Error reporting [by @YuJuncen] (#8)

- resolve conflict [by @3pointer] (#7)

- br: backup stream: support flushing temp files to ExternalStorage [by @joccau] (#6)

- encoder: move encoder to a independent mod [by @3pointer] (#5)

- br-stream: don't clone the key & value in encode_event() [by @kennytm] (#4)

- br: backup stream: Modify log print format [by @joccau] (#3)

- br-stream: reduce the lock of tables; added some metrics [by @YuJuncen] (#2)

For more details of these commits, please check the origin feature branch at https://github.com/3pointer/tikv/tree/br-stream.

Signed-off-by: Yu Juncen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants