Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC cloud: 60k tables & 9 captures, CDC sync task impact tidb performance more than 10 times #2448

Closed
Tammyxia opened this issue Aug 3, 2021 · 1 comment · Fixed by tikv/tikv#10666
Assignees
Labels
priority/P0 The issue has P0 priority. subject/new-feature Denotes an issue or pull request adding a new feature. subject/performance Denotes an issue or pull request is related to replication performance.

Comments

@Tammyxia
Copy link

Tammyxia commented Aug 3, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
  • 6w tables, 1x changefeed, 9x capture
  • workload: $go-ycsb run mysql -P ${workload_dir}/betting -p operationcount=50000000 -p mysql.host=$up_cluster_ip -p mysql.port=4000 --threads 200 -p dbnameprefix=test_betting_ -p databaseproportions=1.0 -p unitnameprefix=unit$i_ -p unitscount=100 -p tablecount=60000
  1. What did you expect to see?
    cdc sync task impact tidb performace <=20%
  2. What did you see instead?
  • cdc sync task impact tidb performace > 10 times

  • when cdc sync task is working, tidb duration 95 is 1.46s - 3.89s, then pause cdc sync task, tidb duration 95 is 1ms - 13.7ms.
    image

  • when cdc sync task is working, tidb qps is 1-5, then pause cdc sync task, tidb qps is 249-556
    image

  1. Versions of the cluster

    • Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

      4.0.14
      
    • TiCDC version (execute cdc version):

      4.0.14 patched 7/30
      
@Tammyxia Tammyxia added priority/P0 The issue has P0 priority. subject/new-feature Denotes an issue or pull request adding a new feature. labels Aug 3, 2021
@Tammyxia Tammyxia changed the title CDC cloud: 6w tables + 9 capture, CDC sync task impact tidb I/O performance 10 times CDC cloud: 6w tables & 9 captures, CDC sync task impact tidb performance more than 10 times Aug 3, 2021
@amyangfei amyangfei added the subject/performance Denotes an issue or pull request is related to replication performance. label Aug 3, 2021
@overvenus
Copy link
Member

This issue is likely caused the resolvedts messages sent by TiKV is too large. In the current implementation, TiKV broadcast the same resolved ts message (that contains all region ID) to every connection, so the total size is O(R * T) where R is the number of captured regions, and T is the number of tables in the TiKV.

https://github.com/tikv/tikv/blob/28292ee1f3810c9e00127d59ddb7d73fe3ca02cf/components/cdc/src/endpoint.rs#L630-L645

A quick solution is to let TiKV sends resolved ts message that only contains region IDs captured by connections. By this way we reduce the total size to O(R).

@overvenus overvenus self-assigned this Aug 4, 2021
@amyangfei amyangfei changed the title CDC cloud: 6w tables & 9 captures, CDC sync task impact tidb performance more than 10 times CDC cloud: 60k tables & 9 captures, CDC sync task impact tidb performance more than 10 times Aug 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P0 The issue has P0 priority. subject/new-feature Denotes an issue or pull request adding a new feature. subject/performance Denotes an issue or pull request is related to replication performance.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants