Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Support 100k tables on a single colocated database #14031

Open
lingamsandeep opened this issue Sep 16, 2022 · 2 comments
Open

[DocDB] Support 100k tables on a single colocated database #14031

lingamsandeep opened this issue Sep 16, 2022 · 2 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@lingamsandeep
Copy link
Contributor

lingamsandeep commented Sep 16, 2022

Jira Link: DB-3529

Description

This is a tracking task used to capture all the work needed to support ~100k tables on a single colocated database:

⬜️ Create Table latency improvements
⬜️ Optimize caching on yb-master
⬜️ Backup/Restore improvements for handling 100k objects
⬜️ XCluster Setup/Teardown optimizations

Create Table Improvements

  • Create table latency on a colocated database increases as more tables are created. This is a result of coupling of table/tablet metadata resulting in a rewrite of the metadata every time a new table is created - thereby resulting in increased latency as number of tables create. Potential optimization here includes docoupling table/tablet metadata and persisting them independently

Caching on yb-master

  • Currently TableInfo for all tables is cached on the yb-master. At 100k tables, this can be optimized to page in TableInfo as needed and maintain only the 'hot' tables in the cache

Backup/Restore for 100k tables

  • Currently several of the APIs that Backup/Restore rely on such CreateDatabaseSnapshot/ImportDatabaseSnapshot either take extremely long or timeout due to OOM on yb-master.

XCluster Setup/Teardown

  • XCluster config tracks individual tables involved in replication. For colocated databases with ~100k tables - consider enabling database scoped replication with no individual tracking of tables.
@lingamsandeep lingamsandeep added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Sep 16, 2022
@lingamsandeep lingamsandeep self-assigned this Sep 16, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Sep 16, 2022
@kmuthukk
Copy link
Collaborator

Noting some observations on current status:

Setup:

This test script was used to created 60K tables against a c5.xlarge RF=3, 3-node cluster.

Release used was 2.13.3-b45, but this aspect probably hasn't changed much in more recent release. Nevertheless, might be worth double checking in a more recent release also.

Summary: Creating 60K tables in the colocated database took about 11hrs. The initial tables took about 100ms, progressively getting slower and towards the end, by the 60K-th table, it was taking about 1sec each.

Node Graphs
Screen Shot 2022-09-18 at 7 11 26 PM

yb-master graphs

Screen Shot 2022-09-18 at 7 14 39 PM

Screen Shot 2022-09-18 at 7 15 07 PM

Screen Shot 2022-09-18 at 7 15 14 PM

Screen Shot 2022-09-18 at 7 15 41 PM

Screen Shot 2022-09-18 at 7 15 55 PM

yb-tserver graphs
Screen Shot 2022-09-18 at 7 16 20 PM
Screen Shot 2022-09-18 at 7 16 33 PM
Screen Shot 2022-09-18 at 7 16 45 PM
Screen Shot 2022-09-18 at 7 17 00 PM
Screen Shot 2022-09-18 at 7 17 19 PM
Screen Shot 2022-09-18 at 7 24 49 PM

On yb-master, the system catalog tablet-00000000000000000000000000000000 had about:

[yugabyte@ip-172-151-29-225 table-sys.catalog.uuid]$ du -hs ./*
693M    ./tablet-00000000000000000000000000000000
3.1M    ./tablet-00000000000000000000000000000000.intents
0       ./tablet-00000000000000000000000000000000.snapshots

We can see the same from yb-master's SSTable size graphs above.

On the yb-tserver, the SSTables represent the data of user tables. So that doesn't grow much, the tablet metadata for the colocated database (in which all the 60K tables reside) keeps growing to ~10MB.

[yugabyte@ip-172-151-29-225 tablet-meta]$ du -hs /mnt/d0/yb-data/tserver/tablet-meta/d24b79fa2f91443a86923c0e695d7084
9.8M    /mnt/d0/yb-data/tserver/tablet-meta/d24b79fa2f91443a86923c0e695d7084

One reason for progressively increasing times for create table inside a colocated database is that the metadata, for the colocated tablet, keeps growing (it is not sharded today) and is written back in its entirety each time. So we are writing back increasing amounts of data back as number of tables increases.

@arpang arpang self-assigned this Sep 22, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Oct 5, 2022
@arpang
Copy link
Contributor

arpang commented Oct 20, 2022

@kmuthukk a quick update on storing table metadata in DocDB instead of superblock:

We implemented a prototype to assess the perf improvements.

Since our users will be using Tablegroups and not colocated databases post-Tablegroup GA, the numbers are for Tablegroups.

For RF 1, the total time for creating 60K colocated tables on local setup:

  • Baseline: 6.3 hours
  • With all TableInfoPBs in DocDB: 3.4 hours

Source of gains: In the colocated create table path, the master makes an AddTableToTablet RPC to Tserver. This RPC without the optimization took ~350 ms for the 60Kth table, which is down to ~ 1 ms post-optimization. This is because flushing the entire protobuf container file on disk (disk IO) has been replaced with writing a single TableInfoPB to DocDB (memtable write).

Currently the design document is under review.

This specific task of storing table metadata in DocDB is being tracked at #14221. This issue, ie (##14031), represents the larger umbrella effort.

Additional not-too-critical finding:

There is a significant difference in the performance of table creation in tablegroups and colocated databases due to implementation differences.

For RF 1, without the above optimization, the total time creation on local setup

  • With tablegroups: 6.3 hours
  • With colocated database: 11.7 hours.

The performance difference is partially being resolved via: D20132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants