[DocDB] Support 100k tables on a single colocated database #14031

lingamsandeep · 2022-09-16T00:47:12Z

Jira Link: DB-3529

Description

This is a tracking task used to capture all the work needed to support ~100k tables on a single colocated database:

⬜️ Create Table latency improvements
⬜️ Optimize caching on yb-master
⬜️ Backup/Restore improvements for handling 100k objects
⬜️ XCluster Setup/Teardown optimizations

Create Table Improvements

Create table latency on a colocated database increases as more tables are created. This is a result of coupling of table/tablet metadata resulting in a rewrite of the metadata every time a new table is created - thereby resulting in increased latency as number of tables create. Potential optimization here includes docoupling table/tablet metadata and persisting them independently

Caching on yb-master

Currently TableInfo for all tables is cached on the yb-master. At 100k tables, this can be optimized to page in TableInfo as needed and maintain only the 'hot' tables in the cache

Backup/Restore for 100k tables

Currently several of the APIs that Backup/Restore rely on such CreateDatabaseSnapshot/ImportDatabaseSnapshot either take extremely long or timeout due to OOM on yb-master.

XCluster Setup/Teardown

XCluster config tracks individual tables involved in replication. For colocated databases with ~100k tables - consider enabling database scoped replication with no individual tracking of tables.

kmuthukk · 2022-09-19T02:49:54Z

Noting some observations on current status:

Setup:

This test script was used to created 60K tables against a c5.xlarge RF=3, 3-node cluster.

Release used was 2.13.3-b45, but this aspect probably hasn't changed much in more recent release. Nevertheless, might be worth double checking in a more recent release also.

Summary: Creating 60K tables in the colocated database took about 11hrs. The initial tables took about 100ms, progressively getting slower and towards the end, by the 60K-th table, it was taking about 1sec each.

Node Graphs

yb-master graphs

yb-tserver graphs

On yb-master, the system catalog tablet-00000000000000000000000000000000 had about:

[yugabyte@ip-172-151-29-225 table-sys.catalog.uuid]$ du -hs ./*
693M    ./tablet-00000000000000000000000000000000
3.1M    ./tablet-00000000000000000000000000000000.intents
0       ./tablet-00000000000000000000000000000000.snapshots

We can see the same from yb-master's SSTable size graphs above.

On the yb-tserver, the SSTables represent the data of user tables. So that doesn't grow much, the tablet metadata for the colocated database (in which all the 60K tables reside) keeps growing to ~10MB.

[yugabyte@ip-172-151-29-225 tablet-meta]$ du -hs /mnt/d0/yb-data/tserver/tablet-meta/d24b79fa2f91443a86923c0e695d7084
9.8M    /mnt/d0/yb-data/tserver/tablet-meta/d24b79fa2f91443a86923c0e695d7084

One reason for progressively increasing times for create table inside a colocated database is that the metadata, for the colocated tablet, keeps growing (it is not sharded today) and is written back in its entirety each time. So we are writing back increasing amounts of data back as number of tables increases.

arpang · 2022-10-20T04:28:37Z

@kmuthukk a quick update on storing table metadata in DocDB instead of superblock:

We implemented a prototype to assess the perf improvements.

Since our users will be using Tablegroups and not colocated databases post-Tablegroup GA, the numbers are for Tablegroups.

For RF 1, the total time for creating 60K colocated tables on local setup:

Baseline: 6.3 hours
With all TableInfoPBs in DocDB: 3.4 hours

Source of gains: In the colocated create table path, the master makes an AddTableToTablet RPC to Tserver. This RPC without the optimization took ~350 ms for the 60Kth table, which is down to ~ 1 ms post-optimization. This is because flushing the entire protobuf container file on disk (disk IO) has been replaced with writing a single TableInfoPB to DocDB (memtable write).

Currently the design document is under review.

This specific task of storing table metadata in DocDB is being tracked at #14221. This issue, ie (##14031), represents the larger umbrella effort.

Additional not-too-critical finding:

There is a significant difference in the performance of table creation in tablegroups and colocated databases due to implementation differences.

For RF 1, without the above optimization, the total time creation on local setup

With tablegroups: 6.3 hours
With colocated database: 11.7 hours.

The performance difference is partially being resolved via: D20132

lingamsandeep added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Sep 16, 2022

lingamsandeep self-assigned this Sep 16, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Sep 16, 2022

arpang self-assigned this Sep 22, 2022

yugabyte-ci unassigned arpang Sep 28, 2022

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocDB] Support 100k tables on a single colocated database #14031

[DocDB] Support 100k tables on a single colocated database #14031

lingamsandeep commented Sep 16, 2022 •

edited

Loading

kmuthukk commented Sep 19, 2022

arpang commented Oct 20, 2022 •

edited

Loading

[DocDB] Support 100k tables on a single colocated database #14031

[DocDB] Support 100k tables on a single colocated database #14031

Comments

lingamsandeep commented Sep 16, 2022 • edited Loading

Description

Create Table Improvements

Caching on yb-master

Backup/Restore for 100k tables

XCluster Setup/Teardown

kmuthukk commented Sep 19, 2022

arpang commented Oct 20, 2022 • edited Loading

lingamsandeep commented Sep 16, 2022 •

edited

Loading

arpang commented Oct 20, 2022 •

edited

Loading