Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaCache can issue excessive RPCs when looking for co-located table tablet by key #7413

Closed
ttyusupov opened this issue Mar 1, 2021 · 0 comments
Assignees
Labels
kind/bug This issue is a bug

Comments

@ttyusupov
Copy link
Contributor

As a result of the fix for #6890 there was introduced a potential perf issue for the first lookup of tablet by key for colocated tables. Instead of sending 1 RPC when doing the first lookup for colocated table and then reusing the result for all tables co-located with the first one, MetaCache is sending 1 RPC for each table (but only for the first time, after that it reuses results).

@ttyusupov ttyusupov added the kind/bug This issue is a bug label Mar 1, 2021
@ttyusupov ttyusupov self-assigned this Mar 1, 2021
ttyusupov added a commit that referenced this issue Mar 5, 2021
…itions list for co-located tables

Summary:
As a result of the fix for #6890 there was introduced a potential perf issue for the first lookup of tablet by key for colocated tables. Instead of sending 1 RPC when doing first lookup for colocated table and then reusing the result for all tables co-located with the first one, MetaCache is sending 1 more RPC each time another table co-located with the first one is queried to resolve tablet by key.
Since all colocated tables share the same tablet, we can cache the locations on the first RPC to any co-located table and then reuse the result for any MetaCache::LookupTabletByKey calls for any other table co-located with the one already queried.

Suppose we have colocated tables `Table1` and `Table2` sharing `Tablet0`, then behavior without the fix is the following:
1. Someone calls `MetaCache::LookupTabletByKey` for `Table1` and `partition_key=p`
2. `MetaCache` checks that it doesn’t have `TableData` for `Table1`, initializes `TableData` for `Table1` with the list of partitions for `Table1`, and sends an RPC to the master
3. Master returns tablet locations that contain tablet locations for both `Table1` and `Table2`, because they are colocated and share the same tablets set
4. `MetaCache` updates `TableData::tablets_by_partition` for `Table1`
5. Caller gets `Tablet0` as a response to `MetaCache::LookupTabletByKey`
6. Someone calls `MetaCache::LookupTabletByKey` for `Table2` and `partition_key=p`
7. `MetaCache` checks that it doesn’t have `TableData` for `Table2` and sends an RPC to the master

And with the fix, at step 4 `MetaCache` will also initialize `TableData` for `Table2` using the same partitions list which was used for `Table1` and will update `TableData::tablets_by_partition` for both tables. So, at step 7, `MetaCache` will have `TableData` for `Table2` and will respond with the tablet without RPC to the master.

- Fixed `MetaCache::ProcessTabletLocations` to reuse partitions list for co-located tables
- Added ClientTest.ColocatedTablesLookupTablet
- Moved most frequent VLOGS from level 4 to level 5 for `MetaCache`

Test Plan:
For ASAN/TSAN/release/debug:
```
ybd --gtest_filter ClientTest.ColocatedTablesLookupTablet -n 100 -- -p 1
```

Reviewers: mbautin, bogdan

Reviewed By: mbautin, bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D10755
polarweasel pushed a commit to lizayugabyte/yugabyte-db that referenced this issue Mar 9, 2021
…use partitions list for co-located tables

Summary:
As a result of the fix for yugabyte#6890 there was introduced a potential perf issue for the first lookup of tablet by key for colocated tables. Instead of sending 1 RPC when doing first lookup for colocated table and then reusing the result for all tables co-located with the first one, MetaCache is sending 1 more RPC each time another table co-located with the first one is queried to resolve tablet by key.
Since all colocated tables share the same tablet, we can cache the locations on the first RPC to any co-located table and then reuse the result for any MetaCache::LookupTabletByKey calls for any other table co-located with the one already queried.

Suppose we have colocated tables `Table1` and `Table2` sharing `Tablet0`, then behavior without the fix is the following:
1. Someone calls `MetaCache::LookupTabletByKey` for `Table1` and `partition_key=p`
2. `MetaCache` checks that it doesn’t have `TableData` for `Table1`, initializes `TableData` for `Table1` with the list of partitions for `Table1`, and sends an RPC to the master
3. Master returns tablet locations that contain tablet locations for both `Table1` and `Table2`, because they are colocated and share the same tablets set
4. `MetaCache` updates `TableData::tablets_by_partition` for `Table1`
5. Caller gets `Tablet0` as a response to `MetaCache::LookupTabletByKey`
6. Someone calls `MetaCache::LookupTabletByKey` for `Table2` and `partition_key=p`
7. `MetaCache` checks that it doesn’t have `TableData` for `Table2` and sends an RPC to the master

And with the fix, at step 4 `MetaCache` will also initialize `TableData` for `Table2` using the same partitions list which was used for `Table1` and will update `TableData::tablets_by_partition` for both tables. So, at step 7, `MetaCache` will have `TableData` for `Table2` and will respond with the tablet without RPC to the master.

- Fixed `MetaCache::ProcessTabletLocations` to reuse partitions list for co-located tables
- Added ClientTest.ColocatedTablesLookupTablet
- Moved most frequent VLOGS from level 4 to level 5 for `MetaCache`

Test Plan:
For ASAN/TSAN/release/debug:
```
ybd --gtest_filter ClientTest.ColocatedTablesLookupTablet -n 100 -- -p 1
```

Reviewers: mbautin, bogdan

Reviewed By: mbautin, bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D10755
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

1 participant