MetaCache can issue excessive RPCs when looking for co-located table tablet by key #7413

ttyusupov · 2021-03-01T14:58:26Z

As a result of the fix for #6890 there was introduced a potential perf issue for the first lookup of tablet by key for colocated tables. Instead of sending 1 RPC when doing the first lookup for colocated table and then reusing the result for all tables co-located with the first one, MetaCache is sending 1 RPC for each table (but only for the first time, after that it reuses results).

…itions list for co-located tables Summary: As a result of the fix for #6890 there was introduced a potential perf issue for the first lookup of tablet by key for colocated tables. Instead of sending 1 RPC when doing first lookup for colocated table and then reusing the result for all tables co-located with the first one, MetaCache is sending 1 more RPC each time another table co-located with the first one is queried to resolve tablet by key. Since all colocated tables share the same tablet, we can cache the locations on the first RPC to any co-located table and then reuse the result for any MetaCache::LookupTabletByKey calls for any other table co-located with the one already queried. Suppose we have colocated tables `Table1` and `Table2` sharing `Tablet0`, then behavior without the fix is the following: 1. Someone calls `MetaCache::LookupTabletByKey` for `Table1` and `partition_key=p` 2. `MetaCache` checks that it doesn’t have `TableData` for `Table1`, initializes `TableData` for `Table1` with the list of partitions for `Table1`, and sends an RPC to the master 3. Master returns tablet locations that contain tablet locations for both `Table1` and `Table2`, because they are colocated and share the same tablets set 4. `MetaCache` updates `TableData::tablets_by_partition` for `Table1` 5. Caller gets `Tablet0` as a response to `MetaCache::LookupTabletByKey` 6. Someone calls `MetaCache::LookupTabletByKey` for `Table2` and `partition_key=p` 7. `MetaCache` checks that it doesn’t have `TableData` for `Table2` and sends an RPC to the master And with the fix, at step 4 `MetaCache` will also initialize `TableData` for `Table2` using the same partitions list which was used for `Table1` and will update `TableData::tablets_by_partition` for both tables. So, at step 7, `MetaCache` will have `TableData` for `Table2` and will respond with the tablet without RPC to the master. - Fixed `MetaCache::ProcessTabletLocations` to reuse partitions list for co-located tables - Added ClientTest.ColocatedTablesLookupTablet - Moved most frequent VLOGS from level 4 to level 5 for `MetaCache` Test Plan: For ASAN/TSAN/release/debug: ``` ybd --gtest_filter ClientTest.ColocatedTablesLookupTablet -n 100 -- -p 1 ``` Reviewers: mbautin, bogdan Reviewed By: mbautin, bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10755

…use partitions list for co-located tables Summary: As a result of the fix for yugabyte#6890 there was introduced a potential perf issue for the first lookup of tablet by key for colocated tables. Instead of sending 1 RPC when doing first lookup for colocated table and then reusing the result for all tables co-located with the first one, MetaCache is sending 1 more RPC each time another table co-located with the first one is queried to resolve tablet by key. Since all colocated tables share the same tablet, we can cache the locations on the first RPC to any co-located table and then reuse the result for any MetaCache::LookupTabletByKey calls for any other table co-located with the one already queried. Suppose we have colocated tables `Table1` and `Table2` sharing `Tablet0`, then behavior without the fix is the following: 1. Someone calls `MetaCache::LookupTabletByKey` for `Table1` and `partition_key=p` 2. `MetaCache` checks that it doesn’t have `TableData` for `Table1`, initializes `TableData` for `Table1` with the list of partitions for `Table1`, and sends an RPC to the master 3. Master returns tablet locations that contain tablet locations for both `Table1` and `Table2`, because they are colocated and share the same tablets set 4. `MetaCache` updates `TableData::tablets_by_partition` for `Table1` 5. Caller gets `Tablet0` as a response to `MetaCache::LookupTabletByKey` 6. Someone calls `MetaCache::LookupTabletByKey` for `Table2` and `partition_key=p` 7. `MetaCache` checks that it doesn’t have `TableData` for `Table2` and sends an RPC to the master And with the fix, at step 4 `MetaCache` will also initialize `TableData` for `Table2` using the same partitions list which was used for `Table1` and will update `TableData::tablets_by_partition` for both tables. So, at step 7, `MetaCache` will have `TableData` for `Table2` and will respond with the tablet without RPC to the master. - Fixed `MetaCache::ProcessTabletLocations` to reuse partitions list for co-located tables - Added ClientTest.ColocatedTablesLookupTablet - Moved most frequent VLOGS from level 4 to level 5 for `MetaCache` Test Plan: For ASAN/TSAN/release/debug: ``` ybd --gtest_filter ClientTest.ColocatedTablesLookupTablet -n 100 -- -p 1 ``` Reviewers: mbautin, bogdan Reviewed By: mbautin, bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10755

ttyusupov added the kind/bug This issue is a bug label Mar 1, 2021

ttyusupov self-assigned this Mar 1, 2021

ttyusupov closed this as completed Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetaCache can issue excessive RPCs when looking for co-located table tablet by key #7413

MetaCache can issue excessive RPCs when looking for co-located table tablet by key #7413

ttyusupov commented Mar 1, 2021

MetaCache can issue excessive RPCs when looking for co-located table tablet by key #7413

MetaCache can issue excessive RPCs when looking for co-located table tablet by key #7413

Comments

ttyusupov commented Mar 1, 2021