You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PartitionKeyRangeCache is using the AsyncCache and is maps a CollectionRoutingMap to a Container RID. When there is a split or other scenario it will use change feed to only get the new partition key ranges and create a new CollectionRoutingMap by updating the values returned in the change feed.
The issue is if an exception occurs like a timeout or some other transient failure the entire CollectionRoutingMap was removed from the cache. This means if any transient failure occurs the SDK has to recreate the entire CollectionRoutingMap by reading the entire changefeed again. This now means that during a split if a transient issue occurs all requests are blocked until the new CollectionRoutingMap is created.
This problem is worse because user can currently disable 429 retries as described in #3055. If a 429 is hit the CollectionRoutingMap will be removed from the cache and it will to be built by reading all the ranges again.
Solution:
The PartitionKeyRangeCache should be converted to use the new NonBlockingAsyncCache which Address cache already uses.
Test Case:
If the container has 5 Partitions and a split occurs the 4 other partitions should always be accessible even if the call to get the new partitions fail or is slow.
The text was updated successfully, but these errors were encountered:
The PartitionKeyRangeCache is using the AsyncCache and is maps a CollectionRoutingMap to a Container RID. When there is a split or other scenario it will use change feed to only get the new partition key ranges and create a new CollectionRoutingMap by updating the values returned in the change feed.
The issue is if an exception occurs like a timeout or some other transient failure the entire CollectionRoutingMap was removed from the cache. This means if any transient failure occurs the SDK has to recreate the entire CollectionRoutingMap by reading the entire changefeed again. This now means that during a split if a transient issue occurs all requests are blocked until the new CollectionRoutingMap is created.
This problem is worse because user can currently disable 429 retries as described in #3055. If a 429 is hit the CollectionRoutingMap will be removed from the cache and it will to be built by reading all the ranges again.
Solution:
The PartitionKeyRangeCache should be converted to use the new NonBlockingAsyncCache which Address cache already uses.
Test Case:
If the container has 5 Partitions and a split occurs the 4 other partitions should always be accessible even if the call to get the new partitions fail or is slow.
The text was updated successfully, but these errors were encountered: