Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting and recreating a container => stale container cache in other microservices #3092

Closed
Mortana89 opened this issue Mar 17, 2022 · 8 comments

Comments

@Mortana89
Copy link

Version: 3.23.0

Describe the bug
When deleting and recreating a container in one microservice, the other microservices still hold an 'invalid' reference to the container.
This because the cache contains a path to the container that doesn't exist anymore

To Reproduce
Delete and recreate a container, and execute a query on that container

Expected behavior
Collection cache should, after a miss (404) in the cache, re-resolve the container path by querying the Cosmos DB by collection name.

Actual behavior
404

Environment summary
SDK Version: 3.23.0
OS Version (e.g. Windows, Linux, MacOSX): windows

@sourabh1007 sourabh1007 added bug Something isn't working needs-investigation labels Mar 17, 2022
@j82w
Copy link
Contributor

j82w commented Mar 17, 2022

@Mortana89 are you sure you didn't try to do a request while the container was deleted? There are test validating this is handled correctly.

public async Task ContainterReCreateStatelessTest(bool operationBetweenRecreate, bool isQuery)

@Mortana89
Copy link
Author

Hi @j82w,
Sorry, I was wrong, I meant a 403 forbidden (using managed identities):
Response status code does not indicate success: Forbidden (403); Substatus: 5301; ActivityId: 44a1aa97-8ca5-4553-9a8e-8f35224b25c2; Reason: (Response status code does not indicate success: Forbidden (403); Substatus: 5301; ActivityId: 44a1aa97-8ca5-4553-9a8e-8f35224b25c2; Reason: (Response status code does not indicate success: Forbidden (403); Substatus: 5301; ActivityId: 44a1aa97-8ca5-4553-9a8e-8f35224b25c2; Reason: (Request blocked by Auth portal-p-sql : Request is blocked because principal [c75829b0-14ce-401b-8757-b3df28245bc7] does not have required RBAC permissions to perform action [Microsoft.DocumentDB/databaseAccounts/readMetadata] on resource [ZOtHAO0XKAA=]. Learn more: https://aka.ms/cosmos-native-rbac.
ActivityId: 44a1aa97-8ca5-4553-9a8e-8f35224b25c2, Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Windows/10.0.14393 cosmos-netstandard-sdk/3.23.1);););

@ealsur
Copy link
Member

ealsur commented Mar 18, 2022

This is not a case for the SDK to handle, a 403 / 5301 from the service is not a retry-able scenario, it means the service is rejecting the request. My hunch is that instead of returning 404, it is failing with 403. The resource ZOtHAO0XKAA= is the one that was deleted. I believe this is more of a service issue and requires a support ticket, @j82w?

@j82w
Copy link
Contributor

j82w commented Mar 18, 2022

I agree with @ealsur. Based on the error message the SDK did retry on the new container, but that principal did not have access to it yet.

@Mortana89
Copy link
Author

So to clarify, do you expect us to log something somewhere or?

Thanks!

@ealsur
Copy link
Member

ealsur commented Mar 18, 2022

Service side issues need to be reported to the service: https://aka.ms/azure-support

@Mortana89
Copy link
Author

Small update here from AZ Support;

This is a known issue that happens when client is using AAD auth + direct mode, and tries to access/modify data in a collection that has recently been deleted and recreated. We are working on a fix for this issue, but the fix is expected to take 1 or 2 months to fully deploy. In the meantime, we recommend you to workaround the issue using 1 of 2 ways:

- if the collection delete and recreate is uncommon, restarting the client should fix the issue.
- if the collection delete and recreate is a common scenario, consider switching to Gateway mode, which will not have this issue.

@ealsur ealsur removed the bug Something isn't working label Apr 11, 2022
@ealsur
Copy link
Member

ealsur commented May 26, 2022

Closing as related to service side issue and not SDK

@ealsur ealsur closed this as completed May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants