Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

403, substatus 5301 #3110

Closed
festivus opened this issue Mar 22, 2022 · 6 comments
Closed

403, substatus 5301 #3110

festivus opened this issue Mar 22, 2022 · 6 comments

Comments

@festivus
Copy link

We are continuously addressing and improving the SDK, if possible, make sure the problem persist in the latest SDK version.

Describe the bug
We are using RBAC for data plane access. We have microservices running inside service fabric on a vm scaleset. The scaleset has a system managed identity. We have granted this managed identity the 00000000-0000-0000-0000-000000000002 role definition id. We create a single cosmos client for each instance of a service. Sometimes a service will fail on all calls to cosmos with the following error:

Response status code does not indicate success: Forbidden (403); Substatus: 5301; ActivityId: XXXX; Reason: (Request blocked by Auth XXXX : Request is blocked because principal [XXXX] does not have required RBAC permissions to perform action [Microsoft.DocumentDB/databaseAccounts/readMetadata] on resource [3TxEAO5CNTo=]. Learn more: https://aka.ms/cosmos-native-rbac.
ActivityId: XXXX, Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Windows/10.0.17763 cosmos-netstandard-sdk/3.24.1);

But other services running on the same machine in the same scaleset work just fine. If we shut down the service with the problem and start it back up it usually fixes the problem.

So,

  1. Not sure why cosmos is not getting a proper token to begin with.
  2. When cosmos refreshes the token behind the scenes, why isn't a valid token retrieved.

To Reproduce
This is very hard to reproduce, I can't publish a repo where you can reproduce this.

Expected behavior
We expect cosmos to get a token from AAD (either initially or on refresh) with the proper permissions.

Actual behavior
Sometime AAD does not initially return the proper permissions and when cosmos does a token refresh behind the scenes that it gets a proper token. I'm not sure if the refresh is only updating the expireson or if it gets a new token with the proper permissions.

Environment summary
SDK Version: 3.24.1
OS Version (e.g. Windows, Linux, MacOSX) Windows Server Core 2019

Additional context
Add any other context about the problem here (for example, complete stack traces or logs).

@j82w
Copy link
Contributor

j82w commented Mar 22, 2022

Do you have the full exception.ToString()? Looking for the diagnostics from the SDK to better understand what the client was doing.

@festivus
Copy link
Author

Just the call stack. But that exception itself isn't really the issue. It's throwing this error because the token it has says it does not have the proper permissions. The issue is why does the token say it doesn't have the proper permissions when it in fact does.

Microsoft.Azure.Cosmos.CosmosException:
at Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode (Microsoft.Azure.Cosmos.Client, Version=3.24.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at Microsoft.Azure.Cosmos.CosmosResponseFactoryCore.ProcessMessage (Microsoft.Azure.Cosmos.Client, Version=3.24.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at Microsoft.Azure.Cosmos.CosmosResponseFactoryCore.CreateItemResponse (Microsoft.Azure.Cosmos.Client, Version=3.24.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at Microsoft.Azure.Cosmos.ContainerCore+d__541.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.24.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Microsoft.Azure.Cosmos.ClientContextCore+<RunWithDiagnosticsHelperAsync>d__391.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.24.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.Cosmos.ClientContextCore+d__30`1.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.24.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)

@ealsur
Copy link
Member

ealsur commented Mar 22, 2022

One scenario where the service might be having an issue is reported on #3092, are you by any chance deleting and recreating the container/database?

@festivus
Copy link
Author

One scenario where the service might be having an issue is reported on #3092, are you by any chance deleting and recreating the container/database?

No, we are not. It's something to do with the AAD token.

  1. AAD isn't returning the proper role assignments on the token.
  2. AAD is returning the proper role assignments on the token and cosmos doesn't think so.

I also don't understand why if AAD doesn't initially return the proper role assignments, why the cosmos refresh of the token doesn't either. This instance of the cosmos client will forever throw the 403 error until I restart the executable or create a new cosmos client.

@j82w
Copy link
Contributor

j82w commented Mar 23, 2022

Please upgrade to the latest SDK. The following bug was fixed and exists in 3.24.0. I don't think this will fix the root issue it, but it should fix the need to recreate the client.
#3027 Initialization: Fixes the SDK to retry if the initialization fails due to transient errors.

@festivus
Copy link
Author

Ok, I will try it. i'll close this for now because it'll take awhile to verify that it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants