Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Azure Function with UserAssigned ManagedIdentity has a 16% chance to result in Azure.Identity.CredentialUnavailableException #10238

Open
jsquire opened this issue Jun 21, 2024 · 4 comments

Comments

@jsquire
Copy link
Member

jsquire commented Jun 21, 2024

Issue Transfer

This issue has been transferred from the Azure SDK for .NET repository, #44693.

Please be aware that @nols-neulsen is the author of the original issue and include them for any questions or replies.

Azure SDK triage

The error indicates that the local managed identity endpoint on the host is unavailable or inaccessible to HTTP traffic when the application starts running and the Identity library attempts to acquire a token. This is not something that the credential or the application has insight into nor influence over. This requires investigation of host environment.

Details

Describe the bug

I have a Windows hosted Function App (Consumption plan) with a single HTTP trigger function.
This function will initialize an ArmClient, using ManagedIdentityCredential, to spawn Container App Jobs.
From a test (902 invocations) this function only succeeds 84% of the time, the other 16% fails due to Azure.Identity.CredentialUnavailableException.
Running locally, everything works 100% of the time if I provide a AzureCliCredential, VisualStudioCredential (with Sync is active) seems to also not work all the time.

Function App:

  • net8.0
  • dotnet-isolated
  • AZURE_CLIENT_ID = obfuscated
  • ...

Packages:

<PackageReference Include="Azure.Identity" Version="1.12.0" />
<PackageReference Include="Azure.ResourceManager.AppContainers" Version="1.1.1" />
<PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.22.0" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http" Version="3.2.0" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http.AspNetCore" Version="1.3.2" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.17.2" />
<PackageReference Include="Microsoft.ApplicationInsights.WorkerService" Version="2.22.0" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.ApplicationInsights" Version="1.2.0" />

User Assigned Managed Identity role assignments:

"assignableScopes": [
    "/subscriptions/<obfuscated>",
    "/subscriptions/<obfuscated>/resourceGroups/<obfuscated>"
],
"permissions": [
    {
        "actions": [
            "Microsoft.Resources/subscriptions/read",
            "Microsoft.Resources/subscriptions/resourceGroups/read",
            "microsoft.app/jobs/read",
            "microsoft.app/jobs/stop/action",
            "microsoft.app/jobs/start/action"
        ],
        "notActions": [],
        "dataActions": [],
        "notDataActions": []
    }
]

Code:

    try
    {
        var userManagedIdentityId = Environment.GetEnvironmentVariable("AZURE_CLIENT_ID"); ArgumentException.ThrowIfNullOrEmpty(userManagedIdentityId);
        var resourceIdString = Environment.GetEnvironmentVariable(...); ArgumentException.ThrowIfNullOrEmpty(resourceIdString);
        var environment = Environment.GetEnvironmentVariable("Environment");
        
        ...
        var resourceId = new ResourceIdentifier(resourceIdString);
        var subscriptionId = resourceId.SubscriptionId;

        ArmClient armClient;
        switch (environment)
        {
            case "NPRD":
                ...
                armClient = new ArmClient(new ManagedIdentityCredential(userManagedIdentityId), subscriptionId);
                break;
            case "CN":
                ...
                armClient = new ArmClient(
                    new ManagedIdentityCredential(userManagedIdentityId, new TokenCredentialOptions { AuthorityHost = AzureAuthorityHosts.AzureChina }), 
                    subscriptionId, 
                    new ArmClientOptions { Environment = ArmEnvironment.AzureChina });
                break;
            default:
                ...
                armClient = new ArmClient(new DefaultAzureCredential(), subscriptionId); // Pick any available credential, info https://learn.microsoft.com/en-us/dotnet/api/azure.identity.defaultazurecredential?view=azure-dotnet
                break;
        }

        var caj = armClient.GetContainerAppJobResource(resourceId);
        ...
        var template = ...;
        await caj.StartAsync(Azure.WaitUntil.Started, template);

        return ...;
    }
    catch (Exception ex)
    {
        _logger.LogInformation("{Message}", ex.Message);
        _logger.LogInformation("{StackTrace}", ex.StackTrace);
        throw;
    }
}

Error:

Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. 
Multiple attempts failed to obtain a token from the managed identity endpoint.

System.Net.Sockets.SocketException (10013): An attempt was made to access a socket in a way forbidden by its access permissions.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)

Expected behavior

Retrieving the credential succeeds 100%

Actual behavior

In 16% of the cases the execution fails due to Azure.Identity.CredentialUnavailableException

Reproduction Steps

Hosting info and code provided in bug description

@1oglop1
Copy link

1oglop1 commented Aug 1, 2024

@jsquire is this related to? #8037
I just wasted over 8 hours debugging why my user-assigned identity does not have permissions, until I "randomly" stumbled up on this.

https://github.com/Azure/azure-sdk-for-js/blob/65faad76f8091d2e1ce7deca3b79e030347f93ea/sdk/identity/identity/samples/AzureIdentityExamples.md?plain=1#L146-L164

As mentioned, I need to set the property managedIdentityClientId or AZURE_CLIENT_ID variable to use the managed identity.

IMO I'd have much better dev experience if I did not need to fiddle with any environment variables or anything during new identity.DefaultAzureCredential. Having written over a thousand AWS Lambdas and GCP functions, I expected the environment to contain all necessary data so that the SDK could "just work".
I'd expect the requirement to set the variable only in case, there is more than one identity assigned and the ability to retrieve the necessary IDs from the runtime, similarly to https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service?tabs=windows.

PS. In case this is not the right place, please redirect me.

@jsquire
Copy link
Member Author

jsquire commented Aug 5, 2024

@1oglop1 : You'll need to address that question to a member of the Functions team, who own the Functions host environment. This is the correct repository for those conversations, which is why I transferred this issue here.

@stebet
Copy link

stebet commented Jan 7, 2025

I'm having the same problem but just with a System Managed Identity. Works fine on a staging slot but fails intermittently on the production slot with this exception:

Azure.RequestFailedException: An attempt was made to access a socket in a way forbidden by its access permissions. (169.254.169.254:80)
 ---> System.Net.Http.HttpRequestException: An attempt was made to access a socket in a way forbidden by its access permissions. (169.254.169.254:80)

This endpoint is unreachable and it should be using the one defined in the IDENTITY_ENDPOINT environment variable (which is set) but for some intermittent requests it just doesn't seem to use that variable.

@stebet
Copy link

stebet commented Jan 10, 2025

@richlander, @liliankasem, @fabiocav Can we get someone to look at this? This is preventing function apps from using more secure methods (managed identities) to connect to other Azure services instead of user/pass connection strings.

I'm running a pretty big function app on a consumption plan using .NET 9 Isolated workers which is configured for System Managed Identity, and random instances of the function fail to run SQL queries since the required function host environment variables do not seem to be properly propagated to the function process.

In this case specifically the IDENTITY_ENDOINT variable which is configured when the managed identity is set up (not by us, this is some system managed env variable), and it should exist in the app process as well so for example the Microsoft.Data.SqlClient is able to fetch a token to connect to an Azure SQL Server.

In our function startup we run this snippet to see if the variable is missing and we get this error logged for some function processes and some not so something is broken during host/process startup here.

if (string.IsNullOrEmpty(Environment.GetEnvironmentVariable("IDENTITY_ENDPOINT")))
{
  logger.LogError("Required environment variable IDENTITY_ENDPOINT is missing! We will be unable to use System Managed Identity connections to external services!");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants