Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lambda gradual deployment fails under the load when runtime is updated from dotnetcore2 to dotnetcore3 #1641

Closed
zllvm opened this issue Jul 1, 2020 · 2 comments
Labels

Comments

@zllvm
Copy link

zllvm commented Jul 1, 2020

Description:

We update a .net application hosted on a lambda from .net core 2.1 to .net core 3.1 using the following cloudformation resource:

Lambda:
    Type: AWS::Serverless::Function
    Properties:
      Timeout: 40
      MemorySize: !Ref MemorySize
      Runtime: dotnetcore3.1
      Role: !GetAtt SomeRole.Arn
      Handler: some.handler
      CodeUri: 
        Bucket: !Ref Bucket
        Key: !Ref Key
      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5
      DeploymentPreference:
        Type: Canary10Percent5Minutes
        Alarms:
          - !Ref ErrorLevelGreaterThanZeroAlarm
        Hooks:
          PreTraffic: !Ref PreTrafficHook
      Environment:
        Variables:
          ASPNETCORE_ENVIRONMENT: !Ref Environment
          LOG_LEVEL: !Ref LogLevel
          CurrentVersion: !Ref Version 

Hence, the runtime was updated from dotnetcore2.1 to dotnetcore3.1.

This type of updated has been executed successfully for many of our services in production. However, while update a service that had a constant load on it (about 40 request per second), the canary deployment was failing during 10% of the traffic switch.
And the following error was logged in the cloudwatch logs:

It was not possible to find any compatible framework version
The specified framework 'Microsoft.AspNetCore.App', version '3.1.0' was not found.
- Check application dependencies and target a framework version installed at:
/var/lang/bin/
- Installing .NET Core prerequisites might help resolve this problem:
https://go.microsoft.com/fwlink/?LinkID=798306&clcid=0x409
- The .NET Core framework and SDK can be installed from:
https://aka.ms/dotnet-download
- The following versions are installed:

2.1.15 at [/var/lang/bin/shared/Microsoft.AspNetCore.App]
2.1.15 at [/var/lang/bin/shared/Microsoft.AspNetCore.App]

Failed to execute the Lambda function. The dotnet CLI failed to start with the provided deployment package. Please check CloudWatch logs for this Lambda function to get detailed information about this failure.: LambdaException

Apparently, it seems that a new version of the function had been still using the old runtime instead of switching to a new one during the canary deployment.

To resolve the issue, we have deployed manually a second lambda with an updated runtime to dotnetcore3.1 and switched all the traffic in API Gateway to the second lambda. Under this time we were able to update the original lambda with no issue and then we switched the traffic back.

Steps to reproduce the issue:

  1. Create a dotnetcore2.1 lambda with AWS::Serverless::Function and DeploymentPreference set to Canary10Percent5Minutes
  2. Load the lambda with the traffic >= 40req/s
  3. Try to update the runtime to dotnetcore3.1

Observed result:
Aforementioned error and cf stack rollback

Expected result:
Successful deployment.

@mokele
Copy link

mokele commented Jul 1, 2020

you might also be interested in this issue I raised this morning regarding the use of !Ref within environment variables when using AutoPublishAlias #1640

@jfuss
Copy link
Contributor

jfuss commented Mar 3, 2022

@zllvm Thanks for reporting the issue. This doesn't relate to SAM but does relate to AWS Lambda. In talking with some engineers on Lambda, we think this may be a race condition in updating the Function code and updating the Function's config (aka runtime). By the time this happen SAM has already run (SAM runs before resources are deployed in CloudFormation).

I did pass this along to the correct team internally but going to close this as there is no action on the SAM team.

@jfuss jfuss closed this as completed Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants