-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(aws-logs): log retention update can race against the target lambda's log group creation resulting in OperationAbortedException #15709
Comments
Easiest way to fix this is to update the Lambda to be resilient against this occurring. |
Easiest way to fix this is to update the Lambda to be resilient against this occurring. |
How would you recommend to approach this? Would the addition of the error code to this exception help? Or a global retry?
|
I woukd say retry on the specific error |
Please, could someone review the proposed fix? We have 2 bugs in packages/@aws-cdk/aws-logs/lib/log-retention-provider/index.ts
|
Fixes: #15709 When creating a lambda with log retention, CDK actually creates 2 lambda functions. The second lambda function alters log retention of the log group of the first lambda and the retention of its own log group. Because log group creation is asynchronous, the log retention lambda tries to pre-create both log groups to guarantee it has an object to work on. If a normal lambda execution also creates the related log group at the same time, an "OperationAbortedException:... Please retry" error is returned. The existing code handles this situation for log retention lambda but not for the first lambda. This fix adds the retry pattern to the general log group creation code. Also existing code had a bug: if OperationAbortedException is hit, the error is hidden but the retention policy is skipped and not actually applied. This fix addresses this bug as well. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
Includes fix for aws/aws-cdk#15709
* bump cdk to v1.124.0 Includes fix for aws/aws-cdk#15709 * update unit tests * match aws-cdk version in CI
@steved we still have this error when add logRetention. cdk version is 1.124.0 |
I am also experiencing this issue when using |
My team is hitting this with version We have some bash scripting that allows us to deploy multiple CDK apps in parallel that share some base infrastructure and we are very consistently hitting this issue. It actually seems worse with the new fix because before it would fail silently and let the deployment continue, but now it fails the whole CDK deployment due to the Lambda function now throwing errors instead of swallowing them. I haven't captured logs from the Lambda yet, but I am seeing failures like this in the
This leads me to believe the retry logic isn't being hit at all, since the code for that looks like it should throw a new error with the message |
We're also getting this error with Here's the stack we're getting:
|
We've been hitting this as well, "@aws-cdk/core": "1.126.0" EDIT: Same problem with
|
I've opened a new ticket since many people have been reporting the issue seems to be happening still. I believe I've made some findings in regards to why it might still be causing problems: #17546 |
When a lambda runs a log group and stream are created if they do not already exist. When creating a CDK Lambda function, the log retention update can then race with the background log group creation resulting in an
OperationAbortedException
error.Reproduction Steps
Hard to reproduce reliably as it relies on a race condition between AWS and the log retention lambda.
What did you expect to happen?
Log retention policy is properly applied.
What actually happened?
Environment
Other
AWS log group creation cloudtrail event:
Log retention API call:
#2237 seems to be a fix for the same potential error on the actual log retention lambda's log group, but not the target.
The text was updated successfully, but these errors were encountered: