-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@aws-cdk/aws-s3: notification lambda function does not handle transient s3 errors well #16811
Comments
Thanks for reporting this. I am unassigning and marking this issue as We use +1s to help prioritize our work, and are happy to revaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization. |
related to #16811, there is sometimes an issue when multiple operations are performed on the same bucket. To get around this in the integration test I created an additional bucket for the import test. ---- ### All Submissions: * [ ] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/master/INTEGRATION_TESTS.md)? * [ ] Did you use `cdk-integ` to deploy the infrastructure and generate the snapshot (i.e. `cdk-integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
related to aws#16811, there is sometimes an issue when multiple operations are performed on the same bucket. To get around this in the integration test I created an additional bucket for the import test. ---- ### All Submissions: * [ ] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/master/INTEGRATION_TESTS.md)? * [ ] Did you use `cdk-integ` to deploy the infrastructure and generate the snapshot (i.e. `cdk-integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
It doesn't look like this has been addressed. Commenting to keep issue open |
Hi team, would you help to have a look on this |
This problem comes nearly every time. We already spend a long time on it but seems we can do nothing but wait for this issue fixed by your team |
The biggest problem with this is due to the mentioned issue #24762 which was closed as a duplicate of here. There's a race condition, which makes creating and tearing down these resources in an automated / E2E manner difficult. |
Here is some reproduction code. cc: @pahud @otaviomacedo from aws_cdk import (
RemovalPolicy,
aws_lambda as _lambda,
aws_s3 as _s3,
aws_iam as iam,
aws_s3_notifications,
App, Duration, Stack
)
class TestStackForAWS(Stack):
def __init__(self, app: App, id: str) -> None:
super().__init__(app, id)
function = _lambda.Function(
self, "function",
code=_lambda.InlineCode(
"exports.handler = function (event, context) {context.succeed('hello world');};"
),
handler="index.handler",
timeout=Duration.seconds(300),
runtime=_lambda.Runtime.NODEJS_18_X,
)
bucket = _s3.Bucket(self, "s3bucket", removal_policy=RemovalPolicy.DESTROY)
notification = aws_s3_notifications.LambdaDestination(function)
bucket.add_event_notification(_s3.EventType.OBJECT_CREATED_PUT, notification)
bucket.add_event_notification(_s3.EventType.OBJECT_CREATED_POST, notification)
bucket.add_to_resource_policy(
iam.PolicyStatement(
actions=["s3:*"],
principals=[iam.AnyPrincipal()],
resources=[
bucket.bucket_arn,
bucket.arn_for_objects('*')
]
)
)
app = App()
TestStackForAWS(app, "ExampleStack")
app.synth() Attempting to
If you comment out the Crucially, the same thing happens when we are deleting stacks that have both a Policy and Notification configured. The delete fails as described in #24762. I hope this helps! |
Reprioritizing as p1, since there is no workaround for this. Note to whoever starts working on this, maybe we should use wait_until_exists to only put the notification configuration after we are sure the bucket has been created. |
Thank you @otaviomacedo. Is there any kind of guess on SLA for a p1 of this nature? It would be good to be able to set expectations internally. |
Hi
I am adding the policy after creating the bucket
Any update on the issue or any workarounds? |
For what it's worth, adding a dependency between the bucket policy and the notifications worked for me.
|
Same issue intermittently but when deleting a stack that contains buckets with both a policy and notifications. Doesn't seem to happen on stack creation. The stack deletion is being called from a lambda. Before calling the stack deletion I also tried deleting the bucket notifications, and waiting until they were all successfully deleted, still getting the intermittent issue and cloudformation is still triggering a notification deletion.
This workaround worked for me as well, perhaps in addition to better error handling, or as a more immediate fix, these dependencies could be picked up by cdk automatically? |
Proposed fix by adding dependency on Bucket Policy: #30053 |
|
1 similar comment
|
What is the problem?
A deployment failed for our team using a CDK defined infrastructure package due to a failure in the bucket notification lambda function. After looking at the logs in our account, it appears that it was caused due to missing error handling within the bucket notification lambda function. The line that failed was this one, which is called here.
Error logs:
Reproduction Steps
Have multiple notifications attempt to update the bucket? I am not quite sure!
What did you expect to happen?
The bucket notification succeeds. To do this, adding an error handling + retry method here would be great. For transient errors like this, it seems like having more robust error handling and retrying transient errors would be very beneficial. That way all of the consumers of AWS CDK get a smoother experience here
What actually happened?
The bucket notification failed, causing the stack to get into an
UPDATE_ROLLBACK_FAILED
stateCDK CLI Version
1.25.0
Framework Version
n/a
Node.js Version
n/a
OS
linux
Language
Python
Language Version
3.8
Other information
List of events:
The text was updated successfully, but these errors were encountered: