Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage-collect orphaned stacks #13676

Open
sandwi opened this issue Mar 18, 2021 · 20 comments
Open

Garbage-collect orphaned stacks #13676

sandwi opened this issue Mar 18, 2021 · 20 comments
Labels
@aws-cdk/core Related to core CDK functionality effort/large Large work item – several weeks of effort feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. feature-request A feature should be added or improved. p1

Comments

@sandwi
Copy link

sandwi commented Mar 18, 2021

❓ General Issue

The Question

cdk deploy leaves stack deployed if a stack is removed from application post deployment and cdk app is deployed again.

Steps to Reproduce

  1. Create a CDK app with two stacks
  2. Run: cdk deploy --all
  3. Delete one stack from the CDK app
  4. Run: cdk deploy --all

Expected Results
The deleted stack is gone.

Actual Results
The deleted stack exists and has been "leaked".

i.e. the stacks are procedural rather than declarative. This doesn't fit into a managed pipeline model as stacks will accidentally leak and have to be manually cleaned, while the behavior makes sense if you're familiar with the implementation details of CDK, this isn't the right developer experience.

Environment

  • CDK CLI Version: 1.73.0
  • Module Version:
  • Node.js Version: 12.x and 15.2.0
  • OS: macOS Mojave and CodeBuild Ubuntu (not used other OSes)
  • Language (Version): TypeScript (4.0.7)

Other information

@sandwi sandwi added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Mar 18, 2021
@peterwoodworth peterwoodworth self-assigned this Mar 22, 2021
@peterwoodworth
Copy link
Contributor

Hey @sandwi, thank you for the submission.

This is presently intended behavior, is this something you would like to see changed in the future?

@peterwoodworth peterwoodworth added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Mar 22, 2021
@sandwi
Copy link
Author

sandwi commented Mar 23, 2021

@peterwoodworth The use case is that of a managed pipeline-as-a-service where developers own the cdk app and push changes and pipeline executes the changes. The developer may delete a stack from CDK app (source code), making an assumption that CDK will be able to deduce (a) that stack(s) has/have been deleted from CDK app, hence in generated cloud assembly but a prior version of the CDK app had the deleted stack and is still deployed, (b) thus cdk deploy of this new version should delete the deployed stack(s) that are no longer part of the app. That is the intent of the developer. We understand that there are interesting scenarios with resource retention policy and how to reason about them. Perhaps we should provide guidelines and best practices around these scenarios. The stack deletion is something of value in development and test phases when developers are experimenting with new features. It is also of value in Blue/Green deployment. This is like versioned cdk app where stack(s) may change between versions. Git (or any vcs) has the version tags at source level. Its the cloud assembly versioning and deployed resources (by a version of assembly) pieces that need enhancements.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 24, 2021
@NGL321 NGL321 added effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. p2 and removed guidance Question that needs advice or information. labels Mar 24, 2021
@NGL321
Copy link
Contributor

NGL321 commented Mar 24, 2021

Hey @sandwi,

You make a solid case for the feature, but I'm not entirely certain about timeframe. I will ask around about prioritization, but if you really believe in this, best way to ensure it gets a more timely resolution is to increase engagement on this issue!

😸 😷

@NGL321 NGL321 added the package/tools Related to AWS CDK Tools or CLI label Mar 24, 2021
@NGL321 NGL321 assigned rix0rrr and eladb and unassigned peterwoodworth and rix0rrr Mar 24, 2021
@NGL321 NGL321 added @aws-cdk/core Related to core CDK functionality and removed package/tools Related to AWS CDK Tools or CLI labels Mar 24, 2021
@eladb
Copy link
Contributor

eladb commented Mar 25, 2021

This is a pretty tricky challenge. When a stack is deleted from a CDK app, the app no longer knows that the stack ever existed and therefore it cannot delete.

The only way I could think this can be solved is by keeping some inventory in the environment of all CDK stacks (e.g. by tagging them) and then running some kind of garbage collection for each environment after deployment which will identify all registered stacks that are not accounted for.

Reassigning to @rix0rrr for effort estimation and prioritization.

@eladb eladb assigned rix0rrr and unassigned eladb Mar 25, 2021
@eladb eladb changed the title (aws-cdk): AWS CDK deploy "leaks" stacks Removed stacks stay orphaned after deployment Mar 25, 2021
@eladb eladb changed the title Removed stacks stay orphaned after deployment Garbage collect orphaned stacks Mar 25, 2021
@eladb eladb changed the title Garbage collect orphaned stacks Garbage-collect orphaned stacks Mar 25, 2021
@ericzbeard
Copy link
Contributor

This could be another good use case for AppRegistry integration.

@eladb
Copy link
Contributor

eladb commented Mar 29, 2021

This could be another good use case for AppRegistry integration.

I am intrigued. What do you mean by that?

@ericzbeard
Copy link
Contributor

ericzbeard commented Mar 29, 2021

We have discussed using AppRegistry as a way to store a stable "AppId" for CDK apps, but this use case is more expansive. If we register with the Service Catalog App Registry, we could reference the AppRegistry stack as a picture of the entire app as it exists before we do the next deployment.

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-servicecatalogappregistry-resourceassociation.html

If one of the stacks in an app is deleted from the code, it will show up in the diff for the AppRegistry stack, which we can detect during synthesis, and take action on after successful deployment of the modified app.

It also has the side benefit of giving us a "free" user interface for CDK applications in the console.

Something like:

const app = new cdk.App({
    appRegistry: {
        register: true, 
        deleteOrphanedStacks: true,
    },
});

@ericzbeard ericzbeard added the feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. label Apr 1, 2021
@rix0rrr rix0rrr removed their assignment Jun 3, 2021
@tomas-mazak
Copy link
Contributor

tomas-mazak commented May 9, 2022

Hi, I would like to revive this topic. In our company, we use CDK extensively and especially with CDK pipelines (we have 120+ pipelines ATM) the orphan stacks are becoming an issue.

The advantage of CDK pipeline is that once you instantiate it, you can forget about applying CDK changes - you simply push to git and changes are applied by a pipeline. However, if you add a pipeline "stage" and later remove it (or just rename it), the stage stacks will stay up and will be "orphaned".

We tackle this by assigning a set of tags to each stack that uniquely identify the CDK app. We provide a tool to application owners that synthesizes the app and gets the list of "declared" stacks, then retrieves the "running" stacks - stacks with corresponding tags from CFN API - and deleting "running" that are not "declared".

However, this tool needs to be run manually by application owners. Also, I believe many people must be having the same issue. Would it make sense to resolve this within CDK?

The AppRegistry idea looks interesting. Potentially, simpler approach with tagging and UUIDs would work too:

  • on first deploy, CDK app is assigned an auto-generated "app id" - UUID for example
  • the app id is then stored in CDK context for persistence
  • all stacks deployed by the app are tagged with for example aws-cdk:app-id=<app-id>

Then, a new CLI can be provided. I can see many options to implement this, for example cdk apply that would deploy new/changed stacks and delete stacks that are no longer "declared". Or simply cdk deploy --clear-orphans. or cdk destroy --orphans-only...

@eladb what do you think? If there's an interest, I would be happy to look into this

@zhelyan
Copy link

zhelyan commented Nov 8, 2022

The only way I could think this can be solved is by keeping some inventory in the environment of all CDK stacks

An App is a container for one or more stacks: it serves as each stack's scope

Can we track the App state in a dedicated "app" stack?

Edit: Sorry, I am not being clear. Make App deploy an "app" stack which then uses e.g. #13676 (comment) to track the state of the stacks included in the app. In future, the dedicated app stack can also wrap any app management features e.g. creating a resource group for the app, running periodic drift detection on the app stacks etc.

@aldex32
Copy link

aldex32 commented Feb 10, 2023

Unfortunately, after 2 years this issue is still opened. As far as I can understand, the App is a construct that wraps a set of other constructs, for instance the Stack. So I would say, it is the responsibility of the construct to know what resources to provision and what to delete. What about having something like a beforeDeploy hook to list the current deployed stacks and an afterDeploy to delete the orphaned ones?

@zhelyan
Copy link

zhelyan commented Feb 11, 2023

I don't think this would work. The problem is that unlike e.g terraform the cdk does not maintain state. I guess it could use a "metadata" stack for that i.e. a state stack that isn't directly editable by end users.

@danielsharvey
Copy link
Contributor

Just adding my voice. This feature remains of significant value to CDK users, as explained above.

Does AWS have a best practice to recommend as an alternative?

For my 2c, I could see an CDK construct specifically for the purpose of deleting stacks:

// previous definition, now removed by developer
// const networkingStack = new NetworkingStack(app, 'networking-stack', props);

// replace with
const networkingStack = new DeleteThisStack(app, 'networking-stack', props);

This construct would delete the stack if it exists and issue a warning otherwise.

@gshpychka
Copy link
Contributor

Does AWS have a best practice to recommend as an alternative?

Probably nested stacks

@Kirizan
Copy link

Kirizan commented Feb 21, 2024

Is there a reason the pipeline construct can't just have a variable that tells the mutating step to replace Execute a change set to delete a stack and then rerun the pipeline? Just like it would if someone added or removed a stage?

Yeah, it would take more time to destroy the package, but then it would actually do what cdk destroy is supposed to do, destroy the app.

@gshpychka
Copy link
Contributor

Is there a reason the pipeline construct can't just have a variable that tells the mutating step to replace Execute a change set to delete a stack and then rerun the pipeline? Just like it would if someone added or removed a stage?

Yeah, it would take more time to destroy the package, but then it would actually do what cdk destroy is supposed to do, destroy the app.

It doesn't know which stack to delete - CDK does not keep state. What do you mean by "just like it would if someone [..] removed a stage"? Because the pipeline would not destroy the stack(s) in that stage for the same reason.

@Kirizan
Copy link

Kirizan commented Feb 22, 2024

It doesn't need to keep a state, it can use the existing code pipeline API to see the current state. It should also be able to use cdk diff to see if there is going to be a change to the pipeline.

Once the mutate step knows there is now going to be a change in the pipeline, rather then just removing that stage, it could first change that stage to delete a stack and run that stage. When that has all been completed, the mutate step would then remove the stage like it would normally.

@github-actions github-actions bot added p1 and removed p2 labels Apr 28, 2024
Copy link

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

@ricky-lim
Copy link

any update on this?

@aviv-imijatovic
Copy link

+1
Affecting us as well.

@para0056
Copy link

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/core Related to core CDK functionality effort/large Large work item – several weeks of effort feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. feature-request A feature should be added or improved. p1
Projects
None yet
Development

No branches or pull requests