-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow filtering out IP exhausted Subnets during subnet selection [ec2, rds, VpcLookup] - SubnetSelection #8310
Comments
Added an Edit addendum to the description above after re-reading some docs and playing around a bit more with CDK context. |
I'm happy you've been able to work out your use case for your particular organization. Having said that, I think functionality like this is great for organizations that need it, but it's not necessarily something that should be ported into the base CDK. Instead, the CDK should probably be providing mechanisms for you to be able to plug your own context providers into. As for the specific use case: I'm a little concerned that you would need to keep on refreshing the context which says "yes IP space available/no IP space not available". Also seems there needs to be a way to be able to say "no, IP space not available, HOWEVER one of the IPs in there was mine already so that's fine". I guess by fact of having recorded a subnet with (Obligatory note that this will break if you |
All valid points. Perhaps another underlying issue worth pointing out that leads to this need is "well, everyone picked our us-east-1a subnet, and now it's full while our other two subnets are basically empty." That's such a psychologically predictable pattern that AWS saw it coming with AZ selection. BYO context provider is interesting. It feels potentially heavy-handed, but I suppose that depends entirely on the implementation. You probably have a much better vision of it than I. I don't actually imagine myself refreshing that often. Generally, once a subnet fills up, it stays full, with the exception of 1 or 2 IPs that fluctuate from stack deletes and redeploys. There's definitely a chance that I steal an IP while someone is deleting a stack, or that I simul-synth alongside someone else. In both situations, someone "loses" and has to reset their context. Those are the situations under which I'd have to refresh context again. But in theory, I shouldn't have to refresh again until I encounter a conflict again. My thought is that just the presence of a subnet in context alongside a successful deployment is enough information to say that one of those subnet IPs is mine. Even if there are 0 IPs available in reality, my context hasn't changed so my deployment should proceed without issue. Managing your context according to guidelines would be critical in this case to avoid fluctuating subnet selections. It just feels bad to make devs start defining specific subnets for a group after only 1 Subnet runs out of IPs. Especially if you decide to expand the existing group with new subnets, rather than adding a whole new group. 1 subnet running out of IPs effectively means I can no longer select subnets using subnetGroupName or subnetType. This is even more painful in account configurations like ours where we have non-prod and prod accounts. If I have to start hardcoding subnets, then all of a sudden I now have to do it for all accounts. I'd love to just be able to rely on there being standard subnet groups available to me. A context reset every once in a while to resolve a conflict is totally worth the tradeoff in my mind. Even being able to set an IP limit threshold that's used to filter out subnets would mean new stacks would be less likely to swipe IP addresses during existing stack delete's and redeploys. |
Perhaps I'm talking myself further into the idea of a custom context provider where rules like this could be defined. 😄 Out of curiosity, where do you think an appropriate entry point would be for that custom context provider? |
It would need a lot of validation around it, and it would be a pretty big undertaking too. The simplest mechanics I could think of is "a context provider is an NPM module", so you would write one, and your Construct that uses it would write the name of the NPM module into the manifest, so that the CLI can go and fetch the module and have it run the query. I'm going to make a call and say it needs to include a version specifier too, to be safe. (*) But I'm just spitballing off the top of my head here, and we'd need to create team buy-in first. Seems like a chunky enough feature that it'd probably be a good idea for someone to write an RFC first. I'd do it myself but I'm a little swamped at the moment...
|
Oof I realized I totally misspoke above. I was describing the situation as though the subnets had been filtered out during the lookup, which means they'd never end up in your context or would be removed during a context reset, which is obviously not what the implementation I have does. 🤦This would make for more stable context. I didn't go that route because I didn't really understand CDK context at the time. |
Does this being part of the Lookup layer change your mind at all about the feature? It still seems like putting resources into an unavailable subnet is a universal concern. |
The CDK's goal is to be a platform for building these kinds of features as much (or more) as delivering them out of the box. Especially if the feature is going to have caveats and there are multiple reasonable implementations imaginable, this is something that should be solved in the broader CDK ecosystem rather than a single "one-size-fits-all" solution that we are supposed to vend and maintain. (Especially since doing so would more likely than not translate into a slew of "my use case is slightly different than what you support can you add a single flag just for me pretty please?" feature requests). For now, I would recommend you build your own context-provider-like system outside the CDK regular build. Store some state in JSON files, read them in your application and configure your app accordingly. How you push around state is then completely under your control and as flexible as you need it to be. When the open provider framework materializes, the same code can then be more nicely integrated and re-used, but fundamentally it doesn't change much about how this would work. Thanks for the energy behind the idea! Closing this issue as not really appropriate for CDK itself, but I wish you the best of luck implementing this and releasing it to the broader community if you feel it fills an important need. |
Expose a SubnetSelection option that allows filtering out Subnets with 0 available IPs (IP "exhausted" subnets).
Use Case
I work in a massive organization where we all share a set of accounts - managed by central resources and deployed with aging Terraform configs. We've reached the point where one of 3 of our private subnets has run out of IPs. The other two subnets still have quite a bit of IP space left.
It would be nice to be able to filter out IP-unavailable Subnets when creating resources to avoid failed deploys to those Subnets. For example, creating a DBCluster, I want to avoid including unavailable Subnets when the DbSubnetGroup is created.
Proposed Solution
I have implemented this change by exposing the
availableIpAddressCount
Subnet property source from the context provider and exposed afilterOutIpExhausted
SubnetSelection
option - defaulted to false. Exhausted subnets are still pulled back as part of the vpc lookup. This just allows consumers to filter them out during subnet selection.I'm a CDK codebase noob so there are cases I'm not sure about, particularly regarding context management (e.g. stuff like #6929). If an existing (deployed) resource is ignoring IP exhausted Subnets, and one of the subnets it's referencing runs out of IPs, then what happens the next time the resource is deployed? If the context is version controlled, then will the subnet still show as available?
Edit: After re-reading the CDK context/environment docs and Rico's super helpful post in #6929, the questions I had above don't seem concerning anymore, provided people are managing context according to the supplied guidelines. Playing around with my own stack, which is deployed to 3 different accounts has also assuaged some fears I had. Now I'm just dying to know what I've missed and what the experts think. 😄
Other
The PR was really just a fun exercise to start getting familiar with CDK code. If it's useful, great. If not, no sweat.
I'm also aware of #5927 + #7720, but I don't see any major conflicts between the changes. There might however be impacts depending on what sort of other rework is under consideration.
Cheers!
This is a 🚀 Feature Request
The text was updated successfully, but these errors were encountered: