Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of NAT Gateways Created #2466

Open
faangbait opened this issue Sep 21, 2024 · 7 comments
Open

Limit number of NAT Gateways Created #2466

faangbait opened this issue Sep 21, 2024 · 7 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@faangbait
Copy link

Not sure if this is simply missing documentation or a feature request.

We're using the following configuration to deploy SNO clusters via hive.

    platform:
      aws:
        region: us-east-1
        zones:
          - us-east-1a
          - us-east-1b

I'd expect this to result in subnets/NAT Gateways to be created in just us-east-1a and us-east-1b, but the installer is creating gateways in 1c through 1f as well.

Since the Openshift Installer defines this as an acceptable configuration parameter, I'm assuming the issue lies in hive; but I'm happy to repost this to the installer repository if developers here can confirm that is a better place for it.

@2uasimojo
Copy link
Member

Those gateways are definitely created by installer. I recall a discussion around this related to cost, where the crux was that you can save money by deploying smaller clusters into “smaller” regions (those with fewer AZs) since these gateways are always created and they’re expensive. But I don’t know if the topic was ever approached from the perspective of being able to restrict day 0 to just using the AZs in the install-config.

Definitely something the installer team would need to answer. @patrickdillon ?

@patrickdillon
Copy link
Contributor

This logic is definitely the responsibility of the installer side--not Hive. @faangbait feel free to open a bug against the installer. I have asked @mtulio to assess as well.

Also note the config you posted is not valid. We don't have any section in the install config where region and zones are at the same level. I think the intent here was:

    platform:
      aws:
        region: us-east-1
        defaultMachinePlatform:
          zones:
            - us-east-1a
            - us-east-1b

But AFAICT correcting the config still does not fix the issue, so I suspect we do need some changes in the installer logic.

@patrickdillon
Copy link
Contributor

But AFAICT correcting the config still does not fix the issue, so I suspect we do need some changes in the installer logic.

oops double checked and I was looking in the wrong place. The manifests are indeed generated correctly. Can you check whether fixing your config resolves the issue?

@mtulio
Copy link

mtulio commented Sep 23, 2024

This logic is definitely the responsibility of the installer side--not Hive. @faangbait feel free to open a bug against the installer. I have asked @mtulio to assess as well.

That's it.

@faangbait The main idea is platform.aws.defaultMachinePlatform.zones will limit the amount of zones that will be discovered by AWS APIs, in your example instead of using all available zones, the installer will use your default (us-east-1a and us-east-1b) to make your VPC/infrastructure.

That's worth to mention that the compute pools definitions takes precedence on your defaults. So if defined the zone sin any of the compute pools (compute[.name==worker].platform.aws.zones or controlPlane.platform.aws.zones) definition in install-config.yaml, it will use it too.

@faangbait
Copy link
Author

faangbait commented Sep 26, 2024

To clarify, I tried it the other way (as defined in the manifest) first. Forgot to hit the undo button before copying and pasting over here -- but I can confirm that neither worked.

That's worth to mention that the compute pools definitions takes precedence on your defaults.

This is a good lead. Since we're not defining worker nodes, it's probably overriding the control plane config with the empty config provided to workers.

If that doesn't solve it, we'll document deleting the extra AZs as a day 1 problem. Easy enough to script.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2024
@2uasimojo
Copy link
Member

@faangbait are we good to close this issue, or is there still work/investigation to be done on the hive and/or installer side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants