Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to assign custom name to NEG via annotation. #919

Closed
jaceq opened this issue Oct 29, 2019 · 36 comments
Closed

Add ability to assign custom name to NEG via annotation. #919

jaceq opened this issue Oct 29, 2019 · 36 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jaceq
Copy link
Contributor

jaceq commented Oct 29, 2019

As per titile, I'd like to be able to assign a custom, provided name to NEG via annotation (NEGAttribure).

It seems there is a mention of this in code: https://godoc.org/k8s.io/ingress-gce/pkg/annotations#NegAttributes

@rramkumar1 rramkumar1 added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 19, 2019
@jaceq
Copy link
Contributor Author

jaceq commented Feb 7, 2020

@freehan @rramkumar1 Any update on this?

@zachdaniel
Copy link

This would be really great!

@deedf
Copy link

deedf commented Feb 23, 2020

If it would let the user specify a stable NEG name that would pre-exist, and then keep that NEG in sync, it would be really helpful.

@bowei
Copy link
Member

bowei commented Feb 23, 2020

@freehan -- what do you think? this seems low hanging fruit.

@deedf
Copy link

deedf commented Feb 25, 2020

The intended usage for me is to register a bunch of NEGs as backends to a GCLB backend service, and have the membership of those known NEGs be managed automatically.

I think I need to do this because there currently does not seem to be a pure k8s solution to multi-cluster ingress.
There is currently GoogleCloudPlatform/gke-autoneg-controller that can something similar but you need to run a custom controller, and I think this could be handled entirely at the NEG level without caring about the Backend part.

There is also this nice writeup there https://blog.jetstack.io/blog/container-native-multi-cluster-glb/ where the optimal setup ends up being a single global lb with k8s services as backends, and they have to jump through hoops to inject the automation generated NEG names into their terraform config.

If the name could be specified in advance for an existing NEG, there would be a really clean solution to multi-cluster deployments on GCP using global container native load balancing, pending a pure k8s solution.

Of course there are numerous implementation details to discuss if there is further interest.

@spencerhance
Copy link
Contributor

@deedf https://cloud.google.com/kubernetes-engine/docs/concepts/ingress-for-anthos might also work for you

@deedf
Copy link

deedf commented Feb 28, 2020

@deedf https://cloud.google.com/kubernetes-engine/docs/concepts/ingress-for-anthos might also work for you

Looks like it might, thanks !

Also, thinking about it more, for the lightweight use case where you just want your deployments to register to an externally configured load balancer, I think the approach used by GoogleCloudPlatform/gke-autoneg-controller is actually exactly right, since using pre-defined NEGs would force the user to create them in every possible zone beforehand.

@zachdaniel
Copy link

I didn’t know about https://github.com/GoogleCloudPlatform/gke-autoneg-controller, and I’ll probably use it to solve for this 😍

@bowei
Copy link
Member

bowei commented Mar 4, 2020

Note: we are looking at the autoneg use case -- so there will likely be something more "official" integrated into ingress/service.

@zachdaniel
Copy link

I've tried out the gke-autoneg-controller and I wasn't able to get it working in the time I had to spend on it. Someone with more ops chops than me might be able to though. But my ideal would absolutely be just allowing us to name it in the annotation :)

@jaceq
Copy link
Contributor Author

jaceq commented Mar 5, 2020

I tried with autoneg but didn't get it to work... it seems complicated, and documentation isn't great...
Overall as I understand steps needed:
-> enable cloud-platform oauth scope
-> install autoneg deployment, roles etc etc
-> create LB backend services (without groups)
-> create compatible healthchecks
-> create services with matching autoneg / neg annotations...

Also, in my specific case (I use Terraform a lot) using autoneg introduces 'hidden' (from terraform point) dependencies.
In my case just being able to name a NEG via annotation would resolve my issues and allow me to get this to work in no-time.

ps. I did get autoneg to work when I noticed how many components I'd have re-write from .yaml to terraform just to get it to run (again, I use Terraform a lot)

@bowei
Copy link
Member

bowei commented Mar 5, 2020

cc: @mark-church

@mark-church
Copy link

@deedf @jaceq @zachdaniel I'm interested to know more about autoneg vs NEG naming and how they impact your use-cases. We are looking at solutions for implementation of this in the GKE Svc/NEG controller. Here are the pros and cons in my mind. Would love to know your thoughts:

  • NEG naming (allowing NEG name to be specified in service.yaml)
    • Pros: much faster to implement and ship, flexible and straightforward solution
    • Cons: puts burden of unique name generation on user, conflicts would fail and so checking existing NEGs might be required, still requires the step of adding NEG to backend-service
  • Autoneg (allowing BES to be specified in service.yaml)
    • Pros: automatically connects NEG to BES without extra step, requires fewer manual steps from user
    • Cons: some corner cases that need to be figured out such as how to handle backend addition/removal if BES or NEG is deleted so some additional complexity that will take more time (@freehan can probably add more detail).

Also, what are the specific use-cases and which LBs (internal/external) do you have for standalone NEGs - multi-cluster, using LB features not supported in GKE, or doing manual deployment just for fun?

@deedf
Copy link

deedf commented Mar 8, 2020

Mark, thanks for your interest.

My use case is configuring a global backend service in front of multiple regional backends. I did not know about Ingress for Anthos despite doing quite some searching, the S/N ratio wasn't that good. Still, it seems a bit top heavy from skimming through the docs, but I need to look at it more so I'll leave it aside.

The original idea was to have a single global backend service configured outside of k8s, and a way to automatically register deployments as backends when and where they were scheduled manually or automatically. It certainly was not for fun, just to work around the k8s cluster split brain syndrome in the face of global resources, and to use features not supported by k8s.

Trying to expand on your points:

  • NEG naming (allowing NEG name to be specified in service.yaml)

    • Cons: requires pre-registering the NEGs in every possible zone in advance, since NEGs are zone bound AIUI. Then you have to keep them up date with zone turnups or teardowns, etc
  • Autoneg (allowing BES to be specified in service.yaml)

    • Pros: acts at schedule/deschedule time instead of requiring preparation in advance.

@jaceq
Copy link
Contributor Author

jaceq commented Mar 9, 2020

@mark-church I am using GCLB with features unsupported by ingress-gce (cloud armor - ip whitelisting, and custom health checks - due to basic auth in place, some of my endpoints do not return 200 without Authorization headers)

Honestly I think it would make sense to have both options (named NEGs and autoneg ) so there would be a option to choose best solution.

Also, in my specific case, I use IaaC (Terraform) extensively, and using named NEGs would simply make my life easier in such case given that dependecy graph would be handled by TF only and not mixed - with autoneg I introduce 'silent' dependencies as at the time of service creation my backend has to already be in place (otherwise I get not registration) - and that is not given in terraform as it is not aware of that dependency.

@samschlegel
Copy link

+1 to NEG naming

We also use Terraform extensively, and are currently looking into moving services from MIGs into Kubernetes. Not having a way to know the name of the NEG in advance means we run into similar issues of having to create the Service, pull the auto-created NEG name down, and then manually pass in the name to our Terraform config. I'd also feel much safer having control over this NEG, as I'm not sure what would cause a potential recreation of the NEG, leading to a name change and service downtime.

@samschlegel
Copy link

Perhaps what we'd need is less custom naming, and more the ability to provide a self link to an externally created NEG that the controller should manage. We're currently running into issues trying to use the Internal HTTP Load Balancer in a Shared VPC, as all the resources for that must live in the host project, but the NEG this controller creates lives in the service project

@jaceq
Copy link
Contributor Author

jaceq commented Mar 17, 2020

@samschlegel Basically there are 2 options in our case.

  1. Pass custom name to NEG
    OR
  2. Be able to read random name back after NEG creation, in fact I opened ticket for that too: Add service attribute that would return name of a NEG hashicorp/terraform-provider-kubernetes#668

@deedf
Copy link

deedf commented Mar 17, 2020

I think the naming problem is also solved by having an annotation in a Service that tells it which Backend Service to register to, just like gke-autoneg does. Then you don't care if the NEG name is generated or not.

I also don't understand how people who advocate using pre-registered NEGs plan to keep them in sync with zone turnups/turndowns (NEGs are per.zone).

@zachdaniel
Copy link

zachdaniel commented Mar 18, 2020

We’re using kustomize + cloud config connector, so we can easily generate resources with unique names (using a prefix/suffix). Being able to name the NEG would be perfect for us.

We want to register that NEG with a compute URL map.

@samschlegel
Copy link

samschlegel commented Mar 20, 2020

@deedf We define services in Bazel and generate both our k8s manifests and our terraform configs there as well, so spinning up or tearing down a zone is just modifying a list we pass in when generating configs

I currently don't like the idea of having the backend service not managed by Terraform as that's how we manage all of our MIG-related infrastructure, so something like gke-autoneg is out of the picture for us

@deedf
Copy link

deedf commented Mar 21, 2020

@samschlegel the "how" to spin up or tear down a zone is trivial, the problem is the "when". How do you know that a GCP zone has been spun up or torn down ? How do you hook your update process to these changes ?
Also I don't really understand the argument about not wanting the backend service not managed by Terraform since what gke-autoneg does IIUC is to keep the NEGs in sync with a preexisting backend service that can very well be created by Terraform. What gke-autoneg manages is backends not backend services.

@samschlegel
Copy link

samschlegel commented Mar 26, 2020

Ah looking at it more autoneg would probably work for us, it just means we need to make Terraform non-authoritative in what backends a backend service has.

re: "when" to spin up or tear down a zone, perhaps I'm misunderstanding what you mean there. all of our zone spin up and tear downs are manual, so the "when" is just part of our deployment scripts.

@tonybenchsci
Copy link

We’re using kustomize + cloud config connector, so we can easily generate resources with unique names (using a prefix/suffix). Being able to name the NEG would be perfect for us.

We want to register that NEG with a compute URL map.

Same for our company. Using kustomize + KCC, but have to copy the autogenerated NEG name after creating application services, kind of a pain.

@bert-laverman
Copy link

@deedf https://cloud.google.com/kubernetes-engine/docs/concepts/ingress-for-anthos might also work for you

This controller contains several errors., worst of which is that the example uses a Backend Service name with an underscore in it, which is not allowed. I tried this before realising the mistake, and now I have a Service that cannot be deleted because the controller won't progress past the error in the name, and it blocks.

I am no Go (or k8s internals) developer, so I do not know how to solve this. I cannot even throw away the test namespace because it waits for that service to disappear.

@bert-laverman
Copy link

Update: It appears you can actually delete stuff, by manually editing the state of the Service and removing the finalizer pointing at the failing code.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2020
@sho-abe
Copy link

sho-abe commented Aug 11, 2020

I think that, the feature of header/query based routing without addons(e.g. Istio) is useful.
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 11, 2020
@mark-church
Copy link

Hi all, custom NEG naming will be coming fairly soon. We'll be releasing it to GKE 1.18 in the late Aug-Sept timeframe.

@tonybenchsci
Copy link

@mark-church Any updates?

@mark-church
Copy link

Yup - it's currently targeted to roll out to GKE 1.18 as Beta functionality in the first week of October. Please don't be surprised if things are off by a week or two :)

Because there are some major changes to the ingress controller in this rollout it's unlikely that we will be able to safely backport to older GKE versions. It is likely that this will just be available in GKE 1.18 and newer releases.

@deedf
Copy link

deedf commented Oct 19, 2020

Yup - it's currently targeted to roll out to GKE 1.18 as Beta functionality in the first week of October. Please don't be surprised if things are off by a week or two :)

@mark-church Any news ?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 17, 2021
@jaceq
Copy link
Contributor Author

jaceq commented Jan 18, 2021

let's not close that dear @fejta-bot

@freehan
Copy link
Contributor

freehan commented Jan 29, 2021

This feature is in public preview. https://cloud.google.com/kubernetes-engine/docs/how-to/standalone-neg#create_a_service

Feel free to try it out and report any issues. Thanks!

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 28, 2021
@freehan freehan closed this as completed Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests