Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split agones controller into leader elected pods #1296

Closed
markmandel opened this issue Jan 25, 2020 · 8 comments
Closed

Split agones controller into leader elected pods #1296

markmandel opened this issue Jan 25, 2020 · 8 comments
Labels
area/operations Installation, updating, metrics etc duplicate Duplicate ticket kind/feature New features for Agones

Comments

@markmandel
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
There is a single pod for the Agones controller. This is fine most of the time, as if it crashes, it will be recreated by the deployment, but it can cause issues when a node goes down, or if there is a crash for an extended period.

Describe the solution you'd like
Have multiple controllers, that are leader elected, ideally with a preference to run on different nodes from each other - so that if there is any kind of downtime with one controller, it can move to another controller quickly, thus providing extra redundancy.

Describe alternatives you've considered
Leaving things as they currently are. Deployments do make the controller fairly robust, as they will bring a controller back up again fairly quickly if there is a failure.

As part of this work, we should look at how fast leader election takes place if the current leader has an issue. Make sure it's acceptable.

Additional context

@markmandel markmandel added kind/feature New features for Agones area/operations Installation, updating, metrics etc labels Jan 25, 2020
@nanasi880
Copy link
Contributor

nanasi880 commented Feb 2, 2021

I am hosting Agones on Amazon EKS.
The agones-controller is annotated with "safe-to-evict:false" and I want to run the controller on Fargate. (The reason for "safe-to-evict:false" is to prevent ClusterAutoscaler from replacing EC2 nodes.)
Fargate takes about a minute from the time the pod is requested to be started to the time it is actually provisioned.
Fargate takes about a minute from the time a pod is requested to be started to the time it is actually provisioned, so if there is only one pod, automatic repair by Deployment is not as fast as it could be.
In such a case, wouldn't it be beneficial to have multiple Pods pre-provisioned?

@markmandel
Copy link
Collaborator Author

I think you're going to run into a lot of trouble running the Agones controller outside of Kubernetes -- especially around the admission webhooks.

Not sure if this is worth the extra complexity to implement, since Kubernetes Deployments solve so many problems.

Can you explain your impetus?

@nanasi880
Copy link
Contributor

I was lacking in explanation.

There are two types of worker nodes in Amazon EKS.
The EC2 type is a way to provision a VM in advance, like GCP's ComputeEngine.
The Fargate type is a way to provision Fargate resources as the Pod is launched, like CloudRun, and is a fully managed environment.
Both of these work as worker nodes in Kubernetes. (Not outside of Kubernetes)

The problem with the EC2 startup type is that VM maintenance is a user responsibility. (Like the recent sudo vulnerability)
In such a case, we want to eject the node and start the worker node from a new machine image, but ClusterAutoscaler cannot move the agones-controller because of the "safe-to-evict:false" annotation.
Therefore, we are planning to use Fargate-type Kubernetes worker nodes that do not require such management.
In this way, all pods except GameServer can be relocated by ClusterAutoscaler.

One problem with the Fargate startup type is that provisioning of the machine is slow.
The Fargate startup type provision Fargate resources after a Pod request is made, so there is a lag of approximately one minute.
This means that when Deployment re-creates a new Pod, there will also be a lag of about a minute.

In situations like this, where provisioning takes a long time, we thought it would be valuable to be able to launch multiple Pods in advance.

@markmandel
Copy link
Collaborator Author

markmandel commented Feb 2, 2021

If you want to have automatic updates (which means you also don't control when your short downtimes are), you could always set .Values.agones.controller.safeToEvict to false in your scenario -- rather than adding a lot more complexity to the Agones system.

Personally, I'm not a fan of having automatic updates to nodes when the system decides, at least not without being able to set automatic maintenance windows -- but the choice is up to you.

@nanasi880
Copy link
Contributor

I found an additional scenario for this idea.

I was faced with the need to update AgonesController.
What I wanted to do was simple, I just wanted to stop the AgonesController from outputting a lot of metrics.
However, I found it impossible to deploy without downtime.

The reason for this is that Agones Deployment has a Recreate policy specified instead of RollingUpdate, which means that all Pods are shut down when the deployment starts updating.
Is the fact that AgonesController can only start one Pod preventing the controller from supporting RollingUpdate?

While it's important to have a planned maintenance window, it's unfortunate that AgonesController is the bottleneck scenario for every fix like this.

If you want to have automatic updates (which means you also don't control when your short downtimes are), you could always set Values.agones.controller.safe-ToEvict to false in your scenario

I understand that safe-to-evict=false is specified by default, which means that you don't want AgonesController to move the Pod while it is running.
Again, with or without safe-to-evict, there are Kubernetes infrastructures like Fargate and GKE Autopilot that behave to allocate machine resources after the Pod is started, and in such environments, starting the Pod first is In such environments, starting the Pod first continues to be a valuable thing.

@markmandel
Copy link
Collaborator Author

However, I found it impossible to deploy without downtime.

I don't see it ever being likely that we'll look to support an Agones upgrade without downtime, the testing complexity is essentially infinite.

If you want no downtime between updates, we recommend using multiple clusters:
https://agones.dev/site/docs/installation/upgrading/

@nanasi880
Copy link
Contributor

OK, I don't think it is coming across well.
Regarding the Agones upgrade, I agree that multiple clusters should be used.
In this case, upgrade means minor or major version upgrade.

However, for other updates to Deployment, it should be possible to work with a single cluster.
This is the case to quickly incorporate very minor fixes such as the patch version change or the recent metrics issue ( #2424 ).
Even in cases like simply turning off Prometheus output using Helm parameters, the current situation shuts down all controller pods.

We hope to improve the fact that the realization is too large for the changes we want to make.

One question, if I am able to address this issue, is there any chance this could be merged?
Since Kubernetes has a standard leader election method, I am wondering if this could be used to achieve a custom controller with multiple pods. (I am about to learn that)
https://github.com/kubernetes/client-go/blob/master/examples/leader-election/main.go

Again, environments like EKS Fargate and GKE Autopilot do not launch pods fast enough, so I think it is worthwhile to continue to pre-launch multiple pods.

@zmerlynn zmerlynn added the duplicate Duplicate ticket label Nov 18, 2022
@zmerlynn
Copy link
Collaborator

There is a design for this in #2797 now, I propose we take further discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Installation, updating, metrics etc duplicate Duplicate ticket kind/feature New features for Agones
Projects
None yet
Development

No branches or pull requests

3 participants