-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split agones controller into leader elected pods #1296
Comments
I am hosting Agones on Amazon EKS. |
I think you're going to run into a lot of trouble running the Agones controller outside of Kubernetes -- especially around the admission webhooks. Not sure if this is worth the extra complexity to implement, since Kubernetes Deployments solve so many problems. Can you explain your impetus? |
I was lacking in explanation. There are two types of worker nodes in Amazon EKS. The problem with the EC2 startup type is that VM maintenance is a user responsibility. (Like the recent sudo vulnerability) One problem with the Fargate startup type is that provisioning of the machine is slow. In situations like this, where provisioning takes a long time, we thought it would be valuable to be able to launch multiple Pods in advance. |
If you want to have automatic updates (which means you also don't control when your short downtimes are), you could always set Personally, I'm not a fan of having automatic updates to nodes when the system decides, at least not without being able to set automatic maintenance windows -- but the choice is up to you. |
I found an additional scenario for this idea. I was faced with the need to update AgonesController. The reason for this is that Agones Deployment has a Recreate policy specified instead of RollingUpdate, which means that all Pods are shut down when the deployment starts updating. While it's important to have a planned maintenance window, it's unfortunate that AgonesController is the bottleneck scenario for every fix like this.
I understand that safe-to-evict=false is specified by default, which means that you don't want AgonesController to move the Pod while it is running. |
I don't see it ever being likely that we'll look to support an Agones upgrade without downtime, the testing complexity is essentially infinite. If you want no downtime between updates, we recommend using multiple clusters: |
OK, I don't think it is coming across well. However, for other updates to Deployment, it should be possible to work with a single cluster. We hope to improve the fact that the realization is too large for the changes we want to make. One question, if I am able to address this issue, is there any chance this could be merged? Again, environments like EKS Fargate and GKE Autopilot do not launch pods fast enough, so I think it is worthwhile to continue to pre-launch multiple pods. |
There is a design for this in #2797 now, I propose we take further discussion there. |
Is your feature request related to a problem? Please describe.
There is a single pod for the Agones controller. This is fine most of the time, as if it crashes, it will be recreated by the deployment, but it can cause issues when a node goes down, or if there is a crash for an extended period.
Describe the solution you'd like
Have multiple controllers, that are leader elected, ideally with a preference to run on different nodes from each other - so that if there is any kind of downtime with one controller, it can move to another controller quickly, thus providing extra redundancy.
Describe alternatives you've considered
Leaving things as they currently are. Deployments do make the controller fairly robust, as they will bring a controller back up again fairly quickly if there is a failure.
As part of this work, we should look at how fast leader election takes place if the current leader has an issue. Make sure it's acceptable.
Additional context
The text was updated successfully, but these errors were encountered: