-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow game servers deletion #540
Comments
Oh that's fun. I'm almost willing to bet that what happens is:
So we should look at if when you delete a GameServer, remove the finaliser if it's at Scheduled or before. (Possible concern with race conditions - but that should be the general gist) |
I thought we have to release the port regardless? So the finalizer has to stay. Right? |
Something that I'd appreciate, we only delete the fleet only when replica is down to 0, WDYT ? |
So the port only physically gets released once the Pod is gone -- hence the finaliser. At PortAllocated - we assign a port from our in memory registry of available ports - but we don't do anything to lock a specific port at that stage physically on the network. So if we delete the GameServer before the Pod goes up, the PortAllocator will just free up the port in the registry, and make it available again. @Kuqd you can always do this yourself by doing a |
@markmandel I'll give a try to foregroundDeletion, it's for an API. |
Some more hints, it seems that gs/pods are still getting created then deleted |
I tried I think I've pinned down the issue, we should stop enqueuing for the GameServerSet when it's being deleted. |
I saw a similar behavior when per testing fleet scaling up and down. When I tested with GKE Autoscaling, at some point fleet autoscaling got stuck (different problem), game servers got stuck in Scheduled state, and then later deletion took some time. @Kuqd, did you observe this error as well or is it a different issue? |
I believe this issue is also resolved (which was similar to issue #543) I tested and it seems fine (I tested by creating 4000 GS). Can someone else confirm it so we can close. I believe recent GSS and Delete improvements resolved the issue. |
I'll investigate. |
@Kuqd can we close now? |
Gentle bump - @Kuqd can we close this now? |
I'm going to close this, please feel free to reopen if it rears its head again. |
Hello, I'd like to reopen this, specifically the part about foreground deletion. I can reproduce it with agones 1.14.0 and a fleet which has 2 or more replicas - it keeps deleting and starting new game servers in a loop. Maybe there is some work to do in order to stop creating replicas for a fleet when it has a finalizer / deletion timestamp? EDIT: I've tested the same sort of thing with a Deployment and ReplicaSet which in this sense is supposed to be equivalent to a Fleet and GameServerSet and deletion works fine so whatever is done for deployments to handle this could apply as a solution for fleets. |
@tenevdev can you create a new issue, with reproducible steps that are run against the latest Agones? I think this is hard to track exactly what your issue is otherwise, as this ticket wandered a little from its original intent. |
If you create a fleet using the tutorial command :
Then over provision (say 40 replica) the fleet using
kubectl edit
a lot of pods will get stuck in scheduling.However if you delete the fleet, it disappears instantly, pods will be delete within seconds but gameservers can stay for up to 10min hanging before ultimately disappearing.
We should investigate what is the root cause of this.
The text was updated successfully, but these errors were encountered: