-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dangling Game Servers left behind if allocation fails at the wrong moment #607
Comments
Oh yep - this is fun. And in reality, likely to happen in any distributed system, because weird stuff always does. 2 thoughts on the above:
But this probably really goes to a wider conversation about player tracking, and how we want to manage that. |
Adding a mandatory call from the SDK when the game start seems a good solution without having to handle player tracking for now. And I agree, we need opt-in or opt-out, I think opt-out is better since we want our users to not have dangling GameServer by default. |
Maybe we should consider having another state called "InUse". After the GS state is set to Allocated, game running on the GS should call Agones to change the state from Allocated to InUse. If the GS is in Allocated state for X minutes, system can shut it down and release the resources. |
At some point in the near future, we will need to track at the very least - max number of players a DGS can handle, and the number of players currently playing - to make it easier to support backfill for matchmakers (or even just front-fill). This feels all related to be. |
I'm wondering if the game server could handle this itself by shutting down after a given timeout. |
I'm wondering if we (optionally) track player counts/ids per game server, we could have some kind of timeout to shutdown if allocated but no players connect after a certain period? (Also optional) |
Hi. I found the same problem. In games where a small number of players match, the server that fails the allocation is still the dangling server. Currently I am implementing PlayerCount management and @jkowalski 's solution on the GameServer side that If PlayerCount is zero for a certain period in Allocated state, Shutdown is called automatically. This issue can occur in a variety of games, so it would be nice if Agones could provide a solution! |
We are part of the way there, with Player Tracking being in alpha right now! Not sure if we should wait until GA for this feature, but what would you like this to look like? Some sort of additional configuration option on a GameServer yaml? |
I agree. It is a good idea to avoid the dangling server by writing it in GameServer yaml. |
This still seems like a good idea, but will get moved to #2716 once it's implemented. |
'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions ' |
Added |
There's a rare race condition in Game Server Allocation which leads to dangling game servers that will never be cleaned up.
Basically if a call to allocate the server partially succeeds by marking the GS as
Allocated
but failing to return the result to the user (for example because of a call timeout or a connection getting dropped at the wrong moment), the user will never know which server was allocated in their call. Even worse, such server will never be in use and will stay Allocated forever, preventing fleet scaling down, etc.One solution would be to purge Allocated game servers that don't have actual game played on them after a long while. To do that we need to know which GS are active and which are not.
How about the following strategy:
we add new call to the SDK that indicates the game is really being played (perhaps we just extend SDK health reporting to periodically pass extra data such as number of players in the game, perhaps some latency statistics and maybe average frame time, tick rate, etc.)
any DGS that does not receive health signal indicating that the game is being played within X amount of time will automatically move back to
Ready
or perhaps evenShutdown
This would obviously be a breaking change for current DGS who don't report the correct health information yet, so we'd need an opt-in or opt-out mechanism for the transition period to avoid getting those GS killed.
The text was updated successfully, but these errors were encountered: