-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbage Collection hang since upgrading to .Net 9 #112203
Comments
Tagging subscribers to this area: @dotnet/gc |
Does this deadlock happen on startup? Fix for #105780 should be included in the latest .NET 9 servicing release. Fix for #110350 is not done yet. A temporary workaround should be to disable background gc |
It's not specifically a startup issue and we should already be using that Jan service release of .Net 9. However it looks like one of those tickets had a fix rolled back anyway. I'll run some tests with background GC disabled and report back the results next week. Will take a few days as I'm not expecting much load on these servers until the middle of the week and it only seems to happen when they are under high load. |
ok sounds good. Looks related to #110350 then. Will look at working on a fix for that. |
So I have set the environment variable DOTNET_gcConcurrent = 0 on all the processing servers, had no issues for the last week then last night had 3 processes hang on the same server (all at different times). Initial analysis looks the same as before:
Looking at the stack traces I do still see a few threads referencing bgc_thread_function - does this suggest background GC is still enabled and maybe my environment variable hasn't worked? |
Yeah it appears the BGC was still enabled. You can check the env. vars using |
Thanks, yes !peb confirms that the DOTNET_gcConcurrent environment variable was not present which is a bit bizarre so I'll need to dig into that. Elsewhere things seem to be fairly solid now so we seem to be on the right track. |
I think we faced the same issue. I upgraded an application to NET9 a few weeks ago. Have around 60 different deployments with 2-3 pods each. Afterwards, on average every 3 or so days some random pod would apparently just freeze entirely - no HTTP routes reachable anymore, logs stopping entirely, and then quickly killed by k8s liveness probes. For the time being, we've tried reverting to the old GC ( Thanks for looking into this, we're keenly awaiting the fix as well as I'm eager to revert to the current GC (and am hesitant to move further services onto NET9 for the time being) |
we have a fix in the works which should be included in the next servicing release for 9. Thanks |
oh sorry, yeah I was referring to the post from @scotttho-datacom. @fabianoliver you are running into a different issue if its on linux (we had some fixes in the latest .NET 9 servicing releases, assuming you are running the latest). If so please create a separate issue with details (dump or stack trace of the process would be helpful) |
Description
Since upgrading our application to .Net 9 we are seeing processes lock up and hang indefinitely (stuck for several hours before being killed manually). Analysis of the memory dumps suggests a garbage collection issue.
Have collected many memory dumps and can probably share a memory dump from our dev environment privately if needed.
WinDBg output:
stacks.txt
Reproduction Steps
We have not been able to reproduce this on demand, but are seeing it on a nightly basis
Expected behavior
Application not to hang
Actual behavior
Application hangs indefinitely
Regression?
Seems to have come in since upgrading to .Net 9
Known Workarounds
None
Configuration
Some background on our setup:
Application Servers (windows x64)
Processing Servers (windows x64)
Have collected many memory dumps and the hang is often in the same area of code (Entity Framework) which looks like it is doing quite a lot of allocations.
Have tried applying the same GC settings to the processing servers and have had mixed results. No hangs in our dev environment after 2 days but 2 hangs in production the first day after applying the settings.
Other information
Possibly related:
#110350
#105780
Apologies if this is a duplicate of either of those, bit hard for me to tell so I figured a separate issue report might be best.
The text was updated successfully, but these errors were encountered: