-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add randomized allocation sampling #100356
Conversation
…ampling feature. This commit is only to gather feedback on the direction and is not intended to be checked in. The is a resumption of some of the original work that occured in dotnet#85750 although this is a different implementation approach. The overall goal is to do lightweight allocation sampling that can produce unbiased approximations of what was actually allocated. Our current ETW AllocationTick sampling is periodic but not randomized. Extrapolating from the AllocationTick samples can lead to unbounded estimation errors in theory and also substantial estimation error observed in practice. This commit is primarily to get feedback on adding a new adjustable limit pointer on the coreclr EE side of allocation contexts. This allows breaking out of the fast path allocation helpers at an arbitrarily selected sampling point that need not align with the standard GC allocation quantum for SOH. With sampling off the fast_helper_alloc_limit would always be identical to alloc_limit, but when sampling is on we would occasionally have them diverge. The key changes are: gcheaputilities.h - defines a new ee_alloc_context which augments the existing gc_alloc_context with the extra fast helper limit field gcheaputilities.h and threads.h - replace the global and per-thread storage of gc_alloc_context with the expanded ee_alloc_context to store the new field. jithelpers.cpp, jitinterfacex86.cpp, and amd64 asm stuff - refactors fast path alloc helpers to use the new limit field instead gchelpers.cpp - Updated Alloc() function recognizes when fast helper limit was exceeded to do some kind of sampling callback and update the fast helper limit when needed. This commit doesn't contain any logic that actually turns the sampling on, determines the limit values needed for sampling, or issues callbacks for sampled objects. That would come later if folks think this is a reasonable approach to intercept allocations.
You may consider peeling off some of the changes into separate PR to make this easier to land. For example, the changes in the GC proper around the 64-bit aligned allocs can be peeled off into a separate PR. |
Split into how many pieces @jkotas?
|
I think the following split may work better:
Smaller PRs are easier to stabilize, review and merge that locks in forward progress. |
I'm not sure to understand the difference between the last 2 items: emitting the event is a tiny part of the work; the second item is the big one. Also, without the sampling itself, the code will be different (mostly what I did at the beginning of NAOT when I did not want to change the behaviour - combined_limit = alloc_limit) |
The second item (Introduce the secondary allocation limit (for both CoreCLR and NAOT)) would be a mechanical change. It does not need any tests. It should not introduce any observable behavior changes. It is just touching a lot of different places and it is easy to miss a few of those somewhere. I do not have strong opinions about the best way to split this, but I think this will need to split into multiple PRs to land it - giving how long it takes to stabilize it. |
@@ -0,0 +1,131 @@ | |||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be move to src\native\minipal and CoreCLR impl switched to it as well?
In any case, if you agree that the 64-aligned allocs cleanup would be a good independent PR, we can start with that. |
_ASSERTE(allocPtr <= allocContext->alloc_limit); | ||
if (size > static_cast<SIZE_T>(allocContext->alloc_limit - allocPtr)) | ||
_ASSERTE(allocPtr <= eeAllocContext->combined_limit); | ||
if ((allocPtr == nullptr) || (size > static_cast<SIZE_T>(eeAllocContext->combined_limit - allocPtr))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not want to adding extra checks to the allocation helper fast paths. It would cause undesirable performance regressions.
Tagging subscribers to this area: @tommcdon |
@noahfalk I believe this PR is superseded by:
If yes, @chrisnas @noahfalk any objections if we close this PR? |
This PR adds a new feature, randomized allocation sampling, that can be used for low overhead allocation sampling with good probabilistic error bounds. We expect APM tools and other scenarios that do memory profiling in development or production will be interested in this. See docs/design/features/RandomizedAllocationSampling.md for more details about what this is and how it works.
This PR is a continuation from work originally in #98167. @Maoni0 and @jkotas had some discussion at that point to agree on the overall development direction.
There are a variety of test results showing some performance comparisons and statistical sampling distributions in src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/. These are available for reviewers to see but will be deleted prior to merging the PR.
Current status:
Note:
An option could be to merge the Core part now to secure the feature in .NET 9 and continue the NativeAOT that might not be finished for the cut. Note that the fix to emit AllocationTick in NativeAOT is now merged so it could help if the new randomized sampling is not available.
Guide to the changes: