-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Try to retain entry weight during profile synthesis #111971
JIT: Try to retain entry weight during profile synthesis #111971
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show numerous methods where we can compute a more precise call count, rather than falling back to |
There's kind of a chicken and egg problem here, because we're about to recompute the weights of all the blocks that are preds of the entry block, so relying on the existing weights of those blocks seems a bit odd. Can you post a simple(-ish) non-OSR example where this changes things? If the entry is a loop head (not sure that is possible anymore with ominpresent scratch BB) there is a |
I had #110693 where I tried to compute |
Thanks for pointing this out; this seems better than relying on the old weights. One thing I notice with this approach is
Sure. For
Whereas if we derive the entry weight from the loop's cyclic probability:
I'm tempted to revive this, since we'd ideally compute this early when the profile is still consistent: I suspect some of diffs you got on that PR had to do with OSR methods having nonsensical weights on |
Sounds right. IMO it would be best to go that route since otherwise we may just end up churning things twice... |
I think we still run into an ordering problem where we don't know how much flow makes it back into the entry block until profile synthesis has run, but perhaps we can make profile synthesis responsible for updating/setting |
If you have a If the entry block has backedges I believe we'll always find a loop there, since there is no other possible entry (ignoring OSR for the time being). |
So in the case where the entry block is also a loop header, we have to cache the block's original weight before running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up, seems like this value should be propagated to fgCalledCount
.
That's my plan; I'll take a look at the diffs in another PR. |
Part of #107749. Follow-up to #111971 and #110693. For methods without profile data, ensure the default call count is available throughout compilation (this had no diffs for me locally). For methods with profile data, compute the call count after synthesis runs to ensure it is available early, and reasonably accurate. I'm only seeing diffs in OSR methods locally, due to the logic in `fgFixEntryFlowForOSR` (which runs right after profile incorporation) no longer affecting `fgCalledCount`. This method guesses that the loop iterates about 100x the method call count, and scales the method entry block's weight down accordingly. This gives the impression later on that `fgCalledCount` is much lower than what we calculated using `fgEntryBB`. The actual diffs seem to manifest largely in LSRA, which uses `fgCalledCount` to normalize block weights, though there are a few other phases that use `BasicBlock::getBBWeight` in lieu of the raw weight as well. I think we ought to consolidate our block weight strategy at some point, especially if we have newfound faith in `fgCalledCount`. For example, instead of this check in if conversion: ``` if (m_startBlock->getBBWeight(m_comp) > BB_UNITY_WEIGHT * 1.05) ``` Perhaps we could do: ``` if (m_startBlock->bbWeight > fgCalledCount * 1.05) ``` But that's for another PR.
Part of #107749. Prerequisite to #111915. Regardless of the profile synthesis option used, we ought to maintain the method's entry weight, which is computed by summing all non-flow weight into the entry block. Ideally, we'd use
fgCalledCount
here, but this isn't computed until after morph, and we need to tolerate the existence of multiple entry blocks for methods with OSR pre-morph.