-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TailCallStress mode. #40698
Fix TailCallStress mode. #40698
Conversation
jitstress run: https://dev.azure.com/dnceng/public/_build/results?buildId=766638&view=results |
e860baf
to
e2ba3b5
Compare
Pushed a fix and re-started stress jobs. |
@AndyAyersMS These changes include switching to always dispatching tail calls via helpers with tailcallstress. Because of that I'm hitting #35687 : JIT\opt\OSR\tailrecursetry\tailrecursetry.cmd fails. #35687 was pushed to future. What do you recommend I should do with this failure? One option is to disable this test under stress. |
Yes, you should disable under stress. |
e2ba3b5
to
949fe31
Compare
Disabled tailrecursetry under stress and rebased. |
I think this is ready for review. @AndyAyersMS @dotnet/jit-contrib PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this seems like a much better approach.
There is likely some code in the importer we can clean up as a result (see passedStressModeValidation
), but no need to do this now.
} | ||
|
||
assg = fgMorphTree(assg); | ||
CORINFO_CLASS_HANDLE structHandle = call->gtRetClsHnd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erozenfeld Can you please expain what this part does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was marking long
on x86 as lvIsMultiRegRet
that was not expected by decomposeLong
(firing an assert).
Right now decomposeLong has its own mechanism to deal with such lclVars.
A long term solution could be to mark long lclVar as lvIsMultiRegRet
on x86 and delete the logic from decomposeLong but for now it is not necessary.
@erozenfeld FYI, related issue #8017 |
PREFIX_VOLATILE = 0x00001000, | ||
PREFIX_UNALIGNED = 0x00010000, | ||
PREFIX_CONSTRAINED = 0x00100000, | ||
PREFIX_READONLY = 0x01000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these flags only use every forth bit?
i.e. 0x1000, 0x2000, 0x4000, 0x8000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason for that. I was just following the existing pattern.
949fe31
to
eb1a3a0
Compare
@echesakovMSFT Thank you for pointing that out. I pushed a change to enable |
This failure in libraries-jitstress :
seems to be unrelated. I was able to repro it locally without my changes and without tailcallstress set. It succeeded once but failed with the same "Stack overflow" on subsequent runs. The stack trace in the debugger is
|
@erozenfeld anything else you're planning on doing before merging? |
@AndyAyersMS Yes, there are 3 failures in libraries-jitstress (Linux x64 and Linux arm64) that I need to investigate. I suspect these may be product bugs exposed by my change to always dispatch tail calls via helpers under tailcallstress. If that's the case I'll remove that change from this PR and address those bugs separately. |
Need any help? |
No, at the moment I'm not blocked on anything. |
If you want, take a look at the issue I described in #40698 (comment) . I was able to repro it on a clean clone without my changes but I only saw it in CI with my changes. It appears to be non-deterministic but fails almost 100% for me. |
It may be something similar to #10015 |
Improve validation of tail calls that are not tail-prefixed in the IL but are marked as such because of TailCallStress. We now do the same correctness validation in morph for such tail calls as we do for implicit tail calls. That blocks tail calls when we have address-taken locals, struct promoted params, and pinned vars. Fixes dotnet#39398. Fixes dotnet#39309. Fixes dotnet#38892. Fixes dotnet#38889. Fixes dotnet#38887. Fixes dotnet#37117. Fixes dotnet#8017.
eb1a3a0
to
415e98e
Compare
I confirmed that the libraries-jitstress failures were exposed by switching to always dispatching tail calls via helpers under tailcallstress. I removed that change from this PR and will address those failures separately. Re-started stress jobs. jitstress: https://dev.azure.com/dnceng/public/_build/results?buildId=771604&view=results @AndyAyersMS PTAL at the current changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just to sum up, the impact is that under tail call stress, we will now
- validate correctness of "stress" tail calls like we do implicit tail calls
- dispatch stress tail calls as tail calls via a helper (or as fast tail calls)
- dispatch implicit tail calls as fast tail calls (reject if requires a helper)
One would hope the parenthetical sets above are empty, and the behavior change from tail call stress mode is to now allow slow implicit tail calls in addition to the tail calls the jit would normally perform. Is that right?
It's almost the case. This change ensures that the validation we do in morph for implicit tail calls also applies to stress calls.
In the current state that is correct. My initial attempt was to always dispatch stress tail calls via a helper. That uncovered some issues in the new tail calls via helpers mechanism that I'm investigating. Once they are fixed we should switch to dispatching all stress tail calls via a helper.
Correct, that part is not changed. |
We gain two things from tailcallstress:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again. This is a big improvement in tail call stress reliability.
jitstress failures are #40751. |
Improve validation of tail calls that are not tail-prefixed in the IL
but are marked as such because of TailCallStress. We now do the same
correctness validation in morph for such tail calls as we do for
implicit tail calls. That blocks tail calls when we have address-taken
locals, struct promoted params, and pinned vars.
Fixes #39398.
Fixes #39309.
Fixes #38892.
Fixes #38889.
Fixes #38887.
Fixes #37117.
Fixes #8017.