Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not copy XState other than AVX when redirecting for GC stress #65825

Merged
merged 6 commits into from
Feb 25, 2022

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Feb 24, 2022

GC stress mechanism relies on exceptions thrown from random locations in managed code.

The context that OS passes for the exception location may contain whatever XState that happens to be dirty. However only AVX matters in managed code.
Besides, the redirection context that stores the state may not have space to save other XState features.

To avoid dealing with the state that we do not expect, we will indicate via the feature mask that only AVX state is valid in the provided context.

Fixes: #65776

@VSadov
Copy link
Member Author

VSadov commented Feb 24, 2022

@janvorli - could you check that the fix actually resolves the issue? I could not reproduce it. I think the issue requires your other changes.

@janvorli
Copy link
Member

It doesn't work, the CopyContext still fails with this change.

@janvorli
Copy link
Member

janvorli commented Feb 24, 2022

It seems that the SetXStateFeatureMask is an "or" operation. I've called GetXStateFeatureMask after the SetXStateFeatureMask you've added and the mask was the same as before the set (0xA3 in my case).
I was wrong, the reason why it doesn't work is different. The mask that the GetXStateFeaturesMask returned - the 0xA3 - doesn't contain XSTATE_MASK_AVX. This mask contains:
XSTATE_LEGACY_FLOATING_POINT
XSTATE_LEGACY_SSE
XSTATE_AVX512_KMASK
XSTATE_AVX512_ZMM

@VSadov
Copy link
Member Author

VSadov commented Feb 24, 2022

Ah, yes, since it is picking whatever is dirty, it is possible that there is XState, but without AVX. I need to apply the mask unconditionally.

@janvorli
Copy link
Member

But what would it copy then when the source context doesn't contain AVX? And it would drop the AVX512 state possibly resulting in floating point state corruption, wouldn't it? It still seems to me that allocating the context with everything is a better way.

@VSadov
Copy link
Member Author

VSadov commented Feb 24, 2022

Since IP is in managed code, only AVX could have live data. If that is not present, we will only copy the legacy part.

The AVX2 vs. AVX512 is confusing. I keep mixing them up. I believe managed code may use AVX2 (i.e. AVX256), but that is included in AVX feature in XState, but not AVX512.

When AVX512 is supported we will have to make the context larger and start saving/restoring AVX512 feature as well.

@@ -1997,15 +1997,6 @@ CONTEXT* AllocateOSContextHelper(BYTE** contextBuffer)
pfnInitializeContext2(buffer, context, &pOSContext, &contextSize, xStateCompactionMask):
InitializeContext(buffer, context, &pOSContext, &contextSize);

// if AVX is supported set the appropriate features mask in the context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this removed, the code above that sets the supportAVX based on the FeatureMask can be removed too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, even before supportAVX was redundant, since SetXStateFeaturesMask performs the same check - to filter out what is not supported.

@janvorli
Copy link
Member

I can confirm that the current state of the change works.

// This should not normally fail.
// The system silently ignores any feature specified in the FeatureMask
// which is not enabled on the processor.
bRes &= SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have one nit - I would prefer failures / asserts on each of the failure instead of combining the results together.

Copy link
Member Author

@VSadov VSadov Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetXStateFeaturesMask and SetXStateFeaturesMask basically cannot fail. They are very tolerant to the input.
To get failures you'd need something like pass context flags for a wrong architecture (like arm64). As-is that would not compile.

Realistically only CopyContext is an API with complex requirements and validation that could fail here due to some mistake.
I think in this particular case individual asserts/returns would not add a lot of value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just found those would fail also when passed context without any XSTATE present. It seems like something that could possibly happen for the usage in Thread::RedirectCurrentThreadAtHandledJITCase where we pass it a general context that we get from the vectored exception handler. So it seems that it would be nice to actually make that work in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my read on that was that Set(Get) would be a noop, but I think you are right, if no XState is set it may fail. Arguably a bug in the validation of Set not accepting 0.

Yes, we should make this case work.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@VSadov
Copy link
Member Author

VSadov commented Feb 25, 2022

linux DAC build failures happen in other PRs as well.

@VSadov VSadov merged commit 1857d04 into dotnet:main Feb 25, 2022
@VSadov VSadov deleted the gcCov branch February 25, 2022 01:23
VSadov added a commit to VSadov/runtime that referenced this pull request Mar 2, 2022
…net#65825)

* Do not copy XState other than AVX

* #if defined(TARGET_X86) || defined(TARGET_AMD64)

* mask XState unconditionally

* Ensure XSTATE_MASK_AVX is set before calling EEGetThreadContext

* redundant supportsAVX, more clear comment

* PR feedback
agocke pushed a commit to agocke/runtime that referenced this pull request Mar 8, 2022
…net#65825)

* Do not copy XState other than AVX

* #if defined(TARGET_X86) || defined(TARGET_AMD64)

* mask XState unconditionally

* Ensure XSTATE_MASK_AVX is set before calling EEGetThreadContext

* redundant supportsAVX, more clear comment

* PR feedback
carlossanlop pushed a commit that referenced this pull request Mar 8, 2022
…#66120)

* Use CopyContext to restore saved context on X86 (#65490)

* Use CopyContext to restore saved context on X86

* PR feedback

* more PR feedback

* Do not copy XState other than AVX when redirecting for GC stress (#65825)

* Do not copy XState other than AVX

* #if defined(TARGET_X86) || defined(TARGET_AMD64)

* mask XState unconditionally

* Ensure XSTATE_MASK_AVX is set before calling EEGetThreadContext

* redundant supportsAVX, more clear comment

* PR feedback

* null-check the redirect context before using. (#65910)

* null-check the redirect context before using.

* tweak the comment

* do not allocate context if InitializeContext has unexpected results.

* EE Suspension on x86 should use RtlRestoreContext when available (#65878)

* RestoreContextSimulated

* probe for RtlRestoreContext

* ntdll.dll

* restore self-trap sequence

* PR feedback

* Clarify CopyContext in RedirectedHandledJITCaseExceptionFilter

* simpler indentation.

* restore last error on the legacy path.

* Update src/coreclr/vm/threads.h

Co-authored-by: Dan Moseley <[email protected]>

Co-authored-by: Dan Moseley <[email protected]>
carlossanlop pushed a commit that referenced this pull request Mar 15, 2022
* Use CopyContext to restore saved context on X86 (#65490)

* Use CopyContext to restore saved context on X86

* PR feedback

* more PR feedback

* Do not copy XState other than AVX when redirecting for GC stress (#65825)

* Do not copy XState other than AVX

* #if defined(TARGET_X86) || defined(TARGET_AMD64)

* mask XState unconditionally

* Ensure XSTATE_MASK_AVX is set before calling EEGetThreadContext

* redundant supportsAVX, more clear comment

* PR feedback

* null-check the redirect context before using. (#65910)

* null-check the redirect context before using.

* tweak the comment

* do not allocate context if InitializeContext has unexpected results.

* EE Suspension on x86 should use RtlRestoreContext when available (#65878)

* RestoreContextSimulated

* probe for RtlRestoreContext

* ntdll.dll

* restore self-trap sequence

* PR feedback

* Clarify CopyContext in RedirectedHandledJITCaseExceptionFilter

* simpler indentation.

* restore last error on the legacy path.

* Update src/coreclr/vm/threads.h

Co-authored-by: Dan Moseley <[email protected]>

Co-authored-by: Vladimir Sadov <[email protected]>
Co-authored-by: Dan Moseley <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CopyContext in Thread::RedirectCurrentThreadAtHandledJITCase sometimes fails
2 participants