Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address APX requirements for GC and VM stubs #112587

Open
anthonycanino opened this issue Feb 14, 2025 · 14 comments
Open

Address APX requirements for GC and VM stubs #112587

anthonycanino opened this issue Feb 14, 2025 · 14 comments
Assignees
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@anthonycanino
Copy link
Contributor

There are a number of asm stubs in some of the GC, PInvoke, and NativeAOT code.

I'm working to establish if APX extended GPRs (EGPRS) need to be added to some of the state saving --- for what I currently see, most of the registers that are manipulated in some of this code, src/coreclr/nativeaot/Runtime/amd64/PInvoke.asm and src/coreclr/nativeaot/Runtime/amd64/GcProbe.asm only need to address non-volatile registers --- all of the EGPRs are volatile.

My question is, is there any other asm points in the code, perhaps on gc transitions etc., where register state may need to be explictly saved for Windows? I am aware most of the state saving is done with xsave, but looking to make sure all dependencies are handled.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 14, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 14, 2025
@anthonycanino
Copy link
Contributor Author

Also, are there any similar requirements with gcinfo? I see that some of the debugging dump code dumps all registers.

@anthonycanino
Copy link
Contributor Author

@dotnet/avx512-contrib

@jkotas @janvorli @MichalStrehovsky is this something you can help me gain some clarity on?

@BruceForstall BruceForstall added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI apx Related to the Intel Advanced Performance Extensions (APX) labels Feb 14, 2025
@BruceForstall BruceForstall added this to the 10.0.0 milestone Feb 14, 2025
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Feb 14, 2025
@BruceForstall BruceForstall removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 14, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BruceForstall
Copy link
Member

Also, are there any similar requirements with gcinfo? I see that some of the debugging dump code dumps all registers.

It looks like gc info encoding/decoding is resilient to variable number of registers (other platforms that use the common gc info encoder have different numbers of registers). You will probably find cases like

_ASSERTE(regNum >= 0 && regNum <= 16);
where there is an assert that the register number is <= 16. It is good that the new eGPRs are defined with numbers in the JIT consecutive above the previous GPRs. There must be code where the GC info is decoded to get a register number, and takes that register number and extracts the corresponding register value from a saved VM context record, to use that value as a GC root during GC processing. I don't know where that lookup is.

@anthonycanino
Copy link
Contributor Author

Thanks helpful thanks Bruce. Looks to have some encoding from the JIT here:

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/gcencode.cpp#L4639-L4661

I will have to dig more to understand the decode side during GC.

@BruceForstall
Copy link
Member

Thanks helpful thanks Bruce. Looks to have some encoding from the JIT here:

Yes, and note that the JIT register numbers are assumed to be equivalent to the register numbers used by the VM.

@jkotas
Copy link
Member

jkotas commented Feb 15, 2025

There must be code where the GC info is decoded to get a register number, and takes that register number and extracts the corresponding register value from a saved VM context record

This code is here:

OBJECTREF* GcInfoDecoder::GetRegisterSlot(
int regNum,
PREGDISPLAY pRD
)
{
_ASSERTE(regNum >= 0 && regNum <= 30);
_ASSERTE(regNum != 18); // TEB
#ifdef FEATURE_NATIVEAOT
PTR_uintptr_t* ppReg = &pRD->pX0;
return (OBJECTREF*)*(ppReg + regNum);
#else
DWORD64 **ppReg;
if(regNum <= 17)
{
ppReg = &pRD->volatileCurrContextPointers.X0;
return (OBJECTREF*)*(ppReg + regNum);
}
else if(regNum == 29)
{
return (OBJECTREF*) pRD->pCurrentContextPointers->Fp;
}
else if(regNum == 30)
{
return (OBJECTREF*) pRD->pCurrentContextPointers->Lr;
}
ppReg = &pRD->pCurrentContextPointers->X19;
return (OBJECTREF*)*(ppReg + regNum-19);
#endif
}

You will also need to expand the context tracking structures. You can use Arm64 handling or X0-X17 registers as a blue-print:

PDWORD64 X0;
. NativeAOT variant: https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/Runtime/regdisplay.h#L132-L142 .

An alternative to doing all this work would be avoid storing GC tracked references in these registers. SIMD registers are on a plan like that. I assume that you do not want to do this - just mentioning this option for completeness.

@BruceForstall
Copy link
Member

fyi, the x64 GetRegisterSlot is here:

OBJECTREF* GcInfoDecoder::GetRegisterSlot(
int regNum,
PREGDISPLAY pRD
)
{
_ASSERTE(regNum >= 0 && regNum <= 16);
_ASSERTE(regNum != 4); // rsp
#ifdef FEATURE_NATIVEAOT
PTR_uintptr_t* ppRax = &pRD->pRax;
if (regNum > 4) regNum--; // rsp is skipped in NativeAOT RegDisplay
#else
// The fields of KNONVOLATILE_CONTEXT_POINTERS are in the same order as
// the processor encoding numbers.
ULONGLONG **ppRax = &pRD->pCurrentContextPointers->Rax;
#endif
return (OBJECTREF*)*(ppRax + regNum);
}

An alternative to doing all this work would be avoid storing GC tracked references in these registers.

That's possibly a reasonable short-term solution if necessary, but it's likely we want to treat the eGPR as equivalent to the current GPR in all ways, long-term.

@anthonycanino
Copy link
Contributor Author

Thank you for the information @jkotas . I see what would need to be updated.

As part of my digging, I came across https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/amd64/Context.asm#L19-L46
which looks to be an asm stub for restoring register I assume on a context switch.

I am slightly confused by the naming as ClrRestoreNonvolatileContextWorker seems to imply non-volatile registers, but some of the registers being restore look to be volatile https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#callercallee-saved-registers .

I ask because, if the EGPRs would also need to be restored in these kinds of stubs, we may need an windows assembler with APX support.

@jkotas
Copy link
Member

jkotas commented Feb 19, 2025

I am slightly confused by the naming as ClrRestoreNonvolatileContextWorker seems to imply non-volatile registers, but some of the registers being restore look to be volatile

This method is used to jump to catch funclets among other things. The catch funclets take arguments in rcx and rdx (https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md#the-pspsym-and-funclet-parameters). That's why this method restores these two extra volatile registers.

I agree that the naming is confusing. Could you please add some comments to the code for next person wondering about this?

@anthonycanino
Copy link
Contributor Author

I agree that the naming is confusing. Could you please add some comments to the code for next person wondering about this?

Done: #112758.

I have been looking to see if there are any stubs or code that would need to be updated regarding pinvoke: it's possible that an invoked function could trash the extended registers (which are volatile). Is it up to the invoking function of a pinvoke call to handle this as part of the codegen?

@jkotas
Copy link
Member

jkotas commented Feb 20, 2025

Is it up to the invoking function of a pinvoke call to handle this as part of the codegen?

Right, nothing special here. Same rules as method calls in regular C/C++.

@anthonycanino
Copy link
Contributor Author

Thanks @jkotas . This was very helpful.

@BruceForstall going to suggest to keep the issue open as it will track these changes we have to make for APX (I can see its already listed as a work item on our main APX issue).

@JulieLeeMSFT
Copy link
Member

@anthonycanino, please feel free to reassign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants