Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Devirtualization and inlining for GVM #112353

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

hez2010
Copy link
Contributor

@hez2010 hez2010 commented Feb 10, 2025

We use FindOrCreateAssociatedMethodDesc with allowInstParam=false to get the exact MethodDesc for a devirted GVM.

While the resulted MethodDesc may be an instantiation stub, so either

  • an InstParam that can be fetched from WrappedMethodDesc if it doesn't require a runtime lookup
  • a RUNTIMELOOKUP node we created for ldvirtftn if it requires a runtime lookup (in this case FindOrCreateAssociatedMethodDesc will yield us a method handle that has shared generics in the method instantiation).

So if we see a RUNTIMELOOKUP, we push it to the call args as the InstParam, otherwise we can use the instantiated entry of the WrappedMethodDesc as the InstParam.

Also extend the late devirt a bit to support devirting generic virtual methods.

To make sure we can get the method handle or RUNTIMELOOKUP, we need to stop spilling ldvirtftn. However, NAOT and lowering is not happy about gtCallAddr being a CALL, so we split the call after inlining is done.

NativeAOT/R2R is not support yet (due to fat function pointers).

Example:

VirtualGenericClass c = new IntProcessor();
c.Process(42);

public class VirtualGenericClass
{
    public virtual void Process<T>(T item)
    {
        Console.WriteLine(item.ToString());
    }
}

public class IntProcessor : VirtualGenericClass
{
    public override void Process<T>(T item)
    {
        Console.WriteLine(item.ToString());
    }
}

Before:

G_M27646_IG01:  ;; offset=0x0000
       push     rbx
       sub      rsp, 32
                                                ;; size=5 bbWeight=1 PerfScore 1.25
G_M27646_IG02:  ;; offset=0x0005
       mov      rcx, 0x7FF7C5D68548      ; IntProcessor
       call     CORINFO_HELP_NEWSFAST
       mov      rbx, rax
       mov      rcx, rbx
       mov      rdx, 0x7FF7C5D68350      ; VirtualGenericClass
       mov      r8, 0x7FF7C5D686B8      ; token handle
       call     CORINFO_HELP_VIRTUAL_FUNC_PTR
       mov      rcx, rbx
       mov      edx, 42
       call     rax
       nop
                                                ;; size=57 bbWeight=1 PerfScore 7.00
G_M27646_IG03:  ;; offset=0x003E
       add      rsp, 32
       pop      rbx
       ret
                                                ;; size=6 bbWeight=1 PerfScore 1.75

After:

G_M27646_IG01:  ;; offset=0x0000
       sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M27646_IG02:  ;; offset=0x0004
       mov      ecx, 42
       call     [System.Number:Int32ToDecStr(int):System.String]
       mov      rcx, rax
       call     [System.Console:WriteLine(System.String)]
       nop
                                                ;; size=21 bbWeight=1 PerfScore 6.75
G_M27646_IG03:  ;; offset=0x0019
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

/cc: @jakobbotsch @AndyAyersMS

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 10, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 10, 2025
@hez2010
Copy link
Contributor Author

hez2010 commented Feb 10, 2025

@MihuBot

@EgorBo
Copy link
Member

EgorBo commented Feb 10, 2025

Looks like CI failures are related? also, the diffs look more like inliner actually no longer inlines what it used to?

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 10, 2025

Looks like CI failures are related?

They should be fixed with 4b809c8, where we must not allow a method InstParam present on the devirted call when the method itself is generic.

the diffs look more like inliner actually no longer inlines what it used to

Not sure about this for now, may need more investigation. Let's see the latest run.

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 10, 2025

Seems that the VM can dispatch a generic method if there's no method InstParam, so that this would enable us devirt while not inlining a GVM.

@hez2010 hez2010 changed the title JIT: Initial devirtualization and inlining for GVM JIT: Devirtualization and inlining for GVM Feb 10, 2025
@hez2010
Copy link
Contributor Author

hez2010 commented Feb 10, 2025

Devirtualizing GVM without inlining is supported now.
@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 10, 2025

@MihuBot

@EgorBo
Copy link
Member

EgorBo commented Feb 11, 2025

No spmi diffs and only 1 method affected in jit-diffs. Can't tell it's a big improvement for that method, but it definitely implies a poor test coverage for this optimization. Since changes around method resolving are always risky - do we want to proceed? Is there a real-world motivation behind this?

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 11, 2025

No spmi diffs and only 1 method affected in jit-diffs.

Isn't it because the large missing contexts?

@EgorBo
Copy link
Member

EgorBo commented Feb 11, 2025

No spmi diffs and only 1 method affected in jit-diffs.

Isn't it because the large missing contexts?

Fair point. Although, missing contexts alone don't tell anything about the diffs. So far, we only have a single evidence. Presumably you can run a local collection of a SPMI collection where you do all the calls to create contexts for them

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 11, 2025

@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 11, 2025

I need to revise the implementation, turning to draft for now.

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 13, 2025

@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 13, 2025

@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 14, 2025

I think it's in a great form now, while seems that it still has some bad interactions with boxed structs during inlining, need to investigate...

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 14, 2025

Seems that lowering on windows-x86 is unhappy about not spilling the CORINFO_HELP_VIRTUAL_FUNC_PTR call:

Assert failure(PID 61084 [0x0000ee9c], Thread: 40756 [0x9f34]): Assertion failed 'node->GetRegNum() != REG_NA' in 'GenInstance`2[System.__Canon,int]:VirtForward[System.__Canon,System.__Canon](System.__Canon,int,System.__Canon,System.__Canon):System.String:this' during 'Lowering nodeinfo' (IL size 23; hash 0xb3dc36bf; FullOpts)

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 15, 2025

@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 19, 2025

@jakobbotsch can you trigger a pri1/outerloop and pgostress run on this PR? I would like to verify my prototype doesn't break anything so that I can move forward with a polished implementation.

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 20, 2025

coreclr outerloop failures are unrelated (as there's no GVM at all).
libraries-jitstress failures are related, need further investigation.

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 20, 2025

Minimal repro:

ParallelQuery<int> query = Array.Empty<int>().AsParallel();
var cast1 = query.Cast<string>();
var cast2 = cast1.Cast<int>();
Use(cast2);

[MethodImpl(MethodImplOptions.NoInlining)]
static void Use<T>(T obj) { }

It will throw

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Linq.ParallelQuery`1.Cast[TCastTo]() in D:\runtime\src\libraries\System.Linq.Parallel\src\System\Linq\Parallel\Enumerables\ParallelQuery.cs:line 104
   at Program.Main() in D:\source\repos\Test\GvmDevirt\Program.cs:line ***

The tree after inlining is

------------ BB01 [0000] [000..000) -> BB03(1) (always), preds={} succs={BB03}

***** BB01 [0000]
STMT00008 ( 0x000[E-] ... ??? )
               [000031] DAC-G------                         *  STORE_LCL_VAR ref    V01 tmp1
               [000002] --C-G------                         \--*  CALL      ref    System.Linq.ParallelEnumerable:AsParallel[int](System.Collections.Generic.IEnumerable`1[int]):System.Linq.ParallelQuery`1[int]
               [000017] --CXG------ arg0                       \--*  COMMA     ref
               [000016] H-CXG------                               +--*  CALL help byref  CORINFO_HELP_GET_GCSTATIC_BASE
               [000015] H---------- arg0                          |  \--*  CNS_INT(h) long   0x7ff830c4f868 class System.Array+EmptyArray`1[int]
               [000013] I---G------                               \--*  IND       ref
               [000012] H----------                                  \--*  CNS_INT(h) long   0x1d83f800df0 static Fseq[Value]

------------ BB03 [0003] [000..001) -> BB05(0.5),BB04(0.5) (cond), preds={BB01} succs={BB04,BB05}

***** BB03 [0003]
STMT00009 ( INL03 @ 0x000[E-] ... ??? ) <- INL02 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000035] -----------                         *  JTRUE     void
               [000034] -----------                         \--*  NE        int
               [000020] -----------                            +--*  LCL_VAR   ref    V01 tmp1
               [000033] -----------                            \--*  CNS_INT   ref    null

------------ BB04 [0004] [000..001) -> BB05(1) (always), preds={BB03} succs={BB05}

***** BB04 [0004]
STMT00010 ( INL03 @ 0x003[E-] ... ??? ) <- INL02 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000037] --C-G------                         *  CALL      void   System.ArgumentNullException:Throw(System.String)
               [000036] ----------- arg0                    \--*  CNS_STR   ref   <string constant>

------------ BB05 [0005] [000..001) -> BB02(1) (always), preds={BB03,BB04} succs={BB02}

------------ BB02 [0006] [000..000) -> BB07(1) (always), preds={BB05} succs={BB07}

***** BB02 [0006]
STMT00013 ( 0x000[E-] ... ??? )
               [000051] DACXG------                         *  STORE_LCL_VAR ref    V02 tmp2
               [000024] --CXG------                         \--*  CALL nullcheck ref    System.Linq.ParallelQuery`1[int]:Cast[System.__Canon]():System.Linq.ParallelQuery`1[System.__Canon]:this
               [000025] ----------- this                       +--*  LCL_VAR   ref    V01 tmp1
               [000029] H---------- gctx                       \--*  CNS_INT(h) long   0x7ff830c4fdb0 method System.Linq.ParallelQuery`1[int]:Cast[System.String]():System.Linq.ParallelQuery`1[System.String]:this

------------ BB07 [0008] [000..001) -> BB09(0.5),BB08(0.5) (cond), preds={BB02} succs={BB08,BB09}

***** BB07 [0008]
STMT00014 ( INL05 @ 0x000[E-] ... ??? ) <- INL04 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000055] -----------                         *  JTRUE     void
               [000054] -----------                         \--*  NE        int
               [000040] -----------                            +--*  LCL_VAR   ref    V02 tmp2
               [000053] -----------                            \--*  CNS_INT   ref    null

------------ BB08 [0009] [000..001) -> BB09(1) (always), preds={BB07} succs={BB09}

***** BB08 [0009]
STMT00015 ( INL05 @ 0x003[E-] ... ??? ) <- INL04 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000057] --C-G------                         *  CALL      void   System.ArgumentNullException:Throw(System.String)
               [000056] ----------- arg0                    \--*  CNS_STR   ref   <string constant>

------------ BB09 [0010] [000..001) -> BB06(1) (always), preds={BB07,BB08} succs={BB06}

------------ BB06 [0011] [000..01A) (return), preds={BB09} succs={}

***** BB06 [0011]
STMT00004 ( 0x000[E-] ... ??? )
               [000009] --C-G------                         *  CALL      void   Program:Use[System.__Canon](System.__Canon)
               [000010] H---------- gctx                    +--*  CNS_INT(h) long   0x7ff830c4f6e0 method Program:Use[System.Linq.ParallelQuery`1[int]](System.Linq.ParallelQuery`1[int])
               [000044] --CXG------ arg1                    \--*  CALL nullcheck ref    System.Linq.ParallelQuery`1[System.__Canon]:Cast[int]():System.Linq.ParallelQuery`1[int]:this
               [000045] ----------- this                       +--*  LCL_VAR   ref    V02 tmp2
               [000049] H---------- gctx                       \--*  CNS_INT(h) long   0x7ff830c80748 method System.Linq.ParallelQuery`1[System.__Canon]:Cast[int]():System.Linq.ParallelQuery`1[int]:this

***** BB06 [0011]
STMT00005 ( 0x019[E-] ... ??? )
               [000011] -----------                         *  RETURN    void

I don't see any problem with it.

If I insert a no-op between cast1 and cast2 like Use(42), the issue will disappear:

ParallelQuery<int> query = Array.Empty<int>().AsParallel();
var cast1 = query.Cast<string>();
Use(42);
var cast2 = cast1.Cast<int>();
Use(cast2);

The tree this time after inlining is:

------------ BB01 [0000] [000..000) -> BB03(1) (always), preds={} succs={BB03}

***** BB01 [0000]
STMT00010 ( 0x000[E-] ... ??? )
               [000035] DAC-G------                         *  STORE_LCL_VAR ref    V02 tmp2
               [000002] --C-G------                         \--*  CALL      ref    System.Linq.ParallelEnumerable:AsParallel[int](System.Collections.Generic.IEnumerable`1[int]):System.Linq.ParallelQuery`1[int]
               [000021] --CXG------ arg0                       \--*  COMMA     ref
               [000020] H-CXG------                               +--*  CALL help byref  CORINFO_HELP_GET_GCSTATIC_BASE
               [000019] H---------- arg0                          |  \--*  CNS_INT(h) long   0x7ff827edf958 class System.Array+EmptyArray`1[int]
               [000017] I---G------                               \--*  IND       ref
               [000016] H----------                                  \--*  CNS_INT(h) long   0x2d007400df0 static Fseq[Value]

------------ BB03 [0003] [000..001) -> BB05(0.5),BB04(0.5) (cond), preds={BB01} succs={BB04,BB05}

***** BB03 [0003]
STMT00011 ( INL03 @ 0x000[E-] ... ??? ) <- INL02 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000039] -----------                         *  JTRUE     void
               [000038] -----------                         \--*  NE        int
               [000024] -----------                            +--*  LCL_VAR   ref    V02 tmp2
               [000037] -----------                            \--*  CNS_INT   ref    null

------------ BB04 [0004] [000..001) -> BB05(1) (always), preds={BB03} succs={BB05}

***** BB04 [0004]
STMT00012 ( INL03 @ 0x003[E-] ... ??? ) <- INL02 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000041] --C-G------                         *  CALL      void   System.ArgumentNullException:Throw(System.String)
               [000040] ----------- arg0                    \--*  CNS_STR   ref   <string constant>

------------ BB05 [0005] [000..001) -> BB02(1) (always), preds={BB03,BB04} succs={BB02}

------------ BB02 [0006] [000..016) -> BB07(1) (always), preds={BB05} succs={BB07}

***** BB02 [0006]
STMT00004 ( 0x000[E-] ... ??? )
               [000009] DAC--------                         *  STORE_LCL_VAR ref    V01 tmp1
               [000028] --CXG------                         \--*  CALL nullcheck ref    System.Linq.ParallelQuery`1[int]:Cast[System.__Canon]():System.Linq.ParallelQuery`1[System.__Canon]:this
               [000029] ----------- this                       +--*  LCL_VAR   ref    V02 tmp2
               [000033] H---------- gctx                       \--*  CNS_INT(h) long   0x7ff827edfea0 method System.Linq.ParallelQuery`1[int]:Cast[System.String]():System.Linq.ParallelQuery`1[System.String]:this

***** BB02 [0006]
STMT00003 ( 0x000[E-] ... ??? )
               [000008] --C-G------                         *  CALL      void   Program:Use[int](int)
               [000007] ----------- arg0                    \--*  CNS_INT   int    42

------------ BB07 [0008] [016..017) -> BB09(0.5),BB08(0.5) (cond), preds={BB02} succs={BB08,BB09}

***** BB07 [0008]
STMT00015 ( INL05 @ 0x000[E-] ... ??? ) <- INL04 @ 0x000[E-] <- INLRT @ 0x016[--]
               [000057] -----------                         *  JTRUE     void
               [000056] -----------                         \--*  NE        int
               [000010] -----------                            +--*  LCL_VAR   ref    V01 tmp1
               [000055] -----------                            \--*  CNS_INT   ref    null

------------ BB08 [0009] [016..017) -> BB09(1) (always), preds={BB07} succs={BB09}

***** BB08 [0009]
STMT00016 ( INL05 @ 0x003[E-] ... ??? ) <- INL04 @ 0x000[E-] <- INLRT @ 0x016[--]
               [000059] --C-G------                         *  CALL      void   System.ArgumentNullException:Throw(System.String)
               [000058] ----------- arg0                    \--*  CNS_STR   ref   <string constant>

------------ BB09 [0010] [016..017) -> BB06(1) (always), preds={BB07,BB08} succs={BB06}

------------ BB06 [0011] [016..021) (return), preds={BB09} succs={}

***** BB06 [0011]
STMT00006 ( 0x016[--] ... ??? )
               [000013] --C-G------                         *  CALL      void   Program:Use[System.__Canon](System.__Canon)
               [000014] H---------- gctx                    +--*  CNS_INT(h) long   0x7ff827edf7d0 method Program:Use[System.Linq.ParallelQuery`1[int]](System.Linq.ParallelQuery`1[int])
               [000047] --CXG------ arg1                    \--*  CALL nullcheck ref    System.Linq.ParallelQuery`1[System.__Canon]:Cast[int]():System.Linq.ParallelQuery`1[int]:this
               [000048] ----------- this                       +--*  LCL_VAR   ref    V01 tmp1
               [000052] H---------- gctx                       \--*  CNS_INT(h) long   0x7ff827f10898 method System.Linq.ParallelQuery`1[System.String]:Cast[int]():System.Linq.ParallelQuery`1[int]:this

***** BB06 [0011]
STMT00007 ( 0x020[E-] ... ??? )
               [000015] -----------                         *  RETURN    void

I don't see there's any subtle difference between the two trees, so I'm puzzled.
Can this be exposing an preexisting JIT bug? @jakobbotsch

@jakobbotsch
Copy link
Member

I don't see there's any subtle difference between the two trees, so I'm puzzled.

The generic context for the second Cast call looks wrong in the first case.

-               [000049] H---------- gctx                       \--*  CNS_INT(h) long   0x7ff830c80748 method System.Linq.ParallelQuery`1[System.__Canon]:Cast[int]():System.Linq.ParallelQuery`1[int]:this
+               [000052] H---------- gctx                       \--*  CNS_INT(h) long   0x7ff827f10898 method System.Linq.ParallelQuery`1[System.String]:Cast[int]():System.Linq.ParallelQuery`1[int]:this

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 20, 2025

The generic context for the second Cast call looks wrong in the first case.

Yeah I found it out right after you post the observation :)

The codegen diff is https://www.diffchecker.com/6hQxtIfb/

Especially,

-       mov      rcx, rbx
-       mov      rdx, 0x7FF830C50898      ; System.Linq.ParallelQuery`1[System.String]:Cast[int]():System.Linq.ParallelQuery`1[int]:this
+       mov      rcx, rax
+       mov      rdx, 0x7FF830C60748      ; System.Linq.ParallelQuery`1[System.__Canon]:Cast[int]():System.Linq.ParallelQuery`1[int]:this

It seems that we somehow lost the concrete type context in some cases, the generic context being passed to impDevirtualizeCall may be incorrect. Now figuring out a fix for it.

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 20, 2025

Doesn't seem like a jit issue, instead it should be handled in VM, where we need to fetch type instantiations for the base method based on generic context. This issue can happen when the derived method table is not generic while the base method table is a shared generics.

@hez2010
Copy link
Contributor Author

hez2010 commented Mar 3, 2025

@jakobbotsch The failing tests are the ones I previously mentioned on Discord. All of them are caused by stopping spilling ldvirtftn, and can be resolved by splitting GVM calls that fail to devirtualize, rather than keeping the helper call in gtCallAddr.

  1. NativeAOT failure
    Changing CHECK_SPILL_NONE to CHECK_SPILL_ALL around // TODO-Bug: CHECK_SPILL_NONE here looks wrong. doesn't help.
Unhandled exception. System.AggregateException: An error occurred while writing to logger(s). (Object reference not set to an instance of an object.)
 ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.Extensions.Logging.Logger.<Log>g__LoggerLog|14_0[TState](LogLevel, EventId, ILogger, Exception, Func`3, List`1&, TState&) + 0xa5
   --- End of inner exception stack trace ---
   at Microsoft.Extensions.Logging.Logger.ThrowLoggingError(List`1) + 0x32
   at Microsoft.Extensions.Logging.Logger.Log[TState](LogLevel, EventId, TState, Exception, Func`3) + 0xb4
   at Microsoft.Extensions.Logging.LoggerExtensions.Log(ILogger, LogLevel, EventId, Exception, String, Object[]) + 0xc8
   at Program.<Main>d__0.MoveNext() + 0x1b0
--- End of stack trace from previous location ---
   at Program.<Main>() + 0x28
  1. Lower assertion Assertion failed 'node->GetRegNum() != REG_NA' while lowering tail call with jit helper.
Assert failure(PID 10808 [0x00002a38], Thread: 12248 [0x2fd8]): Assertion failed 'node->GetRegNum() != REG_NA' in 'Runtime_87393.Bar:M2[int](int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int' during 'Lowering nodeinfo' (IL size 41; hash 0x12b37abb; FullOpts)

  File: D:\a\_work\1\s\src\coreclr\jit\lower.cpp:8702
  Image: C:\h\w\BA6909B4\p\corerun.exe

Otherwise this PR should be ready (and jitstress should pass as well).

Stop spilling ldvirtftn also introduces a lot of regressions especially on tier-0, should we always spill it when optimization is disabled? Or should we simply split all GVM calls that are failed to devirtualize in a separate phase so that we can avoid the regression and the lower issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants