Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolating JITDbl2Ulng helper changes #86175

Closed

Conversation

khushal1996
Copy link
Contributor

@khushal1996 khushal1996 commented May 12, 2023

Draft PR for testing purposes. No need for review at this time.

Isolating the helper function changes to make sure that the helper function is working fine. This is w.r.t. the draft PR #84384

This PR optimize the following cases:


Case Previous Code Optimized Instruction
float -> ulong CORINFO_HELP_DBL2ULNG Helper vcvttss2usi
public static UInt64 FloatToULong(float val)
{
    return (UInt64)val;
}

Assembly before optimization

G_M22196_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
       C5F877               vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25
G_M22196_IG02:              ;; offset=0007H
       62F17E085AC0         vcvtss2sd xmm0, xmm0
       E87E57815E           call     CORINFO_HELP_DBL2ULNG
       90                   nop      
						;; size=12 bbWeight=1 PerfScore 5.25
G_M22196_IG03:              ;; offset=0013H
       4883C428             add      rsp, 40
       C3                   ret      
						;; size=5 bbWeight=1 PerfScore 1.25

Assembly afteroptimization

G_M22196_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M22196_IG02:              ;; offset=0003H
       62F1FE0878C0         vcvttss2usi rax, xmm0
						;; size=6 bbWeight=1 PerfScore 6.00
G_M22196_IG03:              ;; offset=0009H
       C3                   ret

Case Previous Code Optimized Instruction
double -> ulong CORINFO_HELP_DBL2ULNG Helper vcvttsd2usi
public static UInt64 DoubleToULong(double val)
{
    return (UInt64)val;
}

Assembly before optimization

G_M30068_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
       C5F877               vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25
G_M30068_IG02:              ;; offset=0007H
       E874577F5E           call     CORINFO_HELP_DBL2ULNG
       90                   nop      
						;; size=6 bbWeight=1 PerfScore 1.25
G_M30068_IG03:              ;; offset=000DH
       4883C428             add      rsp, 40
       C3                   ret

Assembly afteroptimization

G_M30068_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M30068_IG02:              ;; offset=0003H
       62F1FF0878C0         vcvttsd2usi rax, xmm0
						;; size=6 bbWeight=1 PerfScore 5.00
G_M30068_IG03:              ;; offset=0009H
       C3                   ret

Case Previous Code Optimized Instruction
ulong -> double vcvtsi2sd vcvtusi2sd
public static double UIntToDouble(UInt64 val)
{
    return (double)val;
}

Assembly before optimization

G_M33997_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M33997_IG02:              ;; offset=0003H
       62F17C0857C0         vxorps   xmm0, xmm0
       62F1FF082AC1         vcvtsi2sd  xmm0, rcx
       4885C9               test     rcx, rcx
       7D0A                 jge      SHORT G_M33997_IG03
       62F1FF08580502000000 vaddsd   xmm0, qword ptr [reloc @RWD00]
						;; size=27 bbWeight=1 PerfScore 12.58
G_M33997_IG03:              ;; offset=001EH
       C3                   ret

Assembly afteroptimization

G_M33997_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M33997_IG02:              ;; offset=0003H
       62F1FF087BC1         vcvtusi2sd xmm0, rcx
						;; size=6 bbWeight=1 PerfScore 4.00
G_M33997_IG03:              ;; offset=0009H
       C3                   ret

…vtsd2usi uses ulong.max_value to show FPE for negative, NAN and ulong_max + 1 values.
…architecture. This is because we have changed the JITDbl2Ulng helper function to mimic the new IEEE compliant AVX512 instruction vcvtsd2usi. In the process, we needed to update the library test case because the default Floating Point Error (FPE) value for the new instruction is different from the default MSVC FPE value i.e. 0.
…not changing the library test case but the API to make sure NaN cases are handled.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 12, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label May 12, 2023
@ghost
Copy link

ghost commented May 12, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Draft PR for testing purposes. No need for review at this time.

Author: khushal1996
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

… a special handling for vcvttss2usi64 to make sure we read only dword instead of qword for float to ulong conversion
@ghost ghost locked as resolved and limited conversation to collaborators Jun 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant