Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate RyuJIT logic for generating return blocks #8406

Open
erozenfeld opened this issue Jun 26, 2017 · 2 comments
Open

Evaluate RyuJIT logic for generating return blocks #8406

erozenfeld opened this issue Jun 26, 2017 · 2 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization Priority:3 Work that is nice to have tenet-performance Performance related issue
Milestone

Comments

@erozenfeld
Copy link
Member

erozenfeld commented Jun 26, 2017

RyuJIT may generate multiple return blocks unnecessarily. The current logic allows a method to have up to 4 separate return blocks unless a profiler hook is needed, or the method is marked synchronized, or the method calls unmanaged code, or REVERSE_PINVOKE_{ENTER,EXIT} helper calls need to be inserted in prolog/epilog.

The legacy 64-bit jit tried to generate a single return path whenever possible. Here is an example where it would make sense:

using System;
using System.Diagnostics;

namespace getChars
{
    abstract class Encoding
    {
        abstract public unsafe int GetBytes(char* charsStart, int charCount, byte* bytes, int byteCount);
    }

    class MyEncoding : Encoding
    {
        public override unsafe int GetBytes(char* charsStart, int charCount, byte* bytes, int byteCount)
        {
            return charCount;
        }
    }

    class MyXMLNodeWriter
    {
        private Encoding encoder;
        public MyXMLNodeWriter()
        {
            encoder = new MyEncoding();
        }

        public unsafe int UnsafeGetUTF8Chars(char *chars, int charCount, byte[] buffer, int offset)
        {
            if (charCount > 0)
            {
                fixed(byte* _bytes = &buffer[offset])
                {
                    byte* bytes = _bytes;
                    byte* bytesMax = &bytes[buffer.Length - offset];
                    char* charsMax = &chars[charCount];

                    while (true)
                    {
                        while (chars < charsMax && *chars < 0x80)
                        {
                            *bytes = (byte)*chars;
                            bytes++;
                            chars++;
                        }

                        if (chars >= charsMax)
                            break;

                        char* charsStart = chars;
                        while (chars < charsMax && *chars >= 0x80)
                        {
                            chars++;
                        }

                        bytes += encoder.GetBytes(charsStart, (int)(chars - charsStart), bytes, (int)(bytesMax - bytes));

                        if (chars >= charsMax)
                            break;
                    }

                    return (int)(bytes - _bytes);
                }
            }

            return 0;
        }
    }
    class Program
    {
        static unsafe void Main(string[] args)
        {
            MyXMLNodeWriter nw = new MyXMLNodeWriter();
            const int charCount = 800;
            char* chars = stackalloc char[charCount];
            const int seed = 101;
            Random rand = new Random(seed);
            for (int i =0; i < charCount; ++i)
            {
                chars[i] = (char)(rand.Next() % 256);
            }

            byte[] buffer = new byte[charCount];
            int result = nw.UnsafeGetUTF8Chars(chars, charCount, buffer, 0);
            Console.WriteLine(result);

            Stopwatch watch = Stopwatch.StartNew();
            double total = 0;
            for (int k = 1; k < 1000; ++k)
            {
                for (int i = 0; i < charCount; ++i)
                {
                    chars[i] = (char)(rand.Next() % 0x80);
                }                
                watch.Restart();
                result = nw.UnsafeGetUTF8Chars(chars, charCount, buffer, 0);                
                total += watch.Elapsed.TotalMilliseconds;
            }
            Console.WriteLine("{0}", total);

            return;
        }
    }
}

Here is the code generated for getChars.MyXMLNodeWriter:UnsafeGetUTF8Chars:

; Assembly listing for method getChars.MyXMLNodeWriter:UnsafeGetUTF8Chars(long,int,ref,int):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 this         [V00,T06] (  3,  6   )     ref  ->  rdi         this class-hnd
;  V01 arg1         [V01,T00] ( 16,136.50)    long  ->  rsi
;  V02 arg2         [V02,T08] (  4,  3.50)     int  ->   r8
;  V03 arg3         [V03,T07] (  6,  4   )     ref  ->   r9         class-hnd
;  V04 arg4         [V04,T10] (  6,  3   )     int  ->  rcx
;  V05 loc0         [V05    ] (  4,  2   )   byref  ->  [rsp+0x28]   must-init pinned
;  V06 loc1         [V06,T02] ( 11, 24   )    long  ->  rbx
;  V07 loc2         [V07,T09] (  3,  5   )    long  ->  rbp
;  V08 loc3         [V08,T01] (  5, 72.50)    long  ->  r14
;  V09 loc4         [V09,T05] (  3, 12   )    long  ->  rdx
;  V10 tmp0         [V10,T12] (  2,  2   )    long  ->  rbx
;  V11 tmp1         [V11,T13] (  2,  2   )    long  ->  rax
;  V12 OutArgs      [V12    ] (  1,  1   )  lclBlk (40) [rsp+0x00]
;  V13 cse0         [V13,T11] (  6,  3   )     int  ->  rdx
;  V14 rat0         [V14,T04] (  3, 24   )    long  ->   r8
;  V15 rat1         [V15,T03] (  3, 24   )     ref  ->  rcx
;
; Lcl frame size = 48

G_M133_IG01:
       4156                 push     r14
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4883EC30             sub      rsp, 48
       33C0                 xor      rax, rax
       4889442428           mov      qword ptr [rsp+28H], rax
       488BF9               mov      rdi, rcx
       488BF2               mov      rsi, rdx
       8B8C2480000000       mov      ecx, dword ptr [rsp+80H]

G_M133_IG02:
       4585C0               test     r8d, r8d
       0F8EB8000000         jle      G_M133_IG11
       418B5108             mov      edx, dword ptr [r9+8]
       3BCA                 cmp      ecx, edx
       0F83B9000000         jae      G_M133_IG13
       4863C1               movsxd   rax, ecx
       4D8D4C0110           lea      r9, bword ptr [r9+rax+16]
       4C894C2428           mov      bword ptr [rsp+28H], r9
       488B5C2428           mov      rbx, bword ptr [rsp+28H]
       2BD1                 sub      edx, ecx
       4863CA               movsxd   rcx, edx
       488D2C19             lea      rbp, [rcx+rbx]
       4D63C0               movsxd   r8, r8d
       4E8D3446             lea      r14, [rsi+2*r8]
       EB0F                 jmp      SHORT G_M133_IG04

G_M133_IG03:
       440FB606             movzx    r8, byte  ptr [rsi]
       448803               mov      byte  ptr [rbx], r8b
       48FFC3               inc      rbx
       4883C602             add      rsi, 2

G_M133_IG04:
       493BF6               cmp      rsi, r14
       7307                 jae      SHORT G_M133_IG05
       66813E8000           cmp      word  ptr [rsi], 128
       72E6                 jb       SHORT G_M133_IG03

G_M133_IG05:
       493BF6               cmp      rsi, r14
       7352                 jae      SHORT G_M133_IG09
       488BD6               mov      rdx, rsi
       EB04                 jmp      SHORT G_M133_IG07

G_M133_IG06:
       4883C602             add      rsi, 2

G_M133_IG07:
       493BF6               cmp      rsi, r14
       7307                 jae      SHORT G_M133_IG08
       66813E8000           cmp      word  ptr [rsi], 128
       73F0                 jae      SHORT G_M133_IG06

G_M133_IG08:
       4C8BC5               mov      r8, rbp
       4C2BC3               sub      r8, rbx
       4489442420           mov      dword ptr [rsp+20H], r8d
       4C8BC6               mov      r8, rsi
       4C2BC2               sub      r8, rdx
       498BC8               mov      rcx, r8
       48C1E93F             shr      rcx, 63
       4903C8               add      rcx, r8
       48D1F9               sar      rcx, 1
       4C8BC1               mov      r8, rcx
       488B4F08             mov      rcx, gword ptr [rdi+8]
       4C8BCB               mov      r9, rbx
       488B01               mov      rax, qword ptr [rcx]
       488B4048             mov      rax, qword ptr [rax+72]
       FF5020               call     qword ptr [rax+32]getChars.Encoding:GetBytes(long,int,long,int):int:this
       4863C0               movsxd   rax, eax
       4803D8               add      rbx, rax
       493BF6               cmp      rsi, r14
       729D                 jb       SHORT G_M133_IG04

G_M133_IG09:
       488B442428           mov      rax, bword ptr [rsp+28H]
       482BD8               sub      rbx, rax
       488BC3               mov      rax, rbx

G_M133_IG10:
       4883C430             add      rsp, 48
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret

G_M133_IG11:
       33C0                 xor      eax, eax

G_M133_IG12:
       4883C430             add      rsp, 48
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret

G_M133_IG13:
       E82069375F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

; Total bytes of code 241, prolog size 30 for method getChars.MyXMLNodeWriter:UnsafeGetUTF8Chars(long,int,ref,int):int:this
; ============================================================

The two return sequences are identical so there is no need for this code bloat.

category:cq
theme:block-layout
skill-level:intermediate
cost:medium
impact:small

@briansull
Copy link
Contributor

Whether it s a win or a loss depends upon the size of the epilog region.
On x86 the epilog regions tend to be very small; epilogs of 1-4 bytes are very common on x86.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@kunalspathak kunalspathak added Priority:3 Work that is nice to have and removed JitUntriaged CLR JIT issues needing additional triage labels Dec 23, 2022
@kunalspathak
Copy link
Member

Very uncommon to have and don't think we know the perf impact of having multiple return blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization Priority:3 Work that is nice to have tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants