Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int128 support on Windows [llvm bug] #3081

Closed
vtjnash opened this issue May 11, 2013 · 24 comments
Closed

Int128 support on Windows [llvm bug] #3081

vtjnash opened this issue May 11, 2013 · 24 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@vtjnash
Copy link
Member

vtjnash commented May 11, 2013

Windows 64-bit segfaults when attempting div, mod, or rem on 128-bit numbers (it tries to call the gcc intrinsic __umodti3). This is because LLVM does not use the correct calling conventions for 128-bit numbers on windows.

@vtjnash
Copy link
Member Author

vtjnash commented May 11, 2013

Attached is c code for demonstrating the issue. Observe that gcc (correctly) passes the argument to test on the stack, whereas clang (incorrectly) passes the argument split across registers ecx and edx

// clang -S test.c -O2 -m64 -o test-clang.s
// x86_64-w64-mingw-gcc -S test.c -O2 -m64 -o test-gcc.s

int __attribute__((noinline)) test(__int128 x) {
    return x+19432;
}

int main() {
    return test(23521);
}

gcc generated assembly (correct):

        .file   "test.c"
        .text
        .p2align 4,,15
        .globl  test
        .def    test;   .scl    2;      .type   32;     .endef
        .seh_proc       test
test:
        .seh_endprologue
        movq    (%rcx), %rax
        addl    $19432, %eax
        ret
        .seh_endproc
        .def    __main; .scl    2;      .type   32;     .endef
        .section        .text.startup,"x"
        .p2align 4,,15
        .globl  main
        .def    main;   .scl    2;      .type   32;     .endef
        .seh_proc       main
main:
        subq    $56, %rsp
        .seh_stackalloc 56
        .seh_endprologue
        call    __main
        leaq    32(%rsp), %rcx
        movq    $23521, 32(%rsp)
        movq    $0, 40(%rsp)
        call    test
        addq    $56, %rsp
        ret
        .seh_endproc
        .ident  "GCC: (GNU) 4.8.1 20130329 (prerelease)"

clang generated assembly (incorrect):

        .def     test;
        .scl    2;
        .type   32;
        .endef
        .text
        .globl  test
        .align  16, 0x90
test:
        .cfi_startproc
        leal    19432(%rcx), %eax
        ret
        .cfi_endproc

        .def     main;
        .scl    2;
        .type   32;
        .endef
        .globl  main
        .align  16, 0x90
main:
        .cfi_startproc
        pushq   %rbp
.Ltmp2:
        .cfi_def_cfa_offset 16
.Ltmp3:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
.Ltmp4:
        .cfi_def_cfa_register %rbp
        callq   __main
        movl    $23521, %ecx
        xorl    %edx, %edx
        popq    %rbp
        jmp     test
        .cfi_endproc

@ViralBShah
Copy link
Member

It would be good to have tests to catch all these issues that we keep discovering on Windows. With the effort to use Intel C++ composer on Windows, it would be nice to not fall into the same trap again.

@vharavy
Copy link
Contributor

vharavy commented May 12, 2013

@vtjnash I am not sure that your assembler listings confirm the issue. Windows 64-bit and Linux 64-bit are using different calling convention. So, comparing output of Clang and MinGW is not fair. If you check with GCC (not MinGW) you will get the same result as Clang (I just checked).

We need somehow to verify that LLVM generates code incorrectly. Also, it can be related to this LLVM issue

@vtjnash
Copy link
Member Author

vtjnash commented May 12, 2013

@vharavy what makes you think I'm not running clang for a windows 64 target? This should be a completely fair comparison. To prove my point, if you look more carefully the assembly generated by clang (and gcc) is PE not ELF -- meaning it was generated to run on 64-bit windows (just not correctly). I am well aware of the difference in calling conventions and looked up the windows calling convention before declaring gcc to be correct (e.g. you should get the same result if you test with the Intel or Microsoft compilers).

Demonstrating that this is not a bug that exists on your Linux box is not productive. I could have used gcc without the mingw prefix on my windows box in the same way I used clang. This does demonstrates that LLVM generates incorrect code on Windows. I'm also not sure how the inability of llvm to generate a certain assembly instruction optimization would affect its choice of calling convention (although I had already read through the list of missing and broken features in the source code to check).

@vharavy
Copy link
Contributor

vharavy commented May 12, 2013

@vtjnash OK, I apologize. I did not understood your post clearly.

There is bug fixing phase of LLVM and the release 3.3 is schedule in about 3 weeks. We may try to check this new version to see if the problem was resolved. Here is the link to LLVM 3.3 SVN branch.

@vtjnash
Copy link
Member Author

vtjnash commented May 13, 2013

@vharavy no need to apologize, I was simply making the observation that the outputs are not the same even beyond the superficial differences (including the usage of difference registers). It might be helpful if you could confirm that the Intel and Microsoft compilers are compliant with the windows standard here.

Out of curiosity, do those compilers have __umodti3 natively? I assume they must since LLVM is supposed to compile using those compilers, but it seemed to me like an unusually function name.

(I was hoping to avoid the need to compile clang 3.3 for windows, and was just going to submit it as an issue after working out my thoughts here. But perhaps drangon's script https://code.google.com/p/mingw-w64-dgn/ will be able to help me)

@vharavy
Copy link
Contributor

vharavy commented May 13, 2013

@vtjnash Unfortunately, neither Microsoft compilers nor Intel C++ Composer for Windows does not define type __int128 (or similar). But, calling convention for 64-bit Windows states that any structures in size greater than 8 bytes are passed on stack. So having something like:

typedef struct _int128_t
{
    __int64 hi;
    __int64 lo;
} int128_t;

int __declspec(noinline) test(int128_t x)
{
    return x.lo + 19432;
}

int main()
{
    int128_t x;

    x.hi = 0;
    x.lo = 23521;

    return test(x);
}

produces the following result for Microsoft compiler (VS 2012 Update 2):

test:
 mov         rax,qword ptr [rcx+8]  
 add         rax,4BE8h  
 ret  

main:
 sub         rsp,38h  
 mov         qword ptr [rsp+20h],0  
 mov         qword ptr [rsp+28h],5BE1h  
 movaps      xmm0,xmmword ptr [rsp+20h]  
 lea         rcx,[rsp+20h]  
 movdqa      xmmword ptr [rsp+20h],xmm0  
 call        000000013F7A1000  
 add         rsp,38h  
 ret  

and for Intel C++ Composer 2013 Update 2 for Windows:

main:
 sub         rsp,40h  
 xor         ecx,ecx  
 mov         rax,qword ptr [3F943000h]  
 xor         rax,rsp  
 mov         qword ptr [rsp+30h],rax  
 call        000000013F9418A0  
 xor         eax,eax  
 lea         rcx,[rsp+20h]  
 mov         edx,5BE1h  
 mov         qword ptr [rcx],rax  
 mov         qword ptr [rcx+8],rdx  
 call        000000013F941050  
 mov         rcx,qword ptr [rsp+30h]  
 mov         r15d,eax  
 xor         rcx,rsp  
 call        000000013F941070  
 mov         eax,r15d  
 add         rsp,40h  
 pop         r15  
 ret  

test:
 mov         eax,dword ptr [rcx+8]  
 add         eax,4BE8h  
 ret  

As you may see, in both cases int128_t is passed on stack and its pointer in register rcx (register for 1st argument).

P.S. Intel C++ Composer for Windows very tightly integrates with Microsoft compilers. It cannot be used without having Microsoft SDK or some version of Visual Studio installed, and while compiling it defines both _MSC_VER and __INTEL_COMPILER.

@vtjnash
Copy link
Member Author

vtjnash commented May 13, 2013

I believe the typename is supposed to be _m128 http://msdn.microsoft.com/en-us/library/94z15h2c.aspx
Although you are right that it doesn't really matter since the calling convention doesn't distinguish between the type of the object like it does on linux.

x86_64 is little endian, so the lo byte should be first. Taking that into account the assembly seems to agree with GCC.

typedef struct _int128_t
{
    __int64 lo;
    __int64 hi;
} int128_t;

Were you able to check for whether the compiler defines __umodti3 (and other functions in the family like __umodsi3 for modulus of 32-bit integers)?

@vtjnash
Copy link
Member Author

vtjnash commented May 13, 2013

Nevermind, I was able to confirm that __umodti3 is part of libgcc. This is tangential, but I believe that LLVM will not work without some version of this library loaded (MSVC is listed under the compilers that don't work with LLVM, apparently because of linking issues). However, the LLVM project does provide an alternative http://compiler-rt.llvm.org/. I bring this up because I'm wondering if someone knows whether would help with fixing the libm issues on Windows 32-bit.

@vharavy
Copy link
Contributor

vharavy commented May 13, 2013

@vtjnash It seems not. There are some x64 intrinsics like _mul128, _umul128, _mulh, __umulh, but they are not functions.

_m128 and similar (like _m128i) are XMM, SSE and SSE2 data types, so I am afraid that parameters of such types will be passed in XMM registers.

In regard of __umodti3 and family, we can implement them if required. Or we can use libgcc.dll (actually libgcc_s_seh-1.dll or other). According to disassemler it is expects arguments the way they should to be.

@vtjnash
Copy link
Member Author

vtjnash commented May 13, 2013

The spec is very clear and consistent (although it's a bit hard to find all of the pieces). _m128 is the same as the struct you created above. http://msdn.microsoft.com/en-us/library/zthk2dkh.aspx

@vharavy
Copy link
Contributor

vharavy commented May 13, 2013

Yes, I can see that now. It is defined as follows

typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128 {
     float               m128_f32[4];
     unsigned __int64    m128_u64[2];
     __int8              m128_i8[16];
     __int16             m128_i16[8];
     __int32             m128_i32[4];
     __int64             m128_i64[2];
     unsigned __int8     m128_u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
 } __m128;

P.S. By the way, I don't really see why Julia need support of 128-bit integers. My opinion is everything except basic integer and floating point types, arrays and strings should be moved into separate packages. I like the way R does - you can specify which packages you need during start-up - the will be loaded automatically, other packages can always be loaded manually. Julia defines almost everything in Base and thus have 6 megabyte sys.ji file, and huge amount of functions that are mostly not documented. No wonder, it takes a lot of time to add support of different compilers - because Base relies on over dozen of libraries that are difficult to port/replace.

@JeffBezanson
Copy link
Member

Complaining about our documentation in this context is totally gratuitous.
It has nothing to do with this. And as if 6mb is too big. Really.
On May 13, 2013 6:48 PM, "VHaravy" [email protected] wrote:

Yes, I can see that now. It is defined as follows

typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128 {
float m128_f32[4];
unsigned __int64 m128_u64[2];
__int8 m128_i8[16];
__int16 m128_i16[8];
__int32 m128_i32[4];
__int64 m128_i64[2];
unsigned __int8 m128_u8[16];
unsigned __int16 m128_u16[8];
unsigned __int32 m128_u32[4];
} __m128;

P.S. By the way, I don't really see why Julia need support of 128-bit
integers. My opinion is everything except basic integer and floating point
types, arrays and strings should be moved into separate packages. I like
the way R does - you can specify which packages you need during start-up -
the will be loaded automatically, other packages can always be loaded
manually. Julia defines almost everything in Base and thus have 6 megabyte
sys.ji file, and huge amount of functions that are mostly not documented.
No wonder, it takes a lot of time to add support of different compilers -
because Base relies on over dozen of libraries that are difficult to
port/replace.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3081#issuecomment-17846260
.

@vharavy
Copy link
Contributor

vharavy commented May 14, 2013

@JeffBezanson You may agree, that there is a lot of things to worry about (and to work at) besides 128-bit integers. Anyway, it was just a cry from the heart - it is needed to be done, we will get it done.

@vtjnash
Copy link
Member Author

vtjnash commented May 28, 2013

submitted upstream as http://llvm.org/bugs/show_bug.cgi?id=16168

@Keno
Copy link
Member

Keno commented Jun 27, 2013

This is actually not terribly hard to fix. I'll have a go at it.

@ghost ghost assigned Keno Jun 27, 2013
@vtjnash
Copy link
Member Author

vtjnash commented Jun 28, 2013

@loladiro fixing arbitrary int128 functions is easy (i assume you already have that on your branch). the annoying part is that all of the llvm math intrinsics are calling the functions with the wrong ABI, and we don't have a way to control that, short of reimplementing them

@Keno
Copy link
Member

Keno commented Jun 28, 2013

I know what's involved and where to fix the llvm intrinsics.

@vtjnash
Copy link
Member Author

vtjnash commented Aug 12, 2013

This patch fixes the tests.

--- llvm-3.3.src/lib/CodeGen/SelectionDAG/TargetLowering.cpp    2013-02-12 16:21:59.000000000 -0500
+++ llvm-3.3/lib/CodeGen/SelectionDAG/TargetLowering.cpp    2013-08-12 12:10:25.187393937 -0400
@@ -74,19 +74,37 @@
   TargetLowering::ArgListTy Args;
   Args.reserve(NumOps);

+  int Win64 = 1; //DAG.TLI.???
+  SDValue Store = DAG.getEntryNode();
   TargetLowering::ArgListEntry Entry;
   for (unsigned i = 0; i != NumOps; ++i) {
     Entry.Node = Ops[i];
     Entry.Ty = Entry.Node.getValueType().getTypeForEVT(*DAG.getContext());
     Entry.isSExt = isSigned;
     Entry.isZExt = !isSigned;
+    if (Win64) {
+      IntegerType *ity = dyn_cast<IntegerType>(Entry.Ty);
+      if (ity && ity->getBitWidth() > 64) {
+        printf("libcall arg %d i128*\n", i);
+        SDValue StackPtr = DAG.CreateStackTemporary(Entry.Node.getValueType(), 16); 
+        int SPFI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+        MachinePointerInfo PtrInfo = MachinePointerInfo::getFixedStack(SPFI);
+        // Emit a store to the stack slot.
+        Store = DAG.getStore(Store, dl, Entry.Node, StackPtr, PtrInfo,
+                                     false, false, 16); 
+        Entry.Node = StackPtr;
+        Entry.Ty = PointerType::get(Entry.Ty,0);
+        Entry.isSExt = false;
+        Entry.isZExt = false;
+      }
+    }
     Args.push_back(Entry);
   }
   SDValue Callee = DAG.getExternalSymbol(getLibcallName(LC), getPointerTy());

   Type *RetTy = RetVT.getTypeForEVT(*DAG.getContext());
   TargetLowering::
-  CallLoweringInfo CLI(DAG.getEntryNode(), RetTy, isSigned, !isSigned, false,
+  CallLoweringInfo CLI(Store, RetTy, isSigned, !isSigned, false,
                     false, 0, getLibcallCallingConv(LC),
                     /*isTailCall=*/false,
                     /*doesNotReturn=*/false, /*isReturnValueUsed=*/true,

However, this would also need to be in SelectionDAGLegalize::ExpandLibCall and SelectionDAGLegalize::ExpandChainLibCall, because those appear to be places that the authors of LLVM copied this function.

vtjnash added a commit that referenced this issue Aug 13, 2013
@vtjnash vtjnash reopened this Aug 14, 2013
@vtjnash
Copy link
Member Author

vtjnash commented Aug 14, 2013

By some cruel trick, my copy of GCC had managed to exactly copy the return values where LLVM thought they should be, making this patch succeed only on my machine. But in fact, LLVM is unable to model the int128 return type on windows and so will typically fail even with this patch -- another patch will be forthcoming.

@Keno
Copy link
Member

Keno commented Aug 28, 2013

I'm still seeing failing numbers tests on windows. Should this have fixed it?

@vtjnash
Copy link
Member Author

vtjnash commented Aug 28, 2013

did you force llvm to rebuild? (cd deps/llvm-3.3/build_Release && make install)

@Keno
Copy link
Member

Keno commented Aug 28, 2013

I had but somehow it didn't propagate. I just rebuilt everything from scratch though and it worked fine

@vtjnash
Copy link
Member Author

vtjnash commented Aug 28, 2013

oh, and you need to delete libjulia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

No branches or pull requests

5 participants