Int128 support on Windows [llvm bug] #3081

vtjnash · 2013-05-11T21:15:25Z

Windows 64-bit segfaults when attempting div, mod, or rem on 128-bit numbers (it tries to call the gcc intrinsic __umodti3). This is because LLVM does not use the correct calling conventions for 128-bit numbers on windows.

The text was updated successfully, but these errors were encountered:

vtjnash · 2013-05-11T23:50:28Z

Attached is c code for demonstrating the issue. Observe that gcc (correctly) passes the argument to test on the stack, whereas clang (incorrectly) passes the argument split across registers ecx and edx

// clang -S test.c -O2 -m64 -o test-clang.s
// x86_64-w64-mingw-gcc -S test.c -O2 -m64 -o test-gcc.s

int __attribute__((noinline)) test(__int128 x) {
    return x+19432;
}

int main() {
    return test(23521);
}

gcc generated assembly (correct):

        .file   "test.c"
        .text
        .p2align 4,,15
        .globl  test
        .def    test;   .scl    2;      .type   32;     .endef
        .seh_proc       test
test:
        .seh_endprologue
        movq    (%rcx), %rax
        addl    $19432, %eax
        ret
        .seh_endproc
        .def    __main; .scl    2;      .type   32;     .endef
        .section        .text.startup,"x"
        .p2align 4,,15
        .globl  main
        .def    main;   .scl    2;      .type   32;     .endef
        .seh_proc       main
main:
        subq    $56, %rsp
        .seh_stackalloc 56
        .seh_endprologue
        call    __main
        leaq    32(%rsp), %rcx
        movq    $23521, 32(%rsp)
        movq    $0, 40(%rsp)
        call    test
        addq    $56, %rsp
        ret
        .seh_endproc
        .ident  "GCC: (GNU) 4.8.1 20130329 (prerelease)"

clang generated assembly (incorrect):

        .def     test;
        .scl    2;
        .type   32;
        .endef
        .text
        .globl  test
        .align  16, 0x90
test:
        .cfi_startproc
        leal    19432(%rcx), %eax
        ret
        .cfi_endproc

        .def     main;
        .scl    2;
        .type   32;
        .endef
        .globl  main
        .align  16, 0x90
main:
        .cfi_startproc
        pushq   %rbp
.Ltmp2:
        .cfi_def_cfa_offset 16
.Ltmp3:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
.Ltmp4:
        .cfi_def_cfa_register %rbp
        callq   __main
        movl    $23521, %ecx
        xorl    %edx, %edx
        popq    %rbp
        jmp     test
        .cfi_endproc

ViralBShah · 2013-05-12T03:15:38Z

It would be good to have tests to catch all these issues that we keep discovering on Windows. With the effort to use Intel C++ composer on Windows, it would be nice to not fall into the same trap again.

vharavy · 2013-05-12T15:06:39Z

@vtjnash I am not sure that your assembler listings confirm the issue. Windows 64-bit and Linux 64-bit are using different calling convention. So, comparing output of Clang and MinGW is not fair. If you check with GCC (not MinGW) you will get the same result as Clang (I just checked).

We need somehow to verify that LLVM generates code incorrectly. Also, it can be related to this LLVM issue

vtjnash · 2013-05-12T18:49:31Z

@vharavy what makes you think I'm not running clang for a windows 64 target? This should be a completely fair comparison. To prove my point, if you look more carefully the assembly generated by clang (and gcc) is PE not ELF -- meaning it was generated to run on 64-bit windows (just not correctly). I am well aware of the difference in calling conventions and looked up the windows calling convention before declaring gcc to be correct (e.g. you should get the same result if you test with the Intel or Microsoft compilers).

Demonstrating that this is not a bug that exists on your Linux box is not productive. I could have used gcc without the mingw prefix on my windows box in the same way I used clang. This does demonstrates that LLVM generates incorrect code on Windows. I'm also not sure how the inability of llvm to generate a certain assembly instruction optimization would affect its choice of calling convention (although I had already read through the list of missing and broken features in the source code to check).

vharavy · 2013-05-12T19:57:32Z

@vtjnash OK, I apologize. I did not understood your post clearly.

There is bug fixing phase of LLVM and the release 3.3 is schedule in about 3 weeks. We may try to check this new version to see if the problem was resolved. Here is the link to LLVM 3.3 SVN branch.

vtjnash · 2013-05-13T20:21:11Z

@vharavy no need to apologize, I was simply making the observation that the outputs are not the same even beyond the superficial differences (including the usage of difference registers). It might be helpful if you could confirm that the Intel and Microsoft compilers are compliant with the windows standard here.

Out of curiosity, do those compilers have __umodti3 natively? I assume they must since LLVM is supposed to compile using those compilers, but it seemed to me like an unusually function name.

(I was hoping to avoid the need to compile clang 3.3 for windows, and was just going to submit it as an issue after working out my thoughts here. But perhaps drangon's script https://code.google.com/p/mingw-w64-dgn/ will be able to help me)

vharavy · 2013-05-13T21:22:58Z

@vtjnash Unfortunately, neither Microsoft compilers nor Intel C++ Composer for Windows does not define type __int128 (or similar). But, calling convention for 64-bit Windows states that any structures in size greater than 8 bytes are passed on stack. So having something like:

typedef struct _int128_t
{
    __int64 hi;
    __int64 lo;
} int128_t;

int __declspec(noinline) test(int128_t x)
{
    return x.lo + 19432;
}

int main()
{
    int128_t x;

    x.hi = 0;
    x.lo = 23521;

    return test(x);
}

produces the following result for Microsoft compiler (VS 2012 Update 2):

test:
 mov         rax,qword ptr [rcx+8]  
 add         rax,4BE8h  
 ret  

main:
 sub         rsp,38h  
 mov         qword ptr [rsp+20h],0  
 mov         qword ptr [rsp+28h],5BE1h  
 movaps      xmm0,xmmword ptr [rsp+20h]  
 lea         rcx,[rsp+20h]  
 movdqa      xmmword ptr [rsp+20h],xmm0  
 call        000000013F7A1000  
 add         rsp,38h  
 ret

and for Intel C++ Composer 2013 Update 2 for Windows:

main:
 sub         rsp,40h  
 xor         ecx,ecx  
 mov         rax,qword ptr [3F943000h]  
 xor         rax,rsp  
 mov         qword ptr [rsp+30h],rax  
 call        000000013F9418A0  
 xor         eax,eax  
 lea         rcx,[rsp+20h]  
 mov         edx,5BE1h  
 mov         qword ptr [rcx],rax  
 mov         qword ptr [rcx+8],rdx  
 call        000000013F941050  
 mov         rcx,qword ptr [rsp+30h]  
 mov         r15d,eax  
 xor         rcx,rsp  
 call        000000013F941070  
 mov         eax,r15d  
 add         rsp,40h  
 pop         r15  
 ret  

test:
 mov         eax,dword ptr [rcx+8]  
 add         eax,4BE8h  
 ret

As you may see, in both cases int128_t is passed on stack and its pointer in register rcx (register for 1st argument).

P.S. Intel C++ Composer for Windows very tightly integrates with Microsoft compilers. It cannot be used without having Microsoft SDK or some version of Visual Studio installed, and while compiling it defines both _MSC_VER and __INTEL_COMPILER.

vtjnash · 2013-05-13T21:42:06Z

I believe the typename is supposed to be _m128 http://msdn.microsoft.com/en-us/library/94z15h2c.aspx
Although you are right that it doesn't really matter since the calling convention doesn't distinguish between the type of the object like it does on linux.

x86_64 is little endian, so the lo byte should be first. Taking that into account the assembly seems to agree with GCC.

typedef struct _int128_t
{
    __int64 lo;
    __int64 hi;
} int128_t;

Were you able to check for whether the compiler defines __umodti3 (and other functions in the family like __umodsi3 for modulus of 32-bit integers)?

vtjnash · 2013-05-13T22:24:42Z

Nevermind, I was able to confirm that __umodti3 is part of libgcc. This is tangential, but I believe that LLVM will not work without some version of this library loaded (MSVC is listed under the compilers that don't work with LLVM, apparently because of linking issues). However, the LLVM project does provide an alternative http://compiler-rt.llvm.org/. I bring this up because I'm wondering if someone knows whether would help with fixing the libm issues on Windows 32-bit.

vharavy · 2013-05-13T22:25:12Z

@vtjnash It seems not. There are some x64 intrinsics like _mul128, _umul128, _mulh, __umulh, but they are not functions.

_m128 and similar (like _m128i) are XMM, SSE and SSE2 data types, so I am afraid that parameters of such types will be passed in XMM registers.

In regard of __umodti3 and family, we can implement them if required. Or we can use libgcc.dll (actually libgcc_s_seh-1.dll or other). According to disassemler it is expects arguments the way they should to be.

vtjnash · 2013-05-13T22:37:41Z

The spec is very clear and consistent (although it's a bit hard to find all of the pieces). _m128 is the same as the struct you created above. http://msdn.microsoft.com/en-us/library/zthk2dkh.aspx

vharavy · 2013-05-13T22:48:25Z

Yes, I can see that now. It is defined as follows

typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128 {
     float               m128_f32[4];
     unsigned __int64    m128_u64[2];
     __int8              m128_i8[16];
     __int16             m128_i16[8];
     __int32             m128_i32[4];
     __int64             m128_i64[2];
     unsigned __int8     m128_u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
 } __m128;

P.S. By the way, I don't really see why Julia need support of 128-bit integers. My opinion is everything except basic integer and floating point types, arrays and strings should be moved into separate packages. I like the way R does - you can specify which packages you need during start-up - the will be loaded automatically, other packages can always be loaded manually. Julia defines almost everything in Base and thus have 6 megabyte sys.ji file, and huge amount of functions that are mostly not documented. No wonder, it takes a lot of time to add support of different compilers - because Base relies on over dozen of libraries that are difficult to port/replace.

JeffBezanson · 2013-05-14T00:50:02Z

Complaining about our documentation in this context is totally gratuitous.
It has nothing to do with this. And as if 6mb is too big. Really.
On May 13, 2013 6:48 PM, "VHaravy" [email protected] wrote:

Yes, I can see that now. It is defined as follows

typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128 {
float m128_f32[4];
unsigned __int64 m128_u64[2];
__int8 m128_i8[16];
__int16 m128_i16[8];
__int32 m128_i32[4];
__int64 m128_i64[2];
unsigned __int8 m128_u8[16];
unsigned __int16 m128_u16[8];
unsigned __int32 m128_u32[4];
} __m128;

P.S. By the way, I don't really see why Julia need support of 128-bit
integers. My opinion is everything except basic integer and floating point
types, arrays and strings should be moved into separate packages. I like
the way R does - you can specify which packages you need during start-up -
the will be loaded automatically, other packages can always be loaded
manually. Julia defines almost everything in Base and thus have 6 megabyte
sys.ji file, and huge amount of functions that are mostly not documented.
No wonder, it takes a lot of time to add support of different compilers -
because Base relies on over dozen of libraries that are difficult to
port/replace.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3081#issuecomment-17846260
.

vharavy · 2013-05-14T00:56:19Z

@JeffBezanson You may agree, that there is a lot of things to worry about (and to work at) besides 128-bit integers. Anyway, it was just a cry from the heart - it is needed to be done, we will get it done.

vtjnash · 2013-05-28T21:00:42Z

submitted upstream as http://llvm.org/bugs/show_bug.cgi?id=16168

Keno · 2013-06-27T23:50:50Z

This is actually not terribly hard to fix. I'll have a go at it.

vtjnash · 2013-06-28T00:52:10Z

@loladiro fixing arbitrary int128 functions is easy (i assume you already have that on your branch). the annoying part is that all of the llvm math intrinsics are calling the functions with the wrong ABI, and we don't have a way to control that, short of reimplementing them

Keno · 2013-06-28T03:12:15Z

I know what's involved and where to fix the llvm intrinsics.

vtjnash · 2013-08-12T16:20:24Z

This patch fixes the tests.

--- llvm-3.3.src/lib/CodeGen/SelectionDAG/TargetLowering.cpp    2013-02-12 16:21:59.000000000 -0500
+++ llvm-3.3/lib/CodeGen/SelectionDAG/TargetLowering.cpp    2013-08-12 12:10:25.187393937 -0400
@@ -74,19 +74,37 @@
   TargetLowering::ArgListTy Args;
   Args.reserve(NumOps);

+  int Win64 = 1; //DAG.TLI.???
+  SDValue Store = DAG.getEntryNode();
   TargetLowering::ArgListEntry Entry;
   for (unsigned i = 0; i != NumOps; ++i) {
     Entry.Node = Ops[i];
     Entry.Ty = Entry.Node.getValueType().getTypeForEVT(*DAG.getContext());
     Entry.isSExt = isSigned;
     Entry.isZExt = !isSigned;
+    if (Win64) {
+      IntegerType *ity = dyn_cast<IntegerType>(Entry.Ty);
+      if (ity && ity->getBitWidth() > 64) {
+        printf("libcall arg %d i128*\n", i);
+        SDValue StackPtr = DAG.CreateStackTemporary(Entry.Node.getValueType(), 16); 
+        int SPFI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+        MachinePointerInfo PtrInfo = MachinePointerInfo::getFixedStack(SPFI);
+        // Emit a store to the stack slot.
+        Store = DAG.getStore(Store, dl, Entry.Node, StackPtr, PtrInfo,
+                                     false, false, 16); 
+        Entry.Node = StackPtr;
+        Entry.Ty = PointerType::get(Entry.Ty,0);
+        Entry.isSExt = false;
+        Entry.isZExt = false;
+      }
+    }
     Args.push_back(Entry);
   }
   SDValue Callee = DAG.getExternalSymbol(getLibcallName(LC), getPointerTy());

   Type *RetTy = RetVT.getTypeForEVT(*DAG.getContext());
   TargetLowering::
-  CallLoweringInfo CLI(DAG.getEntryNode(), RetTy, isSigned, !isSigned, false,
+  CallLoweringInfo CLI(Store, RetTy, isSigned, !isSigned, false,
                     false, 0, getLibcallCallingConv(LC),
                     /*isTailCall=*/false,
                     /*doesNotReturn=*/false, /*isReturnValueUsed=*/true,

However, this would also need to be in SelectionDAGLegalize::ExpandLibCall and SelectionDAGLegalize::ExpandChainLibCall, because those appear to be places that the authors of LLVM copied this function.

vtjnash · 2013-08-14T01:30:17Z

By some cruel trick, my copy of GCC had managed to exactly copy the return values where LLVM thought they should be, making this patch succeed only on my machine. But in fact, LLVM is unable to model the int128 return type on windows and so will typically fail even with this patch -- another patch will be forthcoming.

Keno · 2013-08-28T19:09:34Z

I'm still seeing failing numbers tests on windows. Should this have fixed it?

vtjnash · 2013-08-28T22:48:08Z

did you force llvm to rebuild? (cd deps/llvm-3.3/build_Release && make install)

Keno · 2013-08-28T22:50:20Z

I had but somehow it didn't propagate. I just rebuilt everything from scratch though and it worked fine

vtjnash · 2013-08-28T23:32:50Z

oh, and you need to delete libjulia

ghost assigned Keno Jun 27, 2013

Keno mentioned this issue Jun 29, 2013

WIP: Generalize float intrinsics #3580

Closed

vtjnash mentioned this issue Jul 22, 2013

typemax(Int128)/typemin(Int128) segfault #3786

Closed

quinnj mentioned this issue Jul 27, 2013

DateTime128 type quinnj/Datetime.jl#8

Closed

JeffBezanson mentioned this issue Aug 9, 2013

numbers.jl fails bug Windows #4001

Closed

vtjnash added a commit that referenced this issue Aug 13, 2013

fix #3081

7b6f70d

vtjnash closed this as completed in ae14378 Aug 13, 2013

vtjnash reopened this Aug 14, 2013

vtjnash mentioned this issue Aug 14, 2013

llvm builds incorrectly on win64 #4041

Closed

vtjnash closed this as completed in c7ba552 Aug 21, 2013

tkelman mentioned this issue Mar 5, 2014

RFC: Windows CI #6028

Merged

andrewrk mentioned this issue Apr 25, 2018

Finish and fix float printing ziglang/zig#930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int128 support on Windows [llvm bug] #3081

Int128 support on Windows [llvm bug] #3081

vtjnash commented May 11, 2013

vtjnash commented May 11, 2013

ViralBShah commented May 12, 2013

vharavy commented May 12, 2013

vtjnash commented May 12, 2013

vharavy commented May 12, 2013

vtjnash commented May 13, 2013

vharavy commented May 13, 2013

vtjnash commented May 13, 2013

vtjnash commented May 13, 2013

vharavy commented May 13, 2013

vtjnash commented May 13, 2013

vharavy commented May 13, 2013

JeffBezanson commented May 14, 2013

vharavy commented May 14, 2013

vtjnash commented May 28, 2013

Keno commented Jun 27, 2013

vtjnash commented Jun 28, 2013

Keno commented Jun 28, 2013

vtjnash commented Aug 12, 2013

vtjnash commented Aug 14, 2013

Keno commented Aug 28, 2013

vtjnash commented Aug 28, 2013

Keno commented Aug 28, 2013

vtjnash commented Aug 28, 2013

Int128 support on Windows [llvm bug] #3081

Int128 support on Windows [llvm bug] #3081

Comments

vtjnash commented May 11, 2013

vtjnash commented May 11, 2013

ViralBShah commented May 12, 2013

vharavy commented May 12, 2013

vtjnash commented May 12, 2013

vharavy commented May 12, 2013

vtjnash commented May 13, 2013

vharavy commented May 13, 2013

vtjnash commented May 13, 2013

vtjnash commented May 13, 2013

vharavy commented May 13, 2013

vtjnash commented May 13, 2013

vharavy commented May 13, 2013

JeffBezanson commented May 14, 2013

vharavy commented May 14, 2013

vtjnash commented May 28, 2013

Keno commented Jun 27, 2013

vtjnash commented Jun 28, 2013

Keno commented Jun 28, 2013

vtjnash commented Aug 12, 2013

vtjnash commented Aug 14, 2013

Keno commented Aug 28, 2013

vtjnash commented Aug 28, 2013

Keno commented Aug 28, 2013

vtjnash commented Aug 28, 2013