[x64] Mono SIMD codegen bug #97176

tannergooding · 2024-01-18T20:53:53Z

The following program fails on Mono and returns +∞ instead of NaN as expected. This is likely an error somewhere in the branch handling edge case inputs at the end of the Invoke function. It was caught as part of #97114

// This code is based on `vrd2_exp` from amd/aocl-libm-ose
// Copyright (C) 2019-2020 Advanced Micro Devices, Inc. All rights reserved.
//
// Licensed under the BSD 3-Clause "New" or "Revised" License
// See THIRD-PARTY-NOTICES.TXT for the full license text

// Implementation Notes
// ----------------------
// 1. Argument Reduction:
//      e^x = 2^(x/ln2) = 2^(x*(64/ln(2))/64)     --- (1)
//
//      Choose 'n' and 'f', such that
//      x * 64/ln2 = n + f                        --- (2) | n is integer
//                            | |f| <= 0.5
//     Choose 'm' and 'j' such that,
//      n = (64 * m) + j                          --- (3)
//
//     From (1), (2) and (3),
//      e^x = 2^((64*m + j + f)/64)
//          = (2^m) * (2^(j/64)) * 2^(f/64)
//          = (2^m) * (2^(j/64)) * e^(f*(ln(2)/64))
//
// 2. Table Lookup
//      Values of (2^(j/64)) are precomputed, j = 0, 1, 2, 3 ... 63
//
// 3. Polynomial Evaluation
//   From (2),
//     f = x*(64/ln(2)) - n
//   Let,
//     r  = f*(ln(2)/64) = x - n*(ln(2)/64)
//
// 4. Reconstruction
//      Thus,
//        e^x = (2^m) * (2^(j/64)) * e^r

using System.Runtime.Intrinsics;

var res = Invoke(Vector128.Create(double.NaN));
Console.WriteLine(res);

const ulong V_ARG_MAX = 0x40862000_00000000;
const ulong V_DP64_BIAS = 1023;

const double V_EXPF_MIN = -709.782712893384;
const double V_EXPF_MAX = +709.782712893384;

const double V_EXPF_HUGE = 6755399441055744;
const double V_TBL_LN2 = 1.4426950408889634;

const double V_LN2_HEAD = +0.693359375;
const double V_LN2_TAIL = -0.00021219444005469057;

const double C3 = 0.5000000000000018;
const double C4 = 0.1666666666666617;
const double C5 = 0.04166666666649277;
const double C6 = 0.008333333333559272;
const double C7 = 0.001388888895122404;
const double C8 = 0.00019841269432677495;
const double C9 = 2.4801486521374483E-05;
const double C10 = 2.7557622532543023E-06;
const double C11 = 2.7632293298250954E-07;
const double C12 = 2.499430431958571E-08;

static Vector128<double> Invoke(Vector128<double> x)
{
    // x * (64.0 / ln(2))
    Vector128<double> z = x * Vector128.Create(V_TBL_LN2);

    Vector128<double> dn = z + Vector128.Create(V_EXPF_HUGE);

    // n = (int)z
    Vector128<ulong> n = dn.AsUInt64();

    // dn = (double)n
    dn -= Vector128.Create(V_EXPF_HUGE);

    // r = x - (dn * (ln(2) / 64))
    // where ln(2) / 64 is split into Head and Tail values
    Vector128<double> r = x - (dn * Vector128.Create(V_LN2_HEAD)) - (dn * Vector128.Create(V_LN2_TAIL));

    Vector128<double> r2 = r * r;
    Vector128<double> r4 = r2 * r2;
    Vector128<double> r8 = r4 * r4;

    // Compute polynomial
    Vector128<double> poly = ((Vector128.Create(C12) * r + Vector128.Create(C11)) * r2 +
                               Vector128.Create(C10) * r + Vector128.Create(C9)) * r8 +
                             ((Vector128.Create(C8) * r + Vector128.Create(C7)) * r2 +
                              (Vector128.Create(C6) * r + Vector128.Create(C5))) * r4 +
                             ((Vector128.Create(C4) * r + Vector128.Create(C3)) * r2 + (r + Vector128<double>.One));

    // m = (n - j) / 64
    // result = polynomial * 2^m
    Vector128<double> ret = poly * ((n + Vector128.Create(V_DP64_BIAS)) << 52).AsDouble();

    // Check if -709 < vx < 709
    if (Vector128.GreaterThanAny(Vector128.Abs(x).AsUInt64(), Vector128.Create(V_ARG_MAX)))
    {
        // (x > V_EXPF_MAX) ? double.PositiveInfinity : x
        Vector128<double> infinityMask = Vector128.GreaterThan(x, Vector128.Create(V_EXPF_MAX));

        ret = Vector128.ConditionalSelect(
            infinityMask,
            Vector128.Create(double.PositiveInfinity),
            ret
        );

        // (x < V_EXPF_MIN) ? 0 : x
        ret = Vector128.AndNot(ret, Vector128.LessThan(x, Vector128.Create(V_EXPF_MIN)));
    }

    return ret;
}

The text was updated successfully, but these errors were encountered:

ghost · 2024-01-18T20:54:02Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

The following program fails on Mono and returns +∞ instead of NaN as expected. This is likely an error somewhere in the branch handling edge case inputs at the end of the Invoke function. It was caught as part of #97114

// This code is based on `vrd2_exp` from amd/aocl-libm-ose
// Copyright (C) 2019-2020 Advanced Micro Devices, Inc. All rights reserved.
//
// Licensed under the BSD 3-Clause "New" or "Revised" License
// See THIRD-PARTY-NOTICES.TXT for the full license text

// Implementation Notes
// ----------------------
// 1. Argument Reduction:
//      e^x = 2^(x/ln2) = 2^(x*(64/ln(2))/64)     --- (1)
//
//      Choose 'n' and 'f', such that
//      x * 64/ln2 = n + f                        --- (2) | n is integer
//                            | |f| <= 0.5
//     Choose 'm' and 'j' such that,
//      n = (64 * m) + j                          --- (3)
//
//     From (1), (2) and (3),
//      e^x = 2^((64*m + j + f)/64)
//          = (2^m) * (2^(j/64)) * 2^(f/64)
//          = (2^m) * (2^(j/64)) * e^(f*(ln(2)/64))
//
// 2. Table Lookup
//      Values of (2^(j/64)) are precomputed, j = 0, 1, 2, 3 ... 63
//
// 3. Polynomial Evaluation
//   From (2),
//     f = x*(64/ln(2)) - n
//   Let,
//     r  = f*(ln(2)/64) = x - n*(ln(2)/64)
//
// 4. Reconstruction
//      Thus,
//        e^x = (2^m) * (2^(j/64)) * e^r

using System.Runtime.Intrinsics;

var res = Invoke(Vector128.Create(double.NaN));
Console.WriteLine(res);

const ulong V_ARG_MAX = 0x40862000_00000000;
const ulong V_DP64_BIAS = 1023;

const double V_EXPF_MIN = -709.782712893384;
const double V_EXPF_MAX = +709.782712893384;

const double V_EXPF_HUGE = 6755399441055744;
const double V_TBL_LN2 = 1.4426950408889634;

const double V_LN2_HEAD = +0.693359375;
const double V_LN2_TAIL = -0.00021219444005469057;

const double C3 = 0.5000000000000018;
const double C4 = 0.1666666666666617;
const double C5 = 0.04166666666649277;
const double C6 = 0.008333333333559272;
const double C7 = 0.001388888895122404;
const double C8 = 0.00019841269432677495;
const double C9 = 2.4801486521374483E-05;
const double C10 = 2.7557622532543023E-06;
const double C11 = 2.7632293298250954E-07;
const double C12 = 2.499430431958571E-08;

static Vector128<double> Invoke(Vector128<double> x)
{
    // x * (64.0 / ln(2))
    Vector128<double> z = x * Vector128.Create(V_TBL_LN2);

    Vector128<double> dn = z + Vector128.Create(V_EXPF_HUGE);

    // n = (int)z
    Vector128<ulong> n = dn.AsUInt64();

    // dn = (double)n
    dn -= Vector128.Create(V_EXPF_HUGE);

    // r = x - (dn * (ln(2) / 64))
    // where ln(2) / 64 is split into Head and Tail values
    Vector128<double> r = x - (dn * Vector128.Create(V_LN2_HEAD)) - (dn * Vector128.Create(V_LN2_TAIL));

    Vector128<double> r2 = r * r;
    Vector128<double> r4 = r2 * r2;
    Vector128<double> r8 = r4 * r4;

    // Compute polynomial
    Vector128<double> poly = ((Vector128.Create(C12) * r + Vector128.Create(C11)) * r2 +
                               Vector128.Create(C10) * r + Vector128.Create(C9)) * r8 +
                             ((Vector128.Create(C8) * r + Vector128.Create(C7)) * r2 +
                              (Vector128.Create(C6) * r + Vector128.Create(C5))) * r4 +
                             ((Vector128.Create(C4) * r + Vector128.Create(C3)) * r2 + (r + Vector128<double>.One));

    // m = (n - j) / 64
    // result = polynomial * 2^m
    Vector128<double> ret = poly * ((n + Vector128.Create(V_DP64_BIAS)) << 52).AsDouble();

    // Check if -709 < vx < 709
    if (Vector128.GreaterThanAny(Vector128.Abs(x).AsUInt64(), Vector128.Create(V_ARG_MAX)))
    {
        // (x > V_EXPF_MAX) ? double.PositiveInfinity : x
        Vector128<double> infinityMask = Vector128.GreaterThan(x, Vector128.Create(V_EXPF_MAX));

        ret = Vector128.ConditionalSelect(
            infinityMask,
            Vector128.Create(double.PositiveInfinity),
            ret
        );

        // (x < V_EXPF_MIN) ? 0 : x
        ret = Vector128.AndNot(ret, Vector128.LessThan(x, Vector128.Create(V_EXPF_MIN)));
    }

    return ret;
}

Author:	tannergooding
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

tannergooding · 2024-01-18T20:54:27Z

CC. @fanyang-mono

tannergooding · 2024-01-18T20:58:18Z

Note that this does not repro for Vector64 which should be accelerated on Arm64. So this looks like an x86/x64 specific issue.

) * Adding vectorized implementations of Exp to Vector64/128/256/512 * Accelerate TensorPrimitives.Exp for double * Ensure the right allowedVariance is used for the vectorized exp tests * Ensure V128/256/512 defers to the next smaller vector size by operating on the lower/upper halves * Ensure the right allowedVariance amounts are used for the vectorized Exp(float) tests * Ensure we call Exp and that the methods are properly inlined * Skip the Exp test for Vector128/256/512 on Mono due to #97176

vargaz · 2024-01-20T06:07:29Z

I can repro this on x64, but not on arm64.

…net#97114) * Adding vectorized implementations of Exp to Vector64/128/256/512 * Accelerate TensorPrimitives.Exp for double * Ensure the right allowedVariance is used for the vectorized exp tests * Ensure V128/256/512 defers to the next smaller vector size by operating on the lower/upper halves * Ensure the right allowedVariance amounts are used for the vectorized Exp(float) tests * Ensure we call Exp and that the methods are properly inlined * Skip the Exp test for Vector128/256/512 on Mono due to dotnet#97176

steveisok · 2024-02-16T18:24:52Z

Since it's x64, that's less of a priority within mono. @vargaz @fanyang-mono is this an easy fix?

fanyang-mono · 2024-02-16T18:30:33Z

Not sure what the problem is exactly by looking at the repro program.

tannergooding · 2024-02-16T19:15:37Z

My guess is that ConditionalSelect is potentially being mishandled on x86/x64.

The exposed managed API is mask, left, right. On Arm64 this maps to bsl mask, left, right. On x64 this has to be emulated as (left & mask) | (right & ~mask) or if you know that mask is "AllBitsSet" or "Zero" on a per-element basis you can instead use pblendvb right, left, mask (so effectively parameter order is inverted).

Alternatively, it could be a bug with GreaterThan (double) since it has to be emulated using cmpltpd right, left (swapping the operands and using less than instead) on pre-AVX hardware (Mono notably doesn't support AVX today, to my knowledge).

GreaterThanAny (ulong) also needs to be emulated on pre SSE4.2 hardware so it could be that, but I expect that is already being handled.

…_FP (#104206) * Ensure mini-amd64 correctly handles CMP_GT and CMP_GE for OP_XCOMPARE_FP * Re-enable the tests disabled due to #97176

ghost added the untriaged New issue has not been triaged by the area owner label Jan 18, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 18, 2024

tannergooding added area-Codegen-JIT-mono and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jan 18, 2024

tannergooding added a commit to tannergooding/runtime that referenced this issue Jan 18, 2024

Skip the Exp test for Vector128/256/512 on Mono due to dotnet#97176

398c78e

steveisok removed the untriaged New issue has not been triaged by the area owner label Feb 16, 2024

steveisok added this to the Future milestone Feb 16, 2024

fanyang-mono changed the title ~~Mono SIMD codegen bug~~ [x64] Mono SIMD codegen bug Feb 16, 2024

tannergooding added a commit to tannergooding/runtime that referenced this issue Jun 30, 2024

Re-enable the tests disabled due to dotnet#97176

9a8dc5c

tannergooding mentioned this issue Jun 30, 2024

Ensure mini-amd64 correctly handles CMP_GT and CMP_GE for OP_XCOMPARE_FP #104206

Merged

dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jun 30, 2024

tannergooding added a commit that referenced this issue Jul 3, 2024

Ensure mini-amd64 correctly handles CMP_GT and CMP_GE for OP_XCOMPARE…

b652b39

…_FP (#104206) * Ensure mini-amd64 correctly handles CMP_GT and CMP_GE for OP_XCOMPARE_FP * Re-enable the tests disabled due to #97176

tannergooding closed this as completed in #104206 Jul 3, 2024

github-actions bot locked and limited conversation to collaborators Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[x64] Mono SIMD codegen bug #97176

[x64] Mono SIMD codegen bug #97176

tannergooding commented Jan 18, 2024

ghost commented Jan 18, 2024

tannergooding commented Jan 18, 2024

tannergooding commented Jan 18, 2024

vargaz commented Jan 20, 2024

steveisok commented Feb 16, 2024

fanyang-mono commented Feb 16, 2024

tannergooding commented Feb 16, 2024

[x64] Mono SIMD codegen bug #97176

[x64] Mono SIMD codegen bug #97176

Comments

tannergooding commented Jan 18, 2024

ghost commented Jan 18, 2024

tannergooding commented Jan 18, 2024

tannergooding commented Jan 18, 2024

vargaz commented Jan 20, 2024

steveisok commented Feb 16, 2024

fanyang-mono commented Feb 16, 2024

tannergooding commented Feb 16, 2024