IEEE 754:2019 Compliance #1387

tannergooding · 2020-01-07T23:11:14Z

IEEE 754:2019 was published last year and this details the "required" and "recommended" operations for any conforming implementation:

Required:

IEEE API	.NET Double API	.NET Single API
sourceFormat roundToIntegralTiesToEven(source)	double Math.Round(double, MidpointRounding.ToEven)	float MathF.Round(float, MidpointRounding.ToEven)
sourceFormat roundToIntegralTiesToAway(source)	double Math.Round(double, MidpointRounding.AwayFromZero)	float MathF.Round(float, MidpointRounding.AwayFromZero)
sourceFormat roundToIntegralTowardZero(source)	double Math.Round(double, MidpointRounding.ToZero)	float MathF.Round(float, MidpointRounding.ToZero)
sourceFormat roundToIntegralTowardPositive(source)	double Math.Round(double, MidpointRounding.ToPositiveInfinity)	float MathF.Round(float, MidpointRounding.ToPositiveInfinity)
sourceFormat roundToIntegralTowardNegative(source)	double Math.Round(double, MidpointRounding.ToNegativeInfinity)	float MathF.Round(float, MidpointRounding.ToNegativeInfinity)
sourceFormat nextUp(source)	double Math.BitIncrement(double)	float MathF.BitIncrement(float)
sourceFormat nextDown(source)	double Math.BitDecrement(double)	float MathF.BitDecrement(float)
sourceFormat remainder(source, source)	double Math.IEEERemainder(double, double)	float MathF.IEEERemainder(float, float)
sourceFormat scaleB(source, logBFormat)	double Math.ScaleB(double, int)	float MathF.ScaleB(float, int)
logBFormat logB(source)	int Math.ILogB(double)	int MathF.ILogB(float)
formatOf-addition(source1, source2)	double = double + double	float = float + float
formatOf-subtraction(source1, source2)	double = double - double	float = float - float
formatOf-multiplication(source1, source2)	double = double * double	float = float * float
formatOf-division(source1, source2)	double = double / double	float = float / float
formatOf-squareRoot(source1)	double Math.Sqrt(double)	float MathF.Sqrt(float)
formatOf-fusedMultiplyAdd(source1, source2, source3)	double Math.FusedMultiplyAdd(double, double, double)	float MathF.FusedMultiplyAdd(float, float, float)
formatOf-convertFromInt(int)	double = (double)int	float = (float)int
intFormatOf-convertToIntegerTiesToEven(source)
intFormatOf-convertToIntegerTowardZero(source)	int = (int)double	int = (int)float
intFormatOf-convertToIntegerTowardPositive(source)
intFormatOf-convertToIntegerTowardNegative(source)
intFormatOf-convertToIntegerTiesToAway(source)
formatOf-convertFormat(source)	double = (double)float	float = (float)double
formatOf-convertFromDecimalCharacter(decimalCharacterSequence)	double double.Parse(string)	float float.Parse(string)
decimalCharacterSequence convertToDecimalCharacter(source, conversionSpecification)	string double.ToString()	string float.ToString()
formatOf-convertFromHexCharacter(hexCharacterSequence)(hexCharacterSequence)
hexCharacterSequence convertToHexCharacter(source, conversionSpecification)
sourceFormat copy(source)	double = double	float = float
sourceFormat negate(source)	double = -double	float = -float
sourceFormat abs(source)	double Math.Abs(double)	float = MathF.Abs(float)
sourceFormat copySign(source, source)	double Math.CopySign(double, double)	float MathF.CopySign(float, float)
boolean compareQuietEqual(source1, source2)	bool = double == double	bool = float == float
boolean compareQuietNotEqual(source1, source2)	bool = double != double	bool = float != float
boolean compareQuietGreater(source1, source2)	bool = double > double	bool = float > float
boolean compareQuietGreaterEqual(source1, source2)	bool = double >= double	bool = float >= float
boolean compareQuietLess(source1, source2)	bool = double < double	bool = float < float
boolean compareQuietLessEqual(source1, source2)	bool = double <= double	bool = float <= float
boolean compareQuietUnordered(source1, source2)
boolean compareQuietNotGreater(source1, source2)
boolean compareQuietLessUnordered(source1, source2)
boolean compareQuietNotLess(source1, source2)
boolean compareQuietGreaterUnordered(source1, source2)
boolean is754version1985(void)
boolean is754version2008(void)
boolean is754version2019(void)
enum class(source)
boolean isSignMinus(source)	bool double.IsNegative(double)	bool float.IsNegative(float)
boolean isNormal(source)	bool double.IsNormal(double)	bool float.IsNormal(float)
boolean isFinite(source)	bool double.IsFinite(double)	bool float.IsFinite(float)
boolean isZero(source)
boolean isSubnormal(source)	bool double.IsSubnormal(double)	bool float.IsSubnormal(float)
boolean isInfinite(source)	bool double.IsInfinity(double)	bool float.IsInfinity(float)
boolean isNaN(source)	bool double.IsNaN(double)	bool float.IsNaN(float)
boolean isSignaling(source)
enum radix(source)
boolean totalOrder(source, source)
boolean totalOrderMag(source, source)

The following IEEE APIs are also "required" but we do not support the IEEE floating-point exceptions and so they are equivalent to other APIs we expose:

sourceFormat roundToIntegralExact(source)
intFormatOf-convertToIntegerExactTiesToEven(source)
intFormatOf-convertToIntegerExactTowardZero(source)
intFormatOf-convertToIntegerExactTowardPositive(source)
intFormatOf-convertToIntegerExactTowardNegative(source)
intFormatOf-convertToIntegerExactTiesToAway(source)

The following IEEE APIs are also "required" but we do not support throwing for NaN inputs, so they are equivalent to other APIs we expose:

boolean compareSignalingEqual(source1, source2)
boolean compareSignalingGreater(source1, source2)
boolean compareSignalingGreaterEqual(source1, source2)
boolean compareSignalingLess(source1, source2)
boolean compareSignalingLessEqual(source1, source2)
boolean compareSignalingNotEqual(source1, source2)
boolean compareSignalingNotGreater(source1, source2)
boolean compareSignalingLessUnordered(source1, source2)
boolean compareSignalingNotLess(source1, source2)
boolean compareSignalingGreaterUnordered(source1, source2)

IEEE API	.NET Double API	.NET Single API
exp	double Math.Exp(double)	float Math.Exp(float)
expm1
exp2
exp2m1
exp10
exp10m1
log	double Math.Log(double)	float MathF.Log(float)
log2	double Math.Log2(double)	float MathF.Log2(float)
log10	double Math.Log10(double)	float MathF.Log10(float)
logp1
log2p1
log10p1
hypot(x, y)
rSqrt
compound(x, n)
rootn(x, n)
pown(x, n)
pow(x, y)	double Math.Pow(double, double)	float MathF.Pow(float, float)
powr(x, y)
sin	double Math.Sin(double)	float MathF.Sin(float)
cos	double Math.Cos(double)	float MathF.Cos(float)
tan	double Math.Tan(double)	float MathF.Tan(float)
sinPi
cosPi
tanPi
asin	double Math.Asin(double)	float MathF.Asin(float)
acos	double Math.Acos(double)	float MathF.Acos(float)
atan	double Math.Atan(double)	float MathF.Atan(float)
atan2(y, x)	double Math.Atan2(double, double)	float MathF.Atan2(float, float)
asinPi
acosPi
atanPi
atan2Pi(y, x)
sinh	double Math.Sinh(double)	float MathF.Sinh(float)
cosh	double Math.Cosh(double)	float MathF.Cosh(float)
tanh	double Math.Tanh(double)	float MathF.Tanh(float)
asinh	double Math.Asinh(double)	float MathF.Asinh(float)
acosh	double Math.Acosh(double)	float MathF.Acosh(float)
atanh	double Math.Atanh(double)	float MathF.Atanh(float)
sourceFormat sum(source vector, integralFormat)
sourceFormat dot(source vector, source vector, integralFormat)
sourceFormat sumSquare(source vector, integralFormat)
sourceFormat sumAbs(source vector, integralFormat)
(sourceFormat, integralFormat) scaledProd(source vector, integralFormat)
(sourceFormat, integralFormat) scaledProdSum(source vector, source vector, integralFormat)
(sourceFormat, integralFormat) scaledProdDiff(source vector, source vector, integralFormat)
(sourceFormat, sourceFormat) augmentedAddition(source, source)
(sourceFormat, sourceFormat) augmentedSubtraction(source, source)
(sourceFormat, sourceFormat) augmentedMultiplication(source, source)
sourceFormat minimum(source, source)	double Math.Min(double, double)	float MathF.Min(float, float)
sourceFormat minimumNumber(source, source)
sourceFormat maximum(source, source)	double Math.Max(double, double)	float MathF.Max(float, float)
sourceFormat maximumNumber(source, source)
sourceFormat minimumMagnitude(source, source)	double Math.MinMagnitude(double, double)	float MathF.MinMagnitude(float, float)
sourceFormat minimumMagnitudeNumber(source, source)
sourceFormat maximumMagnitude(source, source)	double Math.MaxMagnitude(double, double)	float MathF.MaxMagnitude(float, float)
sourceFormat maximumMagnitudeNumber(source, source)
sourceFormat getPayload(source)
sourceFormat setPayload(source)
sourceFormat setPayloadSignaling(source)

The following IEEE APIs are also "recommended" but cover modifying the floating-point environement, which we don't currently support:

binaryRoundingDirection getBinaryRoundingDirection(void)
void setBinaryRoundingDirection(binaryRoundingDirection)
modeGroup saveModes(void)
void restoreModes(modeGroup)
void defaultModes(void)

tannergooding · 2020-01-07T23:25:26Z

From the required operations, we are notable missing:

Conversion from float-format to int-format using a specified rounding direction
Conversion from hex-string to float-format (and vice-versa)
Unordered comparisons (due to NaN, x >= y is not the opposite of x < y)
An explicit IsZero API (although users can use x == 0, this is meant to be a separate explicit API)
An API which classifies the floating-point type (this is meant to be separate from the other Is* APIs)
An API to determine if we are spec compliant
An API to determine if two inputs have "total order" (Although IComparable provides similar functionality, it doesn't handle edge cases like +/-0 as the spec defines)
An API to explicitly get the radix (this is always 2 for float/double)

For the recommended operations, which provide more accurate computations than can be manually computed, we are notably missing the following:

The p1 (+1) and m1 (-1) APIs (for example: log2(1 + x))
The trigonometric pi operations (for example: sin(pi * x))
An API to compute the hypotenuse of a triangle
An API to compute the reciprocal square root
An API to compute an arbitrary root
An API to compound values
An API which specially handles integral and positive powers

There are also several new recommended APIs in IEEE 754:2019:

Reduction operations which take "vectors" (arrays)
Augmented arithmetic which return a tuple (the result and the error from rounding the result)
Min/Max number APIs which were "required" in IEEE 754:2008, but which didn't clearly define NaN propagation
APIs to get/set the payload of a NaN

tannergooding · 2020-01-07T23:25:48Z

CC. @dotnet/fxdc, since this came up in API review today

tannergooding · 2020-03-05T21:44:23Z

Closing as duplicate of #27204, which I've updated to include the new functions.

tannergooding added design-discussion Ongoing discussion about design without consensus area-System.Numerics labels Jan 7, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 7, 2020

Gnbrkm41 mentioned this issue Jan 11, 2020

Add the ability to parse/format a float/double from/to a hexadecimal literal #1630

Open

tannergooding closed this as completed Mar 5, 2020

tannergooding mentioned this issue Mar 5, 2020

Provide the full set of IEEE Operations/Behaviors required for compliance #27204

Closed

tannergooding removed the untriaged New issue has not been triaged by the area owner label Sep 14, 2020

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IEEE 754:2019 Compliance #1387

IEEE 754:2019 Compliance #1387

tannergooding commented Jan 7, 2020

tannergooding commented Jan 7, 2020

tannergooding commented Jan 7, 2020

tannergooding commented Mar 5, 2020

IEEE 754:2019 Compliance #1387

IEEE 754:2019 Compliance #1387

Comments

tannergooding commented Jan 7, 2020

Required:

Recommended

tannergooding commented Jan 7, 2020

tannergooding commented Jan 7, 2020

tannergooding commented Mar 5, 2020