Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert.Try{From/To}HexString #86556

Merged
merged 28 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions src/libraries/System.Private.CoreLib/src/System/Convert.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2960,6 +2960,65 @@ public static byte[] FromHexString(ReadOnlySpan<char> chars)
return result;
}

/// <summary>
/// Converts the string, which encodes binary data as hex characters, to an equivalent 8-bit unsigned integer span.
/// </summary>
/// <param name="source">The string to convert.</param>
/// <param name="destination">
/// The span in which to write the converted 8-bit unsigned integers. When this method returns false,
adamsitnik marked this conversation as resolved.
Show resolved Hide resolved
/// either the span remains unmodified or contains an incomplete conversion of <paramref name="source"/>,
/// up to the last valid character.
/// </param>
/// <param name="bytesWritten">When this method returns, contains the number of bytes that were written in <paramref name="destination"/>.</param>
adamsitnik marked this conversation as resolved.
Show resolved Hide resolved
/// <returns>true if the conversion was successful; otherwise, false.</returns>
/// <exception cref="ArgumentNullException">Passed string <paramref name="source"/> is null.</exception>
public static bool TryFromHexString(string source, Span<byte> destination, out int bytesWritten)
{
ArgumentNullException.ThrowIfNull(source);

return TryFromHexString(source.AsSpan(), destination, out bytesWritten);
}

/// <summary>
/// Converts the span of chars, which encodes binary data as hex characters, to an equivalent 8-bit unsigned integer span.
/// </summary>
/// <param name="source">The span to convert.</param>
/// <param name="destination">
/// The span in which to write the converted 8-bit unsigned integers. When this method returns false,
adamsitnik marked this conversation as resolved.
Show resolved Hide resolved
/// either the span remains unmodified or contains an incomplete conversion of <paramref name="source"/>,
/// up to the last valid character.
/// </param>
/// <param name="bytesWritten">When this method returns, contains the number of bytes that were written in <paramref name="destination"/>.</param>
adamsitnik marked this conversation as resolved.
Show resolved Hide resolved
/// <returns>true if the conversion was successful; otherwise, false.</returns>
public static bool TryFromHexString(ReadOnlySpan<char> source, Span<byte> destination, out int bytesWritten)
{
var length = source.Length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var length = source.Length;
int length = source.Length;

Use concrete types here. Same on other places.

Perf-wise this could be nuint, so all the division and modulo operations are treated as (fast) bit-operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf-wise this could be nuint, so all the division and modulo operations are treated as (fast) bit-operations.

That's interesting. We rarely use native width types except in interop contexts. @jkotas @EgorBo is it correct that in hot code of this type, it can have perf benefit for simple storage of counters and offsets?

Copy link
Member

@EgorBo EgorBo May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afair, JIT knows that Span.Length (and Array.Length as well) are never negative so it doesn't need extra efforts from the C# side 🙂 e.g.:

int Foo(Span<int> s) => s.Length % 4;
; Method Foo(System.Span`1[int]):int:this
       mov      eax, dword ptr [rdx+08H]
       and      eax, 3
       ret      
; Total bytes of code: 7

(some of the changes landed in .NET 8.0 so might be not visible on sharplab yet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right -- I'm not aware of patterns where nuint gives better code.

Copy link
Member

@gfoidl gfoidl May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT knows that Span.Length (and Array.Length as well) are never negative

Thanks for the reminder -- I'll forget sometimes about the new JIT-goodness.

We rarely use native width types except in interop contexts

Quite a lot of vectorized code uses the native width types, because memory operations don't need (sign-) extending moves then (e.g. Unsafe.Add(ref ptr, intVariable) vs. Unsafe.Add(ref ptr, nuintVariable) -- https://godbolt.org/z/v8E31Wdsx).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Math.DivRem be more efficient here? Especially once it becomes an intrinsic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Math.DivRem be more efficient here? Especially once it becomes an intrinsic.

I doubt it, since the JIT optimizes length % 2 != 0 to (length & 1) != 0.

Copy link
Member

@BrennanConroy BrennanConroy May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment wasn't just about the single length check, but also the division done a few lines later. DivRem in theory combines the two operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the division done a few lines later

The JIT can optimize (uint)length / 2 to (uint)length >> 1. On X86, DivRem would likely be implemented using a div instruction, which is an relatively expensive operation.


if (length % 2 != 0)
goto FalseResult;

if (length == 0)
{
bytesWritten = 0;
return true;
}

var halfLength = length / 2;

if (destination.Length < halfLength)
goto FalseResult;

if (!HexConverter.TryDecodeFromUtf16(source, destination))
goto FalseResult;

bytesWritten = halfLength;
return true;

FalseResult:
bytesWritten = 0;
return false;
MihaZupan marked this conversation as resolved.
Show resolved Hide resolved
}

/// <summary>
/// Converts an array of 8-bit unsigned integers to its equivalent string representation that is encoded with uppercase hex characters.
/// </summary>
Expand Down Expand Up @@ -3011,5 +3070,40 @@ public static string ToHexString(ReadOnlySpan<byte> bytes)

return HexConverter.ToString(bytes, HexConverter.Casing.Upper);
}


/// <summary>
/// Converts a span of 8-bit unsigned integers to its equivalent span representation that is encoded with uppercase hex characters.
/// </summary>
/// <param name="source">A span of 8-bit unsigned integers.</param>
/// <param name="destination">The span representation in hex of the elements in <paramref name="source"/>.</param>
/// <param name="charsWritten">When this method returns, contains the number of chars that were written in <paramref name="destination"/>.</param>
/// <returns>true if the conversion was successful; otherwise, false.</returns>
public static bool TryToHexString(ReadOnlySpan<byte> source, Span<char> destination, out int charsWritten)
{
var length = source.Length;

if (length > int.MaxValue / 2)
goto FalseResult;

if (length == 0)
{
charsWritten = 0;
return true;
}

int twiceLength = length * 2;
MihaZupan marked this conversation as resolved.
Show resolved Hide resolved

if (destination.Length < twiceLength)
goto FalseResult;

HexConverter.EncodeToUtf16(source, destination);
charsWritten = twiceLength;
return true;

FalseResult:
charsWritten = 0;
return false;
MihaZupan marked this conversation as resolved.
Show resolved Hide resolved
}
} // class Convert
} // namespace
Original file line number Diff line number Diff line change
Expand Up @@ -34,90 +34,80 @@ public static void CompleteValueRange()

private static void TestSequence(byte[] expected, string actual)
{
Assert.Equal(expected, Convert.FromHexString(actual));
}

[Fact]
public static void InvalidInputString_Null()
{
AssertExtensions.Throws<ArgumentNullException>("s", () => Convert.FromHexString(null));
}

[Fact]
public static void InvalidInputString_HalfByte()
{
Assert.Throws<FormatException>(() => Convert.FromHexString("ABC"));
}

[Fact]
public static void InvalidInputString_BadFirstCharacter()
{
Assert.Throws<FormatException>(() => Convert.FromHexString("x0"));
}

[Fact]
public static void InvalidInputString_BadSecondCharacter()
{
Assert.Throws<FormatException>(() => Convert.FromHexString("0x"));
}

[Fact]
public static void InvalidInputString_NonAsciiCharacter()
{
Assert.Throws<FormatException>(() => Convert.FromHexString("0\u0308"));
}
byte[] fromResult = Convert.FromHexString(actual);
Assert.Equal(expected, fromResult);

[Fact]
public static void InvalidInputString_ZeroWidthSpace()
{
Assert.Throws<FormatException>(() => Convert.FromHexString("\u200B 000102FDFEFF"));
}
Span<byte> tryResult = stackalloc byte[actual.Length / 2];
adamsitnik marked this conversation as resolved.
Show resolved Hide resolved
Assert.True(Convert.TryFromHexString(actual, tryResult, out int written));
Assert.Equal(fromResult.Length, written);
AssertExtensions.SequenceEqual(expected, tryResult);

adamsitnik marked this conversation as resolved.
Show resolved Hide resolved
[Fact]
public static void InvalidInputString_LeadingWhiteSpace()
{
Assert.Throws<FormatException>(() => Convert.FromHexString(" 000102FDFEFF"));
}

[Fact]
public static void InvalidInputString_TrailingWhiteSpace()
public static void InvalidInputString_Null()
{
Assert.Throws<FormatException>(() => Convert.FromHexString("000102FDFEFF "));
AssertExtensions.Throws<ArgumentNullException>("s", () => Convert.FromHexString(null));
Assert.False(Convert.TryFromHexString(null, default, out _));
}

[Fact]
public static void InvalidInputString_WhiteSpace()
[Theory]
[InlineData("01-02-FD-FE-FF")]
[InlineData("00 01 02FD FE FF")]
[InlineData("000102FDFEFF ")]
[InlineData(" 000102FDFEFF")]
[InlineData("\u200B 000102FDFEFF")]
[InlineData("0\u0308")]
[InlineData("0x")]
[InlineData("x0")]
[InlineData("ABC")] // HalfByte
public static void InvalidInputString_FormatException_Or_FalseResult(string invalidInput)
{
Assert.Throws<FormatException>(() => Convert.FromHexString("00 01 02FD FE FF"));
}
Assert.Throws<FormatException>(() => Convert.FromHexString(invalidInput));

[Fact]
public static void InvalidInputString_Dash()
{
Assert.Throws<FormatException>(() => Convert.FromHexString("01-02-FD-FE-FF"));
Span<byte> buffer = stackalloc byte[invalidInput.Length / 2];
Assert.False(Convert.TryFromHexString(invalidInput.AsSpan(), buffer, out _));
}

[Fact]
public static void ZeroLength()
{
Assert.Same(Array.Empty<byte>(), Convert.FromHexString(string.Empty));

bool tryResult = Convert.TryFromHexString(string.Empty, Span<byte>.Empty, out int written);

Assert.True(tryResult);
Assert.Equal(0, written);
}

[Fact]
public static void ToHexFromHexRoundtrip()
{
for (int i = 1; i < 50; i++)
const int LoopCount = 50;
Span<char> buffer = stackalloc char[LoopCount * 2];
for (int i = 1; i < LoopCount; i++)
{
byte[] data = System.Security.Cryptography.RandomNumberGenerator.GetBytes(i);
byte[] data = Security.Cryptography.RandomNumberGenerator.GetBytes(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use a seeded Random so the tests are reproducible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Dan, but since the existing tests in this type were already using Security.Cryptography.RandomNumberGenerator it's acceptable to keep it.

string hex = Convert.ToHexString(data);
Assert.Equal(data, Convert.FromHexString(hex.ToLowerInvariant()));
Assert.Equal(data, Convert.FromHexString(hex.ToUpperInvariant()));

Span<char> currentBuffer = buffer.Slice(0, i);
bool tryHex = Convert.TryToHexString(data, currentBuffer, out int written);
Assert.True(tryHex);
AssertExtensions.SequenceEqual(hex.AsSpan(), currentBuffer);
Assert.Equal(hex.Length, written);

TestSequence(data, hex);
TestSequence(data, hex.ToLowerInvariant());
TestSequence(data, hex.ToUpperInvariant());

string mixedCase1 = hex.Substring(0, hex.Length / 2).ToUpperInvariant() +
hex.Substring(hex.Length / 2).ToLowerInvariant();
string mixedCase2 = hex.Substring(0, hex.Length / 2).ToLowerInvariant() +
hex.Substring(hex.Length / 2).ToUpperInvariant();
Assert.Equal(data, Convert.FromHexString(mixedCase1));
Assert.Equal(data, Convert.FromHexString(mixedCase2));

TestSequence(data, mixedCase1);
TestSequence(data, mixedCase2);

Assert.Throws<FormatException>(() => Convert.FromHexString(hex + " "));
Assert.Throws<FormatException>(() => Convert.FromHexString("\uAAAA" + hex));
}
Expand Down
3 changes: 3 additions & 0 deletions src/libraries/System.Runtime/ref/System.Runtime.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1119,7 +1119,9 @@ public static partial class Convert
public static byte[] FromBase64CharArray(char[] inArray, int offset, int length) { throw null; }
public static byte[] FromBase64String(string s) { throw null; }
public static byte[] FromHexString(System.ReadOnlySpan<char> chars) { throw null; }
public static bool TryFromHexString(System.ReadOnlySpan<char> source, Span<byte> destination, out int bytesWritten) { throw null; }
Copy link
Member

@adamsitnik adamsitnik Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that you have provided the PR before the API got approved: #78472 (comment)

In the approved proposal the methods have been renamed (no Try prefix) and they are supposed to return OperationStatus rather than bool. Please change the contract and implementation to reflect that.

They also include two out parameters. But since charsConsumed would be always equal bytesWritten I think it's reasonable to have only one parameter. We did exactly the same thing for Ascii:

https://github.com/dotnet/runtime/blob/24d7f5a58ac4bcee1ec5197427068e437f19552f/src/libraries/System.Runtime/ref/System.Runtime.cs#L14306-L14307

But I don't have the power to approve API changes without board review. @bartonjs can we have only one out parameter here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charsConsumed will be twice bytesWritten, unless the method supports whitespace (which I don't remember if it's supposed to, or not) and then it'll be charsConsumed >= 2 * bytesWritten.

The pattern for OperationStatus-returning members is OperationStatus [Method](ReadOnlySpan<TSource> source, Span<TDestination> destination, [required options], out {bytes|chars|items}Consumed, out {bytes|chars|items}Written, [defaulted parameters]), even when consumed and written will have a known/stable relationship.

public static byte[] FromHexString(string s) { throw null; }
public static bool TryFromHexString(string source, Span<byte> destination, out int bytesWritten) { throw null; }
public static System.TypeCode GetTypeCode(object? value) { throw null; }
public static bool IsDBNull([System.Diagnostics.CodeAnalysis.NotNullWhenAttribute(true)] object? value) { throw null; }
public static int ToBase64CharArray(byte[] inArray, int offsetIn, int length, char[] outArray, int offsetOut) { throw null; }
Expand Down Expand Up @@ -1265,6 +1267,7 @@ public static partial class Convert
public static string ToHexString(byte[] inArray) { throw null; }
public static string ToHexString(byte[] inArray, int offset, int length) { throw null; }
public static string ToHexString(System.ReadOnlySpan<byte> bytes) { throw null; }
public static bool TryToHexString(System.ReadOnlySpan<byte> source, System.Span<char> destination, out int charsWritten) { throw null; }
public static short ToInt16(bool value) { throw null; }
public static short ToInt16(byte value) { throw null; }
public static short ToInt16(char value) { throw null; }
Expand Down