-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Conversation
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.Escaping.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/JavaScriptEncoder.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/JavaScriptEncoder.cs
Show resolved
Hide resolved
f96562e
to
b9e8703
Compare
b9e8703
to
9fbd6ee
Compare
Note: the second commit addressing feedback was just rebased onto the original (due to local issues with out-of-sync history causing unnecessary commits being shown) |
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Show resolved
Hide resolved
using (var writer = new StringWriter()) | ||
{ | ||
System.Text.Encodings.Web.JavaScriptEncoder.Default.Encode(writer, "\U0001f4a9"); | ||
Assert.Equal("\\uD83D\\uDCA9", writer.GetStringBuilder().ToString()); | ||
} | ||
|
||
Span<char> destination = new char[12]; | ||
OperationStatus status = System.Text.Encodings.Web.JavaScriptEncoder.Default.Encode("\U0001f4a9".AsSpan(), destination, out int charsConsumed, out int charsWritten, isFinalBlock: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, we should probably fix up the namespace of these tests (rename Microsoft.Framework.WebEncoders
to System.Text.Encodings.Web.Tests
) (outside of this PR, ofc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/System.Text.Encodings.Web/tests/JavaScriptStringEncoderTests.cs
Outdated
Show resolved
Hide resolved
Great. We should get this in for 3.0, imo. This should bring back some of the perf that was regressed previously when we moved to JavascriptEncoder (#39415): cc @ericstj
What's the reason for that? Is there a way for us to bring some of this perf improvement into S.T.E.W itself? |
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Show resolved
Hide resolved
The reason S.T.E.W is slower is because there is a class hierarchy (JavaScriptEncoder : TextEncoder) where both the default encoder and custom share the same (slower) code which is driven off of a combination of valid and invalid ranges which are flexible and can be specified by the caller -- and this code can't have the micro-optimizations that we have in S.T.J. However we could create a new internal type (DefaultJavaScriptEncoder in S.T.E.W) and move code from S.T.J to S.T.E.W however using a custom encoder is still going to be 25% slower since it won't use that fast code. Also note that S.T.E.W was already made 3x-4x faster from the original (for Utf8). |
src/System.Text.Encodings.Web/src/System/Text/Unicode/UnicodeHelpers.cs
Outdated
Show resolved
Hide resolved
It'd be good to have a unit test validating the changes for the latest iteration 390e6c7, but otherwise LGTM. Thanks! |
For performance adds the new API for encoding char. Also changes the semantics of System.Text.Json to use the replacement character for Utf16 for bad surrogate pairs instead of throwing - this makes it consistent with Utf8 and the default behavior of System.Text.Encoding.Json.
@@ -607,7 +606,10 @@ internal static OperationStatus EncodeUtf8Shim(TextEncoder encoder, ReadOnlySpan | |||
return OperationStatus.DestinationTooSmall; | |||
} | |||
|
|||
return EncodeIntoBuffer(destinationPtr, destination.Length, sourcePtr, source.Length, out charsConsumed, out charsWritten, firstCharacterToEncode, isFinalBlock); | |||
fixed (char* destinationPtr = destination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit of deferring creating this fixed block? In general, is it best to keep your fixed blocks as small as possible and is there no value in trying to batch them together with other fixed block setup?
cc @jkotas
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
// Cast to (byte) for performance; avoids array bounds check. | ||
private static bool NeedsEscaping(char value) => value > LastAsciiCharacter || AllowList[(byte)value] == 0; | ||
private static bool NeedsEscaping(char value) => value > LastAsciiCharacter || AllowList[value] == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: implement this in terms of NeedsEscapingNoBoundsCheck
private static bool NeedsEscaping(char value)
=> value > LastAsciiCharacter || NeedsEscapingNoBoundsCheck(value);
@@ -696,6 +697,7 @@ public static void AssertContentsNotEqual(string expectedValue, ArrayBufferWrite | |||
); | |||
|
|||
// Temporary hack until we can use the same escape algorithm on both sides and make sure we want uppercase hex. | |||
// Todo: create new AssertContentsNotEqualAgainJsonNet to avoid calling NormalizeToJsonNetFormat when not necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create tracking issues for TODOs like this. They are easier to track/manage then searching for "todo" in source. Also, increases visibility for others to pick up issues like this.
For performance adds the new API for encoding char. Also changes the semantics of System.Text.Json to use the replacement character for Utf16 for bad surrogate pairs instead of throwing - this makes it consistent with Utf8 and the default behavior of System.Text.Encoding.Json.
For performance adds the new API for encoding char. Also changes the semantics of System.Text.Json to use the replacement character for Utf16 for bad surrogate pairs instead of throwing - this makes it consistent with Utf8 and the default behavior of System.Text.Encoding.Json. Commit migrated from dotnet/corefx@0cb8c78
Fixes #39523
For performance adds the API linked above for encoding char. Also changes the semantics of System.Text.Json to use the replacement character for Utf16 for bad surrogate pairs instead of throwing - this makes it consistent with Utf8 and the default behavior of System.Text.Encoding.Json.
Improves performance of Utf16 encoding (with custom encode) by ~25% by avoiding Utf16->Utf8->Utf16 conversions:
Also by changing code in System.Text.Json, improves performance of Utf8 escaping when no escaping needs to occur (~15% faster), or occurs later in the string:
FWIW the ASCII escaping in System.Text.Json is ~25% faster than System.Text.Encoding.Web.JavaScriptEncoder - thus the reason why there is some duplicate functionality between the two; S.T.J will only call S.T.E.W for non-ASCII or when using a non-default encoder.