Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting from GB18030 encoded bytes to chars throws Exception #110521

Closed
uBpringlIoNaRys opened this issue Dec 9, 2024 · 7 comments
Closed

Comments

@uBpringlIoNaRys
Copy link

uBpringlIoNaRys commented Dec 9, 2024

Description

Converting from GB18030 encoded data to string throws an exception in net9.

Reproduction Steps

This code throws the exception on net9 but not on net8.

using System.Text;

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var encoding = Encoding.GetEncoding("GB18030");

ReadOnlySpan<byte> encodedBytes = [0x95, 0x32, 0xB7, 0x37];

// This call throws, encoding GetString or GetChars as well.
var actual = encoding.GetCharCount(encodedBytes);

Console.WriteLine($"CharCount: {actual}");

var bytes = encoding.GetBytes("𠈓");
Console.WriteLine($"EncodedBytes of character match encodedBytes span: {bytes.AsSpan().SequenceEqual(encodedBytes)}");

Expected behavior

The conversion from bytes to chars does not throw an exception.

Actual behavior

Unhandled exception. System.ArgumentException: The output char buffer is too small to contain the decoded characters, encoding 'Chinese Simplified (GB18030)' fallback 'System.Text.DecoderReplacementFallback'. (Parameter 'chars')
   at System.Text.EncodingNLS.ThrowCharsOverflow()
   at System.Text.EncodingNLS.ThrowCharsOverflow(DecoderNLS decoder, Boolean nothingDecoded)
   at System.Text.EncodingCharBuffer.AddChar(Char ch1, Char ch2, Int32 numBytes)
   at System.Text.GB18030Encoding.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder)
   at System.Text.GB18030Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDecoder)
   at System.Text.Encoding.GetCharCount(ReadOnlySpan`1 bytes)
   at Program.<Main>$(String[] args) in C:\git\Receiver\SYC_Appliance\Test\GB18030DecodeFailure\Program.cs:line 12
   

Regression?

Yes

Known Workarounds

none

Configuration

.net 9
Windows
x64

Other information

There was a change (861164c) that modified the if statement in EncodingCharBuffer.AddChar. Maybe the logic for uninitialized chars and "counting" has changed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 9, 2024
@uBpringlIoNaRys
Copy link
Author

Our product supports receiving GB18030 encoded text in DICOM data.

This error prohibits us migrating to net9.

Is this breaking change something that would be fixed or is this encoding not supported anymore?

@am11
Copy link
Member

am11 commented Dec 10, 2024

Works with .NET 8 https://dotnetfiddle.net/BUQChY
Fails with .NET 9 https://dotnetfiddle.net/3N5Syf

@ericstj
Copy link
Member

ericstj commented Jan 10, 2025

@tarekgh can you please have a look?

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

@tarekgh tarekgh added this to the 9.0.x milestone Jan 10, 2025
@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Jan 10, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

@tarekgh
Copy link
Member

tarekgh commented Jan 10, 2025

This is a regression from the change #97950. I'll work on fixing it.

CC @jkotas

@tarekgh
Copy link
Member

tarekgh commented Jan 14, 2025

This is now fixed by the PR #111367. The fix will be released in the next 9.0 servicing release.

@tarekgh tarekgh closed this as completed Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants