-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review all Encoding usage for BOM compatibility #1027
Comments
I have reviewed usages of Encoding (most commonly
The following cases ignore a BOM if present, and do not fail if there is not a BOM, and thus do not need to be changed to a BOM-less Encoding:
So you'll see in the PR that the amount of changes to address BOM issues are not very many; that's because most fall into those buckets above. |
Looks like you missed It has gone through several rounds of refactoring since then, but currently it has a
Given the fact that we added this field specifically because Side note: perhaps we should also rename |
Is there an existing issue for this?
Task description
Java's
StandardCharsets.UTF_8
does not write a Byte-Order Mark (BOM), while .NET'sSystem.Text.Encoding.UTF8
does include a BOM by default. We have ensured that theIOUtils.CHARSET_UTF_8
does not include a BOM to match Java, and as part of #1018 we've added an internal Support class to allow for usingStandardCharsets.UTF_8
, but we need to review all usage ofSystem.Text.Encoding.UTF8
to determine if it should be replaced withStandardCharsets.UTF_8
orIOUtils.CHARSET_UTF_8
(whatever best matches the corresponding Java Lucene code) to avoid BOM issues.The text was updated successfully, but these errors were encountered: