Unwanted/Risky UTF8 Byte Order marks at the start of the http responses. #847

myobis · 2022-03-25T14:57:53Z

Further to minor glitches in some client code, I used Telerik Fiddler (HexView mode), to confirm that SoapCore is generating UTF8 Byte Order Marks at the very beginning of the http response bodies.

This is useless and possibly harmful.

The case that I noticed can be easily fixed in SoapEncoderOptions.cs replacing
Encoding.UTF8 by new UTF8Encoding(encoderShouldEmitUTF8Identifier: false) .

This being said, there are plenty of other occurences of "Encoding.UTF8" in the code.. some are not a problem like Encoding.UTF8.GetBytes(string), others might be.

The text was updated successfully, but these errors were encountered:

andersjonsson · 2022-03-25T15:49:03Z

@myobis Nice catch! Thanks

Mind checking out my PR to see if that fixes the issue?
I get nervous changing things like this, but I find it unlikely that someone would depend on the BOM being there

myobis · 2022-03-27T19:55:11Z

@andersjonsson , thanks for the fix.

First, your fix does work for my client relying on UTF8: no more glitches 👌.

I also have the following comment about your PR :

I'm not a regular user of Unicode and BigEndianUnicode encodings. However, if it is similar to UTF8, I guess there should be no BOM at the start of the http response bodies for these encodings as well.
The following dotnetfiddle ( https://dotnetfiddle.net/t3J1xl ) shows that they all have BOMs and suggests respective replacements using new UnicodeEncoding(bool, bool) :

You might want to adjust DefaultEncodings.cs accordingly.

andersjonsson · 2022-03-28T06:47:30Z

I'm not a regular user of Unicode and BigEndianUnicode encodings. However, if it is similar to UTF8, I guess there should be no BOM at the start of the http response bodies for these encodings as well.

Since the charset is set to utf-16LE or utf-16BE in those cases I think you are correct.
From the Wikipedia page on UTF-16
"For the IANA registered charsets UTF-16BE and UTF-16LE, a byte order mark should not be used because the names of these character sets already determine the byte order."

itssimple mentioned this issue Mar 25, 2022

fix: Don't output the BOM, it can crash clients. #848

Closed

andersjonsson mentioned this issue Mar 25, 2022

use UTF8 without BOM as default #849

Merged

andersjonsson closed this as completed in #849 Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unwanted/Risky UTF8 Byte Order marks at the start of the http responses. #847

Unwanted/Risky UTF8 Byte Order marks at the start of the http responses. #847

myobis commented Mar 25, 2022

andersjonsson commented Mar 25, 2022

myobis commented Mar 27, 2022

andersjonsson commented Mar 28, 2022

Unwanted/Risky UTF8 Byte Order marks at the start of the http responses. #847

Unwanted/Risky UTF8 Byte Order marks at the start of the http responses. #847

Comments

myobis commented Mar 25, 2022

andersjonsson commented Mar 25, 2022

myobis commented Mar 27, 2022

andersjonsson commented Mar 28, 2022