-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid characters inserted #411
Comments
I can't repro. I have a feeling this is an encoding issue. How do you read the file into a string before sanitizing? |
Hi Michael! Thanks for the fast reply! No, it's not an encoding issue. I wanted to provide a demo and found out, it occurs when you when you add "data" to the AllowedSchemes property. |
I still can't repro. Can you provide a snippet of code that shows the issue? |
Sure, it's based on .NET Framework 4.8 |
Still can't repro 🤷🏻♂️ Made a fresh console app and renamed the Program.txt to Program.cs. Had to rename the namespace to |
Sorry my fault. I missed to check the HtmlSanitizer version in my test project. After updating to the latest, it works with the code attached earlier. |
This occurs due to a CSS rendering issue inside AngleSharp.Css reported here: AngleSharp/AngleSharp.Css#123 The |
Yes, we already did a workaround. Thanks for analyzing and reporting! |
This has been fixed in 8.0.691-beta. In addition to the bug in AngleSharp.Css there was a bug in HtmlSanitizer that prevented this use case from working. This bug has been fixed in 8.0.692 as well but note that this use case won't work in 8.0.692 due to the bug in AngleSharp.Css 0.17.0. |
When sanitizing the html in the file below, the sanitizer inserts a special charater in the style information
dirtyhtml.txt
The character which is inserted is xFFFF, which cause an exception, when the result is put to a Xml serializer.
sanitiedhtml.txt
The text was updated successfully, but these errors were encountered: