Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid characters inserted #411

Closed
NGemballa opened this issue Dec 2, 2022 · 9 comments
Closed

invalid characters inserted #411

NGemballa opened this issue Dec 2, 2022 · 9 comments

Comments

@NGemballa
Copy link

When sanitizing the html in the file below, the sanitizer inserts a special charater in the style information
dirtyhtml.txt

The character which is inserted is xFFFF, which cause an exception, when the result is put to a Xml serializer.
image
sanitiedhtml.txt

@mganss
Copy link
Owner

mganss commented Dec 2, 2022

I can't repro. I have a feeling this is an encoding issue. How do you read the file into a string before sanitizing?

@NGemballa
Copy link
Author

Hi Michael!

Thanks for the fast reply!

No, it's not an encoding issue. I wanted to provide a demo and found out, it occurs when you when you add "data" to the AllowedSchemes property.

@mganss
Copy link
Owner

mganss commented Dec 2, 2022

I still can't repro. Can you provide a snippet of code that shows the issue?

@NGemballa
Copy link
Author

Sure, it's based on .NET Framework 4.8
Program.txt

@mganss
Copy link
Owner

mganss commented Dec 2, 2022

Still can't repro 🤷🏻‍♂️ Made a fresh console app and renamed the Program.txt to Program.cs. Had to rename the namespace to Ganss.Xss to accomodate the latest version of HtmlSanitizer.

@NGemballa
Copy link
Author

Sorry my fault. I missed to check the HtmlSanitizer version in my test project. After updating to the latest, it works with the code attached earlier.
But still I've got the issue the original code. I attached a demo project incl. the source html (strip to avoid sharing personal data). In the sanitized html the special character is added
image
Hope that helps to reproduce the issue.
HtmlSanitizerTest.zip

@mganss
Copy link
Owner

mganss commented Dec 13, 2022

This occurs due to a CSS rendering issue inside AngleSharp.Css reported here: AngleSharp/AngleSharp.Css#123

The " inside the style attribute are unbalanced which is what may be triggering the issue. Perhaps you can work around the issue by fixing this in the original source.

@NGemballa
Copy link
Author

Yes, we already did a workaround.

Thanks for analyzing and reporting!

mganss added a commit that referenced this issue Aug 3, 2023
mganss added a commit that referenced this issue Aug 3, 2023
@mganss mganss closed this as completed in ad12ca0 Aug 3, 2023
@mganss
Copy link
Owner

mganss commented Aug 3, 2023

This has been fixed in 8.0.691-beta. In addition to the bug in AngleSharp.Css there was a bug in HtmlSanitizer that prevented this use case from working. This bug has been fixed in 8.0.692 as well but note that this use case won't work in 8.0.692 due to the bug in AngleSharp.Css 0.17.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants