Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #482: "Error when encoded quotes are used in html attribute" #526

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dminkovsky
Copy link

@dminkovsky dminkovsky commented Feb 4, 2025

Fix #482

This PR adds an option to decode HTML entities in the style attribute before it is handed off to mensch for parsing. The problem is that although HTML entities are valid in style attribute values, they are not valid in CSS strings, resulting in a mensch parse that contains invalid properties/values, and ultimately causes juice to fail.

@@ -37,6 +37,7 @@
"dependencies": {
"cheerio": "^1.0.0",
"commander": "^12.1.0",
"entities": "^4.5.0",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already a transitive dependency through cheerio. I chose this version because our version of cheerio depends on 4.5.0.

@cossssmin
Copy link
Collaborator

Looks good to me. @jrit?

@dminkovsky
Copy link
Author

dminkovsky commented Feb 5, 2025

I was looking at this a bit more, because I would have expected cheerio to decode on parse. But the cheerio.load usage in juice does not decode entities on parse. I investigated, and determined this was because of the decodeEntities: false on line 11 here:

juice/lib/cheerio.js

Lines 9 to 13 in ff01a24

var cheerioLoad = function(html, options, encodeEntities) {
const { xmlMode, ...rest } = options;
options = Object.assign({ xml: { decodeEntities: false, xmlMode } }, rest);
html = encodeEntities(html);
return cheerio.load(html, options);

Now, you'd think that anything inside xml: {} would only apply if options.xmlMode is true, but that's not the case: if you set decodeEntities to false in xml like like juice does, then the entities are never decoded, regardless of whether options.xmlMode is true or false.

I don't know what to make of this, but this is longstanding juice behavior, so it's probably best to leave alone.

@jrit
Copy link
Collaborator

jrit commented Feb 6, 2025

Looks good to me. To answer the question of why it would be like that - it is probably because there were some old issues of people using juice in really unique ways to preprocess odd bits of code and the less juice modified those things by trying to be smart about encodings the better it seemed to support those scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when encoded quotes are used in html attribute
3 participants