Multiple issues with numeric entities #33

raphlinus · 2015-05-06T06:07:22Z

Single digit decimal entities are sometimes recognized, sometimes not. I believe the issue here is the size > 3 test at https://github.com/jgm/cmark/blob/master/src/houdini_html_u.c#L15. When, for example, 	 appears at the end of a line, size == 3 and the test fails.

Handling of  fails to recognize as an entity. This seems to be out of compliance with the current state of the spec, which asks for all 1-8 digit sequences to be recognized. For this issue, perhaps the spec should be changed, and a separate issue commonmark/commonmark-spec#323 open about handling of NULL.

Invalid Unicode characters are passed through to the final render, without replacement. For example, &#xd800; is rendered as b'<p>\xed\xa0\x80</p>\n'. These should be replaced with U+FFFD at parse time.

Entities with more than 8 digits are interpreted as numeric entities. According to the spec, they should be treated as literal text.

Currently, during parsing of entities, the int codepoint is subject to integer overflow, which is undefined behavior in C (yes, I know this is insane, but when you lie down with C, you get up with UB). A sufficiently smart compiler could optimize away the if (cp < codepoint) test because negative values are impossible. This issue would be mitigated somewhat by using a maximum of 8 digits, but &#x80000000 would still provoke it. My recommendation is to use uint32_t and bail when the number of digits exceeds 8.

The text was updated successfully, but these errors were encountered:

nwellnhof · 2015-05-07T14:49:08Z

These are all good points. I can offer to make the following fixes to houdini_unescape_ent:

Accept numeric entities with a single digit.
Replace invalid Unicode code points and  with U+FFFD.
Address integer overflow. It's enough to abort if the accumulated value exceeds 0x10FFFF.

I'll leave the issue of entities with more than 8 digits open for now.

This closes commonmark#33.

nwellnhof · 2015-05-07T15:36:06Z

The pull request also fixes the issue of entities with more than 8 digits.

nwellnhof added a commit to nwellnhof/cmark that referenced this issue May 7, 2015

Multiple issues with numeric entities

5f52f7b

This closes commonmark#33.

nwellnhof mentioned this issue May 7, 2015

Multiple issues with numeric entities #37

Merged

jgm closed this as completed in #37 May 7, 2015

christianselig mentioned this issue Jan 15, 2022

Allow more marker columns in table than header swiftlang/swift-cmark#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple issues with numeric entities #33

Multiple issues with numeric entities #33

raphlinus commented May 6, 2015

nwellnhof commented May 7, 2015

nwellnhof commented May 7, 2015

Multiple issues with numeric entities #33

Multiple issues with numeric entities #33

Comments

raphlinus commented May 6, 2015

nwellnhof commented May 7, 2015

nwellnhof commented May 7, 2015