Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most XML entities not being decoded by reader #99

Closed
kernwig opened this issue Jul 29, 2021 · 8 comments
Closed

Most XML entities not being decoded by reader #99

kernwig opened this issue Jul 29, 2021 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@kernwig
Copy link

kernwig commented Jul 29, 2021

Describe the bug
New XML entity decoding in v3.0 only works on & < and >, but not others. Interestingly, the XML builder does encode a few other characters like '.

To Reproduce

   const xmlResponse = create('<ssid>Me &amp; Myself&apos;s WiFi &copy;&#x1D306;</ssid>');
   const networkResponse = xmlResponse.end({format: 'object'});
   return networkResponse.ssid;

The output string is Me &amp; Myself&apos;s WiFi &amp;copy;&amp;#x1D306;

The first two entities were not touched, while the last two got double-encoded.

Expected behavior
Expected result to be Me & Myself's WiFi ©𝌆.

Version:

  • node.js: 12.16.1
  • xmlbuilder2: 3.0.1

Additional context
Related to #88

@kernwig kernwig added the bug Something isn't working label Jul 29, 2021
@oozcitak
Copy link
Owner

I'll take a look at &apos; it should be decoded as '. Also &#x1D306; should be decoded as 𝌆. But & needs to be escaped when serialized as &amp; per spec. &copy; is not an XML predefined entity, so it should throw a parser exception. Browsers throw on undefined entities, e.g. with Chrome:

const str = `<ssid>Me &amp; Myself&apos;s WiFi &copy;&#x1D306;</ssid>`;
const p = new DOMParser();
const s = new XMLSerializer();
s.serializeToString(p.parseFromString(str, "application/xml"));
<ssid>
  <parsererror xmlns=\"http://www.w3.org/1999/xhtml\" style=\"display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black\">
    <h3>This page contains the following errors:</h3>
      <div style=\"font-family:monospace;font-size:12px\">
         error on line 1 at column 41: Entity 'copy' not defined\n
      </div>
    <h3>Below is a rendering of the page up to the first error.</h3>
  </parsererror>
</ssid>

without &copy;:

const str = `<ssid>Me &amp; Myself&apos;s WiFi &#x1D306;</ssid>`;
const p = new DOMParser();
const s = new XMLSerializer();
s.serializeToString(p.parseFromString(str, "application/xml"));
<ssid>Me &amp; Myself's WiFi 𝌆</ssid>

oozcitak added a commit that referenced this issue Jul 30, 2021
@oozcitak
Copy link
Owner

oozcitak commented Jul 30, 2021

Instead of throwing on unknown entities, I opted to silently ignore them. Your code should return:

Me &amp; Myself's WiFi &copy;𝌆

@kernwig
Copy link
Author

kernwig commented Jul 30, 2021

Thank you. My driving bug had to do with &apos;. When I was working around the bug in xmlbuilder2 v2.x, I used the he library and tried this larger variety of characters just to be thorough. Looks like you are now covering the "predefined entities" of XML, not the massive list of "standard" entities in HTML 5. Reference

@kernwig
Copy link
Author

kernwig commented Aug 9, 2021

@oozcitak When might I expect the next release with this fix?

@oozcitak
Copy link
Owner

oozcitak commented Aug 9, 2021

Oh shoot sorry. Forgot to release. I'll prepare one tomorrow. Thanks for reminding.

@oozcitak
Copy link
Owner

Released v3.0.2 with the fix.

@kernwig
Copy link
Author

kernwig commented Aug 10, 2021

Tried it. Looked at the code. It looks like the code should encode/decode &amp;, but your unit test is not expecting to decode and I'm seeing that too.

& is the one character that absolutely must be encoded, since it's the start of the escape sequence. When decoding into a Javascript object, shouldn't all signs of XML encoding be gone so that it can be used in other contexts?

@kernwig
Copy link
Author

kernwig commented Aug 10, 2021

I see that < and > also get re-encoded, but not '.

   const xmlResponse = create('<ssid>Me &amp; Myself&apos;s &lt;WiFi&gt;&#x1D306;</ssid>');
   const networkResponse = xmlResponse.end({format: 'object'});
   return networkResponse.ssid;

Expected result: Me & Myself's <WiFi>𝌆

Actual result with v3.0.2: Me &amp; Myself's &lt;WiFi&gt;𝌆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants