Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

junitxml doesn't escape newlines in attribute values #1218

Closed
aivarannamaa opened this issue Dec 2, 2015 · 13 comments · Fixed by #7536
Closed

junitxml doesn't escape newlines in attribute values #1218

aivarannamaa opened this issue Dec 2, 2015 · 13 comments · Fixed by #7536
Labels
plugin: junitxml related to the junitxml builtin plugin topic: reporting related to terminal output and user-facing messages and errors type: bug problem that needs to be addressed
Milestone

Comments

@aivarannamaa
Copy link

If error message has newlines in it then <failure>-s attribute message contains them verbatim, but they should be encoded, otherwise XML parser will lose them.

@nicoddemus nicoddemus added type: bug problem that needs to be addressed topic: reporting related to terminal output and user-facing messages and errors labels Dec 2, 2015
@nicoddemus nicoddemus added this to the 2.8.4 milestone Dec 2, 2015
@RonnyPfannschmidt RonnyPfannschmidt modified the milestones: 2.8.4, 2.8.5 Dec 6, 2015
@RonnyPfannschmidt RonnyPfannschmidt modified the milestones: 2.8.5, 2.8.6 Dec 14, 2015
@nicoddemus nicoddemus added plugin: junitxml related to the junitxml builtin plugin and removed plugin: junitxml related to the junitxml builtin plugin labels Feb 2, 2017
asford pushed a commit to uw-ipd/tmol that referenced this issue Apr 19, 2018
Add post-run cleanup pass to propertly escape multiline error messages in test reports.

Workaround for pytest-dev/pytest#1218.
@dajose
Copy link
Contributor

dajose commented Mar 19, 2019

@nicoddemus has anyone check on this?

@nicoddemus
Copy link
Member

@dajose I don't think so. I would love to review a PR and merge this in though. 😉

@carlos-jenkins
Copy link
Contributor

I think I found the source of the issue:

https://github.com/pytest-dev/py/blob/master/py/_xmlgen.py#L236

A way to monkey patch and test:

>>> from _pytest import junitxml
>>> print(junitxml.Junit.failure(message=junitxml.bin_xml_escape('foo\nbar')).unicode())
<failure message="foo
bar"/>
>>> import re
>>> from py.xml import escape
>>> escape.escape['\n'] = '&#x0a;'
>>> escape.charef_rex = re.compile('|'.join(escape.escape.keys()))
>>> print(junitxml.Junit.failure(message=junitxml.bin_xml_escape('foo\nbar')).unicode())
<failure message="foo&#x0a;bar"/>

Basically, the escape class in the py._xmlgen._escape lacks the replacement in its mapping.

Not sure if this is a too intrusive change, as that function is used everywhere.

We could then change:

        self.escape = {
            u('"') : u('&quot;'), u('<') : u('&lt;'), u('>') : u('&gt;'),
            u('&') : u('&amp;'), u("'") : u('&apos;'),
            }

To:

        self.escape = {
            u('"') : u('&quot;'), 
            u('<') : u('&lt;'), 
            u('>') : u('&gt;'),
            u('&') : u('&amp;'), 
            u("'") : u('&apos;'),
            u("\n") : u('&#x0a;'),
        }

@nicoddemus
Copy link
Member

Thanks @carlos-jenkins for the investigation.

The patch seems reasonable. I wonder though if we can use xml.etree to write that instead; I suppose that module is more well tested and reliable.

@carlos-jenkins
Copy link
Contributor

carlos-jenkins commented Mar 20, 2019

The way to do it with xml.etree is something like this:

>>> from xml.etree import ElementTree as et
>>> e = et.Element('failure', attrib={'message': 'foo\nbar'})
>>> et.tostring(e).decode('utf-8')
'<failure message="foo&#10;bar" />'

A complete rewrite of this module to use xml.etree would be nice, but that is a task that requires a lot more effort than I can currently put on. Also there is always this noble intention of pytest to remain compatible with Python 2.7 that may complicate in particular the handling of the encoding.

@nicoddemus what do you think of just changing the bin_xml_escape to use xml.etree instead?

@carlos-jenkins
Copy link
Contributor

Interesting enough the xml.etree changes the newline to &#10; (in decimal) instead of &#x0a; (in hexadecimal), but both means the same.

@nicoddemus
Copy link
Member

what do you think of just changing the bin_xml_escape to use xml.etree instead?

Sure, sounds good; it is definitely a step in the right direction. 👍

@carlos-jenkins
Copy link
Contributor

carlos-jenkins commented Mar 20, 2019

Mmmm I'm having issues, as the example above with xml.etree returns the full node, while we need just the string for the attribute. Following the rabbit hole I got here:

https://github.com/python/cpython/blob/master/Lib/xml/etree/ElementTree.py#L1075

The problem is that _escape_attrib is a private function, is not properly exposed. No idea on what should we do now.

Also explored the xml.sax.saxutils API but I found it inadequate as the escape function will not handle quotes and newlines correctly.

Thoughts?

@nicoddemus
Copy link
Member

Hmmm that's unfortunate, thanks again for tackling this. 👍

We could then change:

        self.escape = {
            u('"') : u('&quot;'), u('<') : u('&lt;'), u('>') : u('&gt;'),
            u('&') : u('&amp;'), u("'") : u('&apos;'),
            }

To:

        self.escape = {
            u('"') : u('&quot;'), 
            u('<') : u('&lt;'), 
            u('>') : u('&gt;'),
            u('&') : u('&amp;'), 
            u("'") : u('&apos;'),
            u("\n") : u('&#x0a;'),
        }

Perhaps the above would be simpler then? We don't even need a new pytest release for it, and should be simple to implement (which you already did actually) and test.

@RonnyPfannschmidt
Copy link
Member

attribute escape and text escape have different rules, we should start to use a real compliant xml generator instead of the hack we have in place which "works"

@nicoddemus
Copy link
Member

@RonnyPfannschmidt definitely, I mentioned this (see #1218 (comment)). I was about to create #4970 but got side tracked, thanks for commenting which put me back on track. 👍

@carlos-jenkins I still think that if you want to contribute that fix to py it would be great. It is simple and doesn't make current matters worse at all. But definitely we should find the time to use another xml generator for junitxml in the future.

@carlos-jenkins
Copy link
Contributor

carlos-jenkins commented Mar 26, 2019

What @RonnyPfannschmidt mentions is correct. The above monkey patch / quick patch has the side effect of escaping the newlines in the text of a tag, which is undesired.

xml.etree indeed has both functions, sadly as private:

For CDATA (tag text):

https://github.com/python/cpython/blob/20004959d23d07ac784eef51ecb161012180faa8/Lib/xml/etree/ElementTree.py#L1059

For attributes:

https://github.com/python/cpython/blob/20004959d23d07ac784eef51ecb161012180faa8/Lib/xml/etree/ElementTree.py#L1075

While I feel urged to import them I don't think is a good idea because, first, is private and may change in the future, and also there is a C version of ElementTree (previously "cElementTree") that may be imported and used instead of the Python one, and I don't know if that module has those functions, are importable or how the C extension load currently works.

One option may be to copy paste those functions as is, and use them in the junitxml file. Quick, dirty, horrible, but the best of both world. Or again, the best option, clean, beautiful, but longer one to rewrite the module to use xml.etree.

Regards

@nicoddemus
Copy link
Member

nicoddemus commented Mar 26, 2019

I'm OK with copying the functions and use them to fix this bug now: while it is dirty and hacky, I don't think anybody has the time to rewrite junitxml right now, which will make the bug to linger around for who knows how long. Fixing this bug and improving the test suite with a reproduction of this issue are all effects of this hack, but a step in the right direction regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin: junitxml related to the junitxml builtin plugin topic: reporting related to terminal output and user-facing messages and errors type: bug problem that needs to be addressed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants