Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traceback when using certain namespace prefixes #108

Closed
jan-cerny opened this issue Apr 4, 2023 · 3 comments
Closed

Traceback when using certain namespace prefixes #108

jan-cerny opened this issue Apr 4, 2023 · 3 comments

Comments

@jan-cerny
Copy link

Hi!

After update of xmldiff to 2.6, we have started to experience tracebacks in our tests (tracked in ComplianceAsCode/content#10417) that are coming from xmldiff.

The problem is that when the XML content that we pass to xmldiff contains namespace prefixes matching ns\d+, eg. ns0, ns1, xmldiff tracebacks.

The traceback stops happening if we replace these prefixes by some other but not generic prefixes.

My understanding is that the ns\d+ prefixes are used by default when generating XML content using xml.etree.ElementTree when register_namespaces isn't called. Therefore, I think that such inputs should be supported by xmldiff. Is that expectation correct? Or is the intention that xmldiff inputs will never contain these specific prefixes?

Version:

xmldiff 2.6, installed using pip3

Operating system:

Fedora 37

Steps to reproduce:

Create a Python script in which you will diff 2 XML files that use namespace prefixes matching ns\d+, eg. ns0. I have this small reproducer script xmldiff_bug.py:

import xmldiff.main

if __name__ == "__main__":
    left = '<ns0:title xmlns:ns0="http://checklists.nist.gov/xccdf/1.2">foo</ns0:title>'
    right = '<ns0:title xmlns:ns0="http://checklists.nist.gov/xccdf/1.2">bar</ns0:title>'
    diff = xmldiff.main.diff_texts(left, right)
    print(diff)

Actual results:

[jcerny@thinkpad ~]$ python3 xmldiff_bug.py
Traceback (most recent call last):
  File "/home/jcerny/xmldiff_bug.py", line 6, in <module>
    diff = xmldiff.main.diff_texts(left, right)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcerny/.local/lib/python3.11/site-packages/xmldiff/main.py", line 44, in diff_texts
    return _diff(
           ^^^^^^
  File "/home/jcerny/.local/lib/python3.11/site-packages/xmldiff/main.py", line 37, in _diff
    return diff_trees(
           ^^^^^^^^^^^
  File "/home/jcerny/.local/lib/python3.11/site-packages/xmldiff/main.py", line 27, in diff_trees
    return list(diffs)
           ^^^^^^^^^^^
  File "/home/jcerny/.local/lib/python3.11/site-packages/xmldiff/diff.py", line 435, in diff
    etree.register_namespace(k, v)
  File "src/lxml/etree.pyx", line 197, in lxml.etree.register_namespace
ValueError: Prefix format reserved for internal use

Expected results:

I would expect the output that I got with xmldiff 2.5:

[jcerny@thinkpad ~]$ python3 xmldiff_bug.py
[UpdateTextIn(node='/ns0:title[1]', text='bar')]
@regebro
Copy link
Contributor

regebro commented Apr 4, 2023

lxml does indeed generate prefixes like that, and does reserve it for internal use, which seems to mean that it's fine with parsing XML with such namespaces, but it's not fine with registering it.

That on the other hand might just mean that we don't have to register them, so if the namespace matches that format, we might skip the registering, which would be an easy fix. That needs investigating.

@regebro
Copy link
Contributor

regebro commented Apr 5, 2023

2.6.1 released

@regebro regebro closed this as completed Apr 5, 2023
@ggbecker
Copy link

ggbecker commented Apr 5, 2023

fixed by #109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants