Skip to content

Commit

Permalink
Better namespace handling (#107)
Browse files Browse the repository at this point in the history
Adds support for adding and removing namespaces.
Changing URI's does not work, and will likely require big changed, because of lxml's namespace handling.
  • Loading branch information
regebro authored Apr 3, 2023
1 parent f59fec8 commit 422528b
Show file tree
Hide file tree
Showing 12 changed files with 158 additions and 30 deletions.
8 changes: 6 additions & 2 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@ Changes
2.6 (unreleased)
----------------

- Nothing changed yet.

- Added `InsertNamespace` and `DeleteNamespace` actions for better handling
of changing namespaces. Should improve any "Unknown namespace prefix"
errors. Changing the URI of a a namespace prefix is not supported, and will
raise an error.

2.6b1 (2023-01-12)
------------------

- Used geometric mean for the node_ratio, for better handling of simple nodes.

- Added an experimental --best-match method that is slower, but generate
smaller diffs when you have many nodes that are similar.

Expand Down
14 changes: 10 additions & 4 deletions docs/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,13 @@ especially in the case where formatting is added:
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> left = '<body><p>My Fine Content</p></body>'
>>> right = '<body><p>My <i>Fine</i> Content</p></body>'
>>> right = '<body><p><b>My <i>Fine</i> Content</b></p></body>'
>>> result = main.diff_texts(left, right, formatter=formatter)
>>> print(result)
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<p diff:insert="">My <i diff:insert="">Fine</i><diff:insert> Content</diff:insert></p>
<p diff:insert="">
<b diff:insert="" diff:rename="p">My <i diff:insert="">Fine</i><diff:insert> Content</diff:insert></b>
</p>
<p diff:delete="">My Fine Content</p>
</body>
<BLANKLINE>
Expand All @@ -66,7 +68,9 @@ The XMLFormatter supports a better handling of text with the ``text_tags`` and `
>>> result = main.diff_texts(left, right, formatter=formatter)
>>> print(result)
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<p>My <i diff:insert-formatting="">Fine</i> Content</p>
<p>
<b diff:insert-formatting="">My <i diff:insert-formatting="">Fine</i> Content</b>
</p>
</body>

This gives a result that flags the ``<i>`` tag as new formatting.
Expand Down Expand Up @@ -134,7 +138,9 @@ Now use that formatter in the diffing:
>>> result = main.diff_texts(left, right, formatter=formatter)
>>> print(result)
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<p>My <i class="insert-formatting">Fine</i> Content</p>
<p>
<b class="insert-formatting">My <i class="insert-formatting">Fine</i> Content</b>
</p>
</body>

You can then add into your CSS files classes that make inserted text green,
Expand Down
36 changes: 36 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,42 @@ Example:
[InsertComment(target='/document[1]', position=0, text=' A comment ')]


``InsertNamespace(prefix, uri)``
................................

Adds a new namespace to the XML document. You need to have this before
adding a node that uses a namespace that is not in the original XML tree.

Example:

.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> left = '<document></document>'
>>> right = '<document xmlns:new="http://theuri"></document>'
>>> main.diff_texts(left, right)
[InsertNamespace(prefix='new', uri='http://theuri')]


``DeleteNamespace(prefix)``
................................

Removes a namespace from the XML document. You don't need to handle this,
strictly speaking, nothing will break if there is an unused namespace,
but `xmldiff` will return this action.

Example:

.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> left = '<document xmlns:new="http://theuri"></document>'
>>> right = '<document></document>'
>>> main.diff_texts(left, right)
[DeleteNamespace(prefix='new')]



The patching API
----------------

Expand Down
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
extensions = [
"sphinx.ext.doctest",
"sphinx.ext.coverage",
"sphinxarg.ext",
# "sphinxarg.ext",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -66,7 +66,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command-line for these cases.
language = None
language = "en"

# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
Expand Down
6 changes: 3 additions & 3 deletions tests/test_data/all_actions.expected.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<document xmlns:diff="http://namespaces.shoobx.com/diff">
<document xmlns:diff="http://namespaces.shoobx.com/diff" xmlns:space="http://namespaces.shoobx.com/outerspace">
<node name="was updated" diff:update-attr="name:updated" newtribute="renamed" diff:rename-attr="attribute:newtribute" this="is new" diff:add-attr="this" diff:delete-attr="attr"><diff:delete>A bit of contained text</diff:delete><diff:insert>Modified</diff:insert></node><diff:delete>This is outside a tag</diff:delete><diff:insert>New tail content</diff:insert><node diff:delete="">
Here we have some text.
</node>
Expand All @@ -8,7 +8,7 @@
<nod diff:insert="" diff:rename="node">
Here we have some text.
</nod>
<new diff:insert=""/><tail diff:delete="">
<tail diff:delete="">
My last tag
</tail>
</document><!-- You can't delete top level comments with lxml -->
<space:name here="we are testing changing namespaces" diff:rename="{http://namespaces.shoobx.com/name}space"/></document><!-- You can't delete top level comments with lxml -->
4 changes: 3 additions & 1 deletion tests/test_data/all_actions.left.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
<document>
<document
xmlns:name="http://namespaces.shoobx.com/name">
<node attribute="renamed" attr="deleted" name="updated">
A bit of contained text
</node>
Expand All @@ -12,5 +13,6 @@
<tail>
My last tag
</tail>
<name:space here="we are testing changing namespaces"/>
</document>
<!-- You can't delete top level comments with lxml -->
6 changes: 4 additions & 2 deletions tests/test_data/all_actions.right.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
<document>
<document
xmlns:space="http://namespaces.shoobx.com/outerspace">
<!-- Insert a new comment -->
<node newtribute="renamed" name="was updated" this="is new">
Modified
Expand All @@ -10,5 +11,6 @@
<nod>
Here we have some text.
</nod>
<new/></document>
<space:name here="we are testing changing namespaces"/>
</document>
<!-- A different comment, which gets ignored! -->
6 changes: 4 additions & 2 deletions tests/test_formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,8 @@ def test_all_actions(self):
formatter = formatting.XmlDiffFormatter()
result = main.diff_files(lfile, rfile, formatter=formatter)
expected = (
"[insert-namespace, space, http://namespaces.shoobx.com/outerspace]\n"
"[delete-namespace, name]\n"
"[move-after, /document/node[2], /document/tag[1]]\n"
"[insert-comment, /document[1], 0, Insert a new comment ]\n"
'[update, /document/node[1]/@name, "was updated"]\n'
Expand All @@ -505,8 +507,8 @@ def test_all_actions(self):
'[update, /document/node[1]/text()[2], "\\n '
'New tail content\\n "]\n'
"[rename, /document/node[2], nod]\n"
"[insert-after, /document/tail[1], \n"
"<new/>]\n"
"[rename, /document/name:space[1], {http://namespaces.shoobx.com/outerspace}name]\n"
'[update, /document/space:name[1]/text()[2], "\\n "]\n'
"[remove, /document/tail[1]]"
)
self.assertEqual(result, expected)
Expand Down
3 changes: 3 additions & 0 deletions xmldiff/actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@
RenameAttrib = namedtuple("RenameAttrib", "node oldname newname")

InsertComment = namedtuple("InsertComment", "target position text")

InsertNamespace = namedtuple("InsertNamespace", "prefix uri")
DeleteNamespace = namedtuple("DeleteNamespace", "prefix")
21 changes: 21 additions & 0 deletions xmldiff/diff.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,27 @@ def diff(self, left=None, right=None):
if not self._matches:
self.match(left, right)

# First, deal with namespaces:
rnsmap = self.right.nsmap
lnsmap = self.left.nsmap
for k, v in rnsmap.items():
# Make sure it's registered:
if k is not None:
etree.register_namespace(k, v)
if k not in lnsmap:
yield actions.InsertNamespace(k, v)
elif lnsmap[k] != v:
raise RuntimeError(
"Sorry, we do not support changing the URI of namespaces in xmldiff"
)

for k, v in lnsmap.items():
# Make sure it's registered:
if k is not None:
etree.register_namespace(k, v)
if k not in rnsmap:
yield actions.DeleteNamespace(k)

# The paper talks about the five phases, and then does four of them
# in one phase, in a different order that described. This
# implementation in turn differs in order yet again.
Expand Down
40 changes: 36 additions & 4 deletions xmldiff/formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -335,14 +335,15 @@ def format(self, diff, orig_tree, differ=None):
else:
root = result

self._nsmap = [(DIFF_PREFIX, DIFF_NS)]
etree.register_namespace(DIFF_PREFIX, DIFF_NS)

for action in diff:
self.handle_action(action, root)

self.finalize(root)

etree.cleanup_namespaces(result, top_nsmap={DIFF_PREFIX: DIFF_NS})
etree.cleanup_namespaces(result, top_nsmap=dict(self._nsmap))
return self.render(result)

def render(self, result):
Expand All @@ -369,6 +370,11 @@ def _xpath(self, node, xpath):
# one and exactly one element is found. This is to protect against
# formatting a diff on the wrong tree, or against using ambiguous
# edit script xpaths.

# First, make a namespace map that uses the left tree's URI's:
nsmap = dict(self._nsmap)
nsmap.update(node.nsmap)

if xpath[0] == "/":
root = True
xpath = xpath[1:]
Expand All @@ -393,11 +399,10 @@ def _xpath(self, node, xpath):
path = "/" + path

matches = []
for match in node.xpath(path, namespaces=node.nsmap):
for match in node.xpath(path, namespaces=nsmap):
# Skip nodes that have been deleted
if DELETE_NAME not in match.attrib:
matches.append(match)

if index >= len(matches):
raise ValueError(
"xpath {}[{}] not found at {}.".format(
Expand Down Expand Up @@ -632,6 +637,14 @@ def _handle_UpdateTextAfter(self, action, tree):

return node

def _handle_InsertNamespace(self, action, tree):
# There is no way to mark this so it's visible, so we'll just update the tree
self._nsmap.append((action.prefix, action.uri))

def _handle_DeleteNamespace(self, action, tree):
# This will be handled by the namespace cleanup
pass

# There is no InsertComment handler, as this formatter removes all comments


Expand Down Expand Up @@ -702,6 +715,19 @@ def _handle_InsertComment(self, action):
json.dumps(action.text),
)

def _handle_InsertNamespace(self, action):
return (
"insert-namespace",
action.prefix,
action.uri,
)

def _handle_DeleteNamespace(self, action):
return (
"delete-namespace",
action.prefix,
)


class XmlDiffFormatter(BaseFormatter):
"""A formatter for an output trying to be xmldiff 0.6 compatible"""
Expand Down Expand Up @@ -792,4 +818,10 @@ def _handle_RenameNode(self, action, orig_tree):
yield "rename", action.node, action.tag

def _handle_InsertComment(self, action, orig_tree):
yield ("insert-comment", action.target, str(action.position), action.text)
yield "insert-comment", action.target, str(action.position), action.text

def _handle_InsertNamespace(self, action, orig_tree):
yield "insert-namespace", action.prefix, action.uri

def _handle_DeleteNamespace(self, action, orig_tree):
yield "delete-namespace", action.prefix
Loading

0 comments on commit 422528b

Please sign in to comment.