-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: namespace rebinding in Memory
, SimpleMemory
and BerkelyDB
stores.
#1843
fix: namespace rebinding in Memory
, SimpleMemory
and BerkelyDB
stores.
#1843
Conversation
e2df424
to
b1a0d6a
Compare
Also add additional tests for namespace rebinding.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
- Added store specific tests for namespace binding and rebinding. - Copied @gjhiggins fix for `rdflib.plugins.stores.memory.Memory.bind` to `rdflib.plugins.stores.memory.SimpleMemory.bind`. - Changed `bind` on `Memory`, `SimpleMemory`, `BerkleyDB` stores to explicitly check for `is not None`/`is None` instead of falsey/truthy. - Added `pytest.util._coalesce` to do null coalescing, for more info see deferred PEP-505 https://peps.python.org/pep-0505/.
b1a0d6a
to
1f13226
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes and the tests for all options.
I agree with the comments that we need to decide if what is currently the cases is actually what we want. I keep tripping up over override & replace, so they seem overly complex to me. I'm keen to see all tests in place but then to consider an API change (for 7.0.0?) to simplify, perhaps to just a single option like override: If True
, the new bind()
will override anything that existed before. If False
, it will do nothing but spit out a warning.
I'm not convinced that's an achievable goal without oversimplifying the API. Here's what I have atm as an attempt to explain the nuances in a preliminary expanded version of Ah, I notice some duplication, I'll clean that up (it is a work-in-progress and atm basically just a dump of my own workings-out from the PR. ) Working with prefix=>namespace bindings
---------------------------------------
Prefix=>namespace bindings - `override` and `replace`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
import sys
from pathlib import Path
import pytest
from rdflib.term import URIRef
from rdflib import ConjunctiveGraph, Graph, Literal
from rdflib.plugins.stores.memory import Memory
from rdflib.namespace import Namespace, NAMESPACE_PREFIXES_CORE, NAMESPACE_PREFIXES_RDFLIB
from rdflib.namespace import OWL, RDFS
from test.data import tarek, michel, context0, context1, context2
foaf1_uri = URIRef("http://xmlns.com/foaf/0.1/")
foaf2_uri = URIRef("http://xmlns.com/foaf/2.0/")
FOAF1 = Namespace(foaf1_uri)
FOAF2 = Namespace(foaf2_uri)
def test_binding_strategy():
# The confusion here is in the two arguments “override” and “replace” and
# how they serve two different purposes --- changing a prefix already bound
# to a namespace versus changing a namespace already bound to a prefix.
g = Graph(bind_namespaces="none")
assert len(list(g.namespaces())) == 0
# 1. Changing the namespace of an existing namespace=>prefix binding:
# Say they release FOAF 2.0 and you want to change the binding of
# `http://xmlns.com/foaf/2.0/` to `foaf`.
# Up to now you had `http://xmlns.com/foaf/0.1/=>foaf` and `http://xmlns.com/foaf/2.0/=>foaf2`
# We'll just set up those two bindings ...
g.bind("foaf", FOAF1)
g.bind("foaf2", FOAF2)
assert len(list(g.namespaces())) == 2
assert list(g.namespaces()) == [
('foaf', foaf1_uri),
("foaf2", foaf2_uri)
]
# Now you want to "upgrade" to FOAF2=>foaf and try the obvious:
g.bind("foaf", FOAF2)
# But that doesn't happen, instead a different prefix, `foaf1` is bound:
assert list(g.namespaces()) == [
("foaf", foaf1_uri),
("foaf1", foaf2_uri)
]
# The rationale behind this behaviour is that the prefix "foaf" was already
# bound to the FOAF1 namespace, so a different prefix for the FOAF2 namespace
# was automatically created.
# You can achieve the intended result by setting `replace` to `True`:
g.bind("foaf", FOAF2, replace=True)
# Because the FOAF2 namespace was rebound to a different prefix, the old
# "foaf2" prefix has gone and because the "foaf" prefix was rebound to a
# different namespace, the old FOAF1 namespace has gone, leaving just:
assert list(g.namespaces()) == [
("foaf", foaf2_uri)
]
# Now, if you wish, you can re-bind the FOAF1.0 namespace to a prefix
# of your choice
g.bind("oldfoaf", FOAF1)
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
('oldfoaf', foaf1_uri)
]
# 2. Changing the prefix of an existing namespace=>prefix binding:
# The namespace manager preserves the existing bindings from any
# subsequent unwanted programmatic rebindings such as:
g.bind("foaf", FOAF1)
# Which, as things stand, results in:
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
('foaf1', foaf1_uri)
]
# In which the attempted change from `oldfoaf` to (the already
# bound-to-a different-namespace `foaf`) was intercepted and
# changed to a non-clashing prefix of `foaf1`
# So, after undoing the above unwanted rebinding ..
g.bind("oldfoaf", FOAF1, replace=True)
# The bindings are again as desired
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
('oldfoaf', foaf1_uri)
]
# Next time through, set override to False
g.bind("foaf", FOAF1, override=False)
# And the bindings will remain as desired
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
('oldfoaf', foaf1_uri)
]
# 3. Parsing data with prefix=>namespace bindings
# Let's see the situation regarding namespace bindings
# in parsed data - it can be a bit confusing ...
# Starting with a very likely example where `foaf` is a
# prefix for `FOAF1`
data = """\
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<urn:example:a> foaf:name "Bob" ."""
g.parse(data=data, format="n3")
# The FOAF2 namespace is already bound to `foaf` so a
# non-clashing prefix of `foaf1` is rebound to FOAF1 in
# place of the existing `oldfoaf` prefix
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
('foaf1', foaf1_uri)
]
# Yes, namespace bindings in parsed data replace existing
# bindings (i.e. `oldfoaf` was replaced by `foaf1`), just
# live with it ...
# A different example of the same principle where `foaf2`
# replaces the exsting `foaf`
data = """\
@prefix foaf2: <http://xmlns.com/foaf/2.0/> .
<urn:example:b> foaf2:name "Alice" ."""
g.parse(data=data, format="n3")
# The already-bound namespace of `foaf=>FOAF2` is replaced
# by the `foaf2=>FOAF2` binding specified in the N3 source
assert list(g.namespaces()) == [
('foaf1', foaf1_uri),
("foaf2", foaf2_uri),
]
# Where a prefix-namespace binding in the data does
# not clash with the existing binding ...
data = """\
@prefix foaf1: <http://xmlns.com/foaf/0.1/> .
<urn:example:a> foaf1:name "Bob" ."""
g.parse(data=data, format="n3")
# The prefix `foaf1`, already bound to FOAF1 is
# used
assert list(g.namespaces()) == [
('foaf1', foaf1_uri),
("foaf2", foaf2_uri),
]
# Otherwise, the existing prefix binding is replaced
data = """\
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<urn:example:b> foaf:name "Alice" ."""
g.parse(data=data, format="n3")
assert list(g.namespaces()) == [
("foaf2", foaf2_uri),
('foaf', foaf1_uri),
]
# Prefixes are bound to namespaces which in turn have a URIRef
# representation - so a different prefix=>namespace binding
# means a different predicate and thus a different statement:
expected = """
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix foaf2: <http://xmlns.com/foaf/2.0/> .
<urn:example:a> foaf:name "Bob" .
<urn:example:b> foaf:name "Alice" ;
foaf2:name "Alice" .
"""
s = g.serialize(format="n3")
for l in expected.split():
assert l in s
Prefix=>namespace binding when parsing data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
import pytest
from rdflib import Graph, Literal, URIRef
from rdflib.namespace import Namespace, OWL, RDFS
foaf1_uri = URIRef("http://xmlns.com/foaf/0.1/")
foaf2_uri = URIRef("http://xmlns.com/foaf/2.0/")
FOAF1 = Namespace(foaf1_uri)
FOAF2 = Namespace(foaf2_uri)
def test_parsing_data():
g = Graph(bind_namespaces="none")
assert len(list(g.namespaces())) == 0
g.bind("foaf1", FOAF1)
g.bind("foaf", FOAF2)
# 3. Parsing data with prefix=>namespace bindings
# Let's see the situation regarding namespace bindings
# in parsed data - it can be a bit confusing ...
# Starting with a very likely example where `foaf` is a
# prefix for `FOAF1`
data = """\
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<urn:example:a> foaf:name "Bob" ."""
g.parse(data=data, format="n3")
# The `foaf` prefix is replaced by the pre-existing
# `foaf1` binding
assert list(g.namespaces()) == [
('foaf1', foaf1_uri),
("foaf", foaf2_uri),
]
# A different example ...
data = """\
@prefix foaf2: <http://xmlns.com/foaf/2.0/> .
<urn:example:b> foaf2:name "Alice" ."""
g.parse(data=data, format="n3")
# Also, namespace bindings in parsed data replace existing
# bindings (i.e. `oldfoaf` was replaced by `foaf1`), just
# live with it ...
assert list(g.namespaces()) == [
('foaf1', foaf1_uri),
("foaf2", foaf2_uri),
]
# Where a prefix-namespace binding in the data does
# not clash with the existing:
data = """\
@prefix foaf1: <http://xmlns.com/foaf/0.1/> .
<urn:example:a> foaf1:name "Bob" ."""
g.parse(data=data, format="n3")
# No rebindings occur
assert list(g.namespaces()) == [
('foaf1', foaf1_uri),
("foaf2", foaf2_uri),
]
# Otherwise, the existing binding is replaced
data = """\
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<urn:example:b> foaf:name "Alice" ."""
g.parse(data=data, format="n3")
assert list(g.namespaces()) == [
("foaf2", foaf2_uri),
('foaf', foaf1_uri),
]
# Prefixes are bound to namespaces which in turn have a URIRef
# representation - so a different prefix means a different
# predicate, therefore a different statement:
expected = """
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix foaf2: <http://xmlns.com/foaf/2.0/> .
<urn:example:a> foaf:name "Bob" .
<urn:example:b> foaf:name "Alice" ;
foaf2:name "Alice" .
"""
s = g.serialize(format="n3")
for l in expected.split():
assert l in s
More extensive illustration of prefix incrementing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A more extensive illustration of attempted rebinds of
`foaf` being intercepted and changed to a non-clashing
prefix of `foafN` (where N is an incrementing integer) ...
.. code-block:: python
def test_change_namespace_and_prefix():
g = Graph(bind_namespaces="none")
g.bind("foaf", FOAF1)
assert list(g.namespaces()) == [
('foaf', foaf1_uri)
]
g.bind("foaf", FOAF2, replace=True)
assert list(g.namespaces()) == [
("foaf", foaf2_uri)
]
g.bind("foaf1", FOAF1)
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
("foaf1", foaf1_uri)
]
foaf3_uri = URIRef("http://xmlns.com/foaf/3.0/")
FOAF3 = Namespace("http://xmlns.com/foaf/3.0/")
g.bind("foaf", FOAF3)
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
("foaf1", foaf1_uri),
("foaf2", foaf3_uri),
]
foaf4_uri = URIRef("http://xmlns.com/foaf/4.0/")
FOAF4 = Namespace("http://xmlns.com/foaf/4.0/")
g.bind("foaf", FOAF4)
assert list(g.namespaces()) == [
("foaf", foaf2_uri),
("foaf1", foaf1_uri),
("foaf2", foaf3_uri),
("foaf3", foaf4_uri),
]
Prefix=>namespace bindings in a multigraph situation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
def test_multigraph_bindings():
data = """@prefix friend-of-a-friend: <http://xmlns.com/foaf/0.1/> .
<urn:example:a> friend-of-a-friend:name "Bob" .
"""
store = Memory()
g1 = Graph(store, identifier=context1, bind_namespaces="none")
g1.bind('foaf', FOAF1)
assert list(g1.namespaces()) == [
('foaf', foaf1_uri)
]
assert list(store.namespaces()) == [
('foaf', foaf1_uri)
]
g1.add((tarek, FOAF1.name, Literal("tarek")))
assert list(store.namespaces()) == [
('foaf', foaf1_uri)
]
g2 = Graph(store, identifier=context2, bind_namespaces="none")
g2.parse(data=data, format="n3")
# The parser-caused rebind is in the underlying store and all objects
# that use the store see the changed binding:
assert list(store.namespaces()) == [
('friend-of-a-friend', foaf1_uri)
]
assert list(g1.namespaces()) == [
('friend-of-a-friend', foaf1_uri)
]
# Including newly-created objects that use the store
cg = ConjunctiveGraph(store=store)
assert ('foaf', foaf1_uri) not in list(cg.namespaces())
assert ('friend-of-a-friend', foaf1_uri) in list(cg.namespaces())
assert len(list(g1.namespaces())) == 6
assert len(list(g2.namespaces())) == 6
assert len(list(cg.namespaces())) == 6
assert len(list(store.namespaces())) == 6
cg.store.add_graph(g1)
cg.store.add_graph(g2)
assert "@prefix friend-of-a-friend: <http://xmlns.com/foaf/0.1/>" in cg.serialize(format="n3")
# In the notation3 format, the statement asserting tarek's name
# now references the changed prefix:
assert '<urn:example:tarek> friend-of-a-friend:name "tarek" .' in cg.serialize(format="n3")
# Add foaf2 binding if not already bound
cg.bind("foaf2", FOAF2, override=False)
assert ('foaf2', foaf2_uri) in list(cg.namespaces())
# Impose foaf binding ... if not already bound
cg.bind("foaf", FOAF1, override=False)
assert ('foaf', foaf1_uri) not in list(cg.namespaces()) |
Um, the change to |
Created a new issue for this #1880 |
Summary of changes
This patch fixes namespace rebinding in
Memory
,SimpleMemory
andBerkelyDB
so that when a prefix is bound to a new URI, that the old URI is also removed from the URI to prefix mapping.This patch also adds some graph specific tests for namespace (re-)binding and store specific tests for namespace (re-)binding. Two tests related to SPARQLStore and SPARQLUpdateStore are marked as xfail, mainly because they behave slightly different in one corner case from all the other stores, however it is possible to make the argument that they are the only stores that behave correctly and that all other stores behave incorrectly, we just have to choose the behaviour that makes the most sense and then update the documentation at some point and ensure all stores conform to that.
In future it would be good to see how we can unify the behaviour of
Store.{bind,namespace,prefix,namespaces}
across different implementations as they all more or less behave the same, however this can be done in a future PR, the fix @gjhiggins made is correct and the tests added should ensure all stores we have behave correctly.The original fix and test was written by @gjhiggins. Commits with @gjhiggins as author are taken from:
This is essentially #1800 with some additional changes.
Fixes #1826.
Checklist
the same change.
so maintainers can fix minor issues and keep your PR up to date.