Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: namespace rebinding in Memory, SimpleMemory and BerkelyDB stores. #1843

Merged

Conversation

aucampia
Copy link
Member

Summary of changes

This patch fixes namespace rebinding in Memory, SimpleMemory and BerkelyDB so that when a prefix is bound to a new URI, that the old URI is also removed from the URI to prefix mapping.

This patch also adds some graph specific tests for namespace (re-)binding and store specific tests for namespace (re-)binding. Two tests related to SPARQLStore and SPARQLUpdateStore are marked as xfail, mainly because they behave slightly different in one corner case from all the other stores, however it is possible to make the argument that they are the only stores that behave correctly and that all other stores behave incorrectly, we just have to choose the behaviour that makes the most sense and then update the documentation at some point and ensure all stores conform to that.

In future it would be good to see how we can unify the behaviour of Store.{bind,namespace,prefix,namespaces} across different implementations as they all more or less behave the same, however this can be done in a future PR, the fix @gjhiggins made is correct and the tests added should ensure all stores we have behave correctly.

The original fix and test was written by @gjhiggins. Commits with @gjhiggins as author are taken from:

This is essentially #1800 with some additional changes.

Fixes #1826.

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Added tests for any changes that have a runtime impact.
  • Checked that all tests and type checking passes.
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

@aucampia aucampia force-pushed the gjhiggins-20220409T1927-namespace_rebind branch 2 times, most recently from e2df424 to b1a0d6a Compare April 18, 2022 11:23
gjhiggins and others added 6 commits April 19, 2022 09:15
Also add additional tests for namespace rebinding.
- Added store specific tests for namespace binding and rebinding.
- Copied @gjhiggins fix for `rdflib.plugins.stores.memory.Memory.bind` to
  `rdflib.plugins.stores.memory.SimpleMemory.bind`.
- Changed `bind` on `Memory`, `SimpleMemory`, `BerkleyDB` stores to
  explicitly check for `is not None`/`is None` instead of
  falsey/truthy.
- Added `pytest.util._coalesce` to do null coalescing, for more info see
  deferred PEP-505 https://peps.python.org/pep-0505/.
@aucampia aucampia force-pushed the gjhiggins-20220409T1927-namespace_rebind branch from b1a0d6a to 1f13226 Compare April 19, 2022 07:15
Copy link
Member

@nicholascar nicholascar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes and the tests for all options.

I agree with the comments that we need to decide if what is currently the cases is actually what we want. I keep tripping up over override & replace, so they seem overly complex to me. I'm keen to see all tests in place but then to consider an API change (for 7.0.0?) to simplify, perhaps to just a single option like override: If True, the new bind() will override anything that existed before. If False, it will do nothing but spit out a warning.

@ghost
Copy link

ghost commented Apr 19, 2022

I keep tripping up over override & replace, so they seem overly complex to me. I'm keen to see all tests in place but then to consider an API change (for 7.0.0?) to simplify, perhaps to just a single option like override: If True, the new bind() will override anything that existed before. If False, it will do nothing but spit out a warning.

I'm not convinced that's an achievable goal without oversimplifying the API. Here's what I have atm as an attempt to explain the nuances in a preliminary expanded version of namespaces_and_bindings.rst (I've chosen to use a test-oriented approach with assertions rather than the usual >>> pycon approach)

Ah, I notice some duplication, I'll clean that up (it is a work-in-progress and atm basically just a dump of my own workings-out from the PR. )

Working with prefix=>namespace bindings
---------------------------------------

Prefix=>namespace bindings - `override` and `replace`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import sys
    from pathlib import Path
    import pytest

    from rdflib.term import URIRef
    from rdflib import ConjunctiveGraph, Graph, Literal
    from rdflib.plugins.stores.memory import Memory
    from rdflib.namespace import Namespace, NAMESPACE_PREFIXES_CORE, NAMESPACE_PREFIXES_RDFLIB
    from rdflib.namespace import OWL, RDFS
    from test.data import tarek, michel, context0, context1, context2

    foaf1_uri = URIRef("http://xmlns.com/foaf/0.1/")
    foaf2_uri = URIRef("http://xmlns.com/foaf/2.0/")

    FOAF1 = Namespace(foaf1_uri)
    FOAF2 = Namespace(foaf2_uri)

    def test_binding_strategy():

        # The confusion here is in the two arguments “override” and “replace” and
        # how they serve two different purposes --- changing a prefix already bound
        # to a namespace versus changing a namespace already bound to a prefix.

        g = Graph(bind_namespaces="none")
        assert len(list(g.namespaces())) == 0

        # 1. Changing the namespace of an existing namespace=>prefix binding:

        # Say they release FOAF 2.0 and you want to change the binding of
        # `http://xmlns.com/foaf/2.0/` to `foaf`.

        # Up to now you had `http://xmlns.com/foaf/0.1/=>foaf` and `http://xmlns.com/foaf/2.0/=>foaf2`

        # We'll just set up those two bindings ...
        g.bind("foaf", FOAF1)
        g.bind("foaf2", FOAF2)
        assert len(list(g.namespaces())) == 2
        assert list(g.namespaces()) == [
            ('foaf', foaf1_uri),
            ("foaf2", foaf2_uri)
        ]

        # Now you want to "upgrade" to FOAF2=>foaf and try the obvious:
        g.bind("foaf", FOAF2)

        # But that doesn't happen, instead a different prefix, `foaf1` is bound:
        assert list(g.namespaces()) == [
            ("foaf", foaf1_uri),
            ("foaf1", foaf2_uri)
        ]

        # The rationale behind this behaviour is that the prefix "foaf" was already
        # bound to the FOAF1 namespace, so a different prefix for the FOAF2 namespace 
        # was automatically created.

        # You can achieve the intended result by setting `replace` to `True`:
        g.bind("foaf", FOAF2, replace=True)

        # Because the FOAF2 namespace was rebound to a different prefix, the old 
        # "foaf2" prefix has gone and because the "foaf" prefix was rebound to a
        # different namespace, the old FOAF1 namespace has gone, leaving just:

        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri)
        ]

        # Now, if you wish, you can re-bind the FOAF1.0 namespace to a prefix
        # of your choice
        g.bind("oldfoaf", FOAF1)

        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ('oldfoaf', foaf1_uri)
        ]

        # 2. Changing the prefix of an existing namespace=>prefix binding:

        # The namespace manager preserves the existing bindings from any
        # subsequent unwanted programmatic rebindings such as:
        g.bind("foaf", FOAF1)

        # Which, as things stand, results in:

        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ('foaf1', foaf1_uri)
        ]

        # In which the attempted change from `oldfoaf` to (the already 
        # bound-to-a different-namespace `foaf`) was intercepted and
        # changed to a non-clashing prefix of `foaf1`

        # So, after undoing the above unwanted rebinding ..
        g.bind("oldfoaf", FOAF1, replace=True)

        # The bindings are again as desired 
        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ('oldfoaf', foaf1_uri)
        ]

        # Next time through, set override to False
        g.bind("foaf", FOAF1, override=False)

        # And the bindings will remain as desired
        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ('oldfoaf', foaf1_uri)
        ]

        # 3. Parsing data with prefix=>namespace bindings 
        # Let's see the situation regarding namespace bindings
        # in parsed data - it can be a bit confusing ...

        # Starting with a very likely example where `foaf` is a
        # prefix for `FOAF1`

        data = """\
        @prefix foaf: <http://xmlns.com/foaf/0.1/> .

        <urn:example:a> foaf:name "Bob" ."""

        g.parse(data=data, format="n3")

        # The FOAF2 namespace is already bound to `foaf` so a
        # non-clashing prefix of `foaf1` is rebound to FOAF1 in
        # place of the existing `oldfoaf` prefix

        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ('foaf1', foaf1_uri)
        ]

        # Yes, namespace bindings in parsed data replace existing
        # bindings (i.e. `oldfoaf` was replaced by `foaf1`), just
        # live with it ...

        # A different example of the same principle where `foaf2`
        # replaces the exsting `foaf`

        data = """\
        @prefix foaf2: <http://xmlns.com/foaf/2.0/> .

        <urn:example:b> foaf2:name "Alice" ."""

        g.parse(data=data, format="n3")

        # The already-bound namespace of `foaf=>FOAF2` is replaced
        # by the `foaf2=>FOAF2` binding specified in the N3 source

        assert list(g.namespaces()) == [
            ('foaf1', foaf1_uri),
            ("foaf2", foaf2_uri),
        ]

        # Where a prefix-namespace binding in the data does
        # not clash with the existing binding ...

        data = """\
        @prefix foaf1: <http://xmlns.com/foaf/0.1/> .

        <urn:example:a> foaf1:name "Bob" ."""

        g.parse(data=data, format="n3")

        # The prefix `foaf1`, already bound to FOAF1 is
        # used

        assert list(g.namespaces()) == [
            ('foaf1', foaf1_uri),
            ("foaf2", foaf2_uri),
        ]

        # Otherwise, the existing prefix binding is replaced

        data = """\
        @prefix foaf: <http://xmlns.com/foaf/0.1/> .

        <urn:example:b> foaf:name "Alice" ."""

        g.parse(data=data, format="n3")

        assert list(g.namespaces()) == [
            ("foaf2", foaf2_uri),
            ('foaf', foaf1_uri),
        ]

        # Prefixes are bound to namespaces which in turn have a URIRef
        # representation - so a different prefix=>namespace binding
        # means a different predicate and thus a different statement:

        expected = """
            @prefix foaf: <http://xmlns.com/foaf/0.1/> .
            @prefix foaf2: <http://xmlns.com/foaf/2.0/> .

            <urn:example:a> foaf:name "Bob" .

            <urn:example:b> foaf:name "Alice" ;
                foaf2:name "Alice" .

            """

        s = g.serialize(format="n3")

        for l in expected.split():
            assert l in s

Prefix=>namespace binding when parsing data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import pytest

    from rdflib import Graph, Literal, URIRef
    from rdflib.namespace import Namespace, OWL, RDFS

    foaf1_uri = URIRef("http://xmlns.com/foaf/0.1/")
    foaf2_uri = URIRef("http://xmlns.com/foaf/2.0/")

    FOAF1 = Namespace(foaf1_uri)
    FOAF2 = Namespace(foaf2_uri)


    def test_parsing_data():

        g = Graph(bind_namespaces="none")
        assert len(list(g.namespaces())) == 0
        g.bind("foaf1", FOAF1)
        g.bind("foaf", FOAF2)

        # 3. Parsing data with prefix=>namespace bindings 
        # Let's see the situation regarding namespace bindings
        # in parsed data - it can be a bit confusing ...

        # Starting with a very likely example where `foaf` is a
        # prefix for `FOAF1`

        data = """\
        @prefix foaf: <http://xmlns.com/foaf/0.1/> .

        <urn:example:a> foaf:name "Bob" ."""

        g.parse(data=data, format="n3")

        # The `foaf` prefix is replaced by the pre-existing
        # `foaf1` binding

        assert list(g.namespaces()) == [
            ('foaf1', foaf1_uri),
            ("foaf", foaf2_uri),
        ]


        # A different example ...

        data = """\
        @prefix foaf2: <http://xmlns.com/foaf/2.0/> .

        <urn:example:b> foaf2:name "Alice" ."""

        g.parse(data=data, format="n3")

        # Also, namespace bindings in parsed data replace existing
        # bindings (i.e. `oldfoaf` was replaced by `foaf1`), just
        # live with it ...

        assert list(g.namespaces()) == [
            ('foaf1', foaf1_uri),
            ("foaf2", foaf2_uri),
        ]

        # Where a prefix-namespace binding in the data does
        # not clash with the existing:

        data = """\
        @prefix foaf1: <http://xmlns.com/foaf/0.1/> .

        <urn:example:a> foaf1:name "Bob" ."""

        g.parse(data=data, format="n3")

        # No rebindings occur

        assert list(g.namespaces()) == [
            ('foaf1', foaf1_uri),
            ("foaf2", foaf2_uri),
        ]

        # Otherwise, the existing binding is replaced
        data = """\
        @prefix foaf: <http://xmlns.com/foaf/0.1/> .

        <urn:example:b> foaf:name "Alice" ."""

        g.parse(data=data, format="n3")

        assert list(g.namespaces()) == [
            ("foaf2", foaf2_uri),
            ('foaf', foaf1_uri),
        ]

        # Prefixes are bound to namespaces which in turn have a URIRef
        # representation - so a different prefix means a different
        # predicate, therefore a different statement:

        expected = """
            @prefix foaf: <http://xmlns.com/foaf/0.1/> .
            @prefix foaf2: <http://xmlns.com/foaf/2.0/> .

            <urn:example:a> foaf:name "Bob" .

            <urn:example:b> foaf:name "Alice" ;
                foaf2:name "Alice" .

            """

        s = g.serialize(format="n3")

        for l in expected.split():
            assert l in s


More extensive illustration of prefix incrementing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A more extensive illustration of attempted rebinds of
`foaf` being intercepted and changed to a non-clashing
prefix of `foafN` (where N is an incrementing integer) ...

.. code-block:: python

    def test_change_namespace_and_prefix():


        g = Graph(bind_namespaces="none")

        g.bind("foaf", FOAF1)
        assert list(g.namespaces()) == [
            ('foaf', foaf1_uri)
        ]

        g.bind("foaf", FOAF2, replace=True)
        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri)
        ]

        g.bind("foaf1", FOAF1)

        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ("foaf1", foaf1_uri)
        ]

        foaf3_uri = URIRef("http://xmlns.com/foaf/3.0/")
        FOAF3 = Namespace("http://xmlns.com/foaf/3.0/")

        g.bind("foaf", FOAF3)

        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ("foaf1", foaf1_uri),
            ("foaf2", foaf3_uri),
        ]

        foaf4_uri = URIRef("http://xmlns.com/foaf/4.0/")
        FOAF4 = Namespace("http://xmlns.com/foaf/4.0/")

        g.bind("foaf", FOAF4)

        assert list(g.namespaces()) == [
            ("foaf", foaf2_uri),
            ("foaf1", foaf1_uri),
            ("foaf2", foaf3_uri),
            ("foaf3", foaf4_uri),
        ]


Prefix=>namespace bindings in a multigraph situation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    def test_multigraph_bindings():
        data = """@prefix friend-of-a-friend: <http://xmlns.com/foaf/0.1/> .

        <urn:example:a> friend-of-a-friend:name "Bob" .

        """

        store = Memory()

        g1 = Graph(store, identifier=context1, bind_namespaces="none")

        g1.bind('foaf', FOAF1)
        assert list(g1.namespaces()) == [
            ('foaf', foaf1_uri)
        ]
        assert list(store.namespaces()) == [
            ('foaf', foaf1_uri)
        ]

        g1.add((tarek, FOAF1.name, Literal("tarek")))

        assert list(store.namespaces()) == [
            ('foaf', foaf1_uri)
        ]

        g2 = Graph(store, identifier=context2, bind_namespaces="none")
        g2.parse(data=data, format="n3")

        # The parser-caused rebind is in the underlying store and all objects
        # that use the store see the changed binding:
        assert list(store.namespaces()) == [
            ('friend-of-a-friend', foaf1_uri)
        ]
        assert list(g1.namespaces()) == [
            ('friend-of-a-friend', foaf1_uri)
        ]

        # Including newly-created objects that use the store
        cg = ConjunctiveGraph(store=store)

        assert ('foaf', foaf1_uri) not in list(cg.namespaces())
        assert ('friend-of-a-friend', foaf1_uri) in list(cg.namespaces())

        assert len(list(g1.namespaces())) == 6
        assert len(list(g2.namespaces())) == 6
        assert len(list(cg.namespaces())) == 6
        assert len(list(store.namespaces())) == 6

        cg.store.add_graph(g1)
        cg.store.add_graph(g2)

        assert "@prefix friend-of-a-friend: <http://xmlns.com/foaf/0.1/>" in cg.serialize(format="n3")

        # In the notation3 format, the statement asserting tarek's name
        # now references the changed prefix:
        assert '<urn:example:tarek> friend-of-a-friend:name "tarek" .' in cg.serialize(format="n3")

        # Add foaf2 binding if not already bound
        cg.bind("foaf2", FOAF2, override=False)
        assert ('foaf2', foaf2_uri) in list(cg.namespaces())

        # Impose foaf binding ... if not already bound
        cg.bind("foaf", FOAF1, override=False)
        assert ('foaf', foaf1_uri) not in list(cg.namespaces())

@aucampia aucampia merged commit 3c87565 into RDFLib:master Apr 19, 2022
@ghost
Copy link

ghost commented May 3, 2022

Um, the change to Store.bind() breaks the public API, store plugins (such as oxrdflib, rdflib-leveldb, rdflib-kyotocabinet and rdflib-sqlachemy) are throwing exceptions on "bind got an unexpected keyword 'override'".

@aucampia
Copy link
Member Author

aucampia commented May 3, 2022

Um, the change to Store.bind() breaks the public API, store plugins (such as oxrdflib, rdflib-leveldb, rdflib-kyotocabinet and rdflib-sqlachemy) are throwing exceptions on "bind got an unexpected keyword 'override'".

Created a new issue for this #1880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with namespace rebinding
3 participants