Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdflib 6.2.0 SPARQL parsing regression: multiple prefixes for same IRI not supported? #2077

Open
jclerman opened this issue Aug 11, 2022 · 8 comments
Labels
bug Something isn't working concept: namespace regression Something stopped working

Comments

@jclerman
Copy link
Contributor

I have a standard block of SPARQL prefixes that I use in some of my code, and I started getting test failures when I tried to update from rdflib 6.1.1 to rdflib 6.2.0. I noticed that the problem seems to be that somehow, support for multiple prefixes that resolve to the same URI seems to have been dropped.

Example block at the beginning of a SPARQL query:

PREFIX foo: <http://example.com#>
PREFIX bar: <http://example.com#>

The above should, as far as I know, be supported - but in rdflib 6.2.0, the foo: prefix will not be recognized when used in the query - and in fact, during parsing of the query, the foo: prefix is deleted.

The issue seems to be somewhere in rdfllib.namespace.__init__.py, in NamespaceManager.bind() - but I can't quite pin down what change is causing this new issue.

I do see this comment in the code, which was present in 6.1.1 but maybe it wasn't working as described?

        # Check if the bound_namespace contains a URI
        # and if so convert it into a URIRef for comparison
        # This is to prevent duplicate namespaces with the
        # same URI

however, if I correctly understand the intent behind that comment, it seems problematic - there is nothing wrong (as far as I know?) with multiple prefixes for the same URI in SPARQL.

@aucampia aucampia added the regression Something stopped working label Aug 11, 2022
@aucampia
Copy link
Member

however, if I correctly understand the intent behind that comment, it seems problematic - there is nothing wrong (as far as I know?) with multiple prefixes for the same URI in SPARQL.

I agree with you that I think it is problematic, I don't really see anything wrong with multiple prefixes either, maybe one concern is that we could end up with ambiguity in what the output of serializers that support namespaces will be, but I don't really think that is reason enough to not support multiple prefixes mapping to the same URI.

@aucampia aucampia reopened this Aug 11, 2022
@aucampia
Copy link
Member

Sorry for the accidental closure.

@jclerman
Copy link
Contributor Author

I've looked into this further, motivated by the issues I described in #2348.

The root of the problem is at least reflected in the fact that the Store class (in rdflib/store.py) defines the prefix method with the expectation that it'll return an Optional[str]. I think instead it should be returning a Set[str] (which might be empty).

All of the implementations of Store would need to be updated with that in mind, including updates of their implementations of bind(). For example, in the Memory store, we have

if override:
if bound_prefix is not None:
del self.__namespace[bound_prefix]
if bound_namespace is not None:
del self.__prefix[bound_namespace]
self.__prefix[namespace] = prefix
self.__namespace[prefix] = namespace

which might change to something like:

        if override:
            if bound_prefix is not None:
                # del self.__namespace[bound_prefix]
                pass
            if bound_namespace is not None:
                del self.__prefix[bound_namespace]
            self.__prefix[namespace].add(prefix)
            self.__namespace[prefix] = namespace

There are tests now that enforce the current behavior - bind("xyz", URI, override=True) is tested to verify that it results in removal of any previous prefix that mapped to the URI in question - that seems wrong. If the URI already has a mapping, override should allow the new mapping to be added without removing the old one - or alternatively (I might be confused about the intent of override), that's what should happen if override is False... or maybe we can just get rid of override, since we have replace for when we want to change the mapping for a prefix?

@jclerman
Copy link
Contributor Author

If the above seems reasonable, I'd be willing to open a PR to make the changes. It seems like a potentially-significant change though, so some discussion of the change seems indicated.

aucampia added a commit to jclerman/rdflib that referenced this issue May 18, 2023
* Changed the tests to use an in-module fixture instead of the
  `test_ontology.owl` graph and removed the `test_ontology.owl` file. The
  problem with arbitrary test data is that it is hard to evolve. If possible
  test data should be kept as terse as possible. In this case I just created a
  namespace manager with three namespaces added, of which only two will be
  effective because of <RDFLib#2077>.
* Added a warning to `NamespaceManager.curie` to warn users that it is not a
  pure function and explain the specific side effects.
aucampia pushed a commit that referenced this issue May 21, 2023
Added a `curie` method to `NamespaceManager`, which can be used to generate a
CURIE from a URI.

Other changes:

- Fixed `NamespaceManager.expand_curie` to work with CURIES that have blank
  prefixes (e.g. `:something`), which are valid according to [CURIE Syntax
  1.0](https://www.w3.org/TR/2010/NOTE-curie-20101216/).
- Added a test to confirm <#2077>.

Fixes <#2348>.

---------
Co-authored-by: Iwan Aucamp <[email protected]>
@aucampia aucampia added bug Something isn't working concept: namespace labels Jun 10, 2023
@aucampia
Copy link
Member

@severin-lemaignan
Copy link

I believe this is also the source of an issue with default prefixes:

This works:

PREFIX foo: <http://example.com#>
PREFIX : <http://example.com#>

SELECT ?a ?b
WHERE {
   ?a :bar ?b .
}

This causes: Exception: Unknown namespace prefix : foo

PREFIX foo: <http://example.com#>
PREFIX : <http://example.com#>

SELECT ?a ?b
WHERE {
   ?a foo:bar ?b .
}

@lsharma-202
Copy link

Following for this issue.

Has this been resolved yet? I too am not able to keep multiple IRI for different prefixes and need this feature.

@white-gecko
Copy link
Member

The pr #2800 introduces the tests for this issue.

My use case as:

  • create a graph with a set of default prefixes
  • parse some data into it
  • query it, while assuming the default prefixes are set

So this touches both, the issue of duplicate prefixes and of parsing/serialization side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working concept: namespace regression Something stopped working
Projects
None yet
Development

No branches or pull requests

5 participants