-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional skolemize blank nodes on parse #2736
Comments
I'll look into this. But it seems to me, as we had to work on both the store and on the parser for that. I havent tried this and im sure there are some problems with that but: Something like this: from rdflib import Graph
from rdflib.compare import isomorphic
bnode_context_A: MutableMapping[str, BNode] = {}
in_graph = Graph().parse("data.nt", format="ntriples", bnode_context=bnode_context_A)
bnode_context_B = {}
skolem_graph = Graph()
for ax in in_graph:
for x in ax:
if x not in bnode_context_B:
bnode_context_B[x] = skolemize(x)
skolem_graph.add((bnode_context_B.get(x, x) for x in ax))
bnode_context = {k, bnode_context_B[v] for k, v in bnode_context_A.items()}
graph = Graph().parse("data.nt", format="ntriples")
assert isomorphic(in_graph, graph) I havent looked into how to get this then to work: # I can use skolem_graph across systems with the blank node identifiers preserved from the original data.nt file.
skolem_graph.serialize(format="ntriples") But you should be able to load now with persistent skolemization: #This sould be the same graph as skolem_graph:
new_graph = Graph().parse("data.nt", format="ntriples", bnode_context=bnode_context) |
Perhaps this runnable example will explain it clearer. from rdflib import Graph
from rdflib.compare import isomorphic
data = """
<urn:object> <urn:hasPart> _:internal-bnode-id-1 .
_:internal-bnode-id-1 <urn:value> "..." .
"""
skolem_graph = Graph().parse(data=data, format="ntriples").skolemize()
graph = Graph().parse(data=data, format="ntriples")
assert isomorphic(skolem_graph.de_skolemize(), graph)
# The output should contain the skolem IRI
# <https://rdflib.github.io/.well-known/genid/rdflib/internal-bnode-id-1>
# but instead, we get something like:
#
# <https://rdflib.github.io/.wellknown/genid/rdflib/N19d54f84f7e84ba8a270ddb627e92cdb> <urn:value> "..." .
# <urn:object> <urn:hasPart> <https://rdflib.github.io/.well-known/genid/rdflib/N19d54f84f7e84ba8a270ddb627e92cdb> .
#
# where N19d54f84f7e84ba8a270ddb627e92cdb is the remapped blank node id by RDFLib.
skolem_graph.print(format="ntriples") If we are able to skolemize blank nodes at parse time, we should expect an output like this: <urn:object> <urn:hasPart> <https://rdflib.github.io/.well-known/genid/rdflib/internal-bnode-id-1> .
<https://rdflib.github.io/.well-known/genid/rdflib/internal-bnode-id-1> <urn:value> "..." . Essentially, without a change to the logic at parse time, it's impossible to skolemize blank nodes and preserve the identifiers in the original data. |
Would it be enough to use an identity mapping for from rdflib import Graph, BNode
from rdflib.compare import isomorphic
data = """
<urn:object> <urn:hasPart> _:internal-bnode-id-1 .
_:internal-bnode-id-1 <urn:value> "..." .
"""
from typing import MutableMapping
class IdMap(MutableMapping[str, BNode]):
def __init__(self, dct=None):
self.dct = {} if dct is None else dct
def __getitem__(self, key: str) -> BNode:
return self.dct.setdefault(key, BNode(key))
def __setitem__(self, key: str, value: BNode):
self.dct[key] = value
def __delitem__(self, key: str):
return self.dct.__delitem__(key)
def __iter__(self):
return iter(self.dct)
def __len__(self) -> int:
return len(self.dct)
skolem_graph = Graph().parse(data=data, format="ntriples", bnode_context=IdMap())
for x in skolem_graph:
print(x) Im not sure how to make a transparent implemention of skolemization during parsing. I would rather invest time into the documentation of skolemization in rdflib and have a recipe of this somewhere. |
Thank you @WhiteGobo for your example. I dug into the code a bit and looked into the history of why the I agree with you, the current API is not very transparent, and yes, it would be nice to have recipes, but I still think a change to the API is beneficial here. |
I have a use case where I need to preserve the blank node identifiers when loading data into a Graph object. To do this, I'd like an option on the
rdflib.Graph.parse
method to either provide a custom format (likentriples-skolem
) or a flag on theparse
method (skolemize=True
) to skolemize blank nodes before adding the statements into the graph.The reason why this is needed is because RDF blank nodes are scoped to the local document. As soon as it is read into a new system (like an RDFLib graph object), the blank node identifiers are remapped and assigned a new blank node identifier. There's no guarantee that the blank node identifiers are preserved.
Some pseudocode usage:
The text was updated successfully, but these errors were encountered: