-
-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scope blank nodes to each federated source #624
Conversation
When federating over multiple sources, the outcoming blank nodes coming from each source will receive a distinct blank node, so that they can not be joined across different sources, even if they have the same blank node label. A reverse translation also takes place for incoming queries with blank nodes, so that these blank nodes will only match if they come from that source. Blank nodes coming from sources will receive a .skolemized field containing a named node. This named node can be queried again as an IRI, and this will be interpreted by Comunica as a blank node corresponding to the proper source, assuming that the array of sources remains the same. Closes #355 Required for LDflex/Query-Solid#34
Pull Request Test Coverage Report for Build 2163
💛 - Coveralls |
Note to self: test |
@@ -0,0 +1,22 @@ | |||
import * as RDF from "rdf-js"; | |||
|
|||
export class BlankNodeSkolemizable implements RDF.BlankNode { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
} | ||
|
||
public equals(other: RDF.Term | null | undefined): boolean { | ||
return other && other.termType === 'BlankNode' && other.value === this.value; |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
const match = /^urn:comunica_skolem:source_([0-9]+):(.+)$/.exec(term.value); | ||
if (match) { | ||
// We had a skolemized term | ||
if (parseInt(match[1], 10) === sourceId) { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
*/ | ||
public static deskolemizeTerm(term: RDF.Term, sourceId: number): RDF.Term | false { | ||
if (term.termType === 'NamedNode' || term.termType === 'BlankNode') { | ||
const match = /^urn:comunica_skolem:source_([0-9]+):(.+)$/.exec(term.value); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
public static skolemizeTerm(term: RDF.Term, sourceId: number): RDF.Term | BlankNodeSkolemizable { | ||
if (term.termType === 'BlankNode') { | ||
const skolemized = `urn:comunica_skolem:source_${sourceId}:${term.value}`; | ||
return new BlankNodeSkolemizable(skolemized, DataFactory.namedNode(skolemized)); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
* @param term Any RDF term. | ||
* @param sourceId A numerical source identifier. | ||
*/ | ||
public static deskolemizeTerm(term: RDF.Term, sourceId: number): RDF.Term | false { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
public getSourceId(source: IDataSource): number { | ||
let sourceId = this.sourceIds.get(source); | ||
if (sourceId === undefined) { | ||
sourceId = this.sourceIds.size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you might want to give sources the opportunity to optionally define their own ID. Then, if a source is hidden behind a cache or something, it would still give the same ID.
FederatedQuadSource.nullToVariable(graph, 'g'), | ||
); | ||
return new PromiseProxyIterator(async () => { | ||
// Deskolemize terms, so we send the original blank nodes to each source. |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
); | ||
return new PromiseProxyIterator(async () => { | ||
// Deskolemize terms, so we send the original blank nodes to each source. | ||
const s = FederatedQuadSource.deskolemizeTerm(FederatedQuadSource.nullToVariable(subject, 's'), sourceId); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Remove regex, parseInt, and optimize return types for V8
@RubenVerborgh Made some changes based on your comments. |
1c60d7d
to
68ca460
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
} | ||
|
||
public equals(other: RDF.Term | null | undefined): boolean { | ||
return other && other.termType === 'BlankNode' && other.value === this.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strictly !!other && …
, but that will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right...
I should really enable those strict null checks in TS, because that would have detected this problem at compile time.
const termSourceId = term.value.substr(FederatedQuadSource.SKOLEM_PREFIX.length, colonSeparator - FederatedQuadSource.SKOLEM_PREFIX.length); | ||
// We had a skolemized term | ||
if (termSourceId === sourceId) { | ||
// It can from the correct source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*came
// Deskolemize terms, so we send the original blank nodes to each source. | ||
// Note that some sources may not match bnodes by label. SPARQL endpoints for example consider them variables. | ||
const s = !subject ? DataFactory.variable('vs') : FederatedQuadSource.deskolemizeTerm(subject, sourceId); | ||
const p = !predicate ? DataFactory.variable('vp') : FederatedQuadSource.deskolemizeTerm(predicate, sourceId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Predicates shouldn't be blanks I guess, but better to be safe than sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. By keeping it generic, we can in the future have support for generalized RDF should that ever be needed.
When federating over multiple sources,
the outcoming blank nodes coming from each source
will receive a distinct blank node,
so that they can not be joined across different sources,
even if they have the same blank node label.
A reverse translation also takes place for incoming queries with blank nodes,
so that these blank nodes will only match if they come from that source.
Blank nodes coming from sources will receive a .skolemized field
containing a named node.
This named node can be queried again as an IRI, and this will
be interpreted by Comunica as a blank node corresponding to the proper
source, assuming that the array of sources remains the same.
Closes #355
Required for LDflex/Query-Solid#34
While the implementation is quite trivial, the impact is quite major, so this deserves a detailed review.