Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope blank nodes to each federated source #624

Merged
merged 8 commits into from
Mar 26, 2020
Merged

Conversation

rubensworks
Copy link
Member

When federating over multiple sources,
the outcoming blank nodes coming from each source
will receive a distinct blank node,
so that they can not be joined across different sources,
even if they have the same blank node label.

A reverse translation also takes place for incoming queries with blank nodes,
so that these blank nodes will only match if they come from that source.

Blank nodes coming from sources will receive a .skolemized field
containing a named node.
This named node can be queried again as an IRI, and this will
be interpreted by Comunica as a blank node corresponding to the proper
source, assuming that the array of sources remains the same.

Closes #355
Required for LDflex/Query-Solid#34

While the implementation is quite trivial, the impact is quite major, so this deserves a detailed review.

When federating over multiple sources,
the outcoming blank nodes coming from each source
will receive a distinct blank node,
so that they can not be joined across different sources,
even if they have the same blank node label.

A reverse translation also takes place for incoming queries with blank nodes,
so that these blank nodes will only match if they come from that source.

Blank nodes coming from sources will receive a .skolemized field
containing a named node.
This named node can be queried again as an IRI, and this will
be interpreted by Comunica as a blank node corresponding to the proper
source, assuming that the array of sources remains the same.

Closes #355
Required for LDflex/Query-Solid#34
@CLAassistant
Copy link

CLAassistant commented Mar 16, 2020

CLA assistant check
All committers have signed the CLA.

@coveralls
Copy link

coveralls commented Mar 16, 2020

Pull Request Test Coverage Report for Build 2163

  • 37 of 37 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 2142: 0.0%
Covered Lines: 4116
Relevant Lines: 4116

💛 - Coveralls

@rubensworks
Copy link
Member Author

Note to self: test BlankNodeSkolemizable#equals

@@ -0,0 +1,22 @@
import * as RDF from "rdf-js";

export class BlankNodeSkolemizable implements RDF.BlankNode {

This comment was marked as resolved.

}

public equals(other: RDF.Term | null | undefined): boolean {
return other && other.termType === 'BlankNode' && other.value === this.value;

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

const match = /^urn:comunica_skolem:source_([0-9]+):(.+)$/.exec(term.value);
if (match) {
// We had a skolemized term
if (parseInt(match[1], 10) === sourceId) {

This comment was marked as resolved.

*/
public static deskolemizeTerm(term: RDF.Term, sourceId: number): RDF.Term | false {
if (term.termType === 'NamedNode' || term.termType === 'BlankNode') {
const match = /^urn:comunica_skolem:source_([0-9]+):(.+)$/.exec(term.value);

This comment was marked as resolved.

public static skolemizeTerm(term: RDF.Term, sourceId: number): RDF.Term | BlankNodeSkolemizable {
if (term.termType === 'BlankNode') {
const skolemized = `urn:comunica_skolem:source_${sourceId}:${term.value}`;
return new BlankNodeSkolemizable(skolemized, DataFactory.namedNode(skolemized));

This comment was marked as resolved.

This comment was marked as resolved.

* @param term Any RDF term.
* @param sourceId A numerical source identifier.
*/
public static deskolemizeTerm(term: RDF.Term, sourceId: number): RDF.Term | false {

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

public getSourceId(source: IDataSource): number {
let sourceId = this.sourceIds.get(source);
if (sourceId === undefined) {
sourceId = this.sourceIds.size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you might want to give sources the opportunity to optionally define their own ID. Then, if a source is hidden behind a cache or something, it would still give the same ID.

FederatedQuadSource.nullToVariable(graph, 'g'),
);
return new PromiseProxyIterator(async () => {
// Deskolemize terms, so we send the original blank nodes to each source.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

);
return new PromiseProxyIterator(async () => {
// Deskolemize terms, so we send the original blank nodes to each source.
const s = FederatedQuadSource.deskolemizeTerm(FederatedQuadSource.nullToVariable(subject, 's'), sourceId);

This comment was marked as resolved.

@rubensworks
Copy link
Member Author

rubensworks commented Mar 24, 2020

@RubenVerborgh Made some changes based on your comments.
Could you have another look at it?
It's important that we get this part right :-)

@rubensworks rubensworks force-pushed the feature/skolemization branch from 1c60d7d to 68ca460 Compare March 24, 2020 12:43
Copy link
Contributor

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

}

public equals(other: RDF.Term | null | undefined): boolean {
return other && other.termType === 'BlankNode' && other.value === this.value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly !!other && …, but that will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...
I should really enable those strict null checks in TS, because that would have detected this problem at compile time.

const termSourceId = term.value.substr(FederatedQuadSource.SKOLEM_PREFIX.length, colonSeparator - FederatedQuadSource.SKOLEM_PREFIX.length);
// We had a skolemized term
if (termSourceId === sourceId) {
// It can from the correct source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*came

// Deskolemize terms, so we send the original blank nodes to each source.
// Note that some sources may not match bnodes by label. SPARQL endpoints for example consider them variables.
const s = !subject ? DataFactory.variable('vs') : FederatedQuadSource.deskolemizeTerm(subject, sourceId);
const p = !predicate ? DataFactory.variable('vp') : FederatedQuadSource.deskolemizeTerm(predicate, sourceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predicates shouldn't be blanks I guess, but better to be safe than sorry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. By keeping it generic, we can in the future have support for generalized RDF should that ever be needed.

@rubensworks rubensworks merged commit cb7b9b2 into master Mar 26, 2020
@rubensworks rubensworks deleted the feature/skolemization branch March 26, 2020 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scope blank nodes to original sources when federating
4 participants