Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifier as context #1505

Closed
wants to merge 23 commits into from
Closed

Identifier as context #1505

wants to merge 23 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Dec 14, 2021

Fixes # A few issues

Proposed Changes

A rebase of the rebased PR #958 “Move Store API to work with identifiers, not graphs”

This is a Draft PR because the rebased #958 is basically just the start, a number of issues remain unresolved, even with the changes contained in these commits.

This change is the latest in series of changes initiated in 2012 with #167 and reignited in 2013 by #301 (and friends). The subtle differences between the way Dataset and ConjunctiveGraph models work has been discussed at length in various issue threads, in an attempt to get an overview, I collated all the early discussions in a gist

If there are any misperceptions about the relative complexity of the exercise, gromgull summarised the, ahem, context in #307 including the assertion:

“For SPARQL, it is often interesting to on-the-fly define a new DataSet, i.e. selecting some subset of graphs from another DataSet, or even from several datasets. It would be nice to be able to do this without copying all the triples.”

but was moved to observe four years later in #698:

“A combination of https://github.com/RDFLib/rdflib/blob/master/rdflib/graph.py#L1698 and https://github.com/RDFLib/rdflib/blob/master/rdflib/graph.py#L2028 means you can pass an existing graph into the dataset. This will then be added directly. But there is no guarantee this new graph has the same store, and the triples will not be copied over. This is chaos, but wasn't flagged earlier since it's chaos all the way down :)”

The issues list contains around a couple of dozen issues related to the whole "identifier as context" business. I'm gradually working my way through them, discovering whether they remain extant or whether this PR fixes them.

My modus operandi is to create a test to check (in test_dataset_anomaly_wip.py) and when I've characterised the issue, migrate it to test_dataset_anomalies.py with annotations. I've also created a separate test which implements all the current Dataset docstrings test_identifier_as_context.py

Example: Issue #939, “rdflib 4.2.2: ConjunctiveGraph.parse() return a Graph object” is fixed by the changes in this PR, as attested by test_issue939_parse_return_inconsistent_type

A note to reviewers - in order to get the tests to pass without extensive skipping, I've been obliged to make a couple of changes in addition to the rebased PR and for those I have provided a rationale, prefixed with DEVNOTE. I had to add three failing named-graph-related DAWG tests to the skippedtests list - SPARQL update CLEAR remains an issue atm.

@ghost
Copy link
Author

ghost commented Dec 15, 2021

The basics look quite good - serialization of default graph and contexts looks sane and as expected.
Attached is results from wip, methodically iterating the serializers over the dataset object and then its contexts.

anomalies-wip-results.txt

@ghost
Copy link
Author

ghost commented Jan 2, 2022

Got too scrappy to be useful in this state, close and reopen when more coherent.

@ghost ghost closed this Jan 2, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant