Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish clear provenance for all non original test data. #1840

Open
aucampia opened this issue Apr 17, 2022 · 2 comments
Open

Establish clear provenance for all non original test data. #1840

aucampia opened this issue Apr 17, 2022 · 2 comments
Assignees
Labels

Comments

@aucampia
Copy link
Member

All original test data should have clear provenance so we know that we are testing the right things, this is in part to mitigate problems like like this. The best way to establish provenance is to programatically download test data, and then make it possible to re-dowload the test data as part of our test run and then ensuring it has not changed.

It would be good to solve this before adding more test data.

@aucampia
Copy link
Member Author

I started working on a Makefile for this, but I think doing this from python may be more sensible as people working on this library likely know python better than GNU make and Python is much more portable and less quirky than GNU Make.

# This file exists mainly to declaratively establish the provenance of test data.
# Runing this file with `make -B all` should redownload all test data with established provanance and should result in no changes to the files on dis.

all:

all: rdfs.ttl
rdfs.ttl:
	curl -L --header "Accept: text/turtle" http://www.w3.org/2000/01/rdf-schema# > $(@)

all: defined_namespaces/qb.ttl
defined_namespaces/qb.ttl:
	curl -L --header "Accept: text/turtle" http://purl.org/linked-data/cube > $(@)

all: suites/w3c/turtle/README
suites/w3c/turtle/README:
	rm -vr $(dir $(@)) || true
	mkdir -vp $(dir $(@))
	curl https://www.w3.org/2013/TurtleTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))

all: suites/w3c/nquads/README
suites/w3c/nquads/README:
	rm -vr $(dir $(@)) || true
	mkdir -vp $(dir $(@))
	curl https://www.w3.org/2013/N-QuadsTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))

all: suites/w3c/ntriples/README
suites/w3c/ntriples/README:
	rm -vr $(dir $(@)) || true
	mkdir -vp $(dir $(@))
	curl https://www.w3.org/2013/N-TriplesTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))

all: suites/w3c/trig/README
suites/w3c/trig/README:
	rm -vr $(dir $(@)) || true
	mkdir -vp $(dir $(@))
	curl https://www.w3.org/2013/TrigTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))

# TODO FIXME: This directoy contains additional files that should be removed:
# - Manifest.rdf
# - datatypes/test001.borked
all: suites/w3c/rdfxml/README
suites/w3c/rdfxml/README:
	rm -vr $(dir $(@)) || true
	mkdir -vp $(dir $(@))
	curl https://www.w3.org/2013/RDFXMLTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))

# TODO FIXME: This directory contains differences from upstream, it seems to be from an older source.
all: suites/DAWG/data-sparql11/manifest-all.ttl
suites/DAWG/data-sparql11/manifest-all.ttl:
	rm -vr $(dir $(@)) || true
	mkdir -vp $(dir $(@))
	curl https://www.w3.org/2009/sparql/docs/tests/sparql11-test-suite-20121023.tar.gz \
		| tar -zxvf - --strip-components=1 -C $(dir $(@))
	find $(dir $(@)) -type f -print0 | xargs -0 chmod -v 644
	find $(dir $(@)) -type f -print0 | xargs -0 dos2unix
	find $(dir $(@)) -type d -print0 | xargs -0 chmod -v 755

@aucampia
Copy link
Member Author

aucampia commented Apr 23, 2022

I'm working on this as part of #1807 and #1701 - as I want to download n3 test data from https://github.com/w3c/N3/tree/master/tests. I will write it in python, it may be slightly more verbose than writing a Makefile but Makefiles have their own host of problems.

aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Added the N3 test suite from https://github.com/w3c/N3/tree/master/tests
- Added `test/data/fetcher.py` which fetches remote test data.
- Changed `test.testutils.GraphHelper` to support nested graphs.

Fixes:
- RDFLib#1807
- RDFLib#1701

Related:
- RDFLib#1840
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Added the N3 test suite from https://github.com/w3c/N3/tree/master/tests
- Added `test/data/fetcher.py` which fetches remote test data.
- Changed `test.testutils.GraphHelper` to support nested graphs.

Fixes:
- RDFLib#1807
- RDFLib#1701

Related:
- RDFLib#1840
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch adds the N3 test suite from https://github.com/w3c/N3/tree/master/tests
and also adds `test/data/fetcher.py` which fetches remote test data.

Remotes are added for some data in the test data directory, more will be
added later and the data itself will be corrected.

I'm mainly doing this because I want N3 test data to test the fix I'm
making for these issues:
- RDFLib#1807
- RDFLib#1701

Related to:
- RDFLib#1840
@aucampia aucampia removed their assignment May 29, 2022
@aucampia aucampia self-assigned this Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant