-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test re-org #1838
test re-org #1838
Conversation
Will review tomorrow morning (CET) |
@gjhiggins merging master into this as it seems to be from an older master. Would be better to rebase rather than merge as that will make for cleaner diffs, but I think that may make things a bit messed up on your end if you don't expect it, so I will just go for merging. |
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Oops, sorry about that, I fell behind the curve without noticing. Given that there are so few actual changes, might this be better managed if I close this PR and re-submit with it split into several commits so that github shows all the changes per commit instead of truncating them, as at present? |
@gjhiggins its fine as is, meged worked, almost done with review, I was busy creating a Makefile to (re-)download all test data so it is easier to validate provanance, but will put this in a subsequent PR. I will also fix any issues on windows. |
Seems windows is a bit freaked out by line endings, will see what to do about it. |
Ah, okay.
That makes a lot of sense.
May be related
|
You are right, but there is also I think a deeper issue here in that git is tampering with line endings, and our line endings are slightly different, just adjusting these checks should fix things though, will do it shortly. |
Er, those tests should be marked as Line 84 in 6f2c11c
same for
??? |
But the URI no longer matches, I think we can passs publicID in to make it match again. |
For some reason test run is a lot slower on this branch, not sure why: master: $ .venv/bin/python3 -m pytest ./test/test_parsers/test_turtle_w3c.py
============================================================================ test session starts ============================================================================
platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/iwana/sw/d/github.com/iafork/rdflib.clean, configfile: tox.ini
plugins: subtests-0.7.0, md-report-0.2.0, cov-3.0.0
collected 287 items
test/test_parsers/test_turtle_w3c.py ................................................................................................................................ [ 44%]
............................................................................................................................................................... [100%]
============================================================================ 287 passed in 0.79s ============================================================================ this branch: $ .venv/bin/python3 -m pytest ./test/test_parsers/test_turtle_w3c.py
============================================================================ test session starts ============================================================================
platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/iwana/sw/d/github.com/iafork/rdflib.cleanish, configfile: tox.ini
plugins: subtests-0.7.0, md-report-0.2.0, cov-3.0.0
collected 287 items
test/test_parsers/test_turtle_w3c.py ................................................................................................................................ [ 44%]
..............................................................
................................................................................................. [100%]
====================================================================== 287 passed in 107.51s (0:01:47) ====================================================================== |
looking into that first because it really makes it hard to change anything. |
Okay that only happens when I pass base into the test manifest. |
Will just work on URI suffix then - though this is basically hitting the same problem as here somewhat: #1804 - but I'm not sure entirely. |
Oh, okay, you fixed it. I'm still mystified but that's not unusual 😄 |
basically, when you synced test data, this changed: https://gist.github.com/aucampia/21671bccb12975d4d951156d6ffafa20#file-gistfile1-patch-L940-L946
-@prefix : <https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/tests-nt/manifest.ttl#> .
-
<> rdf:type mf:Manifest ;
mf:name "N-Triples tests" ;
mf:entries
( It should not be in there, because we should be using original test data, but because RDFLib resolves relative URIs to the file location this essentially now makes the URI dependent on where the file is, so a URI check as was done before does not work. passing publicID makes the tests several orders of manitiude slower, so i don't want to do that. We should figure out why that happens but for now it is best to just get things working and into master so we can work independently on other issues. |
For me there is a difference in the amount of executed tests on master and this branch: this branch:
master
|
That difference is notwithstanding the additional test added in |
coveralls also sees different coverage:
|
I think the missing tests are in test/test_roundtrip.py master
this branch
|
I'm guessing the difference is because of:
|
These files are now in a different directory.
Just waiting for latest coveralls report and doing some final double check, but I think it is good to go. |
There is still about 0.3% missing coverage:
Will track it down a bit later. |
Coverage diff: $ diff -U 300 /var/tmp/cover.master /var/tmp/cover.pr
--- /var/tmp/cover.master 2022-04-17 14:19:05.659948239 +0200
+++ /var/tmp/cover.pr 2022-04-17 14:20:43.598464576 +0200
@@ -1,121 +1,121 @@
---------- coverage: platform linux, python 3.9.12-final-0 -----------
Name Stmts Miss Branch BrPart Cover
--------------------------------------------------------------------------------
rdflib/__init__.py 28 6 2 1 77%
rdflib/collection.py 106 15 34 9 81%
rdflib/compare.py 334 18 183 10 93%
rdflib/compat.py 54 23 14 1 53%
rdflib/container.py 146 26 44 9 78%
rdflib/events.py 34 6 16 1 82%
rdflib/exceptions.py 14 1 6 0 95%
rdflib/extras/__init__.py 0 0 0 0 100%
rdflib/extras/cmdlineutils.py 38 38 12 0 0%
rdflib/extras/describer.py 52 52 10 0 0%
rdflib/extras/external_graph_libs.py 70 31 38 1 52%
rdflib/extras/infixowl.py 1010 439 553 54 51%
rdflib/graph.py 920 199 426 40 76%
rdflib/namespace/_BRICK.py 1201 0 2 0 100%
rdflib/namespace/_CSVW.py 88 0 2 0 100%
rdflib/namespace/_DC.py 20 0 2 0 100%
rdflib/namespace/_DCAM.py 9 0 2 0 100%
rdflib/namespace/_DCAT.py 40 0 2 0 100%
rdflib/namespace/_DCMITYPE.py 17 0 2 0 100%
rdflib/namespace/_DCTERMS.py 102 0 2 0 100%
rdflib/namespace/_DOAP.py 48 0 2 0 100%
rdflib/namespace/_FOAF.py 80 0 2 0 100%
rdflib/namespace/_GEO.py 43 0 2 0 100%
rdflib/namespace/_ODRL2.py 210 0 2 0 100%
rdflib/namespace/_ORG.py 50 0 2 0 100%
rdflib/namespace/_OWL.py 84 0 2 0 100%
rdflib/namespace/_PROF.py 14 0 2 0 100%
rdflib/namespace/_PROV.py 175 0 2 0 100%
rdflib/namespace/_QB.py 41 0 2 0 100%
rdflib/namespace/_RDF.py 28 0 2 0 100%
rdflib/namespace/_RDFS.py 20 0 2 0 100%
rdflib/namespace/_SDO.py 2771 0 2 0 100%
rdflib/namespace/_SH.py 184 0 2 0 100%
rdflib/namespace/_SKOS.py 37 0 2 0 100%
rdflib/namespace/_SOSA.py 40 0 2 0 100%
rdflib/namespace/_SSN.py 25 0 2 0 100%
rdflib/namespace/_TIME.py 99 0 2 0 100%
rdflib/namespace/_VANN.py 11 0 2 0 100%
rdflib/namespace/_VOID.py 36 0 2 0 100%
rdflib/namespace/_WGS.py 11 0 2 0 100%
rdflib/namespace/_XSD.py 76 0 2 0 100%
-rdflib/namespace/__init__.py 377 57 174 13 83%
+rdflib/namespace/__init__.py 377 58 174 15 83%
rdflib/parser.py 259 50 116 14 78%
rdflib/paths.py 210 36 152 15 80%
rdflib/plugin.py 137 15 28 4 85%
rdflib/plugins/__init__.py 0 0 0 0 100%
rdflib/plugins/parsers/RDFVOC.py 14 0 2 0 100%
rdflib/plugins/parsers/__init__.py 0 0 0 0 100%
rdflib/plugins/parsers/hext.py 47 2 30 4 92%
rdflib/plugins/parsers/jsonld.py 320 13 228 14 95%
rdflib/plugins/parsers/notation3.py 1193 196 548 101 80%
rdflib/plugins/parsers/nquads.py 39 2 12 2 92%
rdflib/plugins/parsers/ntriples.py 211 8 86 9 94%
-rdflib/plugins/parsers/rdfxml.py 404 22 166 13 94%
+rdflib/plugins/parsers/rdfxml.py 404 76 166 37 78%
rdflib/plugins/parsers/trig.py 97 8 48 7 90%
rdflib/plugins/parsers/trix.py 159 30 74 20 79%
rdflib/plugins/serializers/__init__.py 0 0 0 0 100%
rdflib/plugins/serializers/hext.py 71 7 46 8 87%
rdflib/plugins/serializers/jsonld.py 227 17 149 16 90%
rdflib/plugins/serializers/longturtle.py 203 27 86 19 83%
rdflib/plugins/serializers/n3.py 75 0 30 0 100%
rdflib/plugins/serializers/nquads.py 27 3 14 3 85%
rdflib/plugins/serializers/nt.py 44 9 20 3 78%
-rdflib/plugins/serializers/rdfxml.py 211 13 116 15 91%
+rdflib/plugins/serializers/rdfxml.py 211 12 116 14 92%
rdflib/plugins/serializers/trig.py 72 3 36 4 94%
rdflib/plugins/serializers/trix.py 54 5 32 4 87%
rdflib/plugins/serializers/turtle.py 290 24 124 10 89%
rdflib/plugins/serializers/xmlwriter.py 81 2 34 3 96%
rdflib/plugins/shared/__init__.py 0 0 0 0 100%
rdflib/plugins/shared/jsonld/__init__.py 0 0 0 0 100%
rdflib/plugins/shared/jsonld/context.py 361 9 212 10 97%
rdflib/plugins/shared/jsonld/errors.py 5 0 2 0 100%
rdflib/plugins/shared/jsonld/keys.py 22 0 0 0 100%
rdflib/plugins/shared/jsonld/util.py 58 16 32 4 69%
rdflib/plugins/sparql/__init__.py 22 5 10 4 66%
rdflib/plugins/sparql/aggregates.py 163 5 63 4 96%
rdflib/plugins/sparql/algebra.py 764 48 573 29 93%
rdflib/plugins/sparql/datatypes.py 21 3 8 1 86%
rdflib/plugins/sparql/evaluate.py 316 15 193 13 94%
rdflib/plugins/sparql/evalutils.py 77 13 50 4 78%
rdflib/plugins/sparql/operators.py 570 55 246 35 87%
rdflib/plugins/sparql/parser.py 314 28 92 14 89%
rdflib/plugins/sparql/parserutils.py 133 27 76 4 77%
rdflib/plugins/sparql/processor.py 39 0 10 0 100%
rdflib/plugins/sparql/results/__init__.py 0 0 0 0 100%
rdflib/plugins/sparql/results/csvresults.py 46 1 28 1 97%
rdflib/plugins/sparql/results/graph.py 8 8 2 0 0%
rdflib/plugins/sparql/results/jsonresults.py 80 12 50 8 82%
rdflib/plugins/sparql/results/rdfresults.py 36 3 24 3 90%
rdflib/plugins/sparql/results/tsvresults.py 42 2 16 3 91%
rdflib/plugins/sparql/results/txtresults.py 38 9 30 4 72%
rdflib/plugins/sparql/results/xmlresults.py 137 16 52 8 87%
rdflib/plugins/sparql/sparql.py 242 17 90 6 91%
rdflib/plugins/sparql/update.py 151 10 94 8 92%
rdflib/plugins/stores/__init__.py 0 0 0 0 100%
rdflib/plugins/stores/auditable.py 89 12 22 1 85%
rdflib/plugins/stores/berkeleydb.py 441 37 150 19 90%
rdflib/plugins/stores/concurrent.py 64 49 22 0 22%
rdflib/plugins/stores/memory.py 376 42 202 24 84%
rdflib/plugins/stores/regexmatching.py 83 57 36 0 25%
rdflib/plugins/stores/sparqlconnector.py 89 28 30 8 63%
rdflib/plugins/stores/sparqlstore.py 361 183 172 18 41%
rdflib/query.py 172 34 88 14 79%
rdflib/resource.py 112 66 38 0 32%
rdflib/serializer.py 16 1 6 1 91%
-rdflib/store.py 140 23 48 7 78%
+rdflib/store.py 140 22 48 6 79%
rdflib/term.py 749 148 393 47 77%
rdflib/tools/__init__.py 0 0 0 0 100%
rdflib/tools/csv2rdf.py 300 144 176 38 48%
rdflib/tools/defined_namespace_creator.py 38 4 14 3 87%
rdflib/tools/graphisomorphism.py 57 57 30 0 0%
rdflib/tools/rdf2dot.py 62 62 22 0 0%
rdflib/tools/rdfpipe.py 82 82 36 0 0%
rdflib/tools/rdfs2dot.py 49 49 22 0 0%
rdflib/util.py 156 30 86 8 77%
rdflib/void.py 76 76 30 0 0%
--------------------------------------------------------------------------------
-TOTAL 20345 2889 7321 778 82%
+TOTAL 20345 2942 7321 802 82%
|
To be slightly more accruate, the issue we are hitting is #677. |
Okay another double whammy I guess, hit by the same relative URI issue and then also it seems code that silently fails, though I'm looking more into it. Basically |
Just to re-iterate, this happens when just adding publicID to the graph, and the slowdown is 136x - so the only way to work around the issue caused by relative URI resolution is to make things 136x slower - this should be logged as a performance issue - clearly something is going horribly wrong somewhere.
|
I'm wrong about this, I can kind of get the tests running again with the patch below but some of them are failing $ git diff
diff --git a/test/test_parsers/test_rdfxml_w3c.py b/test/test_parsers/test_rdfxml_w3c.py
index c99069c5..2f4a5d37 100644
--- a/test/test_parsers/test_rdfxml_w3c.py
+++ b/test/test_parsers/test_rdfxml_w3c.py
@@ -45,7 +45,7 @@ class TestStore(Graph):
super(TestStore, self).add((s, p, o))
-TEST = Namespace("http://www.w3.org/2013/RDFXMLTests/testSchema#")
+TEST = Namespace("http://www.w3.org/2000/10/rdf-tests/rdfcore/testSchema#")
CACHE_DIR = os.path.join(TEST_DATA_DIR, "suites", "w3c", "rdfxml")
@@ -61,9 +61,12 @@ skipped = (
def cached_file(url):
+ # logging.debug("url = %s", url)
fname = url2pathname(relative(url))
+ # logging.debug("fname = %s", fname)
fpath = os.path.join(CACHE_DIR, fname)
+ # logging.debug("fpath = %s", fpath)
if not os.path.exists(fpath):
print("%s does not exist, fetching from %s" % (fpath, url))
folder = os.path.dirname(fpath)
@@ -77,7 +80,7 @@ def cached_file(url):
return fpath
-RDFCOREBASE = "http://www.w3.org/2013/RDFXMLTests/"
+RDFCOREBASE = "http://www.w3.org/2000/10/rdf-tests/rdfcore/"
def relative(url):
@@ -178,7 +181,7 @@ class ParserTestCase(unittest.TestCase):
self.manifest = manifest = Graph(store=self.store)
manifest.open(self.path)
manifest.parse(
- cached_file("http://www.w3.org/2013/RDFXMLTests/Manifest.rdf"),
+ cached_file("http://www.w3.org/2000/10/rdf-tests/rdfcore/Manifest.rdf"),
format="xml",
)
@@ -189,6 +192,8 @@ class ParserTestCase(unittest.TestCase):
manifest = self.manifest
num_failed = total = 0
negs = list(manifest.subjects(RDF.type, TEST["NegativeParserTest"]))
+ self.assertGreater(len(negs), 1)
+ logging.debug("negs = %s", len(negs))
negs.sort()
for neg in negs:
status = first(manifest.objects(neg, TEST["status"]))
@@ -201,6 +206,8 @@ class ParserTestCase(unittest.TestCase):
def testPositive(self):
manifest = self.manifest
uris = list(manifest.subjects(RDF.type, TEST["PositiveParserTest"]))
+ self.assertGreater(len(uris), 1)
+ logging.debug("uris = %s", len(uris))
uris.sort()
num_failed = total = 0
for uri in uris:
@@ -238,7 +245,7 @@ results.add((system, RDFS.comment, Literal("")))
if __name__ == "__main__":
manifest = Graph()
manifest.parse(
- cached_file("http://www.w3.org/2013/RDFXMLTests//Manifest.rdf"),
+ cached_file("http://www.w3.org/2000/10/rdf-tests/rdfcore/Manifest.rdf"),
format="xml",
)
import getopt |
Marking this as critical as no other PRs should be merged while this is pending as that will necessitate a re-review and there is just too much that already went into this PR. |
I'm reverting the xml test data to what it was before, because it just won't pass with the new data and I don't want to fix that now, the whole test suite processor has to be rewritten: #1839 |
The test processor is overpinned to weird quirks of the data and just won't pass with the correct test data. Can fix it later. See #1839
This revents to the IRIs from master branch as the tests only pass when using them. Also assert that there is more than 1 test found to prevent silent failures.
Okay I think coverage should be the same now, @gjhiggins please check the additional commits I made if you are fine with them. I will do one final re-check. I won't review or merge any other PR until this is merged. |
I guess so. But if the publicID behaviour changes, then so does the public API. Given that the test content remains the same and it's just a change to the manifest and the test suite is effectively unchanging, we could choose not re-sync the test suite, merely relocate the existing folders ( |
It's fine - the solution to this particular problem is hacky but the test runner is hacky anyway so it is what it is, we need to improve what is happening with the test suites to prevent regressions though, but the RDFXML one is in the most severe condition so that would be a good place to start. Also, I would rather have the hack in our test runner code than have it in the test data. |
Ah, okay. (And apologies for me requiring so much basic education in maintenance techniques, I can only hope to improve.) |
Okay, now this PR increases test coverage, as it should, because of https://github.com/gjhiggins/rdflib/blob/d741814bfc55019b8e10818276c37e463b66dbfd/test/test_parsers/test_broken_parse_data_from_jena.py
As for test executed: master - https://github.com/RDFLib/rdflib/runs/6046486517?check_suite_focus=true
latest PR branch - https://github.com/RDFLib/rdflib/runs/6054354792?check_suite_focus=true
I did another once over, it seems good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @gjhiggins, thanks, makes tests easier to reason about and maintain I think.
I will merge tomorrow morning (CET) if there is no further feedback, other than the weird quirkiness of some test suite runners this is a fairly straight forward change. |
Summary of changes
Collected variously-scattered test data files into a single
data
folder, refreshed the w3c test suite, using the published RDF 1.1. test data, updated involved test files to reference the relocated test suites, renamed some otherwise rather cryptic test data folder names to communicate more clearly the purpose / relevance. Migrated some more tests from thetest
directory into sub-folders.Checklist
so maintainers can fix minor issues and keep your PR up to date.