Latest RDF parser chokes on empty namespace URI #288

gklyne · 2013-05-21T12:05:48Z

In my ongoing efforts to bring my code up to the latest rdflib, I find a new problem with the RDF/XML parser. The file concern passes the W3C RDF validator, and contains xmlns="" in its rdf:RDF element.

The full test file is:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE rdf:RDF [
    <!ENTITY rdf     "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
    <!ENTITY rdfs    "http://www.w3.org/2000/01/rdf-schema#" >
    <!ENTITY owl     "http://www.w3.org/2002/07/owl#" >
    <!ENTITY xsd     "http://www.w3.org/2001/XMLSchema#" >
    <!ENTITY xml     "http://www.w3.org/XML/1998/namespace" >
    <!ENTITY rdfg    "http://www.w3.org/2004/03/trix/rdfg-1/" >
    <!ENTITY ore     "http://www.openarchives.org/ore/terms/" >
    <!ENTITY ao      "http://purl.org/ao/" >
    <!ENTITY dcterms "http://purl.org/dc/terms/" >
    <!ENTITY foaf    "http://xmlns.com/foaf/0.1/" >
    <!ENTITY ro      "http://purl.org/wf4ever/ro#" >
    <!ENTITY wfprov  "http://purl.org/wf4ever/wfprov#" >
    <!ENTITY wfdesc  "http://purl.org/wf4ever/wfdesc#" >
]>

<rdf:RDF xmlns=""
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:rdfg="http://www.w3.org/2004/03/trix/rdfg-1/"
     xmlns:ore="http://www.openarchives.org/ore/terms/"
     xmlns:ao="http://purl.org/ao/"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:ro="http://purl.org/wf4ever/ro#"
     xmlns:wfprov="http://purl.org/wf4ever/wfprov#"
     xmlns:wfdesc="http://purl.org/wf4ever/wfdesc#"
>

  <!-- Workflow instance -->

  <wfdesc:Workflow rdf:about="docs/mkjson.sh">
    <rdfs:label>ODS to JSON</rdfs:label>
    <rdfs:comment>Converts multiple ODS files to JSON for processing by Dexy</rdfs:comment>

    <wfdesc:hasInput>
      <wfdesc:Input>
        <wfdesc:hasArtifact rdf:resource="data/UserRequirements-gen.ods" />
      </wfdesc:Input>
    </wfdesc:hasInput>
    <wfdesc:hasInput>
      <wfdesc:Input>
        <wfdesc:hasArtifact rdf:resource="data/UserRequirements-astro.ods" />
      </wfdesc:Input>
    </wfdesc:hasInput>
    <wfdesc:hasInput>
      <wfdesc:Input>
        <wfdesc:hasArtifact rdf:resource="data/UserRequirements-bio.ods" />
      </wfdesc:Input>
    </wfdesc:hasInput>

    <wfdesc:hasOutput>
      <wfdesc:Output>
        <wfdesc:hasArtifact rdf:resource="docs/UserRequirements-gen.json" />
      </wfdesc:Output>
    </wfdesc:hasOutput>
    <wfdesc:hasOutput>
      <wfdesc:Output>
        <wfdesc:hasArtifact rdf:resource="docs/UserRequirements-astro.json" />
      </wfdesc:Output>
    </wfdesc:hasOutput>
    <wfdesc:hasOutput>
      <wfdesc:Output>
        <wfdesc:hasArtifact rdf:resource="docs/UserRequirements-bio.json" />
      </wfdesc:Output>
    </wfdesc:hasOutput>

  </wfdesc:Workflow>

</rdf:RDF>

The text was updated successfully, but these errors were encountered:

gromgull · 2013-05-21T12:08:50Z

AFAIK the RDF/XML parser wasn't changed. How does it break?

gklyne · 2013-05-21T12:12:28Z

Sorry, forgot to include the traceback:

Traceback (most recent call last):
  File "TestEvalChecklist.py", line 437, in testEvaluateWfInputs
    args)
  File "../../rocommand/ro.py", line 184, in runCommand
    status  = run(configbase, options, args)
  File "../../rocommand/ro.py", line 55, in run
    status = ro_command.evaluate(progname, configbase, options, args)
  File "../../rocommand/ro_command.py", line 930, in evaluate
    ro_options["minim"], ro_options["target"], ro_options["purpose"])
  File "../../iaeval/ro_eval_minim.py", line 113, in evaluate
    rotitle      = ( rometa.getAnnotationValue(rouri, DCTERMS.title) or
  File "../../rocommand/ro_metadata.py", line 556, in getAnnotationValue
    return self._loadAnnotations().value(subject=resource, predicate=predicate, object=None)
  File "../../rocommand/ro_metadata.py", line 146, in _loadAnnotations
    self._readAnnotationBody(aref, self.roannotations)
  File "../../rocommand/ro_metadata.py", line 233, in _readAnnotationBody
    anngr.parse(annotationuri, format=annotationformat)
  File "build/bdist.macosx-10.8-intel/egg/rdflib/graph.py", line 1002, in parse
    parser.parse(source, self, **args)
  File "build/bdist.macosx-10.8-intel/egg/rdflib/plugins/parsers/rdfxml.py", line 570, in parse
    self._parser.parse(source)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 360, in start_namespace_decl
    self._cont_handler.startPrefixMapping(prefix, uri)
  File "build/bdist.macosx-10.8-intel/egg/rdflib/plugins/parsers/rdfxml.py", line 122, in startPrefixMapping
    self.store.bind(prefix, URIRef(namespace), override=False)
  File "build/bdist.macosx-10.8-intel/egg/rdflib/term.py", line 207, in __new__
    if not _is_valid_uri(value):
  File "build/bdist.macosx-10.8-intel/egg/rdflib/term.py", line 76, in _is_valid_uri
    if c in uri: return False
TypeError: argument of type 'NoneType' is not iterable

----------------------------------------------------------------------
Ran 1 test in 0.071s

FAILED (errors=1)

I haven't yet isolated this to a stand-alone test case. If there isn't an obvious problem here, get back to me and I'll try to isolate it further.

gromgull · 2013-05-21T12:19:05Z

hmm - next time I will trust my feelings and only ever validate things on output, rdfxml.py:122 needs to read:

self.store.bind(prefix, namespace or "", override=False)

Could you try it and see if it fixes your problem?

gklyne · 2013-05-21T12:42:01Z

That seems to fix the problem.

gromgull · 2013-05-21T13:23:53Z

OR perhaps better, the problem is that rdfxml.py passes None to the URIRef constructor. It may be better to fix this in the constructor, in case someone else also decides to pass None

gklyne · 2013-05-21T14:43:08Z

Arguably, both are problems: interpreting an empty string as None (*), and failing when None is passed to a URIRef constructor.

(*) per RFC3986, an empty string is allowed for a URI-reference:

URI-reference = URI / relative-ref

relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

relative-part = "//" authority path-abempty
             / path-absolute
             / path-noscheme
             / path-empty

path-empty    = 0<pchar>

…method. Validating all URIs on creation time was creating too many problems. Related #288, #285, #287, #279, #266

gromgull · 2013-05-21T20:13:42Z

I fixed it - I wonder how big the memory/performance hit is though - we now create another list in memory of ALL subject URIs ... @osma, any thoughts?

@PuckCh

2013/12/31 RELEASE 4.1 ====================== This is a new minor version RDFLib, which includes a handful of new features: * A TriG parser was added (we already had a serializer) - it is up-to-date wrt. to the newest spec from: http://www.w3.org/TR/trig/ * The Turtle parser was made up to date wrt. to the latest Turtle spec. * Many more tests have been added - RDFLib now has over 2000 (passing!) tests. This is mainly thanks to the NT, Turtle, TriG, NQuads and SPARQL test-suites from W3C. This also included many fixes to the nt and nquad parsers. * ```ConjunctiveGraph``` and ```Dataset``` now support directly adding/removing quads with ```add/addN/remove``` methods. * ```rdfpipe``` command now supports datasets, and reading/writing context sensitive formats. * Optional graph-tracking was added to the Store interface, allowing empty graphs to be tracked for Datasets. The DataSet class also saw a general clean-up, see: RDFLib/rdflib#309 * After long deprecation, ```BackwardCompatibleGraph``` was removed. Minor enhancements/bugs fixed: ------------------------------ * Many code samples in the documentation were fixed thanks to @PuckCh * The new ```IOMemory``` store was optimised a bit * ```SPARQL(Update)Store``` has been made more generic. * MD5 sums were never reinitialized in ```rdflib.compare``` * Correct default value for empty prefix in N3 [#312]RDFLib/rdflib#312 * Fixed tests when running in a non UTF-8 locale [#344]RDFLib/rdflib#344 * Prefix in the original turtle have an impact on SPARQL query resolution [#313]RDFLib/rdflib#313 * Duplicate BNode IDs from N3 Parser [#305]RDFLib/rdflib#305 * Use QNames for TriG graph names [#330]RDFLib/rdflib#330 * \uXXXX escapes in Turtle/N3 were fixed [#335]RDFLib/rdflib#335 * A way to limit the number of triples retrieved from the ```SPARQLStore``` was added [#346]RDFLib/rdflib#346 * Dots in localnames in Turtle [#345]RDFLib/rdflib#345 [#336]RDFLib/rdflib#336 * ```BNode``` as Graph's public ID [#300]RDFLib/rdflib#300 * Introduced ordering of ```QuotedGraphs``` [#291]RDFLib/rdflib#291 2013/05/22 RELEASE 4.0.1 ======================== Following RDFLib tradition, some bugs snuck into the 4.0 release. This is a bug-fixing release: * the new URI validation caused lots of problems, but is nescessary to avoid ''RDF injection'' vulnerabilities. In the spirit of ''be liberal in what you accept, but conservative in what you produce", we moved validation to serialisation time. * the ```rdflib.tools``` package was missing from the ```setup.py``` script, and was therefore not included in the PYPI tarballs. * RDF parser choked on empty namespace URI [#288](RDFLib/rdflib#288) * Parsing from ```sys.stdin``` was broken [#285](RDFLib/rdflib#285) * The new IO store had problems with concurrent modifications if several graphs used the same store [#286](RDFLib/rdflib#286) * Moved HTML5Lib dependency to the recently released 1.0b1 which support python3 2013/05/16 RELEASE 4.0 ====================== This release includes several major changes: * The new SPARQL 1.1 engine (rdflib-sparql) has been included in the core distribution. SPARQL 1.1 queries and updates should work out of the box. * SPARQL paths are exposed as operators on ```URIRefs```, these can then be be used with graph.triples and friends: ```py # List names of friends of Bob: g.triples(( bob, FOAF.knows/FOAF.name , None )) # All super-classes: g.triples(( cls, RDFS.subClassOf * '+', None )) ``` * a new ```graph.update``` method will apply SPARQL update statements * Several RDF 1.1 features are available: * A new ```DataSet``` class * ```XMLLiteral``` and ```HTMLLiterals``` * ```BNode``` (de)skolemization is supported through ```BNode.skolemize```, ```URIRef.de_skolemize```, ```Graph.skolemize``` and ```Graph.de_skolemize``` * Handled of Literal equality was split into lexical comparison (for normal ```==``` operator) and value space (using new ```Node.eq``` methods). This introduces some slight backwards incomaptible changes, but was necessary, as the old version had inconsisten hash and equality methods that could lead the literals not working correctly in dicts/sets. The new way is more in line with how SPARQL 1.1 works. For the full details, see: https://github.com/RDFLib/rdflib/wiki/Literal-reworking * Iterating over ```QueryResults``` will generate ```ResultRow``` objects, these allow access to variable bindings as attributes or as a dict. I.e. ```py for row in graph.query('select ... ') : print row.age, row["name"] ``` * "Slicing" of Graphs and Resources as syntactic sugar: ([#271](RDFLib/rdflib#271)) ```py graph[bob : FOAF.knows/FOAF.name] -> generator over the names of Bobs friends ``` * The ```SPARQLStore``` and ```SPARQLUpdateStore``` are now included in the RDFLib core * The documentation has been given a major overhaul, and examples for most features have been added. Minor Changes: -------------- * String operations on URIRefs return new URIRefs: ([#258](RDFLib/rdflib#258)) ```py >>> URIRef('http://example.org/')+'test rdflib.term.URIRef('http://example.org/test') ``` * Parser/Serializer plugins are also found by mime-type, not just by plugin name: ([#277](RDFLib/rdflib#277)) * ```Namespace``` is no longer a subclass of ```URIRef``` * URIRefs and Literal language tags are validated on construction, avoiding some "RDF-injection" issues ([#266](RDFLib/rdflib#266)) * A new memory store needs much less memory when loading large graphs ([#268](RDFLib/rdflib#268)) * Turtle/N3 serializer now supports the base keyword correctly ([#248](RDFLib/rdflib#248)) * py2exe support was fixed ([#257](RDFLib/rdflib#257)) * Several bugs in the TriG serializer were fixed * Several bugs in the NQuads parser were fixed

gromgull added a commit that referenced this issue May 21, 2013

moved uri validation to serialisation time - in compute_qname and n3 …

3f455fc

…method. Validating all URIs on creation time was creating too many problems. Related #288, #285, #287, #279, #266

gromgull closed this as completed in 002d9b3 May 21, 2013

This was referenced Jan 16, 2017

Initial Update mozilla/addons-server#4303

Closed

Update rdflib to 4.2.1 mozilla/addons-server#4390

Closed

This was referenced Mar 16, 2017

Initial Update mozilla/amo-validator#510

Closed

Update rdflib to 4.2.2 mozilla/amo-validator#515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest RDF parser chokes on empty namespace URI #288

Latest RDF parser chokes on empty namespace URI #288

gklyne commented May 21, 2013

gromgull commented May 21, 2013

gklyne commented May 21, 2013

gromgull commented May 21, 2013

gklyne commented May 21, 2013

gromgull commented May 21, 2013

gklyne commented May 21, 2013

gromgull commented May 21, 2013

Latest RDF parser chokes on empty namespace URI #288

Latest RDF parser chokes on empty namespace URI #288

Comments

gklyne commented May 21, 2013

gromgull commented May 21, 2013

gklyne commented May 21, 2013

gromgull commented May 21, 2013

gklyne commented May 21, 2013

gromgull commented May 21, 2013

gklyne commented May 21, 2013

gromgull commented May 21, 2013