Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparql: Variables bound with initBindings not working in some functions #294

Closed
uholzer opened this issue May 30, 2013 · 18 comments · Fixed by #692
Closed

sparql: Variables bound with initBindings not working in some functions #294

uholzer opened this issue May 30, 2013 · 18 comments · Fixed by #692
Assignees
Labels
bug Something isn't working SPARQL
Milestone

Comments

@uholzer
Copy link
Contributor

uholzer commented May 30, 2013

When I bind a varialbe with initBindings and apply to it a function like STR, UCASE or isLiteral, I get an unexpected result. Using a BIND or VALUES clause instead of initBindings gives the expected result.

Example:

from rdflib import ConjunctiveGraph, URIRef, Literal

g = ConjunctiveGraph()

print("=== without function ===")
print("initBindings:")
for r in g.query("SELECT ?target WHERE { }", initBindings={'target': Literal('example')}): print(r)
print("BIND:")
for r in g.query("SELECT ?target WHERE { BIND('example' AS ?target) }"): print(r)
print("VALUES:")
for r in g.query("SELECT ?target WHERE { } VALUES (?target) {('example')}"): print(r)
print("")
print("=== isLiteral ===")
print("initBindings:")
for r in g.query("SELECT (isLiteral(?target) AS ?r) WHERE { }", initBindings={'target': Literal('example')}): print(r)
print("BIND:")
for r in g.query("SELECT (isLiteral(?target) AS ?r) WHERE { BIND ('example' AS ?target) }"): print(r)
print("VALUES:")
for r in g.query("SELECT (isLiteral(?target) AS ?r) WHERE { } VALUES (?target) {('example')}"): print(r)
print("")
print("=== UCASE ===")
print("initBindings:")
for r in g.query("SELECT (UCASE(?target) AS ?r) WHERE { }", initBindings={'target': Literal('example')}): print(r)
print("BIND:")
for r in g.query("SELECT (UCASE(?target) AS ?r) WHERE { BIND('example' AS ?target) }"): print(r)
print("VALUES:")
for r in g.query("SELECT (UCASE(?target) AS ?r) WHERE { } VALUES (?target) {('example')}"): print(r)

output:

=== without function ===
initBindings:
(rdflib.term.Literal(u'example'),)
BIND:
(rdflib.term.Literal(u'example'),)
VALUES:
(rdflib.term.Literal(u'example'),)

=== isLiteral ===
initBindings:
(None,)
BIND:
(rdflib.term.Literal(u'true', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#boolean')),)
VALUES:
(rdflib.term.Literal(u'true', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#boolean')),)

=== UCASE ===
initBindings:
(None,)
BIND:
(rdflib.term.Literal(u'EXAMPLE'),)
VALUES:
(rdflib.term.Literal(u'EXAMPLE'),)

In order to bisect I used this testcase:

failed = False
try:
    from rdflib import ConjunctiveGraph, URIRef, Literal
    g = ConjunctiveGraph()

    a = set(g.query("SELECT (STR(?target) AS ?r) WHERE { }", initBindings={'target': URIRef('example:a')}))
    b = set(g.query("SELECT (STR(?target) AS ?r) WHERE { } VALUES (?target) {(<example:a>)}"))
    if a != b: failed = True

    a = set(g.query("SELECT (isIRI(?target) AS ?r) WHERE { }", initBindings={'target': URIRef('example:a')}))
    b = set(g.query("SELECT (isIRI(?target) AS ?r) WHERE { } VALUES (?target) {(<example:a>)}"))
    if a != b: failed = True

    a = set(g.query("SELECT (isBlank(?target) AS ?r) WHERE { }", initBindings={'target': URIRef('example:a')}))
    b = set(g.query("SELECT (isBlank(?target) AS ?r) WHERE { } VALUES (?target) {(<example:a>)}"))
    if a != b: failed = True

    a = set(g.query("SELECT (isLiteral(?target) AS ?r) WHERE { }", initBindings={'target': Literal('example')}))
    b = set(g.query("SELECT (isLiteral(?target) AS ?r) WHERE { } VALUES (?target) {('example')}"))
    if a != b: failed = True

    a = set(g.query("SELECT (UCASE(?target) AS ?r) WHERE { }", initBindings={'target': Literal('example')}))
    b = set(g.query("SELECT (UCASE(?target) AS ?r) WHERE { } VALUES (?target) {('example')}"))
    if a != b: failed = True

    a = set(g.query("SELECT ?target WHERE { }", initBindings={'target': Literal('example')}))
    b = set(g.query("SELECT ?target WHERE { } VALUES (?target) {('example')}"))
    if a != b: failed = True
except:
    print("Can't test")
    exit(125)

if failed:
    print("bad")
    exit(1)
else:
    print("good")
    exit(0)

I had problems to bisect this. It could be that it only fails some of the time (or I am unable to bisect). It fails starting with commit 1456fe7. Take a look at the diff: Changing e = _eval(extend.expr, c.forget(ctx)) back to e = _eval(extend.expr, c) solves this problem. Of course it breaks other things, for example the dawg test case bind07 (explanation). That's all I found out.

@gromgull
Copy link
Member

Thanks for the issue report WITH test-case, that's how I like it!

Not really the issue here, but I would expect:

g.query("SELECT ?target WHERE { }", initBindings={'target': Literal('example')}): print(r)

to return no result rows. To me initBindings does not equal values ... but that can probably be debated.

I'll dig in a debug...

@uholzer
Copy link
Contributor Author

uholzer commented May 31, 2013

I have no clue what the correct result would be. I found out after bisecting that I mixed up VALUES and BIND. I would expect initBindings to have the same effect as BIND, which behaves in my examples above like VALUES. I think it boils down to the question what WHERE { } has as result. It could be one row with no bindings or no rows at all. You know the algebra better than I (bad thing to say for a mathematician), so you must decide.

@gromgull
Copy link
Member

hmm... for a values clause after the query the spec [1] says it should be a join between the results and the bindings given, the results in this case being a single empty bgp clause. RDFLib does this and query simplification removes the join all together:

python -m rdflib.plugins.sparql.algebra 'select * where {} values ?a { "cake" }'
[[], SelectQuery_{'where': GroupGraphPatternSub_{}, 'valuesClause': ValuesClause_{'var': [?a], 'value': [literal_{'string': rdflib.term.Literal(u'cake')}]}}]
SelectQuery(
    p = Project(
        p = ToMultiSet(
            p = values(
                res = [{?a: rdflib.term.Literal(u'cake')}]
                _vars = set([])
                )
            _vars = set([])
            )
        PV = []
        _vars = set([])
        )
    datasetClause = None
    PV = []
    _vars = set([])
    )

This yields a single solution with the binding we wanted

Jena (fuseki) does the same.

I think we should define initBindings to work exactly like this, as this makes the SPARQLStore implementation correct :)

[1] http://www.w3.org/TR/sparql11-query/#sparqlAlgebraFinalValues

@uholzer
Copy link
Contributor Author

uholzer commented May 31, 2013

Hmm. Is this equivalent to defining initBindings as WHERE { ... BIND(...) }? That is, is using BIND with constants (i.e. extend with constants) the same as using a single-row VALUES (i.e. join with a single row)?

Anyway, your suggestion sounds good to me.

(Hrmpf, I should have noted all the if __name__ == '__main__' in the sparql modul. An excellent way to learn and debug. Great code.)

@gromgull
Copy link
Member

Hmm - annoying.

In optimising the SPARQL engine in the commits around the one you linked, I introduced generators everywhere, and also the ability to pass bindings "up" the query, as this can help you rule-out lots of possible bindings early on in the evaluation, that would in any case be filtered out later on. Consider the contrived example:

select * where { 
  ?p a foaf:Person ; 
      foaf:name "Bob" . 
  OPTIONAL { 
     ?p foaf:knows ?p2 .  
     ?p2 foaf:mbox <mailto:[email protected]> 
  }
}

According to the algebra the main graph pattern and the optional clause should be executed independently, and then joined, i.e. the optional would bind ALL people who knows bill to ?p... but of course only "Bob" would be kept in the end. By passing the ?p we can save a lot of work.
HOWEVER, we must be careful, with scoping rules for expressions, the upper parts does not know variables outside.

For example, in the bind07 example, the ?z variable should never get a binding, since ?o is bound OUTSIDE the scope where the BIND happens.

This is what the ctx.forget does, removing all bindings bound outside our scope.

This causes trouble for initBindings, the variable-binding-context is pre-populated with the binding, which is then deemed to be outside the Extend operator.

I guess the simplest fix is to make initBindings actually add another bit to the algebra, as if it had been a VALUES clause. This will take more than a line though, so maybe not today :)

@uholzer
Copy link
Contributor Author

uholzer commented May 31, 2013

Thanks, now I understand the problem. You do not need to hurry. Although this broke the build system of my website, it doesn't matter because prepairing the next upload will take several days anyway.

@joernhees
Copy link
Member

@gromgull bump? enough cake :p

@sebastiankruk
Copy link

Any progress on that matter? I think I tried all possible ways of using initBinding and none of them seem to be working. The URIRefs I am passing never seem to be bound in the SPARQL query

@drewp
Copy link
Contributor

drewp commented Jun 28, 2014

I think this is the same issue:

# inline literal, filter works
In [24]: g.query('SELECT * { ?s ?p ?o . FILTER (?o > "2009-02-22T03:00:00-08:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>) }', initBindings={'recent': Literal("2009-02-22T03:00:00-08:00", datatype=XS['dateTime'])}).bindings
Out[24]: [{?o: rdflib.term.Literal(u'2009-02-22T12:00:00-08:00', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#dateTime')), ?s: rdflib.term.URIRef(u'http://example.com'), ?p: rdflib.term.URIRef(u'http://example.com/pred')}]

# use the value  from initBindings, get nothing
In [25]: g.query('SELECT * { ?s ?p ?o . FILTER (?o > ?recent) }', initBindings={'recent': Literal("2009-02-22T03:00:00-08:00", datatype=XS['dateTime'])}).bindings
Out[25]: []

@pchampin
Copy link
Contributor

pchampin commented Sep 7, 2014

Same problem here :-(
In the following code, I would expect query2+bindings to return the same result as query1 (i.e. same query, with explicit BIND clause in it).

ns = Namespace("http://ex.org/")
g = Graph()
g.add((ns.a1, ns.p, Literal(1)))
g.add((ns.a2, ns.p, Literal(2)))
g.add((ns.a3, ns.p, Literal(3)))
g.add((ns.a4, ns.p, Literal(4)))
g.add((ns.a5, ns.p, Literal(5)))

query1 = """
    PREFIX : <http://ex.org/>
    SELECT ?x
    WHERE {
        ?x :p ?y .
        FILTER (?y >= ?miny)
        BIND (3 as ?miny)
    }
"""
print list(g.query(query1))
# returns :a3, :a4 and :a5

query2 = """
    PREFIX : <http://ex.org/>
    SELECT ?x
    WHERE {
        ?x :p ?y .
        FILTER (?y >= ?miny)
    }
"""
bindings = {
    "miny": Literal(3),
    }
print list(g.query(query2, initBindings=bindings))
# returns None

@rnd0101
Copy link

rnd0101 commented Jun 19, 2015

Maybe it makes sense to mention initBindings are broken in the documentation and explain how to work around the issue? For example, to just use formatting operation and .n3() method for literals, like
""" ...query... BIND (%(myvar)s AS ?myvar) ... """ % {'myvar': myvar.n3()}.

@joernhees joernhees added the bug Something isn't working label Jun 22, 2015
@joernhees joernhees added this to the rdflib 4.2.1 milestone Jun 22, 2015
@joernhees joernhees modified the milestones: rdflib 4.2.1, rdflib 4.2.2 Aug 11, 2015
@gromgull gromgull self-assigned this Nov 22, 2015
@gromgull
Copy link
Member

@uholzer you said:

You do not need to hurry

So I waited 2.5 years ;)

BUT I think I've fixed this now. Now initBindings inserts a VALUES clause into the query algebra.

It passes the test-cases mentioned here, but may break something else.

It was more complicated than expected - I believe I've covered all of these cases: http://www.w3.org/TR/sparql11-query/#convertSolMod but some weird queries may have algebras where the MultiSet/Join ends up in the wrong place.

joernhees added a commit that referenced this issue Nov 28, 2015
Fix initBindings handling. Fixes #294
@uholzer
Copy link
Contributor Author

uholzer commented Nov 30, 2015

Wait, does this mean I didn't update my website for 2.5 years? Oh dear.

Thanks for the fix.

@gromgull
Copy link
Member

After sorting out my mess in #586, the fix in here DOES actually work, but negates this: cf9ccd9

now my queries take 300s :(

joernhees added a commit that referenced this issue Jan 28, 2016
* master: (49 commits)
  Update reference to "Emulating container types"
  Avoid class reference to imported function
  Prevent RDFa parser from failing on time elements with child nodes
  Second proposed fix for the broken top_level.txt
  make Prologue and Query new style classes
  DOC: minor typo in paramater
  DOC: unamed -> unnamed
  AuditableStore.commit does not call self.store.commit anymore
  ignore operations with no effect
  fixed trivial copy-paste bug
  added test cases for AuditableStore
  expanded path comparison ops in order to keep py2.6 support and not use total_ordering
  let paths be comparable against all nodes. Fixes #545
  re-introduces special handling for DCTERMS.title and test for it
  Fix initBindings handling. Fixes #294
  added .n3 methods for path objects
  Made ClosedNamespace (and _RDFNamespace) inherit from Namespace
  cleaned up trailing whitespace
  Small but nice SPARQL Optimisation fix
  test for #546 from_n3 trailing backslash
  ...
@gromgull gromgull reopened this Jan 18, 2017
@gromgull
Copy link
Member

My fix to this was no good. I made it insert a values clause, which is then joined to the result of the body. The select projection is then done after the join. So all of @uholzer tests works.

But ideally we want initBindings to be bound everywhere in the query. Also inside FILTER clauses and elsewhere.

This doesn't really map to any existing SPARQL component, BIND, VALUES, etc. all have special cases of sub-queries/groups where their values disappear from scope.

This will require implementing something new - probably reverting to the previous solution, and making sure it's not overwritten by ctx.forget

gromgull added a commit that referenced this issue Jan 19, 2017
initBindings are now special fixed bindings, they are always in scope
in all parts of the query.

Fixes #294 (once and for all)
gromgull added a commit that referenced this issue Jan 19, 2017
initBindings are now special fixed bindings, they are always in scope
in all parts of the query.

Fixes #294 (once and for all)
@joernhees
Copy link
Member

3.5 years of 🍰, thanks for digging through this ;)

@dcsouthwick
Copy link

Hopefully I'm really misunderstanding the examples here...

Here is a sample test case:

from rdflib import Graph, Literal, BNode, Namespace, RDF, URIRef
from rdflib.namespace import DC, FOAF
from rdflib.plugins.sparql import prepareQuery

g = Graph()

# Create an identifier to use as the subject for Donna.
donna = BNode()

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:[email protected]")) )

print(g.serialize(format='nt'))

q = prepareQuery('SELECT ?hash WHERE { ?hash <http://xmlns.com/foaf/0.1/name> "Donna Fales" .}')
person = "Donna Fales"
q2 = prepareQuery('SELECT ?hash WHERE { ?hash <http://xmlns.com/foaf/0.1/name> ?person .}')

#
# Example query works
#
qres = g.query(q)
if qres:
    print("Found ?hash equal to:")
    print(qres.result)
#
# Example initBindings does not work
#
qres2 = g.query(q2, initBindings={'person': person})
if qres2:
    print("Found ?hash equal to:")
    print(qres2.result)
else:
    print("{} not found in rdf - initBindings failed :(".format(person))

and the results:

_:N87622270e0574740bdea75e1bc609595 <http://xmlns.com/foaf/0.1/name> "Donna Fales" .
_:N87622270e0574740bdea75e1bc609595 <http://xmlns.com/foaf/0.1/mbox> <mailto:[email protected]> .
_:N87622270e0574740bdea75e1bc609595 <http://xmlns.com/foaf/0.1/nick> "donna"@foo .
_:N87622270e0574740bdea75e1bc609595 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .


Found ?hash equal to:
[(rdflib.term.BNode('N87622270e0574740bdea75e1bc609595'),)]
Donna Fales not found in rdf - initBindings failed :(
[Finished in 0.805s]

@gromgull
Copy link
Member

gromgull commented Sep 2, 2017

Your person = "Donna Fales" is a string. You are expected to pass a RDFLib object into initBindings, wrap the string in a literal and it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SPARQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants