-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic support of SERVICE clause #793
Conversation
c30c101
to
ade9752
Compare
690aa00
to
e504cc6
Compare
The LocalVocab used to be an alias for std::vector<std::string> (defined inside of the ResultTable class). New words were pushed to this vector without checking whether they were already in the vocabulary (exception: when a sequence of identical new words were encountered in a SPARQL expression result, the words was only added once per sequence). The basic data is still a std::vector<std::string>, but there is now a phase, started by LocalVocab::startConstructionPhase() and ended by LocalVocab::endConstrutionPhase(), where new words can be added and a words-to-ID map records which words have already been added. After that phase, the map is deleted and the local vocabulary is read-only. Check test/LocalVocabTest for how it works in prinicple. This is also used by the Values class in PR #793 (and I tested it already quite extensively and it works).
The LocalVocab used to be an alias for std::vector<std::string> (defined inside of the ResultTable class). New words were pushed to this vector without checking whether they were already in the vocabulary (exception: when a sequence of identical new words were encountered in a SPARQL expression result, the words was only added once per sequence). The basic data is still a std::vector<std::string>, but there is now a phase, started by LocalVocab::startConstructionPhase() and ended by LocalVocab::endConstrutionPhase(), where new words can be added and a words-to-ID map records which words have already been added. After that phase, the map is deleted and the local vocabulary is read-only. Check test/LocalVocabTest for how it works in prinicple. This is also used by the Values class in PR #793 (and I tested it already quite extensively and it works).
The LocalVocab used to be an alias for std::vector<std::string> (defined inside of the ResultTable class). New words were pushed to this vector without checking whether they were already in the vocabulary (exception: when a sequence of identical new words were encountered in a SPARQL expression result, the words was only added once per sequence). The basic data is still a std::vector<std::string>, but there is now a phase, started by LocalVocab::startConstructionPhase() and ended by LocalVocab::endConstrutionPhase(), where new words can be added and a words-to-ID map records which words have already been added. After that phase, the map is deleted and the local vocabulary is read-only. Check test/LocalVocabTest for how it works in prinicple. This is also used by the Values class in PR #793 (and I tested it already quite extensively and it works).
294124b
to
d8f9c32
Compare
db4774f
to
0026bae
Compare
I have rebased and adapted this PR to the current master and it works (it's currently active on https://qlever.cs.uni-freiburg.de/wikidata and https://qlever.cs.uni-freiburg.de/osm-germany). But the code still needs quite a bit of work. Here is a list of TODOs:
|
0026bae
to
473cb8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a first detailed look on this code and noticed,
- Some smaller suggestions
- A general conceptual question (how much of the remaining work will you do for this PR and how much should we leave to someone else)
- The lack of unit tests (start with the parsing, there it should be possible, and think about how we can test/ mock the HTTP stuff in the Service clause (but then again, we now have gained experience with writing unit tests for Http Client/Server pairs, so maybe here we can do something similar.
CMakeLists.txt
Outdated
###################################### | ||
# SSL | ||
###################################### | ||
find_package(OpenSSL REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that the one with the heartbleed?
src/parser/GraphPatternOperation.h
Outdated
// The prologue (prefix definitions). | ||
std::string servicePrologue_; | ||
// The body of the SPARQL query for the remote endpoint. | ||
std::string serviceQueryBody_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, the graphPattern_
and the serviceQueryBody
contain the same content, but both are needed (one parsed, one as a string). This should be commented.
Also the names are confusing, maybe graphPattern_
and graphPatternAsSparql_
.
Parser::GraphGraphPatternContext* ctx) { | ||
reportNotSupported(ctx, "Named Graphs (FROM, GRAPH) are"); | ||
parsedQuery::Service Visitor::visit(Parser::ServiceGraphPatternContext* ctx) { | ||
std::string serviceVarOrIri = ctx->varOrIri()->getText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this (in the grammar) can really be a var
or an Iri
, then we should catch the var
case here (I don't think you support variable endpoints at this time, is that correct?
In this case we should check (maybe explicitly visit the var or Iri, and then check for a variable),
and report a useful error here.
Also then maybe the member in the Service
clause should be called serviceIri
if there is no implementation of a var.
(Or maybe, look up in the standard, maybe it is easy to implement, what are the semantics if that variable is bound to zero or more than one value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now consider this in SparqlQLeverVisitor.cpp
, but stumbled over the peculiarity that visit(VarOrIriContext*)
returns VarOrTerm
(which is a std::variant<Variable, GraphTerm>
, where GraphTerm
is a std::variant<Literal, BlankNode, Iri>
) and not the expected std::variant<Variable, Iri>
.
1463e12
to
4a348d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now already very clean and understandable.
Most of my suggestions are small structural improvements,
there is one discussion to be held,
and the unit tests of the actual SERVICE
functionality are yet missing.
4a348d4
to
64c695d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now very close to merging, I just left some comments on small details.
Let me know, for which aspects you need helpt.
test/ServiceTest.cpp
Outdated
std::string_view body = match.template get<2>(); | ||
std::string response = absl::StrCat( | ||
absl::StrJoin(absl::StrSplit(variables, " "), "\t"), "\n", body); | ||
// std::cout << "Reponse (TSV): " << std::endl << response; | ||
co_return co_await send(ad_utility::httpUtils::createOkResponse( | ||
response, request, ad_utility::MediaType::tsv)); | ||
}); | ||
httpServer.runInOwnThread(); | ||
Iri serviceIri{absl::StrCat("<http://localhost:", httpServer.getPort(), ">")}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should definitely write this test without using the Server.
Given your implementation of Service
(it takes a plain function pointer and is not templated) you would have to
write a free function here with the correct interface, and to specify the test query for multiple cases you would have to make it access a (file-scoped) local variable.
The alternative is to store not a plain function pointer in the Service class, but a std::function<istringstream(whatEverTheArguments)>
. This can hold a plain function pointer as well as a capturing lambda, then you can also work with a lambda that takes the query string and returns a lambda here.
(The overhead of std::function
is ok for the Service
class, since we have to send HttpRequests afterwards.
Codecov ReportBase: 69.28% // Head: 69.44% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #793 +/- ##
==========================================
+ Coverage 69.28% 69.44% +0.16%
==========================================
Files 238 240 +2
Lines 23481 23651 +170
Branches 3079 3093 +14
==========================================
+ Hits 16269 16425 +156
- Misses 5914 5919 +5
- Partials 1298 1307 +9
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Factored out from PR #793 (Basic support of SERVICE clause)
Factored out from PR #793 (Basic support of SERVICE clause)
New version rebased to the current master and with additional tests. The improvements of the handling of `Values` and `LocalVocab` (PR #880) and the improvements in `http/util` (PR #900) have already been factored out. So what is remaining here directly pertains to the implementation of SERVICE. TODO: Tests in `test/SparqlAntlrParserTest.cpp` are still missing, maybe Johannes can help me with that.
bb11f61
to
a152552
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is now very clean and also well-tested.
I have sent you a message about your questions on the missing coverage,
feel free to integrate them, otherwise this can be merged.
We can't, with a reasonable effort, help the other four places.
@joka921 The end of a mini journey, thank you for your valuable help, Johannes! |
First working implementation of SERVICE
Basic principle: Send the query inside the SERVICE clause to the remote endpoint and turn the result into a VALUES clause, which is then processed using the (in the meantime already committed) code from #820 and #822 and the (soon to be committed) code from #823.
Update
parsedQuery::Values
to hold a table ofTripleComponent
sinstead of
std::string
s as before. Modify theSparqlQleverVisitor
accordingly.Update the
Values : Operation
class to add OOV entries to theLocalVocab
(any row containing an oov entry was ignored so far).Update the JOIN operation to propagate the localVocab if only of the
operands has one (which is a frequent use case).
Used the occasion to create a separate
LocalVocab
class with basicfunctionality for adding words to a local vocabulary once.
Add a new class HttpClient which can open HTTP and HTTPS connections
to a given host, and which can then send GET or POST requests, receive the result synchronously, and write it to a (potentially very large)
string. Uses Boost.Beast, just like our HttpServer.
Wrote a test for our HttpServer and HttpClient. However, the test so far only tests the HTTP case because our HttpServer just does HTTP so
far. Also used the occasion to rename util/HttpServer to util/http
because httpServer was simply a misnomer. This change required very many
small changes (in includes and in the various CMakeLists.txt).