-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code for locating triples in an existing index #1000
base: master
Are you sure you want to change the base?
Conversation
This is the first part of a series of PRs split of from the large proof-of-concept PR #916, which realizes SPARQL 1.1 Update
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1000 +/- ##
===========================================
- Coverage 87.63% 74.21% -13.43%
===========================================
Files 308 256 -52
Lines 28008 24239 -3769
Branches 3138 3064 -74
===========================================
- Hits 24545 17989 -6556
- Misses 2324 5016 +2692
- Partials 1139 1234 +95 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not yet looked at the tests yet,
But I already have some comments about this and that.
// result. | ||
while (locatedTriple != locatedTriples.end() && | ||
locatedTriple->rowIndexInBlock == rowIndex && | ||
locatedTriple->existsInIndex == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the semantics of existsInIndex
always existingTripleThatShallBeDeleted
? If so, maybe the name can be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I indeed used names in this class like "to be inserted" or "to be deleted". However, I then realized that the semantics of the LocatedTriple
class is really independent from what you later do with it. So I prefer names that are close to what the class actually does and not what it is later used for.
auto locatedTripleMatches = [&]() { | ||
if constexpr (matchMode == MatchMode::MatchAll) { | ||
return true; | ||
} else if constexpr (matchMode == MatchMode::MatchId1) { | ||
return locatedTriple->id1 == id1; | ||
} else if constexpr (matchMode == MatchMode::MatchId1AndId2) { | ||
return locatedTriple->id1 == id1 && locatedTriple->id2 == id2; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be simpler.
I think the ranges of matching locatedTriples
can be expressed as a range of contiguous iterators.
So you can write a function getRelevantTriples(blockIndex, matchMode, id1, id2)
that returns the two iterators and get rid of some complexity in the loops below.
Alternatively you can directly specify these two iterators as input to this function and perform this calculation previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And isn't there an inherent invariant between the block indices and the id1
and id2
and therefore the matching range inside the locatedTriples
. Those should be established at the same place (the binary search in the block and the binary search in the locatedTriples
.
for (size_t rowIndex = rowIndexInBlockBegin; rowIndex < rowIndexInBlockEnd; | ||
++rowIndex) { | ||
// Append triples that are marked for insertion at this `rowIndex` to the | ||
// result. | ||
while (locatedTriple != locatedTriples.end() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole algorithm looks to me like the zipper/merge join
algorithm with a twist.
Therefore I am a bit confused about the nested loop structure here.
I would also expect a tight loop that inserts all the elements from the block that do not appear at all in the locatedTriples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently thinking about using my zipperJoinWithUndef
abstraction for this,
but you would need to add the indices (which is doable). But I would first write this in the "canonical" zipper style.
#include "index/LocatedTriples.h" | ||
#include "index/Permutations.h" | ||
|
||
// TODO: Why the namespace here? (copied from `test/IndexMetaDataTest.cpp`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid very strange ODR-violation bugs when another .cpp
file also has a member V
and the two are linked together. Linking the tests together is necessary for the code coverage to work.
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also had an initial look at the tests.
Please also note the report from Sonarcloud
. Many suggestions there are similar to my suggestions.
std::copy(lts.begin(), lts.end(), | ||
std::ostream_iterator<LocatedTriple>(std::cout, " ")); | ||
std::ostream_iterator<LocatedTriple>(os, " ")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use std::ranges::copy
, then you don't need begin()
and end()
std::transform(ltpb.map_.begin(), ltpb.map_.end(), | ||
std::back_inserter(blockIndices), | ||
[](const auto& entry) { return entry.first; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::transform(ltpb.map_.begin(), ltpb.map_.end(), | |
std::back_inserter(blockIndices), | |
[](const auto& entry) { return entry.first; }); | |
// TODO<C++23> use `std::ranges::to<std::vector>` | |
std::ranges::transform(ltpb.map_ | std::views::keys, | |
std::back_inserter(blockIndices)); |
(needs the <ranges>
header.
// ____________________________________________________________________________ | ||
std::ostream& operator<<(std::ostream& os, const IdTable& idTable) { | ||
os << "{ "; | ||
std::copy(idTable.begin(), idTable.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::ranges::copy
// ____________________________________________________________________________ | ||
std::ostream& operator<<(std::ostream& os, | ||
const columnBasedIdTable::Row<Id>& idTableRow) { | ||
os << "("; | ||
for (size_t i = 0; i < idTableRow.numColumns(); ++i) { | ||
os << idTableRow[i] << (i < idTableRow.numColumns() - 1 ? " " : ")"); | ||
} | ||
return os; | ||
} | ||
|
||
// ____________________________________________________________________________ | ||
std::ostream& operator<<(std::ostream& os, const IdTable& idTable) { | ||
os << "{ "; | ||
std::copy(idTable.begin(), idTable.end(), | ||
std::ostream_iterator<columnBasedIdTable::Row<Id>>(os, " ")); | ||
os << "}"; | ||
return os; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not against implementing those, but they should be in the IdTable.h
header if you need them.
std::ostream& operator<<(std::ostream& os, | ||
const columnBasedIdTable::Row<Id>& idTableRow); | ||
std::ostream& operator<<(std::ostream& os, const IdTable& idTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated above, those should be in the correct header.
test/LocatedTriplesTest.cpp
Outdated
testWithGivenBlockSize( | ||
triplesInIndex, triplesToLocate, 16, | ||
{{2, "{ LT(2 0 V:2 V:14 V:20 0) LT(2 0 V:2 V:15 V:20 1) }"}, | ||
{4, "{ LT(4 0 V:2 V:20 V:10 1) }"}, | ||
{5, "{ LT(5 0 V:2 V:30 V:20 1) }"}, | ||
{6, "{ LT(6 0 V:2 V:30 V:30 1) }"}, | ||
{7, "{ LT(7 0 V:2 V:30 V:31 0) }"}, | ||
{8, "{ LT(8 x V:9 V:30 V:32 0) }"}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the structure of these tests in general. I would, as already stated, strictly prefer if there were proper objects for testing and not strings, e.g.
using LT = LocatedTriple;
LocatedTriplesMap expected {{2, {LT(2, 0, V(2), V(14), V(20, 0, LT{2, 0, V(2), V(15), V(20}}, ...}....
You can make this work in a typesafe way with a syntax that is still rather readable.
This PR is from June 2023. Since then, quite a bit of code that is relevant for this PR has been refactored. In particular: the `Permutation` class, the `CompressedRelationWriter` and the index building functions in `IndexImpl`. Mostly, this refactoring has made the code in this PR simpler. There is still some akwardness in `LocatedTriplesTest.cpp` because we want to build an index from `Id` triples there (and not from Turtle input). Now evertyhing compiles and runs through again. Various tests in `LocatedTriplesTest` fail, that's what I will look at next.
Quality Gate passedIssues Measures |
The code is based on the PRs ad-freiburg#916 and ad-freiburg#1000
This is the first part of a series of PRs split of from the large proof-of-concept PR #916, which realizes SPARQL 1.1 Update