Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code for locating triples in an existing index #1000

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

hannahbast
Copy link
Member

This is the first part of a series of PRs split of from the large proof-of-concept PR #916, which realizes SPARQL 1.1 Update

This is the first part of a series of PRs split of from the large
proof-of-concept PR #916,
which realizes SPARQL 1.1 Update
@hannahbast hannahbast requested a review from joka921 June 9, 2023 14:50
@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Attention: Patch coverage is 87.86611% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 74.21%. Comparing base (f9c9ff4) to head (258231d).
Report is 25 commits behind head on master.

❗ Current head 258231d differs from pull request most recent head 59aae8e. Consider uploading reports for the commit 59aae8e to get more accurate results

Files Patch % Lines
src/index/LocatedTriples.cpp 88.63% 2 Missing and 23 partials ⚠️
src/index/LocatedTriples.h 76.47% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1000       +/-   ##
===========================================
- Coverage   87.63%   74.21%   -13.43%     
===========================================
  Files         308      256       -52     
  Lines       28008    24239     -3769     
  Branches     3138     3064       -74     
===========================================
- Hits        24545    17989     -6556     
- Misses       2324     5016     +2692     
- Partials     1139     1234       +95     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet looked at the tests yet,
But I already have some comments about this and that.

src/global/IdTriple.h Outdated Show resolved Hide resolved
src/index/IndexMetaData.h Outdated Show resolved Hide resolved
src/index/MetaDataHandler.h Outdated Show resolved Hide resolved
src/index/LocatedTriples.h Show resolved Hide resolved
src/index/LocatedTriples.h Show resolved Hide resolved
src/index/LocatedTriples.cpp Outdated Show resolved Hide resolved
// result.
while (locatedTriple != locatedTriples.end() &&
locatedTriple->rowIndexInBlock == rowIndex &&
locatedTriple->existsInIndex == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the semantics of existsInIndex always existingTripleThatShallBeDeleted? If so, maybe the name can be more explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I indeed used names in this class like "to be inserted" or "to be deleted". However, I then realized that the semantics of the LocatedTriple class is really independent from what you later do with it. So I prefer names that are close to what the class actually does and not what it is later used for.

Comment on lines +229 to +237
auto locatedTripleMatches = [&]() {
if constexpr (matchMode == MatchMode::MatchAll) {
return true;
} else if constexpr (matchMode == MatchMode::MatchId1) {
return locatedTriple->id1 == id1;
} else if constexpr (matchMode == MatchMode::MatchId1AndId2) {
return locatedTriple->id1 == id1 && locatedTriple->id2 == id2;
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be simpler.
I think the ranges of matching locatedTriples can be expressed as a range of contiguous iterators.
So you can write a function getRelevantTriples(blockIndex, matchMode, id1, id2) that returns the two iterators and get rid of some complexity in the loops below.

Alternatively you can directly specify these two iterators as input to this function and perform this calculation previously

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And isn't there an inherent invariant between the block indices and the id1 and id2 and therefore the matching range inside the locatedTriples. Those should be established at the same place (the binary search in the block and the binary search in the locatedTriples.

Comment on lines +254 to +258
for (size_t rowIndex = rowIndexInBlockBegin; rowIndex < rowIndexInBlockEnd;
++rowIndex) {
// Append triples that are marked for insertion at this `rowIndex` to the
// result.
while (locatedTriple != locatedTriples.end() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole algorithm looks to me like the zipper/merge join algorithm with a twist.
Therefore I am a bit confused about the nested loop structure here.
I would also expect a tight loop that inserts all the elements from the block that do not appear at all in the locatedTriples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently thinking about using my zipperJoinWithUndef abstraction for this,
but you would need to add the indices (which is doable). But I would first write this in the "canonical" zipper style.

#include "index/LocatedTriples.h"
#include "index/Permutations.h"

// TODO: Why the namespace here? (copied from `test/IndexMetaDataTest.cpp`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid very strange ODR-violation bugs when another .cpp file also has a member V and the two are linked together. Linking the tests together is necessary for the code coverage to work.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 23 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also had an initial look at the tests.
Please also note the report from Sonarcloud. Many suggestions there are similar to my suggestions.

std::copy(lts.begin(), lts.end(),
std::ostream_iterator<LocatedTriple>(std::cout, " "));
std::ostream_iterator<LocatedTriple>(os, " "));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use std::ranges::copy, then you don't need begin() and end()

Comment on lines +349 to +351
std::transform(ltpb.map_.begin(), ltpb.map_.end(),
std::back_inserter(blockIndices),
[](const auto& entry) { return entry.first; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::transform(ltpb.map_.begin(), ltpb.map_.end(),
std::back_inserter(blockIndices),
[](const auto& entry) { return entry.first; });
// TODO<C++23> use `std::ranges::to<std::vector>`
std::ranges::transform(ltpb.map_ | std::views::keys,
std::back_inserter(blockIndices));

(needs the <ranges> header.

// ____________________________________________________________________________
std::ostream& operator<<(std::ostream& os, const IdTable& idTable) {
os << "{ ";
std::copy(idTable.begin(), idTable.end(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::ranges::copy

Comment on lines +360 to +377
// ____________________________________________________________________________
std::ostream& operator<<(std::ostream& os,
const columnBasedIdTable::Row<Id>& idTableRow) {
os << "(";
for (size_t i = 0; i < idTableRow.numColumns(); ++i) {
os << idTableRow[i] << (i < idTableRow.numColumns() - 1 ? " " : ")");
}
return os;
}

// ____________________________________________________________________________
std::ostream& operator<<(std::ostream& os, const IdTable& idTable) {
os << "{ ";
std::copy(idTable.begin(), idTable.end(),
std::ostream_iterator<columnBasedIdTable::Row<Id>>(os, " "));
os << "}";
return os;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against implementing those, but they should be in the IdTable.h header if you need them.

Comment on lines +174 to +176
std::ostream& operator<<(std::ostream& os,
const columnBasedIdTable::Row<Id>& idTableRow);
std::ostream& operator<<(std::ostream& os, const IdTable& idTable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above, those should be in the correct header.

test/LocatedTriplesTest.cpp Outdated Show resolved Hide resolved
Comment on lines 329 to 336
testWithGivenBlockSize(
triplesInIndex, triplesToLocate, 16,
{{2, "{ LT(2 0 V:2 V:14 V:20 0) LT(2 0 V:2 V:15 V:20 1) }"},
{4, "{ LT(4 0 V:2 V:20 V:10 1) }"},
{5, "{ LT(5 0 V:2 V:30 V:20 1) }"},
{6, "{ LT(6 0 V:2 V:30 V:30 1) }"},
{7, "{ LT(7 0 V:2 V:30 V:31 0) }"},
{8, "{ LT(8 x V:9 V:30 V:32 0) }"}});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure of these tests in general. I would, as already stated, strictly prefer if there were proper objects for testing and not strings, e.g.

using LT = LocatedTriple;
LocatedTriplesMap expected {{2, {LT(2, 0, V(2), V(14), V(20, 0, LT{2, 0, V(2), V(15), V(20}}, ...}....

You can make this work in a typesafe way with a syntax that is still rather readable.

This PR is from June 2023. Since then, quite a bit of code that is
relevant for this PR has been refactored. In particular: the
`Permutation` class, the `CompressedRelationWriter` and the index
building functions in `IndexImpl`. Mostly, this refactoring has made the
code in this PR simpler. There is still some akwardness in
`LocatedTriplesTest.cpp` because we want to build an index from `Id`
triples there (and not from Turtle input).

Now evertyhing compiles and runs through again. Various tests in
`LocatedTriplesTest` fail, that's what I will look at next.
Copy link

Quality Gate Passed Quality Gate passed

Issues
25 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Qup42 added a commit to Qup42/qlever that referenced this pull request Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants