Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial join area #1713

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

Jonathan24680
Copy link
Collaborator

@Jonathan24680 Jonathan24680 commented Jan 17, 2025

This merge request adds the possibility to use arbitrary WKT Geometries (i.e. areas) in SpatialJoin queries. Right now, areas can get approximated as the center point or the true distance between them is estimated. In addition to that a future PR will add pre computed bounding boxes to the index generation to enable even faster query times (like it already exists for geopoints)

@Jonathan24680 Jonathan24680 requested a review from joka921 January 17, 2025 10:00
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 93.03797% with 11 lines in your changes missing coverage. Please review.

Project coverage is 89.95%. Comparing base (acb6633) to head (ec75e89).
Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/SpatialJoinAlgorithms.cpp 92.08% 8 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1713      +/-   ##
==========================================
+ Coverage   89.86%   89.95%   +0.08%     
==========================================
  Files         389      393       +4     
  Lines       37308    37765     +457     
  Branches     4204     4240      +36     
==========================================
+ Hits        33527    33970     +443     
- Misses       2485     2491       +6     
- Partials     1296     1304       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sparql-conformance
Copy link

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial general review of everything but the tests.
When you have addressed the initial principle suggestions I will have a closer look

Comment on lines +30 to +31
using AnyGeometry = boost::variant<Point, Linestring, Polygon, MultiPoint,
MultiLinestring, MultiPolygon>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between boos::variant and std::variant, and do we need the non-std one here?

Comment on lines +71 to +78
Id OnlyForTestingWrapperComputeDist(const IdTable* idTableLeft,
const IdTable* idTableRight,
size_t rowLeft, size_t rowRight,
ColumnIndex leftPointCol,
ColumnIndex rightPointCol) const {
return computeDist(idTableLeft, idTableRight, rowLeft, rowRight,
leftPointCol, rightPointCol);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now those are really too many OnlyForTestingWrappers for const functions.
With const there is no harm in making the functions public (you cannot break the class), so just make them public, and get rid of the clutter.

Comment on lines +147 to +148
// This is different to the query box, which is a box, which contains the area
// where all results are contained in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence.

// This is different to the query box, which is a box, which contains the area
// where all results are contained in
BoostGeometryNamespace::Box calculateBoundingBoxOfArea(
const std::string& wktString) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a string? shouldn't you first parse the geometry, and then in a second step calculate the box etc?

Comment on lines +152 to +154
// calculates the midpoint of the given Box
BoostGeometryNamespace::Point calculateMidpointOfBox(
const BoostGeometryNamespace::Box& box) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can shorten (and thereby make more readable) this class by adding
using Point = BlaNamespace::Poin insde the SpatialJoinAlgoirthms class.

then you can here just write Point calculateMidpoint(const Bos& box) const
In the .cpp file you can write
auto SpatialJoinAlgoirithms::calculateMidpoint(const Box& box) const -> Point
(with the trailing return types you are inide the class namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this for every function in this class, it will make the code much more readable.

// ____________________________________________________________________________
Id SpatialJoinAlgorithms::computeDist(const IdTable* idTableLeft,
const IdTable* idTableRight,
size_t rowLeft, size_t rowRight,
ColumnIndex leftPointCol,
ColumnIndex rightPointCol) const {
auto getAreaString = [&](const IdTable* idtable, size_t row, size_t col) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your algorithm needs the parsing at multiple points
(to compute the box, and to compute the actual distance), so you should have functions that only do the parsing (and return your variant type), and then functions to compute the distance/the box/the whatnotelse from the given parsed type (the variant)

Comment on lines +83 to +85
} catch (...) {
return std::nullopt;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does boost geometry have a mode or overload of readWkt that doesn't use exceptions? then use that.

return Id::makeUndefined();
}
// convert to m and return. Note that the 78630 is an approximation, which
// does not provide accurate results for the poles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is 78630?
Can you link to some doku/reference/the name of this thing?

Comment on lines +616 to +628
try {
std::string areastring = betweenQuotes(
ExportQueryExecutionTrees::idToStringAndType(
qec_->getIndex(), smallerResult->at(i, smallerResJoinCol), {})
.value()
.first);
bbox = calculateBoundingBoxOfArea(areastring);
} catch (...) {
printWarning();
continue;
}
} else {
// create a box with a side length of at most 1mm to approximate the point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated, separate the parsing from the remaining functionality.

Comment on lines +645 to +663
if (!geopoint) {
try {
std::string areastring = betweenQuotes(
ExportQueryExecutionTrees::idToStringAndType(
qec_->getIndex(), otherResult->at(i, otherResJoinCol), {})
.value()
.first);
auto areaBox = calculateBoundingBoxOfArea(areastring);
auto midpoint = calculateMidpointOfBox(areaBox);
bbox = computeBoundingBox(
midpoint,
getMaxDistFromMidpointToAnyPointInsideTheBox(areaBox, midpoint));
} catch (...) {
printWarning();
continue;
}
} else {
bbox = computeBoundingBox(
Point(geopoint.value().getLng(), geopoint.value().getLat()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an exact code dupliacation that can be extracted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants