Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement GEOSEARCH #1879

Closed
romange opened this issue Sep 18, 2023 · 12 comments
Closed

implement GEOSEARCH #1879

romange opened this issue Sep 18, 2023 · 12 comments
Labels
API-6 Redis 6 API enhancement New feature or request hacktoberfest Participates in Hacktoberfest help wanted Extra attention is needed

Comments

@romange
Copy link
Collaborator

romange commented Sep 18, 2023

see https://redis.io/commands/geosearch/

@romange romange added enhancement New feature or request API-6 Redis 6 API hacktoberfest Participates in Hacktoberfest labels Sep 18, 2023
@chakaz chakaz added the help wanted Extra attention is needed label Sep 19, 2023
@azuredream
Copy link
Contributor

Hi Romange,

I can start by implementing
GEOSEARCH key <FROMLONLAT longitude latitude> <BYRADIUS radius <MI>>

It appears that "georadiusbymember" and "GEORADIUS" are also under construction. Do I need to worry about confliction?
You mentioned that we don't have "client *c", need to find another way. People might have different ways to do that.

I'd appreciate it if you could give me more advice on the implementation. I'll do more research as well.

@romange
Copy link
Collaborator Author

romange commented Oct 18, 2023

@azuredream thanks ! you can also work on #1548 if you prefer - noone is working on it right now.

@romange
Copy link
Collaborator Author

romange commented Oct 18, 2023

You should look at other GEO functions that are already implemented in zset_family.cc to see how the code is structured. Usually we have a separate function for parsing and handling all the arguments and a function that actually performs the data structure operations. The reason for this is that these functions run in possibly diferrent threads due to our shared nothing multi-threaded architecture.

@azuredream
Copy link
Contributor

I've taken a thorough look at the GEOADD and GEOSEARCH implementations in Redis, as well as what Dragonfly has currently.
Do we need to move the Geo functions into a separate class GeoFamily, and setting it as a friend class to ZsetFamily?
I suppose that what's in geo.c and geo.h is just draft.

@romange
Copy link
Collaborator Author

romange commented Oct 19, 2023

geo.c and geo.h is redis code and we should use it if possible. If not possible we should write our own code.
I suggest to add all the geo functions inside zset_family like we do now just because otherwise we will need to introduce a bunch of header files to share the internal code located in zset_family.cc. In any case, we can do this refactoring later if we decide zset_family became too large.

@azuredream
Copy link
Contributor

Thanks for your reply. I spend some time to understand GeoSearch and its Redis implementation
Just an update, I've completed parsing arguments and getting neighbors boxes (reused Redis code), not tested anything yet.
membersOfAllNeighbors() accepts a robj as input. I plan to rewrite it to work with robj_wrapper .

I can write a Draft PR if you want to keep tracking the progress, but not sure if it make sense to push untested code.

@romange
Copy link
Collaborator Author

romange commented Oct 22, 2023

Thanks, please note you probably need to start rewriting geoGetPointsInRange. We have our own implementation for sorted sets in addition to SKIPLIST, so it complicates things a bit. In fact, I would even suggest to omit skiplist case and handle only listpack and bptree implementations - we will deprecate skiplist at some point.

@azuredream
Copy link
Contributor

Hi, I ran into some challenges, particularly with the zset search process. I'd appreciate your input before proceeding:

  1. The method ExtractListPack(const zrangespec& range) does not align with the GEOSEARCH requirements. Specifically, GEOSEARCH filters items based not just on range but also on distance from a central point.
  2. To accommodate GEOSEARCH, I considered overloading ExtractListPack with a zgeorangespec as its argument and then using reinterpret_cast to convert it to zrangespec. Or I have to overload a much more functions, including zzlFirstInRange(zrangespec, ..).
  3. The data structure ScoredArray IntervalVisitor::result_ isn't compatible with the results from GEOSEARCH. GeoPoint output should include score, xy[], distance, and member.

@romange
Copy link
Collaborator Author

romange commented Oct 24, 2023

Can you provide a pointer to the reference implementation in redis that does (1) and (3) ? Because, for example, with (3), I think (not sure), that zset score is in fact the location, so it should be enough to extract xy[]. Then it should be possible to compute the distance if you have xy[] and the reference point of the query. So given ScoredArray you should be able to derive other fields. Again i am guessing here because i have not looked at the reference code.

@azuredream
Copy link
Contributor

Please refer to geoGetPointsInrange()
if (geoWithinShape(shape, score, xy, &distance) == C_OK) {
/* Append the new element. */
char *member = (vstr == NULL) ? sdsfromlonglong(vlong) : sdsnewlen(vstr, vlen);
geoArrayAppend(ga, xy, distance, score, member);
}

If we fetch all the nodes in 9 boxes (center and 8 neighbors) then filter the intermediate result by distance, a large amount of time and space overhead would be applied.

The difference between the fetched superset and the actual result set can be significant. In the worst case, the fetched data can be 64 times larger than the actual desired result. This happens because if the radius_meters(In geohash_helper.c: geohashEstimateStepsByRadius) just exceeds the radius of a box, step moves to a scale that's 8 times larger. And when you account for the 8 neighbors, you get the 64 times factor. In the best case where radius_meters matches a box radius, the scale is just 8 times, corresponding to 8 neighbors.

Why we need 8 neighbors while the central box is larger than the result? Because the src node not always at the center of a Geohash box.

Plus, if any flag is set (e.g., to 1), the system should stop searching as soon as the first match is found

@romange
Copy link
Collaborator Author

romange commented Oct 24, 2023

I totally agree that this is less efficient but it seems like the "trivial" solution won't include any throw away work, therefore it can be considered as an intermediate step to what you have in mind. Meanwhile you will add all other missing pieces, plus tests. It's a lot of code. After the "trivial" PR will be merged we will continue with performance improvements and add specialized functions.

@chakaz
Copy link
Collaborator

chakaz commented Oct 31, 2023

Fixed via #2070

@chakaz chakaz closed this as completed Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-6 Redis 6 API enhancement New feature or request hacktoberfest Participates in Hacktoberfest help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants