-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dont abort bidirectional a* search if only one direction is exhausted #2936
Conversation
c6c0a36
to
26ded41
Compare
src/thor/bidirectional_astar.cc
Outdated
forward_exhausted = true; | ||
LOG_WARN("Forward search exhausted: n = " + std::to_string(edgelabels_forward_.size()) + | ||
"," + std::to_string(edgelabels_reverse_.size())); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nitpick but i feel like at this point we should flip the logic around to be like:
if (!best_connections_.empty()) {
return FormPath(graphreader, options, origin, destination, forward_time_info, invariant);
}
// mark the forward direction exhausted so we dont expand from here anymore
forward_exhausted = true;
LOG_WARN("Forward search exhausted: n = " + std::to_string(edgelabels_forward_.size()) + "," + std::to_string(edgelabels_reverse_.size()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
test::customize_edges(map.config, [¬_thru_edgeids](const GraphId& edgeid, DirectedEdge& edge) { | ||
if (std::find(not_thru_edgeids.begin(), not_thru_edgeids.end(), edgeid) != | ||
not_thru_edgeids.end()) { | ||
edge.set_not_thru(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its great to see you make use of that! i thought that one day opening that up for testing would be useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is a valid test case. I believe that not_thru will only be set in one direction. A directed edge that enters an area with one entrance/exit is marked as not_thru but the opposing edge that exits that area will not be marked as not_thru. Perhaps the not_thru logic is too aggressive that it somehow is marking some edges as not_thru in both directions, but the design is that only one direction of an edge should ever be marked as not_thru.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree that its not true to what will actually get created in the data but its handy to be able to force the router into this situation even though its contrived. it would be better to show that it naturally happens for derived things like not_through, but at the same time the you have to figure out and spend tons of time designing the layout of the test to get it to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh you have no idea how much time I spent on trying to recreate a scenario in gurka which would mark an edhe as not_thru
! Thx to @genadz for pointing me to an existing test which was using this method.
BTW, customize_edges
does not work if we're also setting live traffic conditions on the edge. When using live traffic, we pass-in a cleanly (I believe to avoid caching) created "reader" which does not have the customizations created in customize_edges
. I think it has to do with live traffic & customize_edges
altering different in-memory representations of the tile (I have to dig in further for the exact root-cause though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is a valid test case. I believe that not_thru will only be set in one direction. A directed edge that enters an area with one entrance/exit is marked as not_thru but the opposing edge that exits that area will not be marked as not_thru. Perhaps the not_thru logic is too aggressive that it somehow is marking some edges as not_thru in both directions, but the design is that only one direction of an edge should ever be marked as not_thru.
You're right, this is a contrived scenario with not thru's. However, such a scenario can happen when routing out of closures (edges marked closed due to live traffic conditions). In this case, the reverse search will not be able to expand into a closure.
I originally hit this condition while testing something with closures. ButI wanted to keep the test simple without adding live traffic into the mix, so resorted to making a test with not_thru's.
// Expand from the search direction with lower sort cost | ||
// Note: If one direction is exhausted, we force search in the remaining | ||
// direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why do your comments seem to have a shorter line break than your code? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because once upon a time (long before twitter made 120-char limits fashionable), 80-char limit was the common std for line-wrapping. I still haven't fully shrugged off that habit 😄 . I'll fix this in the next PR update.
this code change looks good to solve the issue we face where we can find a route but dont. however we need to understand what this does to routes that can't find a path and shouldn't. i'm not sure if i was making sense to you over in #2926 (comment) but one of the things the code in master does now is if one side of the search is trapped in an island then the code bails fast (as soon as that side is exhausted). this is a good thing. for routes that should fail, we want them to fail as fast as possible. perhaps you can take a corpus of failed routes and run it through this branch to see how often they fail slower than before (because they wait until we do too many iterations in the other search direction instead of bailing immediately). ie long request time just to get back a no route is wasted CPU |
src/thor/bidirectional_astar.cc
Outdated
@@ -622,6 +622,8 @@ BidirectionalAStar::GetBestPath(valhalla::Location& origin, | |||
BDEdgeLabel fwd_pred, rev_pred; | |||
bool expand_forward = true; | |||
bool expand_reverse = true; | |||
bool forward_exhausted = false; | |||
bool reverse_exhausted = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need these additional variables ? we can check if forward_pred_idx
equals to kInvalidLabel
or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could (i initially used that in one of my initial iterations), but found the bools a bit more readable and make the intent more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a best of both worlds could be bool is_forward_exhausted() { return "forward_pred_idx == kInvalidLabel" }
?
ditto on @kevinkreiser comment about failing fast. I have seen cases where nearly the entire NA road network is expanded when unable to actually reach the other search direction. That can hang one thread in a server for a considerable time. I still argue that if one direction's search is exhausted without any connections being found then the route is not possible. Also, won't a second pass be generated that would relax conditions that might lead to a failure (hierarchy, not_thru)? |
Note that another case is when a data set is an extract (e.g. tiles within a bounding box for an embedded Valhalla implementation). In this case there can be regions that are disconnected from each other and this relies on quick failures (exhausting one direction). |
I think that concern (about no-route situations taking longer to complete) is valid. I'm not yet sure what other criteria can be used to abort (or extend) search from the other direction if one direction is exhausted (maybe dynamically change one of the termination thresholds like cost or number of iterations if one side exhausts?). As for the second pass, not all restrictions are relaxed/avoided. Eg not thru's and traffic closures are not ignored in the second pass (so the route would still fail). As @kevinkreiser asked above, let me get some numbers on time it takes for the no-route routes before/after this change. I'm open to dropping this change if the no-route take significantly longer and don't justify finding routes for very specific cases (routing out of small islands with high cost edges). |
I agree. Let me get some numbers for you and we can discuss it from there. |
Can you look at if the expansion from the non-exhausted side is making progress towards the destination or simply expanding further and further away from the destination? |
I can imagine some cases where even that might not be sufficient. eg doing a cross-country route across US, where start point is on the west coast and the destination lies in an unconnected island on the east coast. So the forward search, will be moving towards the destination for the most part (not many edges to explore in the other direction) and will continue for quite some time maybe if we can detect the side that finished was part of a small connected component then we can abort prematurely. |
Here are the performance numbers using
Dumping the iteration count for a failed route shows how many more we edges are searched: {"locations": [{"lat":26.596227982772817,"lon":-80.0191682907218},{"lat":26.49913918981666,"lon":-78.414660088092}], "costing":"auto"}
In both cases the number of edges in reverse direction is the same (3090), but the edges searched in the fwd direction is ~50 times more (in the first pass). I also tested a few working routes from the test file, and both master and this branch showed the same number of edges explored. I'm not sure whats contributing to the difference in 50th percentile as I'd expect the impact to be only in the no-route scenario. Either way, extending the search clearly has a big negative impact. Given this, I'm hesitant to move this forward with this PR, unless there's a better way to reduce the search space for no-route situations. @kevinkreiser @dnesbitt61 thoughts? |
I cannot think of a way to effectively limit the expansion short of breaking out after some number of iterations. My view is that it is best to fail quickly in this case, at least on the first pass of the algorithm. |
@mandeepsandhu i think something must be wrong here if im understanding you correctly. i dont know how many routes fail in the test set but it sounded like you didnt separate routes that fail from routes that didnt. i would first want to confirm that routes that dont fail see no performance impact. if they do (which it seemed like your numbers suggest) then we need to figure out why as this change should not affect that! then once that is worked out i would suggest we sequester the failing routes into their own file and see what performance diff we see on them. over all, since we have such a low route failure rate, an increase in time to fail might be tolerable. we just need to separate the two cases so we can understand what the code change is doing to each. again i expected no change to non-failure routes unless they were retry (2 pass) routes or something. |
@mandeepsandhu @kevinkreiser @dnesbitt61 I have an idea: right now there are 2 types of "not_thru" regions: marked as "not_thru" and marked as "closed" due to live traffic. So, in case one of the searches exhausts we should check if another one is still in "not_thru" region. If it's true, we should continue another search until we find a connection. |
@genadz i was saying that before in either the issue or a resolved comment. we would need to track whether expansion stopped because it hit not through regions on all paths or not. that seemed expensive to me. maybe you are saying if the other side is expanding in not through or closed, so tracking from the other search path? |
Okay, let me wean out the failing routes and see how many of those are there. Spot checking individual non-failing routes, as expected, I didn't see a difference in performance and the number of labels explored in fwd/rev direction were exactly the same between master & this branch (but these were just 3-4 out of the 2500 routes).
okay, I'll get back to you after I've removed the failing routes and measured them separately. |
Yeah we could use the state of not_thru/closure pruning to decide if we want to continue expanding in that direction if the other direction has exhausted. However, it could happen that the label on top of the priority queue when the other direction's search exhausts, does not have pruning disabled (maybe the edges with pruning disabled are slightly higher cost and therefore have not been expanded from yet). Maybe you're suggesting that if any edge in the currently explored set of labels has pruning disabled, we should continue with the search? |
hahaha i was saying exactly this to gena an hour or so ago 😄 |
It may happen only when search that was exhausted explored not all adjacency edge to "not_thru" region. if we want to handle this too we should check if we have unexplored edges from "not_thru" region. it requires additional logic to maintain number of unexplored "not_thru" edges, so probably not worth of it |
So I separated out the routes in @kevinkreiser as you expected, the non-2pass routes were un-touched in performance. Fast routes:
Slow 2-pass routes:
Example of 2-pass route: '{"costing":"auto","directions_options":{"units":"kilometers"},"locations":[{"city":"Frankfurt am Main","lat":50.111591,"lon":8.681067},{"city":"Koeln","lat":50.937962,"lon":6.960059}]}' We can look into conditionally expanding the search (when 1 side exhausts) based on if any of the expansions from the other side is within a closure/not_thru region or not. This will require some additional bookkeeping. I'm still debating if the added complexity is worth it or not. I'm also okay dropping this altogether and just live with certain route failures (these happen in very specific circumstances), unless you know of other legitimate cases where we'd like to extend the search. Let me know what you think @dnesbitt61 @danpat @genadz @kevinkreiser. |
Thinking about this a bit more, we have the following options:
Any other solution that we can think of? |
My preference would be to keep the current behavior on the first pass (fail quickly). This way if there is a closure and a path is found that avoids the closure it will be returned. I am assuming that not_thru behavior is not the main problem being solved here (let me know if you have found cases where not_thru logic alone is failing routes). On a subsequent pass, adjust closures to allow them (perhaps with a high penalty) and identify if the successful route used a closure. This way, the client can identify routes that succeed but with a closure taken - e.g. indicate the only way to complete the route is using the closure. |
quick recap of how routing through closures work:
The problem surfaced when I started penalizing the closures (so that the route can get out of it asap), since it forces search from only direction until the costs become equal. Although this is not restricted to closures, and the following 2 routes hit the exhaustion due to search hitting a not_thru region (theres no live traffic being used here): In your solution, we could possibly set |
Surprised at the not_thru cases. I tested these 2 cases with Valhalla data from earlier this year and they both succeed. This data does not have hierarchies and shortcuts and has driveways removed (it was intended for bicycles). I'll stay out of the closures discussion as I haven't thought enough about it to make any recommendations. My main point is that if you make these changes to extend searches, please make them conditional so that existing applications do not suddenly have long route times for some unexpected cases. I especially worry about the region extract case (mobile applications) where locations near the edge of the extract may be truly unreachable. |
I personally like options number 3 above where we only extend it in a very special circumstance. Also now that we do allow configuration passed into the algorithm, I think we can do what @dnesbitt61 is suggesting and make it something you can opt out of via a config parameter. what do you think about that @mandeepsandhu ? |
@kevinkreiser +1 to that. To recap and make sure we're all on the same page:
correct? |
@mandeepsandhu that would be my suggestion yep! |
26ded41
to
a5813fd
Compare
Performance numbers with latest master vs branch: With option to extend search disabled (
Fast routes with extended search enabled (
Slow 2-pass routes with extended search enabled (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'll leave final approve to @kevinkreiser or @dnesbitt61 as they've been more active in the PR.
this means one that find a path yes?
this means ones that fail on the first pass and also fail on the second pass but now fail slower or do some of them pass now? or maybe we dont have any test routes that exhibit the failure scenario that is seeking to handle? |
Yes, in 1 pass.
There are no route failures. The slow 2-pass ones are the ones which fail in the first pass (search exhaustion in one direction) and then the second pass with relaxed restrictions passes. With extended search, its the first pass failure that takes longer (if the other end is behind not_thru). So there are no route failures finally, just that some specific 2-pass routes are slow now due to extended search. |
ok then one final question, can you run the full set with it enabled. i dont have a sense of how frequently we have to do a 2 pass approach so i cant tell how much overal numbers will be pulled toward the performance drop. i mean i love that its configurable so its a bit moot but its good to know! also im not sure how representative these DE routes are |
There are 36 routes in Here's the per comparison with
Here's another comparison of a set of "1k" test routes that I have locally, to hopefully give a more representative view of routes:
...and one more corpus of 20k routes:
(branch being faster is likely due to measurement noise)
|
3069b41
to
b2c6f47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, especially because its opt in. i do wonder if for closure we should have done the same thing we did with not through on the second pass and just allow them. are we thinking about that in the future?
@kevinkreiser do you mean we should set |
Test case illustrates the case where 1 directions of bidirectional expansion being exhausted causes route failures.
Instead, we force search in the remaining direction and fail if both directions exhaust or one of the terminating conditions become true.
b2c6f47
to
1071b73
Compare
…halla into pedestrian_crossing * 'pedestrian_crossing' of https://github.com/abhi2039/valhalla: build: Bail with error if non-existant build-type specified (valhalla#2965) nit: Enables compiler warnings in part of mjolnir module (valhalla#2922) fix missing comma scripts/valhalla_build_config (valhalla#2963) Remove duplicate names being used across different gurka tests (valhalla#2962) Dont abort bidirectional a* search if only one direction is exhausted (valhalla#2936) penalize uturns at pencil points and short internal edges (valhalla#2944) Remove unused IsNextEdgeInternal function (valhalla#2958) restore compatibility with gcc 6.3.0, libprotobuf 3.0.0, boost v1.62.0 (valhalla#2953) Add ability to build Valhalla modules as STATIC libraries (valhalla#2957) Allow disabling Werror (valhalla#2937) Enhanced logic for IsTurnChannelManeuverCombinable (valhalla#2952) Allow config object to be passed-in to path algorithms (valhalla#2949) Floating-point precision fix in meili::Project() fixes map-match issue (valhalla#2946) Use kServiceRoad edges while reclassifying ferry connections (valhalla#2933) ci: Update cache key to trigger clearing it (valhalla#2947)
Instead, we force search in the remaining direction and fail if both
directions exhaust or one of the terminating conditions become true.