Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BoundingRect algorithm to return a Result #444

Closed
frewsxcv opened this issue Apr 28, 2020 · 14 comments
Closed

Update BoundingRect algorithm to return a Result #444

frewsxcv opened this issue Apr 28, 2020 · 14 comments

Comments

@frewsxcv
Copy link
Member

frewsxcv commented Apr 28, 2020

#443 (comment) see the conversation below

@frewsxcv
Copy link
Member Author

What should be the behavior if the geometry is invalid? For example, if all the coordinates are NaN values?

@michaelkirk
Copy link
Member

I know that I requested this, but in hindsight, I'm not really sure we should do it after all.

To me it seems reasonable that an empty geometry should return an Option<Rect>::None

What should be the behavior if the geometry is invalid? For example, if all the coordinates are NaN values?

I've brought this up other places too, and have said we should try to return None rather than a NaN result in some places. But thinking about it a bit more, we do floating point ops all over the place, if we map every NaN to a None we're going to be doing a lot of checking and have very few operations that don't return Options, which sounds tedious.

Currently I'm leaning towards just blasting https://www.youtube.com/watch?v=P-jQHQcu1x8 while we close this issue.

@urschrei
Copy link
Member

if we map every NaN to a None we're going to be doing a lot of checking and have very few operations that don't return Options, which sounds tedious.

This is both 100% true and also inevitable / our chickens coming home to roost because we've been putting it off for years.

@frewsxcv
Copy link
Member Author

If we're using Some to represent "the operation succeeded" and None to represent "the operation failed", my feeling is we should migrate it at least to Result<Rect, ()>. And we can probably just create a geo::algorithms::bounding_rect::Error enum with a couple variants expressing the types of failures that can occur, like invalid geometry

@urschrei
Copy link
Member

If we're using Some to represent "the operation succeeded" and None to represent "the operation failed", my feeling is we should migrate it at least to Result<Rect, ()>. And we can probably just create a geo::algorithms::bounding_rect::Error enum with a couple variants expressing the types of failures that can occur, like invalid geometry

This is actually far more sensible than my panicked reply (sorry!). FWIW I do think we should collectively think about a coherent strategy for validity at some point, even though nobody wants to.

@frewsxcv frewsxcv changed the title Update BoundingRect algorithm to always return Rect, not Option<Rect> Update BoundingRect algorithm to return a Result Nov 20, 2020
@frewsxcv
Copy link
Member Author

So to summarize, to complete this issue, we'll change the signature of BoundingRect to return Result<Rect, ()> instead of Option<Rect>.

@michaelkirk
Copy link
Member

So to summarize, to complete this issue, we'll change the signature of BoundingRect to return Result<Rect, ()> instead of Option.

I glazed over the Err(()) when initially reading this - did you literally mean to use ()?

Is there a meaningful distinction between Result::Err(()) and None ?

In my mind the difference between a Result and an Option is the presence of the associated data on the failure case, so you can add some context into why things didn't succeed.

But when the context we add is literally nil (()), it seems kind of baroque.

Or was () a stand-in for geo::algorithms::bounding_rect::Error? Without defining the Error's, IMO this seems not really worth the time.

And currently nothing exists in the pipeline to produce meaningful errors - e.g. nothing in this code validates. Would it make sense to park this issue for now?

@frewsxcv
Copy link
Member Author

So to summarize, to complete this issue, we'll change the signature of BoundingRect to return Result<Rect, ()> instead of Option.

I glazed over the Err(()) when initially reading this - did you literally mean to use ()?

Is there a meaningful distinction between Result::Err(()) and None ?

In my mind the difference between a Result and an Option is the presence of the associated data on the failure case, so you can add some context into why things didn't succeed.

But when the context we add is literally nil (()), it seems kind of baroque.

Or was () a stand-in for geo::algorithms::bounding_rect::Error? Without defining the Error's, IMO this seems not really worth the time.

And currently nothing exists in the pipeline to produce meaningful errors - e.g. nothing in this code validates. Would it make sense to park this issue for now?

Yes, effectively Option<T> and Result<T, ()> are the same! In either the case, the None and Err variants indicate we could not return a T, with no further information. But to me, there is a semantic difference between the two.

I would use Option<T> if it is reasonable for the operation to not result in a T. An example of this is str::find. If the method returns a None, there wasn't an underlying issue with our string or the operation., the pattern in the string just couldn't be found. There might be an error in application code that uses str::find, but the operation itself did not error.

I would use Result<T, E> if we expect the operation to succeed most of the time, especially if the user provides "valid" inputs. If an Err gets returned, we know the operation did not perform the way it is intended, which would be categorize as an "error".

But note that I am no authoritative source! I'm just emptying the info in my head, which might be wrong. I tried searching for more discussion about this, and all I found was this reddit thread. Besides that there are many instances of Result<(), E>, Result<T, ()> and Result<(), ()> in the Rust compiler, but I don't see why we should use that as a guiding light here.

So coming back to this issue, if we were to go with the definition that I laid out, it comes down to whether we expect BoundingRect to succeed most of the time, especially if the geometries are "valid". In this case, it comes down to whether we consider empty geometries to be "valid", as that's what causes BoundingRect to return None currently. If we think processing empty geometries is a valid thing for users to do, then 100% I agree we should return an Option here. If we think empty geometries are not okay, then I think we should indicate to the user that they provided an invalid input by returning an Err. And if that's the case, I'm also fine creating a specific error for this instead of () in the Err variant.

Ooops that got a little rant-y! I do not at all feel strongly about this, and am curious to hear what others think

@frewsxcv
Copy link
Member Author

One more thing– Just did some spelunking and found that the Result docs used to talk about this: rust-lang/rust@46cb598#diff-ad8236744863c291238276800604f95213121f845975531e1c8c2fc55fccd2a8R226-R246, but was removed in rust-lang/rust@16b9f67.

@michaelkirk
Copy link
Member

Ooops that got a little rant-y! I do not at all feel strongly about this, and am curious to hear what others think

I love it! I actually fully agree with the well articulated distinction between Result/Option as you've laid it out. Thanks for the references!

In this case, it comes down to whether we consider empty geometries to be "valid", as that's what causes BoundingRect to return None currently. If we think processing empty geometries is a valid thing for users to do, then 100% I agree we should return an Option here.

This reasoning makes sense to me and I think gets to the core of the matter. "empty geometries are valid" is a hidden assumption that I'd made and FWIW, it seems consistent with the ogc/jts/geos/postgis thinktank:

i.e.

SELECT ST_IsValid(ST_GeometryFromText('MULTIPOLYGON EMPTY'));
 st_isvalid 
------------
 t
(1 row)

@frewsxcv
Copy link
Member Author

Ooops that got a little rant-y! I do not at all feel strongly about this, and am curious to hear what others think

I love it! I actually fully agree with the well articulated distinction between Result/Option as you've laid it out. Thanks for the references!

In this case, it comes down to whether we consider empty geometries to be "valid", as that's what causes BoundingRect to return None currently. If we think processing empty geometries is a valid thing for users to do, then 100% I agree we should return an Option here.

This reasoning makes sense to me and I think gets to the core of the matter. "empty geometries are valid" is a hidden assumption that I'd made and FWIW, it seems consistent with the ogc/jts/geos/postgis thinktank:

i.e.

SELECT ST_IsValid(ST_GeometryFromText('MULTIPOLYGON EMPTY'));
 st_isvalid 
------------
 t
(1 row)

Well if the thinktank considers it valid, that's good enough for me! Thanks for pointing this out @michaelkirk

@b4l, sorry for sending you down that rabbit hole! I think we'll want to leave the trait signatures of BoundingRect alone.

@b4l
Copy link
Member

b4l commented Dec 16, 2020

@frewsxcv no problem!
If we can capture all cases and boil down to the two variants of an Option - sweet! If there could be faults, failures or invalid input we probably want some distinct error. The empty geometry being valid would suggest Result<Option, ()>?
One improvement that can be made nonetheless is to make the return type not depend on the input type. This makes matching for geometry type obsolete.

@michaelkirk
Copy link
Member

michaelkirk commented Dec 16, 2020

One improvement that can be made nonetheless is to make the return type not depend on the input type. This makes matching for geometry type obsolete.

I'm not sure I'm understanding, but if you're saying that we should have even infallable methods return a Result<Rect, _> rather than a Rect, so as to be consistent with the results from other types, then I disagree.

It's usually preferable to have the compiler help you know which cases can't fail rather than having to think about it as a human and sprinkling around potentially crashing unwraps.

If at the callsite you have some reason to turn a Rect into an Option<Rect>, that's easy and safe to do, whereas the reverse, unwrapping an Option/Result into a Rect could potentially crash.

@michaelkirk
Copy link
Member

I believe that in the end, we decided to change nothing here.

Variants which can be empty (Geometry, GeometryCollection, Multi*, LineString, Polygon) output Option<Rect>, while non-empty variants (Rect, Line, Triangle, Point) output a non-optional Rect

So closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants