Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of nonfinite ordinate values #885

Open
caspervdw opened this issue May 1, 2023 · 5 comments
Open

Handling of nonfinite ordinate values #885

caspervdw opened this issue May 1, 2023 · 5 comments
Milestone

Comments

@caspervdw
Copy link
Contributor

My PR in #881 sparked some discussion, so I think it is good to transfer it to an issue.

Has there been a discussion about blocking nonfinite values altogether?

@dr-jts wrote:

Yes, several times (probably on some issue, but don't have a reference to hand).
The conclusion is always to follow Postel's Law: Be lenient in what you accept, and strict in what you emit.
The rationale is that the library should be able to read everything, so that it can be validated and subsequently fixed as and how determined.

That's clear, so GEOS needs to deal with nonfinite values.

As Paul, says, there is a combinatorial explosion of possibilities if the lib tries to do clever things with handling non-finite values. It would be best to take a rigorous and simple approach to determining how to decide if things are EMPTY internally. It might be simplest to specify that any non-finite X or Y ordinate in the first coordinate indicates an EMPTY CoordinateSequence. That way most geometry operations will Just Work, since they generally always check for EMPTY and take some default action.

I am certainly on board with a rigorous and simple approach. Some thoughts:

  • My first thought is a hasNonFinite property. It's rigorous and simple. However it will impose an extra loop and it might cause performance degradation. And there is no chance of caching here as coordinate buffers may be user controlled.
  • If we would go with @dr-jts proposal and process geometries as EMPTY geometries if the first coordinate contains a nonfinite, wouldn't this cause wrong assumptions? EMPTY means: no coordinates, while there are coordinates, they are just nonfinite.
  • Secondly, it's not complete as a second coordinate might be nan and result in undefined behavior too.
  • What to do with Z coordinates being nan? If a user provides an XYZ buffer with NaN Z coordinates, should GEOS act like it is empty? Or coerce it to 2D? What if only some Z coordinates are none?
@pramsey
Copy link
Member

pramsey commented May 1, 2023

In the limit, the existing rules of the road are fine: check for validity with isValid(); expect things to fail in exciting ways if your geometry is not valid; fix nonvalid things with makeValid().

The only downside I have see with this so far, is that non-finite values can actually crash some of the algorithms. So we are having to check the coordinates before starting, which is a full pass through. Even having a flag isn't free, since some of the best performing constructors we have now, the copying from buffers, are straight mem copies, so a "quick" finite check would involve, yep, a whole scan.

I think having fast and easy checks on empty is both doable and largely done. Zero-length coordinate sequence or ring list is a big signal. Unfortunately for generic collections, it's possible to be fed a non-zero-length collection of empty things, and various nestings of same. But for ordinary collections and simple geometries the way to signal empty is pretty straight ahead.

@caspervdw
Copy link
Contributor Author

caspervdw commented May 29, 2023

Sorry for my late @pramsey, my bandwith for open-source related stuff is currently very limited. I do like to be rigorous about this, because these nonfinite ordinate values keep coming back and are costing a ridiculous amount of time.

So summarizing: GEOS does not give any meaning to nonfinite (NaN / Infinity / -Infinity) ordinate values (with one exception, see below). There is just no way to ensure absence of nonfinite values in a geometry because of the way GEOS constructs them. Most algorithms fail in exciting ways when passed geometries with nonfinite ordinates. Some algorithms would crash; as a consequence, these algorithms perform a nonfinite check for each ordinates to ensure the absence of nonfinite values.

Empty geometries are something different. An empty geometry has no coordinates and is considered valid by GEOS, while a geometry with nonfinite values has coordinates and is considered invalid.

The only exception to this rule is for the WKB representation of empty point geometries. Empty point geometries are serialized into a WKB with all-NaN coordinates. Vice versa, all-NaN WKB points are deserialized into empty points.

Z / M coordinates may also carry nonfinite values. Similarly to the XY coordinates, these carry no meaning. Algorithms that act on Z/M coordinates must ensure that they don't crash on nonfinites; but they must not give meaning to them.

I wonder @dr-jts @pramsey if you agree with above summary? If so I may have to revert some stuff in #881

Todo items:

  • ensure that make_valid actually handles nonfinite values
  • WKB/WKT/GeoJSON outputting NaN / Infinity / -Infinity falls into the category "fail in exciting ways" -> don't try to avoid that
  • check if algorithms give meaning to nonfinite values (Interpolate maybe?)

@pramsey
Copy link
Member

pramsey commented May 31, 2023

I think a few extra things need to be added to the summary.

On the utility of non-finite coordinates, I agree with the summary, there is none. However, we do probably have to accept them, if only to allow people who have them embedded in their data the opportunity to clean them with "make valid" (important TODO item, does make valid actually clean non-finite coordinates?)

On the handling of non-finite coordinates. Functions do have do a full pass to avoid allowing non-finite values into delicate parts of the algorithms, but sometimes they are making a pass anyways, to the overhead isn't in the pass, but in having to add in a non-finite check. This seems to be the price of admission.

Finally there is one place were non-finite values are actually meaningful, and that is in the WKB of a POINT EMPTY. Mind you, we don't need the non-finite values in the GEOS representation of an empty point, necessarily, we just need to recognize an incoming WKB empty point via the NaN values, and construct an appropriate object, and vice versa when writing WKB. (This is what happens in PostGIS, for example, when it is handed a WKB with NaN coordinates. The constructed object doesn't have NaN in it, it is just a POINT object with a zero-length list of coordinates.

@caspervdw
Copy link
Contributor Author

@pramsey Thanks for your reply, I've edited my summary and added some todos. Please check!

@pramsey
Copy link
Member

pramsey commented Jun 2, 2023

Looks good to me. Item 3 is a bit of an iceberg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants