Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update geo-traits #839

Merged
merged 12 commits into from
Oct 24, 2024
Merged

Update geo-traits #839

merged 12 commits into from
Oct 24, 2024

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron mentioned this pull request Oct 24, 2024
2 tasks
if let Some(coord) = point.coord() {
self.push_coord(&coord);
} else {
self.push([f64::NAN; D]);
Copy link

@michaelkirk michaelkirk Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely out of my depth with geoarrow, but this appears a little awkward. Does the push_point method even make sense anymore?

edit: as in, does it even need to exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little awkward, and potentially could be refactored in the future/moved to the point builder, but at some level there needs to be a conversion between empty points and the GeoArrow representation of those, which is with NAN values.

Copy link

@michaelkirk michaelkirk Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess this comes down to my unfamiliarity with geoarrow.

So it sounds like a PointArray needs to have an element in its CoordBuffer for each point, even if that point is POINT EMPTY.

Is the validity: NullBuffer used to encode which coords correspond to POINT EMPTY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it sounds like a PointArray needs to have an element in its CoordBuffer for each point, even if that point is POINT EMPTY.

Correct.

The raw layout for a 2D point array with 4 points is:

[0, 2, 4, 5, NaN, NaN, 10, 2]

This translates to

Point(0, 2),
Point(4, 5),
Point EMPTY,
Point(10, 2)

Is the validity: NullBuffer used to encode which coords correspond to POINT EMPTY?

No... With GeoArrow we have three states of geometry:

  • null: this is a row that does not have a geometry
  • empty: this row has a geometry but its geometry is empty (e.g. as a result of an empty intersection)
  • valid and non-empty: a standard geometry

The validity: NullBuffer encodes the first of those. Every Arrow array has an optional nullability buffer, and that can say "this row is null".

When that validity buffer says that a row is valid, then we can look into the encoded geometry buffer, but it's still possible for that encoded geometry to be empty.

So for a 2D point, the encoding is one of these three states:

  • null: validity buffer indicates value is null. For a point array the memory still needs to be reserved so that the offsets of future points are correct, but the values of that point's coordinates are undefined.
  • empty: validity buffer indicates not null. But all values are f64::NAN.
  • valid: validity buffere not null, values are not f64::NAN.

@kylebarron kylebarron merged commit ee90acf into main Oct 24, 2024
22 checks passed
@kylebarron kylebarron deleted the kyle/update-traits-again branch October 24, 2024 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants