Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions to netCDF Driver to allow Reading from CF-1.8 Encoded Geometries (#1287) #1541

Merged
merged 108 commits into from
Jun 5, 2019
Merged

Conversation

wchen329
Copy link
Contributor

@wchen329 wchen329 commented May 17, 2019

What does this PR do?

This PR includes additions to the netCDF Vector Driver to allow support for retrieving OGRFeature types from a netCDF File encoded utilizing the CF-1.8 standard. It adds support the following six feature types (OGRPoint, OGRLineString, OGRPolygon) and "Multi" collections of those three that are mentioned in the following CF documentation: (https://github.com/cf-convention/cf-conventions/blob/master/ch07.adoc#geometries).

What about backwards compatibility?

This PR preserves backwards compatibility with the earlier WKT representation used for earlier versions of this driver. The new CF-1.8 features are only activated if the input netCDF File if the Conventions attribute is of the following { CF-1.x | (x ≥ 8) }. Otherwise, the driver will not even utilize the new additions to the driver, allowing for the driver to work as before.

Testing

Automatic testing is now available as a part of the netCDF-GDAL test suite. Please see the following directory for test cases: https://github.com/wchen329/gdal/tree/master/autotest/gdrivers/data/netcdf-sg

A few more test cases are still being added.

What are related issues/pull requests?

Issue #1287

Tasklist

  • Implement Reading Capability of the Six Feature Types Previously
  • Implement OGRField Awareness
  • Utilize Manual Testing to Verify Success Conditions
  • Utilizing GDAL/OGR CPL Functions, Improve Resilience
  • Verify Cases of Malformed Input
  • Further Optimize Memory Usage
  • Finish rigging to all relevent points in netCDFLayer
  • Add test case(s) to autotest
  • Review
  • Refactor Unused / Dead Code
  • Adjust for comments
  • Review (2)
  • Adjust for comments (2)
  • All CI builds and checks have passed

Environment

  • OS: openSUSE Tumbleweed (Primary), Debian Sid (Secondary) Microsoft Windows 7 Professional AMD64 (Tertiary)
  • Compiler: GCC 8.3.1 (Primary), MSVC++ 14.0 (Secondary)
  • Standard AMD64/x86-64 Machine, openSUSE under VirtualBox Guest under Intel Core i7-3770 (3 "cores" accessible to Guest) [little endian], n]

What Specifically Could Be Helpful

  • Information on getting automated tests running on CI
  • Information about detection of Big Endian through CPL interfaces or specific macros.
  • Related to Update GDAL NetCDF Vector Driver to support CF 1.8 geometries and Related Bugs / Feature Extensions #1287 - This PR does not include any CF-1.8 writing functionality. This functionality is saved for a future PR. Unlike the readers, the WKT writer and a CF-1.8 version cannot exactly co-exist without one having precedence over the other. Given this would it not be too inconvenient to deprecate WKT writing (that can be activated through a special flag) and allow for a new CF-1.8 writer to be the default action?

@wchen329
Copy link
Contributor Author

wchen329 commented Jun 3, 2019

@rouault While I'm waiting for the last build to pass (or wait for another warning to clean up), what would say is still required for this PR to pass your review (since I made a few other changes since your initial review)? What other changes should be made to get this thing finally pulled up (assuming changes to make CI pass is trivial work that doesn't change the overall structure)?

EDIT: Okay so this time it seems like there's some failure from stuff that I didn't touch I'll try synching upstream hopefully this will fix it?

gdal/doc/source/drivers/vector/netcdf.rst Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdflayersg.cpp Outdated Show resolved Hide resolved
@rouault
Copy link
Member

rouault commented Jun 4, 2019

I've tested a bit the driver with the test_ogrsf utility (cd apps; make test_ogrsf to build it) and it reports a few errors, due to the attribute filter and spatial filters being ignored:

$ test_ogrsf -ro ../autotest/gdrivers/data/netcdf-sg/Yahara_alb.nc
INFO: Open of `../autotest/gdrivers/data/netcdf-sg/Yahara_alb.nc' using driver `netCDF' successful.
INFO: Testing layer geometry_container.
INFO: Feature count verified.
INFO: Feature/layer spatial ref. consistency verified.
INFO: Spatial filter inclusion seems to work.
ERROR: Attribute filter seems to be ignored in GetFeatureCount() when spatial filter is set.
ERROR: Spatial filter (0) failed to eliminatea feature unexpectedly!
INFO: Spatial filter is ignored by GetFeature() as expected.
INFO: Infinity spatial filter works as expected.
INFO: Huge coords spatial filter works as expected.
INFO: Attribute filter inclusion seems to work.
ERROR: GetFeatureCount() may not be taking attribute filter into account (nInclusiveCount = 71, nExclusiveCount = 71, nExclusiveCountWhileIterating = 0, nTotalCount = 71).
INFO: Attribute filter is ignored by GetFeature() as expected.
INFO: GetExtent() test passed.
INFO: Random read test passed.
INFO: SetNextByIndex() read test passed.
ERROR: ExecuteSQL() should have returned a layer without features.
ERROR: ExecuteSQL() should have returned a layer without features.
INFO: TestVirtualIO successful.

You should likely move the new code you've added in netCDFLayer::GetNextFeature() to the top of netCDFLayer::GetNextRawFeature() instead

@rouault
Copy link
Member

rouault commented Jun 4, 2019

ogrinfo ../autotest/gdrivers/data/netcdf-sg/point_test.nc reports 2 layers names_geometry and point_test with identical content. Only one of them should be reported.
Similarly with line_test.nc that reports names_geometry and line_test, but only the first one has geometries.

@wchen329
Copy link
Contributor Author

wchen329 commented Jun 4, 2019

You should likely move the new code you've added in netCDFLayer::GetNextFeature() to the top of netCDFLayer::GetNextRawFeature() instead

ogrinfo ../autotest/gdrivers/data/netcdf-sg/point_test.nc reports 2 layers names_geometry and point_test with identical content. Only one of them should be reported.
Similarly with line_test.nc that reports names_geometry and line_test, but only the first one has geometries.

Huh interesting, I'll go look at this and see what's up; thanks for finding these issues! I'll go update the test cases with some layer counting assertions too.

Update:

  1. Concerning test_ogrsf: Thanks for the pointers on using the tool @rouault, it was quite a simple fix as you said, just moving things around.

  2. The second issue relates to how the driver is kind of several frankenstein'd drivers together. I verified through GDB trace that this extra empty layer is not created by my additions but probably a legacy portion that decided to "make a default empty layer somewhere, if all else fails" since I'm being permissive about having several generations of the driver co-exist in the same nc file. I will see if I can find which one exactly is adding this empty layer.

@wchen329
Copy link
Contributor Author

wchen329 commented Jun 4, 2019

Okay so I found the culprit, some of the heuristics for the old implementation of vector layers kept detecting a variable (or some?) as Vector layers incompletely, so I disabled these if the dataset has CF >= 1.8. This also means however, that mixing a legacy Vector dataset with CF-1.8 geometries is not possible anymore.

Would it be beneficial if I go more fine-grained and manipulate the heuristics themselves and re-enable the ability read from both vector layers that are CF-1.8 and old-WKT style? My latest push has made them mutually exclusive.

EDIT: Actually it seems that the driver might have this behavior by default, as I tried this on my permissive implementation and got more or less the same result.

EDIT 2: Well actually actually it looks like both the old implementations may freak out when CF-1.8 geometries are added in but work well on their own. Perhaps a target of future refactoring..

@wchen329
Copy link
Contributor Author

wchen329 commented Jun 5, 2019

By the way some random builds (see Travis below) are failing due to tests from other drivers or some random unrelated test seg. faulting (testing code that I didn't write), in these events, would it be useful to...

  1. Document these tests as subjects for further PRs
  2. Possibly restart the test? In some sense I feel this is wasting your guys' resources but it'd be nice to get all green checks

@rouault
Copy link
Member

rouault commented Jun 5, 2019

This also means however, that mixing a legacy Vector dataset with CF-1.8 geometries is not possible anymore.

I don't see this as a problem. I guess in the future we might want to generate CF-1.8 geometries, and completely deprecate/remove the WKT approach

By the way some random builds (see Travis below) are failing

Yes, those random failures are annoying. Part of them due to the CI insfrastructure itself. Some of them unexplained, perhaps coming from GDAL itself but hard to reproduce locally

@wchen329
Copy link
Contributor Author

wchen329 commented Jun 5, 2019

I don't see this as a problem. I guess in the future we might want to generate CF-1.8 geometries, and completely deprecate/remove the WKT approach

Alright, 👍 so in that case I guess maybe this is the most elegant solution. Though, on the note of writing CF-1.8 geometries (and I guess this is more suited for #1287) if I were to make another PR for it, would you say that hiding the WKT implementation behind a command line option would be the most appropriate thing to do?

Yes, those random failures are annoying. Part of them due to the CI insfrastructure itself. Some of them unexplained, perhaps coming from GDAL itself but hard to reproduce locally

Some of this behavior at least seem a little characteristic of intermittent errors when being reckless with C-style memory management and so perhaps may be produced by buffer overflows? I agree though tracking down these type of errors with CI may be kind of spotty, given the fact that these errors kind of randomly pop up.

By the way what other changes would you say need to be made to this PR before I perhaps try to get another go at CI?

@rouault
Copy link
Member

rouault commented Jun 5, 2019

would you say that hiding the WKT implementation behind a command line option would be the most appropriate thing to do?

A layer creation option, yes. The existing FEATURE_TYPE one could be extended likely. But honestly, I'm not sure about the point of writing old-style WKT based geometries in WKT once CF-1.8 support is there. I'd largely prefer to see removal of write support for WKT, which is an obscure feature not even used by the people who made me implement it. And at some later point, we could drop the support for WKT reading. The less code, the better.

C-style memory management and so perhaps may be produced by buffer overflows?

Perhaps, although the gcc52_stdcpp14_sanitize CI target runs with -fsanitize=undefined -fsanitize=address . So it tracks most such issues. Failures seem to occur mostly on the trusty_32bit and osx configurations, which might have their own pecularities

By the way what other changes would you say need to be made to this PR before I perhaps try to get another go at CI?

I'd say we should probably be good once you've addressed my today comments.

@wchen329
Copy link
Contributor Author

wchen329 commented Jun 5, 2019

A layer creation option, yes. The existing FEATURE_TYPE one could be extended likely. But honestly, I'm not sure about the point of writing old-style WKT based geometries in WKT once CF-1.8 support is there. I'd largely prefer to see removal of write support for WKT, which is an obscure feature not even used by the people who made me implement it. And at some later point, we could drop the support for WKT reading. The less code, the better.

Wow, so the WKT portion is underutilized to that extent. The stuff behind old WKT write capabilities do indeed seem quite a bit to maintain. Maybe another refactoring project :)

Perhaps, although the gcc52_stdcpp14_sanitize CI target runs with -fsanitize=undefined -fsanitize=address . So it tracks most such issues. Failures seem to occur mostly on the trusty_32bit and osx configurations, which might have their own pecularities

Hmm indeed a mystery then. But yeah! Those two builds seem to be the ones that randomly segfault.

I'd say we should probably be good once you've addressed my today comments.

Apologies if I read too fast, but I'm not sure which comments from today you are referring to. I did resolve the issues from earlier review however.

@rouault
Copy link
Member

rouault commented Jun 5, 2019

but I'm not sure which comments from today you are referring to

Ah, I see you've fixed them by your later commits. Restarting the OSX build, and we should be good

@wchen329
Copy link
Contributor Author

wchen329 commented Jun 5, 2019

EDIT: Nevermind I read incorrectly at first, thanks @rouault !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants