Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn Download_LIDAR_data into a parameterized script #4

Open
scottyhq opened this issue Jan 3, 2025 · 17 comments
Open

Turn Download_LIDAR_data into a parameterized script #4

scottyhq opened this issue Jan 3, 2025 · 17 comments
Labels

Comments

@scottyhq
Copy link
Member

scottyhq commented Jan 3, 2025

In order to generate DeepDEM products for a variety of STV study sites it would be great to turn the notebooks into separate parameterized scripts, starting with this one: https://github.com/uw-cryo/DeepDEM/blob/main/notebooks/0a_Download_LIDAR_data.ipynb

Took a quick stab at this here, https://github.com/uw-cryo/stv-aux-data/blob/main/pdal_pipeline.py, but I think it should live in the repo under the scripts folder.

Would be good to also document practical considerations, such as recommended computational resources for a given AOI size and result posting (e.g. 10m vs 1m).

@ShashankBice
Copy link
Member

Hi Scott, I think Karthik's script is the latest and greatest we have which we should work on for further improvements :) @kvenkman please upload your python script to internal repo, and then we can take a stab.

I am getting an account on MAAP, NASA's internal planetary computer. Making the 3DEP DSM there using the pdal pipeline is going to be one of my first tests of the platform 😊 I'll keep you all posted!

@ShashankBice
Copy link
Member

The projection info in the EPT file is an issue. For the Mt. Baker site, we did not have to specify the NAVD88 geoid in the a_srs vertical projection info, and the output DEM had no geoid offset with respect to WGS84 ellipsoid products from ASP.
For the Colorado dataset, not specifying the NAVD88 vertical datum in the a_srs caused a geoid offset. The pdalinfo on both of the data set point cloud returned similar outputs, and we could not find a way to understand what is the vertical projection of the input EPT files.

@ShashankBice
Copy link
Member

Finally, from David's slack message, which I agree with:

  • we need to produce the DSM using 100, 98 and 95 percentiles, to decide which version provides the most accurate canopy heights.
  • We also need to investigate site specific point cloud classification, which we will have to use for point cloud filtering.

@dshean
Copy link
Member

dshean commented Jan 4, 2025

We need to regroup on this, centralize, perform a detailed review, and define updated requirements.

I'm not sure this should live in the DeepDEM repo. Moving it elsewhere eliminates need for pdal dependency. It could end up in coincident repo, as we will need to prepare 3DEP (and other) lidar data. Or it could be a standalone repo. The output lidar DSM products are needed by Shashank's co-registration scripts, to prepare aligned inputs for DeepDEM.

@dshean
Copy link
Member

dshean commented Jan 4, 2025

I don't understand why points classified as roads are removed: https://github.com/uw-cryo/stv-aux-data/blob/main/pdal_pipeline.py#L117. This may be specific to Mt. Baker 3DEP data, and should be removed from the general processing pipeline.

@dshean
Copy link
Member

dshean commented Jan 4, 2025

I have some concerns about the percentile filtering implementation and default use of idw gridding. I believe this is the cause of the unusual/unexpected offsets between the lidar "DSM" and other DSM sources over vegetation (@ShashankBice @Jack-Hayes).

I see notes about filtering noisy Z values: https://github.com/uw-cryo/stv-aux-data/blob/main/pdal_pipeline.py#L252C3-L252C81. That is not what this is supposed to do.

The objective of this percentile filter option was to allow users to create a gridded raster with output values using a relative height percentile (like RH98, where output value is the 98th percentile of the distribution for all height values for points within some radius of that output grid cell). This is a workaround, as the core functionality is not currently available in PDAL: PDAL/PDAL#1596.

@kvenkman , the workaround we discussed was to remove the upper X% of point height values (not necessarily noise, these could be real returns from highest part of canopy within the cell), and then use max when creating output DSM grid. Right now, as far as I can tell, the code in Cell 5 (https://github.com/uw-cryo/DeepDEM/blob/main/notebooks/0a_Download_LIDAR_data.ipynb) is still using idw gridding after this percentile filter to create a DSM. This will include points from the ground, and canopy returns, NOT isolating points from the top of canopy that we want for a DSM. The resulting output "DSM" values will be biased low over vegetation or any other cells with multiple returns. This will also impact any previous training for DeepDEM, as the lidar DSM "truth" is not really a DSM.

If we are creating a DTM, and filters have isolated ground returns, then idw is an appropriate gridding method. But for a DSM, we must either use max on unfiltered points (RH100, susceptible to noise), or max after the relative percentile filter, which effectively gives us RHX, where X is the corresponding percentile).

Hopefully that makes sense. This is critical and should be fixed ASAP, as it is affecting multiple projects.

@dshean dshean added bug Something isn't working priority and removed bug Something isn't working labels Jan 5, 2025
@kvenkman
Copy link
Member

kvenkman commented Jan 6, 2025

@dshean I wrote in the option to filter out road classifications as in some scenes they seemed to contribute to noise. We can switch it off for scenes where unnecessary - or remove the code altogether if its extraneous.

For the current iteration of DeepDEM experiments, the percentile filter has been switched OFF (running tests to double-check), so we have been using RH100. We can extend the notebook to have the option to specify gridding methods as well.

Lastly, I agree with the general consensus that the code to download, process and generate DSMs for various projects should not live in the DeepDEM repository - I wrote up this notebook (borrowing heavily from the OT version) to serve only as a starting point for folks who jump directly to using the repo, since there was no code for this purpose. Ideally this code would live/be maintained in another repo, and referenced / periodically be updated here.

@dshean
Copy link
Member

dshean commented Jan 6, 2025

OK, but we don't want to use RH100. We want to use RH98, which can be accomplished using the PDAL percentile filter workaround and max. Are we on the same page here?

@dshean
Copy link
Member

dshean commented Jan 6, 2025

If you switched the percentile filter OFF, but are still using idw, then you are not creating an output DSM with RH100, and the values will be biased low over vegetation or any areas with multiple returns. Hopefully that is clear.
It is critical to define the appropriate filtering steps and gridding method for each pipeline for DSM, DTM, and the intensity raster. This is not really about allowing options for gridding methods in notebooks, we need to carefully define and test the correct pipeline for each output.

@kvenkman
Copy link
Member

kvenkman commented Jan 6, 2025

Understood - the ground returns would be biasing the DSM low when using idw. I'll rerun the DSM processing

@dshean
Copy link
Member

dshean commented Jan 6, 2025

OK. Can you please confirm what you mean by "rerun the DSM processing"? Want to make sure we are on the same page around the filters and parameters to use, so we don't have to do this again.

It's not just the ground returns. The number of ground returns is relatively low. Most of the valid returns are coming from within the canopy. We want to limit our DSM values to our best estimate for the top of canopy returns.

I appreciate centralization, and a notebook with single function and flexible options, but we really just need need separate pipelines for these products (DSM, DTM, intensity raster). We need to isolate 'first,only' returns and use idw for DSM creation. Or if using all returns, use a metric like RH98 and max.

I've shared this in the past, but here it is again https://github.com/uw-cryo/EarthLab_AirQuality_UAV/blob/d9b37831b6b6b3aab4cbd476735fdf55b0379560/notebooks/EarthLab_AQ_lidar_download_processing_function.ipynb:
EarthLab_AQ_lidar_download_processing_function ipynb at d9b37831b6b6b3aab4cbd476735fdf55b0379… 2025-01-06 09-45-40

@dshean
Copy link
Member

dshean commented Jan 6, 2025

Not a priority at the moment, but for everyone to better understand these issues, it would be valuable to run some tests for small bboxes over different vegetation types at PCD sites using different combinations of return filters, class filters, percentile filters and gridding methods.

Analyzing differences between these outputs will illustrate the importance of these processing choices.

Can then compare the PDAL output with the vendor-provided DTM products, and when available, the vendor-provided DSM products. For example, see observed differences in Cells 57 and 58 here: https://github.com/uw-cryo/EarthLab_AirQuality_UAV/blob/d9b37831b6b6b3aab4cbd476735fdf55b0379560/notebooks/EarthLab_AQ_lidar_analysis_viz_old.ipynb.

Shean_DSI-21-0008_STV_AnnualReview_Year2_20240927 - Google Slides 2025-01-06 09-53-58

@kvenkman
Copy link
Member

kvenkman commented Jan 6, 2025

Re. reprocessing the DSM, I was going to turn ON the percentile filter with a threshold of 98%, and set the gridding method to 'max'. Does that sound right?

@dshean
Copy link
Member

dshean commented Jan 6, 2025

You can try that approach (remove upper 2% of points, then compute max), but I also think the more traditional ALS processing approach (isolating 'first,only' returns, then idw) should be implemented and tested. How much the two differ will depend on a number of factors, including point density, grid cell size, vegetation characteristics.

@kvenkman
Copy link
Member

kvenkman commented Jan 7, 2025

After extensive testing over the Mt Baker site, I can confirm that using only the first returns along with idw gridding yields a DSM that is on average 10 m higher over dense vegetation, with very little change over bare terrain. The previous DSM I had generated was gridded using idw but considered all returns.

I have reprocessed the DSM for the entire site and will be re-running training of the models tomorrow.

@scottyhq
Copy link
Member Author

scottyhq commented Jan 8, 2025

I'm not sure this should live in the DeepDEM repo. Moving it elsewhere eliminates need for pdal dependency. It could end up in coincident repo, as we will need to prepare 3DEP (and other) lidar data. Or it could be a standalone repo. The output lidar DSM products are needed by Shashank's co-registration scripts, to prepare aligned inputs for DeepDEM.

Thanks everyone for all the clarifications and workflow refinement! Revisiting the initial focus of this thread, I think there is consensus to move the LiDAR processing into it's own repo, that will greatly simplify environment management for the PDAL dependency (#14), and also we can have tighter scope around each piece of software.

If it's to be a standalone installable package we should have a short and sweet unique name (not one of these https://pypi.org/search/?q=lidar), and be sure to clarify the focus of the workflows compared to other existing tools (e.g. https://github.com/opentopography). @dshean, @kvenkman, or @ShashankBice if one of you could create a new repo in this org with a suggested name I'll start migrating the latest code into script form.

@ShashankBice
Copy link
Member

Here is a repo. Feel free to experiment with the different functions and parameters, tweak names etc 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Todo
Development

No branches or pull requests

4 participants