Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abstractions and implementation for remote dataset support #25

Open
stuckyb opened this issue Jan 7, 2022 · 3 comments
Open

abstractions and implementation for remote dataset support #25

stuckyb opened this issue Jan 7, 2022 · 3 comments
Labels
enhancement New feature or request just needs testing Mostly implemented/fixed, but needs to have testing before we close.

Comments

@stuckyb
Copy link
Owner

stuckyb commented Jan 7, 2022

Currently, dataset abstractions (e.g., dataset abstract base classes) are geared toward local datasets. A similar framework needs to be designed and implemented for remote datasets and unified with the local dataset abstractions. Thus, this issue is both a major software engineering task and implementation task.

@HeatherSavoy-USDA
Copy link
Collaborator

Will we have pre-populated catalog entries for remote datasets like the local ones? Or generate/search metadata as needed? If the former, we could at least have data provider- and/or data access protocol-specific ingestion processes (#34)?

@HeatherSavoy-USDA
Copy link
Collaborator

I've been exploring how to support the remote MODIS NDVI dataset (#24). The roadblocks I've run into have been mentioned in various issues but I'm listed them here:

  1. [Minor] It would be nice to have a general self.ds_path that supports remote data URLs (mentioned in configuration system #44)
  2. Currently getData() is opening the remote dataset for every time increment, but it would be more efficient to point to the data store once (stacking instead of explicitly looping over layers in the same dataset? #31)
  3. The data are sparse daily (sub-monthly but non-daily datasets #35) so requests on a specific date of known data is currently working, but a range of dates is not due to Roadblock 2.
  4. I'm doing a slice in the x/y dimensions based on the bounds of subset_geom.geom to limit the amount of data downloaded. It would be better/safer if the geometry buffer was implemented (reproject clipping polygons to dataset CRS #38).

HeatherSavoy-USDA added a commit that referenced this issue Feb 28, 2022
But see #25 for pending issues
HeatherSavoy-USDA added a commit that referenced this issue Mar 7, 2022
For sparse daily data #35
And addresses #25
@HeatherSavoy-USDA
Copy link
Collaborator

I think the only thing remote data specific left that I've run into with the MODIS NDVI dataset that hasn't been resolved as of now is self.ds_path not playing nice with URLs pointing to data stores. I'm currently just not using it so it's not a roadblock, but I'd like it to work to make the code looks nice. I can pass a URL into the initializer like

super().__init__('https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac', '1299')

But it strips the '//' out and doesn't play nice with open_url(). It seems like a simple fix that has different options on how to do, but it seems like a python specific style choice that I'll leave to a more python person.

@HeatherSavoy-USDA HeatherSavoy-USDA added the just needs testing Mostly implemented/fixed, but needs to have testing before we close. label May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request just needs testing Mostly implemented/fixed, but needs to have testing before we close.
Projects
None yet
Development

No branches or pull requests

2 participants