-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsetting before Caching #47
Comments
Sorry for not replying sooner. |
Martin - thanks for your input! In my use case, the region of interest is much smaller (geographically) than a single geotiff image. Therefore, I am unable to subset based on a file-list. Please let me know if I am misinterpreting your suggestion. The experimental CachingFileSystem option would be excellent assuming xarray does not trigger an entire file download... At the moment I am having an issue getting it to work with intake. Below is my current effort with traceback... will the fsspec class be able to "drop-in" as a new caching mechanism?
{'file': intake.source.cache.FileCache,
Traceback:
Catalog:
|
The At the moment, I think the only solution would be to use this filesystem implementation, or s3 directly, via FUSE - which is usually painful and slower than you might think. Maybe worth a go, though? It does give me idea: fsspec/filesystem_spec#102 , which would not be too hard, I think. |
I think for zarr only the chunks requested are actually cached. For nc files the whole file is cached. @observingClouds |
Question:
I am wondering if it is possible to subset a dataset (via .sel method) before the data is cached.
Reasoning:
My use case is - I would like to cache all the landsat8 data from the s3 repository for a small research (~10km * 10km) station. Currently my catalog looks like (note this subsets the landsat8 tiffs to 60 total - but eventually I would want to use the entire timeseries):
Being able to subset before caching would reduce the amount of storage significantly (see example below for my 60 tiff subset):
Is there currently a way of doing this? If not, how difficult would it be to add this functionality? Is it adding an extra argument to the intake-xarray driver (which would look like: slice: {x:(xmin,xmax),y:(ymin,ymax)} in the catalog) or would it need to include modifications to the caching mechanisms deeper in intake?
Thanks!
The text was updated successfully, but these errors were encountered: