-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Xarray sub-package #1013
Add Xarray sub-package #1013
Conversation
if "x" not in da.dims and "y" not in da.dims: | ||
try: | ||
latitude_var_name = next( | ||
x for x in ["lat", "latitude", "LAT", "LATITUDE", "Lat"] if x in da.dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to support other variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say no. Dataset with other names will likely not be regular lat/lon grids and fail in other ways.
|
||
|
||
@define(kw_only=True) | ||
class TilerFactory(BaseTilerFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By sub-classing titiler.core.factory.TilerFactory we avoid re-writing code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we need to redefine all of the class attributes (e.g. stats_dependency
) that are already defined in titiler.core.factory.TilerFactory
?
|
||
# remove some attribute from init | ||
img_preview_dependency: Type[DefaultDependency] = field(init=False) | ||
add_preview: bool = field(init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we remove those 2 attributes because we don't support /preview
endpoints
return Response(content, media_type=media_type) | ||
|
||
# custom /statistics endpoints (remove /statistics - GET) | ||
def statistics(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ IMO having a full dataset /statistics in a bit dangerous (as for the /preview endpoints) which is why we support only geojson statistics
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work here, @vincentsarago!
This is a very opinionated take, but I think titiler-xarray would be best off with two separate routes, each with its own set of optional dependencies. The first route would be zarr
, which would open Zarr and virtual Zarr datasets using xarray.open_zarr
. The second route would be md
, which would opening any dataset readable by xarray.open_dataset
.
The primary reason I think we should do this is that it would enable us to incentivize virtualizing datasets into zarr, which would lead to much faster tile generation. We could do this by:
- Having all query parameters in the zarr route only relevant for
open_zarr
, simplifying API usage. - Automatically detect virtual datasets, removing the need for the
reference
parameter. - Lightening the image size for titiler-xarray deployments only using zarr because other readers would not be installed (and eventually obstore and/or icechunk could be used instead of the fsspec dependencies)
This would also simplify non-zarr usage for the following reasons:
- Zarr specific parameters (e.g., group, consolidated) would not be included in endpoints in the
md
route - We could use xarray's automatic backend detection rather than writing our own in
titiler/xarray/io.py
I also think isolating Zarr usage would simplify the eventual support of the GeoZarr and multiscales specifications.
Thanks @maxrjones 🙏 I see what you're saying. The goal of having a single Reader was to handle all the non-COG dataset so splitting in to two separate reader/set of endpoints would not meat the goal.
We can absolutely use
If I follow your think, it seems we would need a What if we make the dependencies optional? I'm going to open a PR on top of this one to try some things |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! The concept of creating pyramids in a zarr store was new to me, then I googled around and found @maxrjones's notebook 😆.
It is great to have the io
methods standardized here so we can import them in titiler.cmr
and other applications.
Co-authored-by: Henry Rodman <[email protected]>
* use xarray.open_zarr and make aiohttp and s3fs optional * add support for references * tests prefixed protocol * use tmp_dir for reference * add parquet support * remove kerchunk support
…o feature/add-xarray-package
): | ||
"""return available variables.""" | ||
with self.dataset_opener(src_path, **io_params.as_dict()) as ds: | ||
return list(ds.data_vars) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easier to put this into an extension than to have a list_variable
in a Class Method (because in the class method the dataset_opener
is customizable
... | ||
|
||
|
||
def xarray_open_dataset( # noqa: C901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved everything within the xarray_open_dataset
function to ease the customization and also because I've made som dependencies optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is a fine default option for some applications, but it is easier to customize now because all the user has to do is write a new dataset_opener
- nice!
variable: str = attr.ib() | ||
|
||
# xarray.Dataset options | ||
opener: Callable[..., xarray.Dataset] = attr.ib(default=xarray_open_dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now the opener
MUST be a callable that take 4 arguments:
- src_path: str
- group: Any
- decode_times: bool
cache_client
we might change this in the future
* remove cache layer * Update src/titiler/xarray/README.md Co-authored-by: Aimee Barciauskas <[email protected]> * add tile example --------- Co-authored-by: Aimee Barciauskas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great, thanks for all of your work moving xarray support further upstream! I left a few small comments but nothing blocking.
... | ||
|
||
|
||
def xarray_open_dataset( # noqa: C901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is a fine default option for some applications, but it is easier to customize now because all the user has to do is write a new dataset_opener
- nice!
description="RasterIO resampling algorithm. Defaults to `nearest`.", | ||
), | ||
] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the default behavior get set somewhere else or do we need to set it to something besides None
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it defaults to what rio-tiler has (nearest
)
dataset_dependency: Type[DefaultDependency] = DatasetParams | ||
|
||
# Tile/Tilejson/WMTS Dependencies (Not used in titiler.xarray) | ||
tile_dependency: Type[DefaultDependency] = DefaultDependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this default to TileParams
instead of DefaultDependency
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no because there is not buffer nor padding for the XarrayReader's tile method
|
||
|
||
@define(kw_only=True) | ||
class TilerFactory(BaseTilerFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we need to redefine all of the class attributes (e.g. stats_dependency
) that are already defined in titiler.core.factory.TilerFactory
?
@hrodmn for most of them we don't have to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the io logic, please see comments below. Nothing blocking.
if "x" not in da.dims and "y" not in da.dims: | ||
try: | ||
latitude_var_name = next( | ||
x for x in ["lat", "latitude", "LAT", "LATITUDE", "Lat"] if x in da.dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say no. Dataset with other names will likely not be regular lat/lon grids and fail in other ways.
Co-authored-by: Jonas <[email protected]>
Co-authored-by: Jonas <[email protected]>
Co-authored-by: Jonas <[email protected]>
…o feature/add-xarray-package
da6ce9b
to
b69591c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the pyramid and reference components of the code. Is the current plan to include those in this PR or save them for later development? fwiw I would suggest the latter since there's been movement in GeoZarr multiscales and virtualizarr/icechunk since the original extension was developed
Co-authored-by: Max Jones <[email protected]>
@maxrjones we've removed the |
"consolidated": False, | ||
"backend_kwargs": {"consolidated": False}, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxrjones oh you were talking about this! I think we can let this go 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a new issue to add kerchunk reference!
overtake #1007
ref: developmentseed/titiler-xarray#68
To Do