Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use factory from titiler.xarray #72

Merged
merged 12 commits into from
Jan 9, 2025
Merged

Conversation

hrodmn
Copy link
Contributor

@hrodmn hrodmn commented Nov 26, 2024

Now we can import features from titiler.xarray and just make a few modifications instead of defining custom methods for everything 🥳

  • reader.py just adds a cache to the default titiler.xarray.io.Reader
  • factory.py just adds /variables and /histogram endpoints to titiler.xarray.factory.TilerFactory, and has the custom /map endpoint template.

There are a few things that will change in this application as a result, though.

dropped features

If we want to keep either of these we will need to keep the custom factory methods rather than recycle the existing factory methods from titiler.xarray and rio-tiler:

smaller changes

  • tileMatrixSetId is now a required parameter for most endpoints (removed default of WebMercatorQuad)
  • the minzoom and maxzoom parameters have been removed from the /info response
  • band_metadata and band_description are no longer empty in /info response
  • The bounds in the tilejson.json response are no longer capped to -180, -90, 180, 90 because this chunk is not defined in the rio-tiler tilejson method
    minx, miny, maxx, maxy = zip(
    [-180, -90, 180, 90], list(src_dst.geographic_bounds)
    )
    bounds = [max(minx), max(miny), min(maxx), min(maxy)]

I will need to update the test expectations to match these feature changes but want to check with others first!

Unfortunately the renaming operation (titiler/xarray -> titiler/xarray_api makes the PR diff less useful than it could be for factory.py :/ so I will post a diff here:

factory.py diff

3,4c3
< from dataclasses import dataclass
< from typing import Dict, List, Literal, Optional, Type, Union
---
> from typing import List, Literal, Optional, Type, Union
9,11c8,9
< from fastapi import Depends, Path, Query
< from pydantic import conint
< from rio_tiler.models import Info
---
> from attrs import define
> from fastapi import Depends, Query
13c11
< from starlette.responses import HTMLResponse, Response
---
> from starlette.responses import HTMLResponse
17,19c15
< from titiler.core.dependencies import ColorFormulaParams
< from titiler.core.factory import BaseTilerFactory, img_endpoint_params
< from titiler.core.models.mapbox import TileJSON
---
> from titiler.core.dependencies import ColorFormulaParams, DefaultDependency
22,23c18,20
< from titiler.core.utils import render_image
< from titiler.xarray.reader import ZarrReader
---
> from titiler.xarray.dependencies import XarrayIOParams, XarrayParams
> from titiler.xarray.factory import TilerFactory as BaseTilerFactory
> from titiler.xarray_api.reader import XarrayReader
42,44c39,41
< @dataclass
< class ZarrTilerFactory(BaseTilerFactory):
<     """Zarr Tiler Factory."""
---
> @define(kw_only=True)
> class XarrayTilerFactory(BaseTilerFactory):
>     """Xarray Tiler Factory."""
46c43,44
<     reader: Type[ZarrReader] = ZarrReader
---
>     reader: Type[XarrayReader] = XarrayReader
>     reader_dependency: Type[DefaultDependency] = XarrayParams
49a48,52
>         super().register_routes()
>         self.variables()
> 
>     def variables(self) -> None:
>         """Register /variables endpoint"""
56,84c59,61
<         def variable_endpoint(
<             url: Annotated[str, Query(description="Dataset URL")],
<             group: Annotated[
<                 Optional[int],
<                 Query(
<                     description="Select a specific zarr group from a zarr hierarchy. Could be associated with a zoom level or dataset."
<                 ),
<             ] = None,
<             reference: Annotated[
<                 Optional[bool],
<                 Query(
<                     title="reference",
<                     description="Whether the dataset is a kerchunk reference",
<                 ),
<             ] = False,
<             decode_times: Annotated[
<                 Optional[bool],
<                 Query(
<                     title="decode_times",
<                     description="Whether to decode times",
<                 ),
<             ] = True,
<             consolidated: Annotated[
<                 Optional[bool],
<                 Query(
<                     title="consolidated",
<                     description="Whether to expect and open zarr store with consolidated metadata",
<                 ),
<             ] = True,
---
>         def get_variables(
>             src_path=Depends(self.path_dependency),
>             io_params=Depends(XarrayIOParams),
87,88c64,67
<             return self.reader.list_variables(
<                 url, group=group, reference=reference, consolidated=consolidated
---
>             return XarrayReader.list_variables(
>                 src_path=src_path,
>                 group=io_params.group,
>                 decode_times=io_params.decode_times,
91,423c70,72
<         @self.router.get(
<             "/info",
<             response_model=Info,
<             response_model_exclude_none=True,
<             response_class=JSONResponse,
<             responses={200: {"description": "Return dataset's basic info."}},
<         )
<         def info_endpoint(
<             url: Annotated[str, Query(description="Dataset URL")],
<             variable: Annotated[
<                 str,
<                 Query(description="Xarray Variable"),
<             ],
<             group: Annotated[
<                 Optional[int],
<                 Query(
<                     description="Select a specific zarr group from a zarr hierarchy, can be for pyramids or datasets. Can be used to open a dataset in HDF5 files."
<                 ),
<             ] = None,
<             reference: Annotated[
<                 bool,
<                 Query(
<                     title="reference",
<                     description="Whether the dataset is a kerchunk reference",
<                 ),
<             ] = False,
<             decode_times: Annotated[
<                 bool,
<                 Query(
<                     title="decode_times",
<                     description="Whether to decode times",
<                 ),
<             ] = True,
<             drop_dim: Annotated[
<                 Optional[str],
<                 Query(description="Dimension to drop"),
<             ] = None,
<             show_times: Annotated[
<                 Optional[bool],
<                 Query(description="Show info about the time dimension"),
<             ] = None,
<             consolidated: Annotated[
<                 Optional[bool],
<                 Query(
<                     title="consolidated",
<                     description="Whether to expect and open zarr store with consolidated metadata",
<                 ),
<             ] = True,
<         ) -> Info:
<             """Return dataset's basic info."""
<             with self.reader(
<                 url,
<                 variable=variable,
<                 group=group,
<                 reference=reference,
<                 decode_times=decode_times,
<                 drop_dim=drop_dim,
<                 consolidated=consolidated,
<             ) as src_dst:
<                 info = src_dst.info().model_dump()
<                 if show_times and "time" in src_dst.input.dims:
<                     times = [str(x.data) for x in src_dst.input.time]
<                     info["count"] = len(times)
<                     info["times"] = times
< 
<             return Info(**info)
< 
<         @self.router.get(r"/tiles/{z}/{x}/{y}", **img_endpoint_params)
<         @self.router.get(r"/tiles/{z}/{x}/{y}.{format}", **img_endpoint_params)
<         @self.router.get(r"/tiles/{z}/{x}/{y}@{scale}x", **img_endpoint_params)
<         @self.router.get(r"/tiles/{z}/{x}/{y}@{scale}x.{format}", **img_endpoint_params)
<         @self.router.get(r"/tiles/{tileMatrixSetId}/{z}/{x}/{y}", **img_endpoint_params)
<         @self.router.get(
<             r"/tiles/{tileMatrixSetId}/{z}/{x}/{y}.{format}", **img_endpoint_params
<         )
<         @self.router.get(
<             r"/tiles/{tileMatrixSetId}/{z}/{x}/{y}@{scale}x", **img_endpoint_params
<         )
<         @self.router.get(
<             r"/tiles/{tileMatrixSetId}/{z}/{x}/{y}@{scale}x.{format}",
<             **img_endpoint_params,
<         )
<         def tiles_endpoint(  # type: ignore
<             z: Annotated[
<                 int,
<                 Path(
<                     description="Identifier (Z) selecting one of the scales defined in the TileMatrixSet and representing the scaleDenominator the tile.",
<                 ),
<             ],
<             x: Annotated[
<                 int,
<                 Path(
<                     description="Column (X) index of the tile on the selected TileMatrix. It cannot exceed the MatrixHeight-1 for the selected TileMatrix.",
<                 ),
<             ],
<             y: Annotated[
<                 int,
<                 Path(
<                     description="Row (Y) index of the tile on the selected TileMatrix. It cannot exceed the MatrixWidth-1 for the selected TileMatrix.",
<                 ),
<             ],
<             url: Annotated[str, Query(description="Dataset URL")],
<             variable: Annotated[
<                 str,
<                 Query(description="Xarray Variable"),
<             ],
<             tileMatrixSetId: Annotated[  # type: ignore
<                 Literal[tuple(self.supported_tms.list())],
<                 f"Identifier selecting one of the TileMatrixSetId supported (default: '{self.default_tms}')",
<             ] = self.default_tms,
<             scale: Annotated[  # type: ignore
<                 conint(gt=0, le=4), "Tile size scale. 1=256x256, 2=512x512..."
<             ] = 1,
<             format: Annotated[
<                 ImageType,
<                 "Default will be automatically defined if the output image needs a mask (png) or not (jpeg).",
<             ] = None,
<             multiscale: Annotated[
<                 bool,
<                 Query(
<                     title="multiscale",
<                     description="Whether the dataset has multiscale groups (Zoom levels)",
<                 ),
<             ] = False,
<             reference: Annotated[
<                 bool,
<                 Query(
<                     title="reference",
<                     description="Whether the dataset is a kerchunk reference",
<                 ),
<             ] = False,
<             decode_times: Annotated[
<                 bool,
<                 Query(
<                     title="decode_times",
<                     description="Whether to decode times",
<                 ),
<             ] = True,
<             drop_dim: Annotated[
<                 Optional[str],
<                 Query(description="Dimension to drop"),
<             ] = None,
<             datetime: Annotated[
<                 Optional[str], Query(description="Slice of time to read (if available)")
<             ] = None,
<             post_process=Depends(self.process_dependency),
<             rescale=Depends(self.rescale_dependency),
<             color_formula=Depends(ColorFormulaParams),
<             colormap=Depends(self.colormap_dependency),
<             render_params=Depends(self.render_dependency),
<             consolidated: Annotated[
<                 Optional[bool],
<                 Query(
<                     title="consolidated",
<                     description="Whether to expect and open zarr store with consolidated metadata",
<                 ),
<             ] = True,
<             nodata=Depends(nodata_dependency),
<         ) -> Response:
<             """Create map tile from a dataset."""
<             tms = self.supported_tms.get(tileMatrixSetId)
<             with self.reader(
<                 url,
<                 variable=variable,
<                 group=z if multiscale else None,
<                 reference=reference,
<                 decode_times=decode_times,
<                 drop_dim=drop_dim,
<                 datetime=datetime,
<                 tms=tms,
<                 consolidated=consolidated,
<             ) as src_dst:
<                 image = src_dst.tile(
<                     x,
<                     y,
<                     z,
<                     tilesize=scale * 256,
<                     nodata=nodata if nodata is not None else src_dst.input.rio.nodata,
<                 )
< 
<             if post_process:
<                 image = post_process(image)
< 
<             if rescale:
<                 image.rescale(rescale)
< 
<             if color_formula:
<                 image.apply_color_formula(color_formula)
< 
<             content, media_type = render_image(
<                 image,
<                 output_format=format,
<                 colormap=colormap,
<                 **render_params,
<             )
< 
<             return Response(content, media_type=media_type)
< 
<         @self.router.get(
<             "/tilejson.json",
<             response_model=TileJSON,
<             responses={200: {"description": "Return a tilejson"}},
<             response_model_exclude_none=True,
<         )
<         @self.router.get(
<             "/{tileMatrixSetId}/tilejson.json",
<             response_model=TileJSON,
<             responses={200: {"description": "Return a tilejson"}},
<             response_model_exclude_none=True,
<         )
<         def tilejson_endpoint(  # type: ignore
<             request: Request,
<             url: Annotated[str, Query(description="Dataset URL")],
<             variable: Annotated[
<                 str,
<                 Query(description="Xarray Variable"),
<             ],
<             tileMatrixSetId: Annotated[  # type: ignore
<                 Literal[tuple(self.supported_tms.list())],
<                 f"Identifier selecting one of the TileMatrixSetId supported (default: '{self.default_tms}')",
<             ] = self.default_tms,
<             group: Annotated[
<                 Optional[int],
<                 Query(
<                     description="Select a specific zarr group from a zarr hierarchy, can be for pyramids or datasets. Can be used to open a dataset in HDF5 files."
<                 ),
<             ] = None,
<             reference: Annotated[
<                 bool,
<                 Query(
<                     title="reference",
<                     description="Whether the dataset is a kerchunk reference",
<                 ),
<             ] = False,
<             decode_times: Annotated[
<                 bool,
<                 Query(
<                     title="decode_times",
<                     description="Whether to decode times",
<                 ),
<             ] = True,
<             drop_dim: Annotated[
<                 Optional[str],
<                 Query(description="Dimension to drop"),
<             ] = None,
<             datetime: Annotated[
<                 Optional[str], Query(description="Slice of time to read (if available)")
<             ] = None,
<             tile_format: Annotated[
<                 Optional[ImageType],
<                 Query(
<                     description="Default will be automatically defined if the output image needs a mask (png) or not (jpeg).",
<                 ),
<             ] = None,
<             tile_scale: Annotated[
<                 int,
<                 Query(
<                     gt=0, lt=4, description="Tile size scale. 1=256x256, 2=512x512..."
<                 ),
<             ] = 1,
<             minzoom: Annotated[
<                 Optional[int],
<                 Query(description="Overwrite default minzoom."),
<             ] = None,
<             maxzoom: Annotated[
<                 Optional[int],
<                 Query(description="Overwrite default maxzoom."),
<             ] = None,
<             post_process=Depends(self.process_dependency),
<             rescale=Depends(self.rescale_dependency),
<             color_formula=Depends(ColorFormulaParams),
<             colormap=Depends(self.colormap_dependency),
<             render_params=Depends(self.render_dependency),
<             consolidated: Annotated[
<                 Optional[bool],
<                 Query(
<                     title="consolidated",
<                     description="Whether to expect and open zarr store with consolidated metadata",
<                 ),
<             ] = True,
<             nodata=Depends(nodata_dependency),
<         ) -> Dict:
<             """Return TileJSON document for a dataset."""
<             route_params = {
<                 "z": "{z}",
<                 "x": "{x}",
<                 "y": "{y}",
<                 "scale": tile_scale,
<                 "tileMatrixSetId": tileMatrixSetId,
<             }
<             if tile_format:
<                 route_params["format"] = tile_format.value
<             tiles_url = self.url_for(request, "tiles_endpoint", **route_params)
< 
<             qs_key_to_remove = [
<                 "tilematrixsetid",
<                 "tile_format",
<                 "tile_scale",
<                 "minzoom",
<                 "maxzoom",
<                 "group",
<             ]
<             qs = [
<                 (key, value)
<                 for (key, value) in request.query_params._list
<                 if key.lower() not in qs_key_to_remove
<             ]
<             if qs:
<                 tiles_url += f"?{urlencode(qs)}"
< 
<             tms = self.supported_tms.get(tileMatrixSetId)
< 
<             with self.reader(
<                 url,
<                 variable=variable,
<                 group=group,
<                 reference=reference,
<                 decode_times=decode_times,
<                 tms=tms,
<                 consolidated=consolidated,
<             ) as src_dst:
<                 # see https://github.com/corteva/rioxarray/issues/645
<                 minx, miny, maxx, maxy = zip(
<                     [-180, -90, 180, 90], list(src_dst.geographic_bounds)
<                 )
<                 bounds = [max(minx), max(miny), min(maxx), min(maxy)]
< 
<                 return {
<                     "bounds": bounds,
<                     "minzoom": minzoom if minzoom is not None else src_dst.minzoom,
<                     "maxzoom": maxzoom if maxzoom is not None else src_dst.maxzoom,
<                     "tiles": [tiles_url],
<                 }
---
>     def statistics(self) -> None:
>         """Register /statistics and /histogram endpoints"""
>         super().statistics()
432,456c81,82
<             url: Annotated[str, Query(description="Dataset URL")],
<             variable: Annotated[
<                 str,
<                 Query(description="Xarray Variable"),
<             ],
<             reference: Annotated[
<                 bool,
<                 Query(
<                     title="reference",
<                     description="Whether the dataset is a kerchunk reference",
<                 ),
<             ] = False,
<             consolidated: Annotated[
<                 bool,
<                 Query(
<                     title="consolidated",
<                     description="Whether to expect a consolidated dataset",
<                 ),
<             ] = True,
<             group: Annotated[
<                 Optional[int],
<                 Query(
<                     description="Select a specific zarr group from a zarr hierarchy, can be for pyramids or datasets. Can be used to open a dataset in HDF5 files."
<                 ),
<             ] = None,
---
>             src_path=Depends(self.path_dependency),
>             reader_params=Depends(self.reader_dependency),
459,463c85,89
<                 url,
<                 variable=variable,
<                 reference=reference,
<                 consolidated=consolidated,
<                 group=group,
---
>                 src_path=src_path,
>                 variable=reader_params.variable,
>                 group=reader_params.group,
>                 decode_times=reader_params.decode_times,
>                 datetime=reader_params.datetime,
477c103,105
<         @self.router.get("/map", response_class=HTMLResponse)
---
>     def map_viewer(self) -> None:
>         """Register /map endpoints"""
> 
481d108
<             url: Annotated[Optional[str], Query(description="Dataset URL")] = None,
484,485c111,113
<                 f"Identifier selecting one of the TileMatrixSetId supported (default: '{self.default_tms}')",
<             ] = self.default_tms,
---
>                 "Identifier selecting one of the supported TileMatrixSetIds",
>             ],
>             url: Annotated[Optional[str], Query(description="Dataset URL")] = None,
496,502d123
<             reference: Annotated[
<                 bool,
<                 Query(
<                     title="reference",
<                     description="Whether the dataset is a kerchunk reference",
<                 ),
<             ] = False,
545,547c166,167
<             templates = Jinja2Templates(
<                 directory="",
<                 loader=jinja2.ChoiceLoader([jinja2.PackageLoader(__package__, ".")]),
---
>             jinja2_env = jinja2.Environment(
>                 loader=jinja2.ChoiceLoader([jinja2.PackageLoader(__package__, ".")])
548a169,170
>             templates = Jinja2Templates(env=jinja2_env)
> 
551c173
<                     request, "tilejson_endpoint", tileMatrixSetId=tileMatrixSetId
---
>                     request, "tilejson", tileMatrixSetId=tileMatrixSetId

Resolves #71

@j08lue
Copy link
Member

j08lue commented Nov 27, 2024

No more support for reference files

I see this was discussed here: developmentseed/titiler#1016 (comment)

So no more Kerchunk support? While I see how virtual Zarr is the better approach, I think some projects that are looking to use TiTiler for multidim datasets are still using Kerchunk.

For example on the UKRI / UK EO DataHub, this may be data they need to load with TiTiler: https://radiantearth.github.io/stac-browser/#/external/eocis.org/stac/collections/eocis-lst-slstrA-day

Example reference file: https://eocis.org/data/eocis-lst-slstrA-day-kerchunk/2024/10/EOCIS_-LST-L3C-LST-SLSTRA-0.01deg_1DAILY_DAY-20241020000000-fv4.00-kerchunk.json

Would there be an easy enough way for them to enable this in a custom runtime / fork of TiTiler-Xarray, if they need it?

Not a blocker for merging this, but we should point out ways forward to them.

I see VirtualiZarr and Icechunk have docs on migrating from Kerchunk / creating virtual datasets:

👌

@hrodmn
Copy link
Contributor Author

hrodmn commented Nov 28, 2024

Thanks for the reviews @j08lue and @abarciauskas-bgse!

Would there be an easy enough way for them to enable this in a custom runtime / fork of TiTiler-Xarray, if they need it?

It don't think it would be too hard to keep that feature working in this application! If it is important to any users right now we can define a custom version of open_xarray_dataset rather than importing it from titiler.xarray.

It will be very simple if we can identify kerchunk reference files from the url parameter. Can we assume that any file with a .json file extension will be a kerchunk reference? If so, we would only need to add some logic to a custom opener function. If we truly need an additional argument like reference we would need to redefine all of the factory endpoints to use it :/.

@j08lue
Copy link
Member

j08lue commented Nov 28, 2024

It will be very simple if we can identify kerchunk reference files from the url parameter

If it is simple, either via .json or the reference:// prefix that @vincentsarago suggested here, I think we should add it for now.

Maybe mark Kerchunk support for deprecation already, to nudge existing users time to move to Icechunk or so.

I guess this adds a few dependencies, though, which would be great to be able to avoid longer-term.

@vincentsarago
Copy link
Member

Just a reminder, we've drop kerchunk reference support in titiler-xarray because it needed some nested option (e.g you could have your kerchunk on one bucket in S3 but the netcdf on another bucket but maybe private, meaning that you need to be able to define both kwargs when using fsspec, maybe I'm wrong but that what I understood at first.

We could re-add it back but with the assumption that the reference file will always sit next to the data

@abarciauskas-bgse
Copy link
Contributor

@j08lue I discussed briefly with @hrodmn - I agree we should add back support so we can support the UKRI / UK EO DataHub datasets. However I'm not sure it makes sense to add it to this application specifically since the deployment will only work with datasets in S3 (due to the lambda's subnet configuration), so even if we had back reference support it won't work for the example https://eocis.org/data/eocis-lst-slstrA-day-kerchunk/2024/10/EOCIS_-LST-L3C-LST-SLSTRA-0.01deg_1DAILY_DAY-20241020000000-fv4.00-kerchunk.json. When would you like to demonstrate this functionality to UKRI / UK EO DataHub? I think it would make the most sense to create a separate demo deployment for them which has this reference support.

@j08lue
Copy link
Member

j08lue commented Dec 4, 2024

create a separate demo deployment for them which has this reference support.

Thanks for being so considerate. The UK EODH engineering team is deploying their own instance of TiTiler-Xarray (in their k8s cluster), so we do not need a (demo) deployment that works for their datasets with references.

It would just be great if they could keep using TiTiler-Xarray and did not need to build their own separate application for reference support. Or if, that we could describe how that is done.

If I understand you correctly, TiTiler-Xarray will get back support for references, but it is a matter of the deployment to make sure the service has access to the relevant resources.

@dwest77a
Copy link

dwest77a commented Dec 6, 2024

zarr-developers/VirtualiZarr#321 Added this issue for reference with using VirtualiZarr to create Icechunk stores.

There is also an issue converting an existing Kerchunk file to Icechunk using VirtualiZarr due to 'inline' data, where chunks are written as base64 chunks into the file, instead of as a reference. Conversion is not currently possible for files where this is the case - which covers a significant portion of the CEDA kerchunk files.

Copy link
Member

@j08lue j08lue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in terms of dependent projects we've been in touch with - all have been informed of the changes and paths forward should be clear: create your own application like titiler-xarray and add your custom reference file and access control logic. I am sure maintainers here are happy to help - pls use TiTiler Discussions.

@hrodmn
Copy link
Contributor Author

hrodmn commented Jan 9, 2025

Thank you all for your detailed feedback and for discussing the downstream impacts of dropping kerchunk reference support.

I am merging this PR without adding the modifications required to read kerchunk files because we can only implement that feature with limited support (dependent on storage arrangement of the reference json and the actual assets), but users who wish to keep that feature could create a new application that implements the kerchunk capability in a custom Reader.

@hrodmn hrodmn merged commit 8e6d222 into dev Jan 9, 2025
4 checks passed
@hrodmn hrodmn deleted the feature/import-titiler-xarray branch January 9, 2025 13:04
@hrodmn hrodmn mentioned this pull request Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use official titiler.xarray to customize the application
5 participants