Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zarr and Xarray examples to docs #655

Merged
merged 1 commit into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,7 @@ maxdepth: 2
---
how-to-run
basic-array-ops
zarr
xarray
pangeo
```
76 changes: 76 additions & 0 deletions docs/examples/xarray.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
file_format: mystnb
kernelspec:
name: python3
---
# Xarray

Cubed can work with Xarray datasets via the [`cubed-xarray`](https://github.com/cubed-dev/cubed-xarray) package.

Install by running the following:

```shell
pip install cubed cubed-xarray xarray pooch netCDF4
```

Note that `pooch` and `netCDF4` are needed to access the Xarray tutorial datasets that we use in the example below.

## Open dataset

Start by importing Xarray - note that we don't need to import Cubed or `cubed-xarray`, since they will be picked up automatically.

```{code-cell} ipython3
import xarray as xr

xr.set_options(display_expand_attrs=False, display_expand_data=True);
```

We open an Xarray dataset (in netCDF format) using the usual `open_dataset` function. By specifying `chunks={}` we ensure that the dataset is chunked using the on-disk chunking (here it is the netCDF file chunking). The `chunked_array_type` argument specifies which chunked array type to use - Cubed in this case.

```{code-cell} ipython3
ds = xr.tutorial.open_dataset(
"air_temperature", chunked_array_type="cubed", chunks={}
)
ds
```

Notice that the `air` data variable is a `cubed.Array`. Since Cubed has a lazy computation model, this array is not loaded from disk until a computation is run.

## Convert to Zarr

We can use Cubed to convert the dataset to Zarr format by calling `to_zarr` on the dataset:

```{code-cell} ipython3
ds.to_zarr("air_temperature_cubed.zarr", mode="w", consolidated=True);
```

This will run a computation that loads the input data and writes it out to a Zarr store on the local filesystem.

## Compute the mean

We can also use Xarray's API to run computations on the dataset using Cubed. Here we find the mean air temperature over time, for each location:

```{code-cell} ipython3
mean = ds.air.mean("time", skipna=False)
mean
```

To run the computation we need to call `compute`:

```{code-cell} ipython3
mean.compute()
```

This is fine for outputs that fit in memory like the example here, but sometimes we want to write the output of the computation to Zarr, which we do by calling `to_zarr` on the dataset instead of `compute`:

```{code-cell} ipython3
mean.to_zarr("mean_air_temperature.zarr", mode="w", consolidated=True);
```

We can check that the Zarr file was created by loading it from disk using `xarray.open_dataset`:

```{code-cell} ipython3
xr.open_dataset(
"mean_air_temperature.zarr", chunked_array_type="cubed", chunks={}
)
```
65 changes: 65 additions & 0 deletions docs/examples/zarr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
file_format: mystnb
kernelspec:
name: python3
---
# Zarr

Cubed was designed to work seamlessly with Zarr data. The examples below demonstrate using {py:func}`cubed.from_zarr`, {py:func}`cubed.to_zarr` and {py:func}`cubed.store` to read and write Zarr data.

## Write to Zarr

We'll start by creating a small chunked array containing random data in Cubed and writing it to Zarr using {py:func}`cubed.to_zarr`. Note that the call to `to_zarr` executes eagerly.

```{code-cell} ipython3
import cubed
import cubed.random

# 2MB chunks
a = cubed.random.random((5000, 5000), chunks=(500, 500))

# write to Zarr
cubed.to_zarr(a, "a.zarr")
```

## Read from Zarr

We can check that the Zarr file was created by loading it from disk using {py:func}`cubed.from_zarr`:

```{code-cell} ipython3
cubed.from_zarr("a.zarr")
```

## Multiple arrays

To write multiple arrays in a single computation use {py:func}`cubed.store`:

```{code-cell} ipython3
import cubed
import cubed.random

# 2MB chunks
a = cubed.random.random((5000, 5000), chunks=(500, 500))
b = cubed.random.random((5000, 5000), chunks=(500, 500))

# write to Zarr
arrays = [a, b]
paths = ["a.zarr", "b.zarr"]
cubed.store(arrays, paths)
```

Then to read the Zarr files back, we use {py:func}`cubed.from_zarr` for each array and perform whatever array operations we like on them. Only when we call `to_zarr` is the whole computation executed.

```{code-cell} ipython3
import cubed.array_api as xp

# read from Zarr
a = cubed.from_zarr("a.zarr")
b = cubed.from_zarr("b.zarr")

# perform operation
c = xp.add(a, b)

# write to Zarr
cubed.to_zarr(c, store="c.zarr")
```
4 changes: 4 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ tenacity
toolz
tqdm
zarr
cubed-xarray
xarray
pooch
netCDF4

# docs
sphinx-book-theme
Expand Down
Loading