Skip to content
This repository has been archived by the owner on Jun 30, 2022. It is now read-only.

Return a Dask array when loading Bedmap2 #45

Merged
merged 8 commits into from
Jun 24, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Dependencies
* `xarray <https://xarray.pydata.org/>`__
* `pandas <https://pandas.pydata.org>`__
* `rasterio <https://rasterio.readthedocs.io>`__
* `dask <https://dask.org/>`__

Most of the examples in the :ref:`gallery` also use:

Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies:
- xarray
- pandas
- rasterio
- dask
# Development requirements
- matplotlib
- cmocean
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ pooch>=0.5
xarray
pandas
rasterio
dask
12 changes: 10 additions & 2 deletions rockhound/bedmap2.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
}


def fetch_bedmap2(datasets, *, load=True):
def fetch_bedmap2(datasets, *, load=True, chunks=100, **kwargs):
Copy link
Member Author

@santisoler santisoler May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the default chunks number arbitrarily. Maybe we should increase it to speed up computations by reducing overheads.
Based on Dask Best Practices on arrays, we could assign chunks size of (1000, 1000) taking into account that several datasets may be loaded simultaneously.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@santisoler sounds good to me. We can test what works best for this dataset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"""
Fetch the Bedmap2 datasets for Antarctica.

Expand Down Expand Up @@ -70,6 +70,14 @@ def fetch_bedmap2(datasets, *, load=True):
Wether to load the data into an :class:`xarray.Dataset` or just return the
path to the downloaded data tiff files. If False, will return a list with the
paths to the files corresponding to *datasets*.
chunks : int, tuple or dict
Chunk sizes along each dimension. This argument is passed to the
:func:`xarray.open_rasterio` function in order to obtain
`Dask arrays <https://docs.dask.org/en/latest/array.html>`_ inside the
returned :class:`xarray.Dataset`.
This helps to read the dataset without loading it entirely into memory.
kwargs : dict
santisoler marked this conversation as resolved.
Show resolved Hide resolved
Extra parameters passed to the :func:`xarray.open_rasterio` function.

Returns
-------
Expand All @@ -88,7 +96,7 @@ def fetch_bedmap2(datasets, *, load=True):
return [get_fname(dataset, fnames) for dataset in datasets]
arrays = []
for dataset in datasets:
array = xr.open_rasterio(get_fname(dataset, fnames))
array = xr.open_rasterio(get_fname(dataset, fnames), chunks=chunks, **kwargs)
# Replace no data values with nans
array = array.where(array != array.nodatavals)
# Remove "band" dimension and coordinate
Expand Down