Skip to content
This repository has been archived by the owner on Jun 30, 2022. It is now read-only.

Commit

Permalink
Load Bedmap2 datasets as Dask arrays (#45)
Browse files Browse the repository at this point in the history
Add chunks and kwargs arguments to fetch_bedmap2.
These arguments are passed to xr.open_rasterio() in order to load the
desired dataset as a Dask array.
Help to reduce memory consumption by reading the file in chunks.
  • Loading branch information
santisoler authored Jun 24, 2019
1 parent ee19bf6 commit da79ea6
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 4 deletions.
1 change: 1 addition & 0 deletions doc/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Dependencies
* `xarray <https://xarray.pydata.org/>`__
* `pandas <https://pandas.pydata.org>`__
* `rasterio <https://rasterio.readthedocs.io>`__
* `dask <https://dask.org/>`__

Most of the examples in the :ref:`gallery` also use:

Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies:
- xarray
- pandas
- rasterio
- dask
# Development requirements
- matplotlib
- cmocean
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ pooch>=0.5
xarray
pandas
rasterio
dask
19 changes: 15 additions & 4 deletions rockhound/bedmap2.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
}


def fetch_bedmap2(datasets, *, load=True):
def fetch_bedmap2(datasets, *, load=True, chunks=1000, **kwargs):
"""
Fetch the Bedmap2 datasets for Antarctica.
Expand Down Expand Up @@ -55,8 +55,11 @@ def fetch_bedmap2(datasets, *, load=True):
relative to EIGEN-GL04C geoid (to convert back to WGS84, add this grid)
.. warning ::
Loading a great number of datasets may require a fair amount of memory that
could crash your system. We recommend loading only the needed datasets.
Loading datasets into memory may require a fair amount of memory.
In order to prevent this, the function loads the datasets as Dask arrays if
``chunks`` is not ``None``.
Be careful when doing operations that loads the entire datasets into memory,
like plotting or performing some computations.
.. warning ::
Loading any dataset along with ``thickness_uncertainty_5km`` would modify the
Expand All @@ -70,6 +73,14 @@ def fetch_bedmap2(datasets, *, load=True):
Wether to load the data into an :class:`xarray.Dataset` or just return the
path to the downloaded data tiff files. If False, will return a list with the
paths to the files corresponding to *datasets*.
chunks : int, tuple or dict
Chunk sizes along each dimension. This argument is passed to the
:func:`xarray.open_rasterio` function in order to obtain
`Dask arrays <https://docs.dask.org/en/latest/array.html>`_ inside the
returned :class:`xarray.Dataset`.
This helps to read the dataset without loading it entirely into memory.
**kwargs
Extra parameters passed to the :func:`xarray.open_rasterio` function.
Returns
-------
Expand All @@ -88,7 +99,7 @@ def fetch_bedmap2(datasets, *, load=True):
return [get_fname(dataset, fnames) for dataset in datasets]
arrays = []
for dataset in datasets:
array = xr.open_rasterio(get_fname(dataset, fnames))
array = xr.open_rasterio(get_fname(dataset, fnames), chunks=chunks, **kwargs)
# Replace no data values with nans
array = array.where(array != array.nodatavals)
# Remove "band" dimension and coordinate
Expand Down

0 comments on commit da79ea6

Please sign in to comment.