-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic chunking to open_rasterio #2255
Conversation
This uses the automatic chunking in dask 0.18+ to chunk rasterio datasets in a nicely aligned way. Currently this doesn't implement tests due to a difficulty in creating chunked tiff images.
3ccf864
to
ef8f193
Compare
assert actual.chunks[0] == (1, 1, 1) | ||
assert actual.chunks[1] == (256,) * 4 | ||
assert actual.chunks[2] == (256,) * 8 | ||
with xr.open_rasterio(tmp_file, chunks=(3, 'auto', 'auto')) as actual: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (86 > 79 characters)
import os
if not os.path.exists('myfile.tif'):
import requests
response = requests.get('https://oin-hotosm.s3.amazonaws.com/5abae68e65bd8f00110f3e42/0/5abae68e65bd8f00110f3e43.tif')
with open('myfile.tif', 'wb') as f:
f.write(response.content)
import dask
dask.config.set({'array.chunk-size': '1MiB'})
import xarray as xr
ds = xr.open_rasterio('myfile.tif', chunks=True) # this only reads metadata to start
>>> ds.chunks
((1, 1, 1),
(1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 136),
(1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 995)) Also depends on dask/dask#3679 . Without that PR it will use values that are similar, but don't precisely align with 1024. Oh, I should point out that the image has tiles of size (512, 512) |
previous_chunks = tuple((c,) for c in block_shape) | ||
shape = (img.count, img.height, img.width) | ||
dtype = img.dtypes[0] | ||
chunks = dask.array.core.normalize_chunks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expose normalize_chunks()
as a top-level API in dask.array, e.g., dask.array.normalize_chunks
? I'm generally a little nervous about dipping into dask.array.core
.
I have yet to run into a raster that varies dtypes and block shapes across bands. Most of the time, they are single band rasters. And if they are not, they have had the same dtype and block shape. So, I think your assumption is a good one for most use cases. Also, only a single dtype is allowed currently: xarray/xarray/backends/rasterio_.py Lines 39 to 40 in 1d7bcbd
|
One thing I would like to note is that the automatic chunking would be useful if the raster is tiled or not. I tested out a raster that was not tiled, but it still had chunks. This is due to the raster being written in stripes. So, I would recommend removing the restriction to only tiled rasters. Also, to create a tiled raster: import rasterio
import numpy
from affine import Affine
with rasterio.open(
"tiled.tif",
"w",
driver="GTiff",
count=2,
width=1024,
height=1024,
crs="+init=epsg:4326",
transform=Affine(0.0083333333, 0.0, -180.00416666665, 0.0, -0.0083333333, 75.00416666665),
dtype=rasterio.float32,
tiled=True,
blockxsize=512,
blockysize=512,
) as rds:
rds.write((numpy.random.rand(2, 1024, 1024)*10).astype(numpy.float32)) Looks like they have this option in the tests: open_kwargs=dict(
tiled=True,
blockxsize=512,
blockysize=512
)
with create_tmp_geotiff(nx=1024, ny=1024, nz=2, open_kwargs=open_kwargs) as (tmp_file, expected):
.... |
I've abandoned this PR. If anyone has time to pick it up, that would be welcome. I think that it would have positive impact. |
I appreciate you staring this! Based on this PR, I added the feature into |
I'm glad to hear it! I'm curious, are there features in rioxarray that
could be pushed upstream?
…On Wed, Jul 24, 2019 at 8:39 AM Alan D. Snow ***@***.***> wrote:
I've abandoned this PR. If anyone has time to pick it up, that would be
welcome.
I appreciate you staring this! Based on this PR, I added the feature into
rioxarray here: corteva/rioxarray#31
<corteva/rioxarray#31> (released in version
0.0.9).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2255?email_source=notifications&email_token=AACKZTHDF5BOUXGTTZ55M3DQBBZRJA5CNFSM4FHJIU5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WXTEQ#issuecomment-514685330>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTDNYY2CK4WV4FBASGTQBBZRJANCNFSM4FHJIU5A>
.
|
Depends on what the xarray maintainers would like to add. I would definitely like to see the |
Put another way: why don't we put all the logic in rioxarray and make rioxarray an optional dependency of xarray to open rio files? |
That is an option. All of the logic has already been moved over. |
with #4697 and the fact that this seems to have been included in |
This uses the automatic chunking in dask 0.18+ to chunk rasterio
datasets in a nicely aligned way.
Currently this doesn't implement tests due to a difficulty in creating
chunked tiff images.
This also uncovered some inefficiencies in how Dask doesn't align rechunking to existing chunk schemes.
whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)I could use help on how the following: