Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff? #2042

Closed
ebo opened this issue Apr 7, 2018 · 31 comments

Comments

@ebo
Copy link

ebo commented Apr 7, 2018

Matthew Rocklin wrote a gist https://gist.github.com/mrocklin/3df315e93d4bdeccf76db93caca2a9bd to demonstrate using XArray to read tiled GeoTIFF datasets, but I am still confused as to how to write them to a GeoTIFF. I can easily create a tiff with "rasterio.open(out, 'w', **src.profile)", but the following does not seem like the best/cleanest way to do this:

ds = xr.open_rasterio('myfile.tif',  chunks={'band': 1, 'x': 2048, 'y': 2048})
with rasterio.open('myfile.tif', 'r') as src:
    with rasterio.open('new_myfile.tif', 'w', **src.profile) as dst:
            for i in range(1, src.count + 1):
                dst.write(ds.variable.data[i-1].compute(), i)

Also, if the profile and tags were propagated through open_rasterio, then the second open would not be necessary and would be generally useful.

@Schlump
Copy link

Schlump commented Apr 8, 2018

https://github.com/robintw/XArrayAndRasterio/blob/master/rasterio_to_xarray.py

@fmaussion
Copy link
Member

if the profile and tags were propagated through open_rasterio, then the second open would not be necessary and would be generally useful.

We have been adding new attributes like this recently (#1583 and #1740), so I don't see much trouble in adding a few more. Note that the rasterio object is available via the (undocumented) _file_obj attribute. So a quick workaround for you in the mean time would be to access the info you need directly via this object.

As for the to_rasterio method, I'm currently against it. I'm already starting to think that these kind of domain specific tools should exist in dedicated projects, not in the main xarray codebase. For rasterio in particular, it turns out that the geotiff/GDAL data model is fairly different from the xarray/NetCDF model. The rasterio folks have also shown only limited interest in our endeavor (rasterio/rasterio#920), which is understandable. I don't have a strong opinion though, and I am curious if the @pydata/xarray crew sees it differently.

@ebo
Copy link
Author

ebo commented Apr 8, 2018 via email

@ebo
Copy link
Author

ebo commented Apr 9, 2018 via email

@fmaussion
Copy link
Member

I do not care about the to_rasterio but I do care about a ''to_tiff''

Yes sorry, I meant to_tiff

If xarray has no way to output tiffs then I cannot use xarray.

I'm not saying it shouldn't exist, I'm just asking whether it should be in the xarray codebase or elsewhere.

If you'd like to parse new attributes when opening the geotiff file this could be added easily. PRs are welcome!

@rabernat
Copy link
Contributor

rabernat commented Apr 9, 2018

I'm already starting to think that these kind of domain specific tools should exist in dedicated projects, not in the main xarray codebase.

👍

@ebo
Copy link
Author

ebo commented Apr 9, 2018 via email

@ebo
Copy link
Author

ebo commented Apr 9, 2018 via email

@rabernat
Copy link
Contributor

rabernat commented Apr 9, 2018

I really need to know what xarray can and is planning to do with tiff's
so that I can not only use them but also document stuff for a dozen or
more of my coworkers

@ebo, we are very glad to hear your input about how you might use xarray together with geotiff data. The majority of xarray developers are coming from a netCDF background, so this is somewhat new territory for us. It sounds like you have a real need for the computational tools that xarray provides. Engaging the geotiff community could potentially be very advantageous for xarray, since it could bring lots of new users. On the other hand, there are already lots of powerful tools in the geotiff space, and we have limited resources (i.e. time), so we need to be a bit conservative.

It would probably be useful to clarify how the decision making process works on things like this for open source projects. There is no xarray master plan that can provide a simple answer to your question of "what xarray is planning to do with tiffs". The main questions that have to be answered when deciding whether to add a big new feature are

  • Does this feature make sense within the "scope" of the project? (Can be difficult to answer--much discussion is usually required.)
  • Do the xarray developers have the time and expertise to implement and support such a feature?

The first item, regarding "scope," is being addressed now via this discussion. What are the pros and cons of attempting to add the new feature? Different people will have different opinions. Let's hear them out. A key question, as identified by @fmaussion, is whether the geotiff data model is compatible enough with the xarray data model enough to provide a full-featured writeable backend. In other words, can I write any arbitrary xarray dataset to geotiff and then read it back, with no loss of information. If the answer is "no," then it will be hard to convince the xarray community that geotiff is a suitable candidate for a backend.

If you feel strongly that we need the ability to not only directly read (as we can already with open_rasterio) but also directly write geotiff, you should lay out your arguments persuasively, taking into account not only the immediate impacts on your personal project but the impact on xarray as a whole. There may be good ways to achieve what you want without making any changes to xarray, i.e. by creating a small standalone package to transform geotiff to / from xarray (as in @Schlump's example); that option needs to be considered seriously.

The second item (time) is a rather strong constraint: xarray is a volunteer effort. There are currently 369 open issues in xarray. Which ones should be the top priority? Will attempting to add a new feature lead to much more work down the line, in the form of unforeseen bugs?

Ultimately, what happens in xarray is determined by the needs of the xarray developers themselves, who use xarray heavily in their daily science work. This may sound exclusive, but it is the opposite, because anyone can become an xarray developer. The reason we can read geotiffs today is because, one year ago, @fmaussion rolled up his sleeves and wrote the rasterio backend (#1260).

That little number 1260 is a link to a merged pull request (aka "PR"). A PR is much more powerful than a feature request; it is an actual implementation of the feature someone wishes to see in xarray. Anyone is free to make a PR to xarray, although before doing so, it is good to discuss the possible new feature via the issue tracker, as described in the xarray contributing guide. As a full time programmer in a lab dealing with geospatial data, you yourself are already a prime candidate to implement your desired feature! 😉

As an example of how a new backend was incorporated into xarray, you can refer to #1905, in which @barronh implemented a backend for "pseudo-netCDF" a file format used by his research group. Skimming through that discussion will give you a good idea of some of the questions that arise in implementing new backend functionality.

Apologies for the long digression into open-source politics. I thought it would be useful to clarify these things.

@ebo
Copy link
Author

ebo commented Apr 9, 2018 via email

@mrocklin
Copy link
Contributor

When writing #2093 I came across this issue and thought I'd weigh in.

The GIS community seems like a fairly close neighbor to XArray's current community. Some API compatibility here might be a good to expand the community. I definitely agree that GeoTiff does not implement the full XArray model, but it might be useful to support the subset of datasets that do, just so that round-trip operations can occur. For example, it might be nice if the following worked:

dset = xr.open_rasterio(...)
# do modest modifications to dest
dset.to_rasterio(...)

My hope would be that the rasterio/GDAL data model would be consistent enough so that we could detect and err early if the dataset was not well-formed.

@jhamman
Copy link
Member

jhamman commented Apr 30, 2018

I agree, it would be nice to have the to_rasterio functionality. My impression is that there are will be some (significant) development challenges, particularly related to rasterio's support for many file formats, but those can probably be sorted out by a committed developer or by partnering with a rasterio developer.

This is a bit outside my area of expertise but I imagine it will be useful to see a prototype, that perhaps only supports a few rasterio file formats, before diving into the xarray backends to implement this full bore.

@mrocklin
Copy link
Contributor

My impression is that there are will be some (significant) development challenges

If you're able to expand on this that would be welcome.

that perhaps only supports a few rasterio file formats

My hope would be that rasterio/GDAL would handle the many-file-format issue for us if they support writing in chunks. I also lack experience here though.

@mrocklin
Copy link
Contributor

My first attempt would be to use this API: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html#writing

@ebo
Copy link
Author

ebo commented Apr 30, 2018

So far as I have run into open_rasterio takes care of most things out of the box. Besides how to deal with chunks, there is also how to deal with several types of metadata:

  • the regular metadata which rasterio access by either the meta or profile variables.

  • user defined metadata dictionary which rasterio use 'tags()'

  • per band metadata dictionary which rasterio uses 'tags(band)'

Whether xarray/open_rasterio uses the same interface or not, there will be a need to deal with file metadata and per-band metadata.

@ebo
Copy link
Author

ebo commented Apr 30, 2018

@mrocklin it was the windowed-rw example that prompted a number of my early questions about dask.array and xarray equivalents. Maybe someting along the lines of the following would also be helpful:

https://gis.stackexchange.com/questions/158527/is-it-possible-to-read-raster-files-by-block-with-rasterio/158528#158528

@ebo
Copy link
Author

ebo commented Apr 30, 2018

@mrocklin gdal can read/write windows:

# Read raster as arrays
banddataraster = raster.GetRasterBand(1)
dataraster = banddataraster.ReadAsArray(xoff, yoff, xcount, ycount).astype(numpy.float)

from: https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html

Also see BandReadAsArray and BandWriteAsArray in http://gdal.org/python/osgeo.gdal_array-module.html (which appear to be a read/write gdal.Band.ReadAsArray method and gdal.Band.WriteArray method respectively).

But there are some got'yas there in that GDAL as far as I recall is not thread safe. I wonder how you got that to work other than setting up a slave read process that handles all reads.

@mrocklin
Copy link
Contributor

gdal can read/write windows:

I'm aware. See this doc listed above for rasterio: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html#writing

Background here is that rasterio more-or-less wraps around GDAL, but with interfaces that are somewhat more idiomatic to this community.

I wonder how you got that to work other than setting up a slave read process that handles all reads.

We've run into these issues before as well. Typically we handle them with locks of various types.

@ebo
Copy link
Author

ebo commented Apr 30, 2018

When I poked at this I could not figure out how to keep the internal cached states separate. That may have been because the processing loop was opening many different images, and not just one. I'm glad you found a way.

@djhoese
Copy link
Contributor

djhoese commented Jul 20, 2018

I'd like to add to this discussion the issue I brought up here #2288. It is something that could/should probably result in a new xarray add-on package for doing these type of operations. For example, I work on the pyresample and satpy projects. Pyresample uses its own "AreaDefinition" objects to define the geolocation/projection information. SatPy uses these AreaDefinitions by setting DataArray.attrs['area'] and using then when necessary. This includes the ability to write geotiffs using rasterio and a custom array-like class for writing dask chunks to the geotiff between separate threads (does not work multiprocess, yet).

Edit: by "add-on" I mean something like "geoxarray" where it is an optional dependency for a user that depends completely on xarray.

@ebo
Copy link
Author

ebo commented Jul 20, 2018 via email

@guillaumeeb
Copy link

Did someone advanced in the direction of a to_rasterio or to_tiff implementation in Xarray? Or in a geo-xarray?

We're begining experimentations on Dask/Xarray/GeoTiff analysis at @CNES. Xarray/Dask is really useful and promising for temporal stack analysis on a given area, but still a bit out of the box with rasterio.

@djhoese
Copy link
Contributor

djhoese commented Jan 22, 2019

@guillaumeeb Not that I know of but I'm not completely in the loop with xarray. There is the geoxarray project that I started (https://github.com/geoxarray/geoxarray) but really haven't had any time to work on it. Otherwise you could look at the satpy library or its dependency library trollimage which uses rasterio but it assumes some things about how data is structured including an 'area' in .attrs from pyresample. Sorry I don't have a better idea.

@ebo
Copy link
Author

ebo commented Jan 22, 2019 via email

@guillaumeeb
Copy link

Thanks @djhoese @ebo.

@ebo if you have some examples, that would be really cool!

@alexsalr
Copy link

alexsalr commented Jan 23, 2019 via email

@ebo
Copy link
Author

ebo commented Jan 25, 2019 via email

@snowman2
Copy link
Contributor

A new project called rioxarray has a to_raster method with the default driver of GTiff.

You can use it like so:

import rioxarray
import xarray

xds = xarray.open_rasterio("myfile.tif")
wgs84_xds = xds.rio.reproject("EPSG:4326")
wgs84_xds.rio.to_raster("myfile_wgs84.tif")

It currently only supports 2d/3d DataArrays. So, you would have to iterate over your variables to export each one to a raster if you have a Dataset.

@jwindhager
Copy link

Hi! I just wrote this tiny tifffile wrapper for my own purposes, with support for xarray: https://pypi.org/project/xtiff. Not properly tested yet, but happy to take issues/pull requests (e.g. for additional write modes). Also, feel free to integrate it into xarray. The current version is xarray-agnostic, that's why I wrote it as an independent package.

@dcherian
Copy link
Contributor

dcherian commented Apr 9, 2022

I think rioxarray is now the recommended solution.

@dcherian dcherian closed this as completed Apr 9, 2022
@ebo
Copy link
Author

ebo commented Apr 9, 2022

Thanks for closing this dcherian, I had completely forgotten about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests