-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff? #2042
Comments
We have been adding new attributes like this recently (#1583 and #1740), so I don't see much trouble in adding a few more. Note that the rasterio object is available via the (undocumented) As for the |
On Apr 8 2018 12:45 PM, Fabien Maussion wrote:
> if the profile and tags were propagated through open_rasterio, then
> the second open would not be necessary and would be generally useful.
We have been adding new attributes like this recently
(#1583 and
#1740), so I don't see much
trouble in adding a few more. Note that the rasterio object is
available via the (undocumented) ``_file_obj`` attribute. So a quick
workaround for you in the mean time would be to access the info you
need directly via this object.
As for the ``to_rasterio`` method, I'm currently against it. I'm
already starting to think that these kind of domain specific tools
should exist in dedicated projects, not in the main xarray codebase.
For rasterio in particular, it turns out that the geotiff/GDAL data
model is fairly different from the xarray/NetCDF model. The rasterio
folks have also shown only limited interest in our endeavor
(rasterio/rasterio#920), which is
understandable. I don't have a strong opinion though, and I am
curious
if the @pydata/xarray crew sees it differently.
I do not care about the ``to_rasterio`` but I do care about a
''to_tiff'' (even if I have to do all the geospatial stuff outside of
xarray as long as I can output the image data portion of the tiff via
xarray). I also do not overly care if the xarray interface is
significantly different from the rasterio/GDAL API (however someone will
have to document the differences so that it does not continually trip
people like me up -- I should be able to help a little with this once it
gets working). Basically however it is handled I have to be able to
read a GeoTIFF, process, and write back out to a GeoTIFF. If xarray has
no way to output tiffs then I cannot use xarray.
|
On Apr 8 2018 11:54 AM, Schlump wrote:
https://github.com/robintw/XArrayAndRasterio/blob/master/rasterio_to_xarray.py
Ahhh... Now I understand Fabien Maussion'd comment about to_rasterio.
I read the posts out of order. I will see if this does the job for me
(I will likely have to extend it a little, but I think this is a great
start).
EBo --
|
Yes sorry, I meant
I'm not saying it shouldn't exist, I'm just asking whether it should be in the xarray codebase or elsewhere. If you'd like to parse new attributes when opening the geotiff file this could be added easily. PRs are welcome! |
👍 |
On Apr 9 2018 9:22 AM, Ryan Abernathey wrote:
> I'm already starting to think that these kind of domain specific
> tools should exist in dedicated projects, not in the main xarray
> codebase.
👍
I am perfectly fine with that stance, but I also think it is also
reasonable to ask/expect that if you provide a reader for some format
that you also provide writers for them -- or at least document that you
will not and why. Almost all of my current work is in geotiff format.
I have no choice, and many other people working in the geospatial domain
will be hamstrung without it. Sitting down the pipeline from me is 1/2
million archived images (there are actually closer to 2 million images,
but only 1/2 to 1 is associated with out current projects, and comprise
several petabytes of data).
I really need to know what xarray can and is planning to do with tiff's
so that I can not only use them but also document stuff for a dozen or
more of my coworkers (heck the next time we run the Python Bootcamp I
would probably offer to teach this). If you plan not to support it then
fine. I will not spend any more time with xarrays and focus on
dask.arrays or anything else that will work.
My question to you now is if supporting basic tiff I/O is in scope. If
so I can deal with all the rest of the rasterio/geospatial stuff outside
of xarray.
I will start fleshing out the stuff that Matthew Rocklin and Schlump
have provided.
|
On Apr 9 2018 12:49 AM, Fabien Maussion wrote:
> I do not care about the ``to_rasterio`` but I do care about a
> ''to_tiff''
Yes sorry, I meant ``to_tiff``
ah... got it.
> If xarray has no way to output tiffs then I cannot use xarray.
I'm not saying it shouldn't exist, I'm just asking whether it should
be in the xarray codebase or elsewhere.
fair enough. I just need them to play well enough together that I can
read, process, and write a chunk/window at a time (whether that is with
a simple xr.compute() or something else).
If you'd like to parse new attributes when opening the geotiff file
this could be added easily. PRs are welcome!
what is a PR? Did you mean functionality request?
I'm still not clear where dask.array, xarray, rasterio, and pangeo
begin and end. I think I have posted an issue about extending the
metatdata/tags some place, but I am sure it is not as clear as it should
be, and for the life of me I am not sure where I posted that.
|
@ebo, we are very glad to hear your input about how you might use xarray together with geotiff data. The majority of xarray developers are coming from a netCDF background, so this is somewhat new territory for us. It sounds like you have a real need for the computational tools that xarray provides. Engaging the geotiff community could potentially be very advantageous for xarray, since it could bring lots of new users. On the other hand, there are already lots of powerful tools in the geotiff space, and we have limited resources (i.e. time), so we need to be a bit conservative. It would probably be useful to clarify how the decision making process works on things like this for open source projects. There is no xarray master plan that can provide a simple answer to your question of "what xarray is planning to do with tiffs". The main questions that have to be answered when deciding whether to add a big new feature are
The first item, regarding "scope," is being addressed now via this discussion. What are the pros and cons of attempting to add the new feature? Different people will have different opinions. Let's hear them out. A key question, as identified by @fmaussion, is whether the geotiff data model is compatible enough with the xarray data model enough to provide a full-featured writeable backend. In other words, can I write any arbitrary xarray dataset to geotiff and then read it back, with no loss of information. If the answer is "no," then it will be hard to convince the xarray community that geotiff is a suitable candidate for a backend. If you feel strongly that we need the ability to not only directly read (as we can already with The second item (time) is a rather strong constraint: xarray is a volunteer effort. There are currently 369 open issues in xarray. Which ones should be the top priority? Will attempting to add a new feature lead to much more work down the line, in the form of unforeseen bugs? Ultimately, what happens in xarray is determined by the needs of the xarray developers themselves, who use xarray heavily in their daily science work. This may sound exclusive, but it is the opposite, because anyone can become an xarray developer. The reason we can read geotiffs today is because, one year ago, @fmaussion rolled up his sleeves and wrote the rasterio backend (#1260). That little number 1260 is a link to a merged pull request (aka "PR"). A PR is much more powerful than a feature request; it is an actual implementation of the feature someone wishes to see in xarray. Anyone is free to make a PR to xarray, although before doing so, it is good to discuss the possible new feature via the issue tracker, as described in the xarray contributing guide. As a full time programmer in a lab dealing with geospatial data, you yourself are already a prime candidate to implement your desired feature! 😉 As an example of how a new backend was incorporated into xarray, you can refer to #1905, in which @barronh implemented a backend for "pseudo-netCDF" a file format used by his research group. Skimming through that discussion will give you a good idea of some of the questions that arise in implementing new backend functionality. Apologies for the long digression into open-source politics. I thought it would be useful to clarify these things. |
On Apr 9 2018 11:43 AM, Ryan Abernathey wrote:
> I really need to know what xarray can and is planning to do with
> tiff's
> so that I can not only use them but also document stuff for a dozen
> or
> more of my coworkers
@ebo, we are very glad to hear your input about how you might use
xarray together with geotiff data. The majority of xarray developers
are coming from a netCDF background, so this is somewhat new
territory
for us. It sounds like you have a real need for the computational
tools that xarray provides. Engaging the geotiff community could
potentially be very advantageous for xarray, since it could bring
lots
of new users. On the other hand, there are already lots of powerful
tools in the geotiff space, and we have limited resources (i.e.
time),
so we need to be a bit conservative.
It would probably be useful to clarify how the decision making
process works on things like this for open source projects. There is
no xarray master plan that can provide a simple answer to your
question of "what xarray is planning to do with tiffs". The main
questions that have to be answered when deciding whether to add a big
new feature are
- Does this feature make sense within the "scope" of the project?
(Can be difficult to answer--much discussion is usually required.)
- Do the xarray developers have the time and expertise to implement
and support such a feature?
The first item, regarding "scope," is being addressed now via this
discussion. What are the pros and cons of attempting to add the new
feature? Different people will have different opinions. Let's hear
them out. A key question, as identified by @fmaussion, is whether the
geotiff data model is compatible enough with the xarray data model
enough to provide a full-featured writeable backend. In other words,
can I write any arbitrary xarray dataset to geotiff and then read it
back, with no loss of information. If the answer is "no," then it
will
be hard to convince the xarray community that geotiff is a suitable
candidate for a backend.
If you feel strongly that we need the ability to not only directly
read (as we can already with `open_rasterio`) but also directly
*write* geotiff, you should lay out your arguments persuasively,
taking into account not only the immediate impacts on your personal
project but the impact on xarray as a whole. There may be good ways
to
achieve what you want without making any changes to xarray, i.e. by
creating a small standalone package to transform geotiff to / from
xarray (as in @Schlump's example); that option needs to be considered
seriously.
The second item (time) is a rather strong constraint: xarray is a
volunteer effort. There are currently [369 open
issues](https://github.com/pydata/xarray/issues) in xarray. Which
ones
should be the top priority? Will attempting to add a new feature lead
to much more work down the line, in the form of unforeseen bugs?
Ultimately, what happens in xarray is determined by the needs of the
xarray developers themselves, who use xarray heavily in their daily
science work. This may sound exclusive, but it is the opposite,
because **anyone can become an xarray developer**. The reason we can
read geotiffs today is because, one year ago, @fmaussion rolled up
his
sleeves and wrote the rasterio backend (#1260).
That little number 1260 is a link to a merged [pull
request](https://help.github.com/articles/about-pull-requests/) (aka
"PR"). A PR is much more powerful than a feature request; it is an
actual implementation of the feature someone wishes to see in xarray.
Anyone is free to make a PR to xarray, although before doing so, it
is
good to discuss the possible new feature via the issue tracker, as
described in the [xarray contributing
guide](http://xarray.pydata.org/en/stable/contributing.html). As a
full time programmer in a lab dealing with geospatial data, **you
yourself** are already a prime candidate to implement your desired
feature! 😉
As an example of how a new backend was incorporated into xarray, you
can refer to #1905, in which @barronh implemented a backend for
"pseudo-netCDF" a file format used by his research group. Skimming
through that discussion will give you a good idea of some of the
questions that arise in implementing new backend functionality.
Apologies for the long digression into open-source politics. I
thought it would be useful to clarify these things.
No need to apologize about a long digression into open-source politics,
and I fully understand and smack in the middle of that with at least 4
different projects. I also know about issue/commit numbers on
github/bitbucket/redmine/etc. NASA has formal rules about what can be
released and when. My last open-source project took 9 months to get the
software release authorized, but that was for an entire project new
code. For basic image I/O support I would not expect any problems, but
I have to get permission before releasing anything beyond snippets and
examples that do not include primary workflows. I will release as much
as I can back in the the public domain, but this starts to get
complicated as the scope grows.
I do not remember seeing anyone use the acronym PR for "pull request"
before, so sorry for that confusion. I just could not guess it in the
context.
The argument for providing basic functionality for GeoTIFF's and
geotiffs, is that it is a common dataset used along side NetCDF and HDF.
I can, if you need me to, try to track down a stack of sites which
provide images in GeoTIFF's such as NASA's Giovanni, Digital Globe,
Planet Labs, just to name a couple off the top of my head. How many
folks here work in and around GIS folks?
I will have to post back later (probably to several separate issues) to
address several of the pointers raised above, but I have to get on to
fleshing some of this out.
|
When writing #2093 I came across this issue and thought I'd weigh in. The GIS community seems like a fairly close neighbor to XArray's current community. Some API compatibility here might be a good to expand the community. I definitely agree that GeoTiff does not implement the full XArray model, but it might be useful to support the subset of datasets that do, just so that round-trip operations can occur. For example, it might be nice if the following worked: dset = xr.open_rasterio(...)
# do modest modifications to dest
dset.to_rasterio(...) My hope would be that the rasterio/GDAL data model would be consistent enough so that we could detect and err early if the dataset was not well-formed. |
I agree, it would be nice to have the This is a bit outside my area of expertise but I imagine it will be useful to see a prototype, that perhaps only supports a few rasterio file formats, before diving into the xarray backends to implement this full bore. |
If you're able to expand on this that would be welcome.
My hope would be that rasterio/GDAL would handle the many-file-format issue for us if they support writing in chunks. I also lack experience here though. |
My first attempt would be to use this API: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html#writing |
So far as I have run into open_rasterio takes care of most things out of the box. Besides how to deal with chunks, there is also how to deal with several types of metadata:
Whether xarray/open_rasterio uses the same interface or not, there will be a need to deal with file metadata and per-band metadata. |
@mrocklin it was the windowed-rw example that prompted a number of my early questions about dask.array and xarray equivalents. Maybe someting along the lines of the following would also be helpful: |
@mrocklin gdal can read/write windows:
from: https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html Also see BandReadAsArray and BandWriteAsArray in http://gdal.org/python/osgeo.gdal_array-module.html (which appear to be a read/write gdal.Band.ReadAsArray method and gdal.Band.WriteArray method respectively). But there are some got'yas there in that GDAL as far as I recall is not thread safe. I wonder how you got that to work other than setting up a slave read process that handles all reads. |
I'm aware. See this doc listed above for rasterio: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html#writing Background here is that rasterio more-or-less wraps around GDAL, but with interfaces that are somewhat more idiomatic to this community.
We've run into these issues before as well. Typically we handle them with locks of various types. |
When I poked at this I could not figure out how to keep the internal cached states separate. That may have been because the processing loop was opening many different images, and not just one. I'm glad you found a way. |
I'd like to add to this discussion the issue I brought up here #2288. It is something that could/should probably result in a new xarray add-on package for doing these type of operations. For example, I work on the pyresample and satpy projects. Pyresample uses its own "AreaDefinition" objects to define the geolocation/projection information. SatPy uses these AreaDefinitions by setting Edit: by "add-on" I mean something like "geoxarray" where it is an optional dependency for a user that depends completely on xarray. |
On Jul 20 2018 12:57 PM, David Hoese wrote:
I'd like to add to this discussion the issue I brought up here #2288.
It is something that could/should probably result in a new xarray
add-on package for doing these type of operations. For example, I
work
on the pyresample and satpy projects. Pyresample uses its own
"AreaDefinition" objects to define the geolocation/projection
information. SatPy uses these AreaDefinitions by setting
`DataArray.attrs['area']` and using then when necessary. This
includes
the ability to write geotiffs using rasterio and a custom array-like
class for writing dask chunks to the geotiff between separate threads
(does not work multiprocess, yet).
I would love to see these additions (or some recipies on how to do it
as xarray stands). As a note, I figured out a rather simple way using
with rasterio.open(...,'w',**profile) to effect the write. That might
help in the short to medium term.
I am also interested in looking at your Pyresample and well as
something similar to the morphological operators (in this context
specifically measure).
Best of success!
|
Did someone advanced in the direction of a We're begining experimentations on Dask/Xarray/GeoTiff analysis at @CNES. Xarray/Dask is really useful and promising for temporal stack analysis on a given area, but still a bit out of the box with rasterio. |
@guillaumeeb Not that I know of but I'm not completely in the loop with xarray. There is the geoxarray project that I started (https://github.com/geoxarray/geoxarray) but really haven't had any time to work on it. Otherwise you could look at the satpy library or its dependency library trollimage which uses rasterio but it assumes some things about how data is structured including an |
I work with geotiff all the time. A separate to_tiff is not needed.
The trick is that there are two separate sections/areas where the
metadata is stored. You will know where/how to store that information.
I do not have access to any of that code at the moment. If you cannot
find the examples I will try to hack an example or three once I get back
to work.
…On Jan 22 2019 7:05 AM, David Hoese wrote:
@guillaumeeb Not that I know of but I'm not completely in the loop
with xarray. There is the geoxarray project that I started
(https://github.com/geoxarray/geoxarray) but really haven't had any
time to work on it. Otherwise you could look at the [satpy
library](https://satpy.readthedocs.io/en/latest/) or its dependency
library
[trollimage](https://trollimage.readthedocs.io/en/latest/xrimage.html)
which uses rasterio but it assumes some things about how data is
structured including an `'area'` in `.attrs` from
[pyresample](https://pyresample.readthedocs.io/en/latest/). Sorry I
don't have a better idea.
|
Hi @guillaumeeb <https://github.com/guillaumeeb> ,
I have also created geotiff files from xarray using rasterio. I was working
in with a a to_tiff method adapted to my workflow (
https://github.com/alexsalr/ciat_monitor_crops/blob/master/b_Temporal_Stack/xr_eotemp.py
), based in these methods https://github.com/robintw/XArrayAndRasterio.
This was with rasterio prior to 1.0, so I don't know if the new version
changes the behaviour.
|
Here is an old chunk of code I wrote awhile back to do this. Please
note three things. There is the metadata attached to the file (I think
it was through "tags"), metadata attached to the metadata "meta"
variable, and some metadata that is attached on a per-band basis. It
can be problematic when you assume that the info is global to the image
and is embedded somehow (it took me weeks to figure some of this out).
Also note that I do per-band and image statistics... Also, I did not
keep good enough notes and cannot remember where I got some of the hints
and are just as likely to come from published examples that have been
hacked to marginally work. Also, the .xml weirdness has to do in part
with historic artifacts of our particular dataset that is over 3.5
petabytes, and cannot easily be updated, and is easier to hack in the
code.
Hope this helps:
=====================
def to_tiff(data, fname, template=None, **kwargs):
import numpy as np
# check and promote the number of dimentio(1)ns for consistency
nbands = data.ndim
if 2 == nbands:
# expand the array so that it is least 3D (ie stacks of
surfaces)
import numpy as np
data = np.expand_dims(data,axis=0)
elif 3 != nbands: # nothing to do if it is already 3D
print("Error: to_tiff can only currently deal with 2D and 3D
data")
return
profile = {}
tags = {}
tmpl = None
if template:
tmpl = rasterio.open(template,'r')
profile = tmpl.profile.copy()
tags = tmpl.tags()
# the metadata should be appended. Cache here to
# simplify variable replacement below.
meta = {}
if 'meta' in profile:
meta.update(profile['meta'])
if 'meta' in kwargs:
meta.update(kwargs['meta'])
# overwrite anything inheritied from the template with
# user supplied args
profile.update(kwargs)
# overwrite bits that write the array as geotiff and
# save the cached metadata
profile['driver'] = 'GTiff'
profile['count'] = data.shape[0]
profile['width'] = data.shape[2]
profile['height'] = data.shape[1]
profile['meta'] = meta
if 'dtype' not in profile:
profile['dtype'] = type(data[0,0,0])
# if you do not remove the previously associated .xml file,
# then the tags and metadata can get corrupted.
try:
os.remove(fname)
os.remove(fname+".xml")
except:
pass
# now create and save the array to a file
with rasterio.open(fname,'w',**profile) as out:
for b in range(data.shape[0]):
#print("\nprocessing band %d"%(b+1))
out.write(data[b].astype(profile['dtype']), b+1)
# caluclate the stats for each band
# not sure what the proper name for per band stats is in
QGIS
stats = {
'STATISTICS_MINIMUM': np.nanmin(data[b]),
'STATISTICS_MAXIMUM': np.nanmax(data[b]),
'STATISTICS_MEAN': np.nanmean(data[b]),
'STATISTICS_STDDEV': np.nanstd(data[b])}
out.update_tags(b+1,**stats)
#print(" stats= %s"%str(stats))
# now calculate the stats across all the bands
stats = {
'STATISTICS_MINIMUM': np.nanmin(data),
'STATISTICS_MAXIMUM': np.nanmax(data),
'STATISTICS_MEAN': np.nanmean(data),
'STATISTICS_STDDEV': np.nanstd(data)}
out.update_tags(**tags)
if 'tags' in kwargs:
out.update_tags(**kwargs['tags'])
out.update_tags(**stats)
#print("\n overall stats= %s\n"%str(stats))
del tmpl
…
|
A new project called rioxarray has a You can use it like so:
It currently only supports 2d/3d |
Hi! I just wrote this tiny tifffile wrapper for my own purposes, with support for xarray: https://pypi.org/project/xtiff. Not properly tested yet, but happy to take issues/pull requests (e.g. for additional write modes). Also, feel free to integrate it into xarray. The current version is xarray-agnostic, that's why I wrote it as an independent package. |
I think rioxarray is now the recommended solution. |
Thanks for closing this dcherian, I had completely forgotten about it. |
Matthew Rocklin wrote a gist https://gist.github.com/mrocklin/3df315e93d4bdeccf76db93caca2a9bd to demonstrate using XArray to read tiled GeoTIFF datasets, but I am still confused as to how to write them to a GeoTIFF. I can easily create a tiff with "rasterio.open(out, 'w', **src.profile)", but the following does not seem like the best/cleanest way to do this:
Also, if the profile and tags were propagated through open_rasterio, then the second open would not be necessary and would be generally useful.
The text was updated successfully, but these errors were encountered: