-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagating all dataset metadata #72
Comments
From @stuckyb: we could probably implement format-specific, generic metadata capture methods. E.g., NetCDF, by way of xarray, can attach arbitrary properties to variables. We could capture those as generic metadata. Unit testing would check that generic metadata capture works as expected. |
Notes on
|
@MelanieVeron-USDA Could you help with this issue? We currently are supporting netCDF and GeoTIFF format for rasters. Could you check what metadata entries each format can hold? Then we will see which are automatically read in by |
For both file formats, there does not appear to be a set limit to how many or what kind of global attributes (metadata such as data set title, units, data description, urls to data source and metadata/informational websites, miscellaneous comments, other custom metadata tags, etc.). That said, in general, there are metadata standards that sources/file creators follow (linked below), though the method/package used to add metadata can vary. For GeoTIFF files: Data Curation Network - GeoTIFF Primer - This primer gives a comprehensive summary of the GeoTIFF format and what kinds of metadata it can contain, including minimum/recommended metadata elements. Links to recommended metadata standards (ISO 19139, FDGC, OGC (Open Geospatial Consortium) GeoTIFF Standard v1.1, GeoBlacklight 1.0) are also provided. The packages rioxarray, xarray, and gdal (from the osgeo package) can be used to load GeoTIFF files and explore their metadata in python. There is a varying level of inconsistency in what metadata is included in the file (as defined by the source/author of the file) and what metadata is show (based on the package used to load the data). The general pattern is that rioxarray retains/shows more attributes (metadata) than xarray, and gdal includes extra gdal-specific metadata that are otherwise missed by both rioxarray and xarray (at minimum, the AREA_OR_POINT tag, the meaning of which is briefly explained here). That said, gdal may not include certain metadata that rioxarray includes, so it may be best practice to use both rioxarray and gdal (but not xarray) to extract as much metadata as possible. For NetCDF files: NetCDF Climate and Forecast (CF) Metadata Conventions - This page provides metadata standards for the NetCDF format. The packages rioxarray, xarray, gdal (from the osgeo package), and netCDF4 can be used to load NetCDF files and explore their metadata in python. As with GeoTIFF files, there is a varying level of inconsistency in what metadata is included in the file (as defined by the source/author of the file) and what metadata is show (based on the package used to load the data). Some NetCDF files fail to open with rioxarray (unknown reason why) but do open with xarray. When the file can be opened with rioxarray and xarray, the xarray package is the one to fully include data set- and variable-level attributes/metadata (excluding gdal-specific metadata). The gdal and netCDF4 packages include all metadata, including gdal-specific metadata, from the file. For extracting metadata in python, it may be best to use xarray in conjunction with either gdal or netCDF4. Link to GeoTIFF/NetCDF file metadata exploration python scripts & html outputs (data files not included due to size limit): scripts.zip |
To include in metadata output when the content doesn't fit elsewhere Per #72
I added the first component in 888d8d7. For the second component, it is less straightforward:
|
This may be addressed in the implementation of #34 |
Expand metadata about datasets to include:
The text was updated successfully, but these errors were encountered: