-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract metadata from NetCDF and HDF5 files as XML in NcML format #9153
Comments
Just a quick update that I have some uncommitted code locally that extracts an NcML file (XML) from a NetCDF file and saves it as an auxiliary file: The code is very hacky and probably doesn't work with S3. I'm been chatting with @landreev about the best place to put it. Probably a dedicated method the gets called right after |
Discussion/Notes Part of what's left is deciding if it's a goal to support direct upload. When you don't do direct upload, there is a temporary file. Two PRs will come out of this, one for viewer, one for dataverse. This is hard coded right now. It doesn't use a queue such as JMS (which we use for file ingest). Some installations will care about these formats some will. Is this the beginning of a new class of plugins? Size for this sprint was a 33. TODO: Phil put a sentence or two into laying out better than what mike did the scope of this work. |
added to sprint Dec 15, 2022 |
I demo'ed this during Tuesday's community call but here's an updated screenshot. The main difference is that I'm putting NcML in its own category in the UI ("XML from NetCDF/HDF5 (NcML)") instead of "Other": @mreekie captured a lot of the discussion from standup this morning (thanks!). I'll just add a bit about scope:
Here are a couple screenshots of the poor UX if you enable a NcML preview tool when the NetCDF or HDF5 file is existing but hasn't had a chance to be processed (have the NcML extracted) or if the file cannot be processed (such as certain HDF5 files). You think you'll see a preview...... but you don't because the NcML file doesn't exist (sad trombone) |
Some quick hacking and thoughts on the idea of extending the external tools framework so that tools can say if they need an aux file. (@landreev also just suggested maybe a custom mimetype like application/netcdf+auxfile or something)
|
The use case is an external tool that operates on aux files pulled out of NetCDF/HDF5 files.
I fixed this in 9edaf59 by adding a new "requirements" option for external tools. First, let's look at how the eyeball is hidden when the HDF5 file can't be parsed: For the good HDF5, we still see the preview: Here's how I documented the "requirements" option: I'll go ahead and make a pull request so I can get some feedback. I have the other PR on the previewers side to work on so I'll leave this issue assigned to me rather than taking it off the board. |
I just created a pull request to add an NcML previewer: I can't put that PR on our project board because it isn't under IQSS so I'll put this issue in "ready for review". Update: I'm taking this issue off the board (like usual, the main PR will close it). I just created this issue to track (on the board) the previewer PR: |
"Use a version like '4.11.0.1' in the example above where the previously released version was 4.11" -- dev guide That is, these scripts should have been 5.12.1.whatever since the last release was 5.12.1. Fixing. (They were 5.13.whatever.)
Assuming PR #9152 is merged we'll have a library in place to start extracting XML from NetCDF and HDF5 files.
The supported XML format is called NcML and is described here: https://docs.unidata.ucar.edu/netcdf-java/current/userguide/ncml_overview.html
Yesterday there was general agreement among devs that it would be fine to save the XML as a derivative or aux file.
This will open the door for previewing the file as raw XML to start.
Additionally, we could work on created a dedicated previewer that shows the data in a nicer way than raw XML.
The code we write will look something like this:
Here's the output for an HDF5 file at
src/test/resources/hdf/hdf5/vlen_string_dset
(from the PR above):Here's part of the output for a NetCDF file at
src/test/resources/netcdf/madis-raob.nc
(also from the PR above):Here's the full XML/NcML output: madis-ncml.xml.txt
2.5 years ago @qqmyers made some suggestions for previewing XML files at IQSS/dataverse.harvard.edu#70 (comment) . Here's his comment:
"FWIW: Something like https://www.jqueryscript.net/other/tree-xml-viewer-formatter.html adapted with the wiki instructions at https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers/wiki/How-to-create-a-previewer might be a quick win. (I didn't search too hard for an XML viewer - there could be better libraries out there to start from.)"
The text was updated successfully, but these errors were encountered: