Skip to content

HDF5 2.0 Planning

Dana Robinson edited this page Oct 30, 2024 · 11 revisions

These are the topics/issues we're going to focus on in the next version of HDF5. For full transparency, we're going to try to track publicly-visible features (i.e., not refactoring) in GitHub so that everyone can see how the release is progressing. HDF5 2.0 issues we plan to address are organized into high-level parent issues (we're in the beta for this GitHub feature) with the specific issues as children. The parent issues are easy to spot since they start with "2.0:" and there are links to them in this document.

NOTE: This list is ambitious, so not everything here is realistically going to make it in.

THIS IS A DRAFT

We should be done with our planning by early November (for a March 2025 release).

Major 2.0.0 changes

The biggest changes/features we'll be making to the library in HDF5 2.0 are:

Move to semantic versioning (https://semver.org/)

People have been asking for this for a long time and we get many complaints about our existing scheme. All future HDF5 versions will be major.minor.patch.

Update library defaults to provide better performance with cloud-optimized HDF5 and modern I/O hardware

We'll have more information about this in the winter, but we would like to revisit all the library defaults (cache sizes, etc.), do a few rounds of performance testing, and see if they still make sense in 2025.

Complex number support

We've been wanting to expand the type system for a long time. We added IEEE float16 support in the last release and now we'd like to add complex numbers.

Maintaining two build systems is unnecessary overhead. Keeping the two systems in sync and reinventing the wheel when we have to perform more complicated testing takes non-trivial engineering resources. Starting in the 2.0.0 release, all Autotools files will be removed and we will no longer support it.

Remove the C++ wrappers

These have been (mostly) unmaintained for some time and we have neither the plans nor the resources to bring them up to speed with modern C++.

Remove the HDF5 <--> GIF tools

These tools have unfixed CVE issues, are not actively maintained, and are an odd fit for the library.

Other 2.0.0 changes

The heading for each of the following topics points to the GitHub parent issue for all the specific child issues.

Since we're dropping Autotools support, it's imperative that CMake work well. We'll make a pass over both build systems to ensure that CMake does everything the Autotools do, simplify the build system code, revamp the install docs, and work to fix all the open GitHub issues.

Over the past few years, we've dramatically expanded our CI and we'll continue to do that for HDF5 2.0.0. We now report to my.dash and you can see the output of our GitHub CI under the GHDaily heading, as well as test results from many HPC systems. Improvements over the next few months will include testing our develop branch with the development trunk of both OpenMPI and MPICH, testing HDF5 with the HighFive C++ wrapper, and adding missing configurations.

HDF5 is heavily used on HPC systems, so we'll continue to fix bugs as they arise.

Compression

There are several bugs in the CMake code that deals with building the compression filters that we'd like to fix. We'll also be improving the hdf5_plugins repo.

Crash bugs

Library behavior bugs

Cloud-optimized HDF5

Floating-point

Windows

Documentation

Misc

Bugfixes and general quality

  • Fix all oss-fuzz and other segfault-related bugs (there are several in Jira) and harden surrounding code
  • -fsanitize=memory/address/undefined clean and added to CI
  • Fix file image bugs and review code (#1915)

Cloud-Optimized HDF5

  • Review and possibly refactor ros3 VFD code
  • Performance testing

Support for new datatypes

  • Complex numbers
  • ML types (FP8, FP4)
  • Fix outstanding long double issues (mainly POWER)

Configure issues

  • Autotools pkg-config support
  • CMake support for MinGW + MSYS2

Performance

  • Additional subfiling performance profiling and fixing

Windows as a first-class citizen

  • Win32 VFD
  • Unicode/code page file name tests
  • H5Dwrite performance issue
  • Core VFD slowdown

CI/CD/Testing

  • Determine a final version naming/numbering scheme that works well with package management systems
  • Clean up CDash, public CDash all green
  • Schedule a pre-release and code freeze
  • Fix MacOS signed binary issues
  • Develop a manual test process
  • Move VOL tool testing to main repo

Documentation

  • All “developer-level” public headers have full Doxygen (VOL, VFD, etc.)
  • Develop a list of user guide improvements that need to be made

Misc

  • Close PR #1387 (minor datatype optimization)
  • Go over plugin, etc. paths and make sure that everything is handled well and uniformly

Tools

  • H5L copy hack in h5repack

Fortran wrappers

  • Make sure cross-compiling with Fortran works well

https://github.com/HDFGroup/hdf5/issues/5038 https://github.com/HDFGroup/hdf5/issues/5039 https://github.com/HDFGroup/hdf5/issues/5040 https://github.com/HDFGroup/hdf5/issues/5041 https://github.com/HDFGroup/hdf5/issues/5042 https://github.com/HDFGroup/hdf5/issues/5043 https://github.com/HDFGroup/hdf5/issues/5044 https://github.com/HDFGroup/hdf5/issues/5045 https://github.com/HDFGroup/hdf5/issues/5046 https://github.com/HDFGroup/hdf5/issues/5047 https://github.com/HDFGroup/hdf5/issues/5048 https://github.com/HDFGroup/hdf5/issues/5049

Clone this wiki locally