Akadio proposes to develop H5 Hermes, an enhancement to HDF5 to enable high efficiency I/O performance for HDF5 applications that perform many related reads and writes across large data structures. Such I/O operations are often a serious bottleneck for large scale simulations, resulting in I/O bottlenecks that significantly degrade overall performance.
There are existing HDF5-based solutions similar to what we propose here. However, these solutions are either one-off implementations or implemented for specific computing environments, or both, and thus do not have the broad applicability that a general HDF5 solution would have. With its proven record for scalability, HDF5 is the data store of choice for many of the most important applications at NASA, the DOE, and other high-end computing (HEC) facilities. Thus, an HDF5 based solution would become available immediately, with minimal modification, to many existing applications, and would be extremely easy to adapt in support of future applications.
H5 Hermes directly addresses the objectives of the solicitation. It will accelerate the integration of current and future high-end computing systems and data stores by enabling I/O processing to keep up with the demands of increasingly high-resolution simulations on massively parallel systems. Thus, H5 Hermes helps achieve four of the five objectives of solicitation S5.01, namely to:
H5 Hermes will also improve the ability to efficiently retrieve many pieces of related data from very large cloud-based data stores. This is another NASA priority, though not part of the S5.01 solicitation.
A wide range of NASA applications use HDF5 or one based on HDF5, such as netCDF4 and CGNS. In earth science, the GMAO and NCCS use HDF5 or netCDF4 for HEC simulations. In flight-dynamics simulations, NASA uses HDF5, often on high end systems requiring high speed parallel I/O. Many astrophysics and astronomy codes, such as ENZO and FLASH use HDF5 for simulation, modeling and analysis, on HEC systems. The Astrophysics Source Code Library lists several codes that use HDF5 and run on HEC systems.
Climate and weather modeling codes use HDF5 or netCDF4 . E.g. Ocean-Land-Atmosphere Model. Astrophysics codes (e.g. ENZO and FLASH) use HDF5 for simulation, and push I/O limits on HEC systems. All major aerospace companies, the finance industry, oil and gas industry, and many others, use HDF5 in simulations and other analysis on HEC systems. HDF5 is the most-used format for applications in the DOE laboratories, largely because is performs so sell on the world’s biggest and fastest computers.