You are here
Non-Linear Machine Learning based Data Reduction Software for High-Performance Computing
Phone: (301) 294-4632
Phone: (301) 294-5221
There are large amounts of floating- point data generated from DOE computing systems, and the data volume and velocity pose challenges for data storage and more importantly data analysis. Therefore, data reduction has been explored to reduce data prior to storage. Recently MGARD has been developed to allow for a new way of data compression. However, this software tool still needs further work for production purposes, e.g., to add/enhance its software quality control, generality, usability, etc. Moreover, many of existing High Performance Computing (HPC) data reduction software tools were designed for domain experts and hence require the high level of expertise to install and run. To address the above issues, a tool called MINNION, is proposed to harden HPC scientific data reduction software. MINNION will allow simulation data to be refactored such that users can perform exploratory data analysis progressively, and the fidelity can be adjusted according to users’ accuracy needs and exploring trade-off between storage size and data accuracy. This effort will provide the tool and insights that can be used to achieve application specific data reduction and reducing data input complexity. To meet the project objectives, the following will be performed: (1) identify data reduction algorithms and toolsets to integrate, and define system requirements; (2) design and develop the MINNION data reduction toolset; and (3) build a proof-of concept prototype for demonstration and conduct performance evaluation. Conversations will be initiated with multiple potential partners including customers, market leaders, key suppliers, and critical sales and distribution channels to pursue successful transition and commercialization for the proposed technology. The proposed techniques, tool and software will greatly reduce the complexity of HPC data reduction for scientific applications, and hence the cost of data storage, data movement, and data analysis. It will also substantially lower the level of expertise to utilize the data reduction algorithms with better generality, usability and scalability. Besides DOE, a wide range of organizations can also benefit from our proposed tools, such as DOD and NSF computing centers, as well as commercial computing centers. MINNION can be applied to and benefit a broad range of HPC centers and Large-Scale Distributed Computer Systems in industry (such as IT, various science applications, finance/economics, etc.), university/academic, and government agencies (such as defense and government labs).
* Information listed above is at the time of submission. *