You are here
Next-Generation TechNology for the Extremely Efficient Storage, Distribution, and Processing of Nuclear Physics Data
Phone: (954) 249-3152
Phone: (954) 249-3152
Millions of dollars are spent every year in generating the tons of data that are necessary for addressing nuclear physics grand challenge problems at DOE facilities. Due to inherent technical and financial limitations at the “last- mile” of the data delivery pipeline, only a small fraction of these expensive Peta bytes of data is available for immediate access by end-users at any given time. This not only renders the users’ work inefficient, but also risks the waste of significant amounts of information that could very well contain the keys to answering grand challenge questions. This project aims at solving this issue by providing the ground-breaking data compression technology able to multiply the capacity of existing (and future) last-mile “live storage” facilities by factors larger than 4x and up to 9x without any additional investment in hardware. The technology, which will be packaged as an open-source set of tools, will be integrated with ROOT/IO and will provide the users with full control in the form of easy to understand/use/modify/adapt according to the specific needs and technical sophistication of each user. The data compression technology is based on a novel and very rich theory of application-driven lossless compression that systematically analyzes the data in search of hidden redundancies that are not easily detected by traditional general-purpose compression. In addition, the theory provides fundamental mechanisms for achieving unprecedented lossless compression factors from a thorough understanding of the peculiar —and rich— interdependencies among the precisions of the data. This theory is complemented by very efficient and portable software implementations that we have developed in the last five years and which have already made paradigm- shifting impact of programs at NASA and DoD. Already successful for network-data and in-memory compression, this project will realize the power of the proposed data compression technology for nuclear physics I/O and storage. In Phase-I we surpassed our initial 2x compression target and successfully demonstrated factors as high as 3.9x in real-life datasets produced at the STAR experiment. The Phase-I compression factors are close to 3 times higher than what can be attained with the software available to the High Energy and Nuclear Physics (HENP) community today. Furthermore, the compression solution does not compromise the physics as it is designed for maintaining numerical integrity. In Phase-II we will capitalize on the substantial success delivered by the Phase-I, and further develop the technology to provide a robust software solution that could be used by other experiments around the world as means for improving data management and accessibility. With the integration of additional compression optimizations conceived in Phase-I, the Phase-II technology is expected to deliver 4x-9x compression. Once brought to life, the proposed technology will seamlessly boost the capacity of live storage facilities, which will in turn accelerate the pace of discovery in nuclear physics by enabling immediate access to significantly larger collections of data. The nuclear physics community would benefit of this technology in a very short time due to an infusion strategy that targets direct integration into the ROOT/IO framework. Commercial Applications and Other Benefits: It is estimated by IDC that the amount of data that exists in the world is duplicated every 2 years. Even with declining costs of storage, this enormous growth of data makes storage one of the biggest cost elements in markets beyond nuclear and high energy physics, such as cloud computing, aerospace design, oil & gas exploration, weather modeling, among many others. Data compression has risen as a tool for seamlessly and inexpensively tackling the increasing demand for additional storage space. Prominent players from Oracle to Facebook to Google have already embraced data compression, contributing to a data compression software market that is expected to grow at approximately $864 million by 2023. The proposed technology is expected to positively disrupt this market, with a specialized offering for large-scale applications that rely on floating point data, precisely the type of information that our technology targets and that has been unsuccessfully addressed by traditional data compression solutions.
* Information listed above is at the time of submission. *