You are here

E4S: Extreme-Scale Scientific Software Stack for Commercial Clouds

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-SC0022502
Agency Tracking Number: 0000263388
Amount: $250,000.00
Phase: Phase I
Program: SBIR
Solicitation Topic Code: C53-02b
Solicitation Number: N/A
Timeline
Solicitation Year: 2021
Award Year: 2022
Award Start Date (Proposal Award Date): 2022-02-14
Award End Date (Contract End Date): 2023-02-13
Small Business Information
2836 Kincaid Street
Eugene, OR 97405-4156
United States
DUNS: 167172308
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Nicholas Chaimov
 (503) 869-8513
 nchaimov@paratools.com
Business Contact
 Sameer Shende
Phone: (541) 913-8797
Email: sameer@paratools.com
Research Institution
N/A
Abstract

The software used in High Performance Computing (HPC) and Artificial Intelligence/Machine Learning (AI/ML) workloads is increasingly complex to maintain, install, and optimize. More problematic is the poor performance portability of applications between platforms, forcing site-specific re-engineering of codes. Existing solutions to deployment of AI/ML workflows on commercial cloud environments are platform- specific, preventing migration from one cloud provider to another. This project proposes to address the problem by combining the use of E4S, which provides multi-platform container images, with MVAPICH2, a highly-performant and performance-portable MPI library for fast, inter-and intra- node communication on AWS and other commercial cloud platforms. Phase I will evaluate the feasibility of this solution and build prototypes for evaluation. We will evaluate the use of MVAPICH2 to provide high-performance deployments of MPI applications on cloud platforms; build high-performance versions of commonly used Deep Learning frameworks for cloud deployment; make use of high-speed network adapters and GPUs within the cloud environments; and evaluate the creation of a web interface for one-click deployment of highly performant Deep Learning applications. The success of our Phase I project will deliver a productive platform for transitioning important HPC applications (many developed in DOE national laboratories) to more accessible cloud based HPC platforms in a portable manner while retaining high performance. It will be beneficial to practically all scalable HPC applications ranging from modeling and simulation to AI/ML, where advance message communication hardware and access to accelerator technologies are being more commonly supported in commercial cloud systems. In particular, data analytics and deep learning are areas of high growth and of benefit to a broad range of industries. High performance is critical for these codes — a poorly performing code wastes compute resources, preventing purchased hardware from being used for other uses, increasing a business’s costs for cloud computing resources, and increasing time to solution. This project will especially benefit the deep learning market by making deployment of applications on cloud platforms easier, facilitating portability between cloud platforms while maintaining performance, and reducing training time for deep learning models. Efficient use of pay-per-core-hour resources like public clouds reduces costs to users along with energy consumption

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government