You are here

E4S: Extreme-Scale Scientific Software Stack for Commercial Clouds

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-SC0022502
Agency Tracking Number: 0000271168
Amount: $1,600,000.00
Phase: Phase II
Program: SBIR
Solicitation Topic Code: C53-02b
Solicitation Number: N/A
Timeline
Solicitation Year: 2023
Award Year: 2023
Award Start Date (Proposal Award Date): 2023-04-03
Award End Date (Contract End Date): 2025-04-02
Small Business Information
2836 Kincaid Street
Eugene, OR 97405-4156
United States
DUNS: 167172308
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Nicholas Chaimov
 (503) 869-8513
 nchaimov@paratools.com
Business Contact
 Allen Malony
Phone: (541) 913-8797
Email: malony@paratools.com
Research Institution
N/A
Abstract

C53-02b-271168The software used in High Performance Computing (HPC) and ArtificialIntelligence/MachineLearning (AI/ML) workloads is increasingly complex to maintain, install, and optimize. More problematic is the poor performance portability of applications between platforms, forcing site-specific re-engineering of codes. Existing solutions to deployment of AI/ML work?ows on commercial cloud environments are platform-specific, preventing migration from one cloud provider to another. This project proposes to address the problem by combining the use of an integrated stack of HPC software, which provides multi-platform container images, with a highly performant and performance-portable Message Passing Interface (MPI) library for fast, inter-and intra- node communication on four cloud platforms, made available through a platform-neutral interface. Phase I has evaluated the feasibility of this solution and built prototypes for evaluation. We made improvements to an MPI implementation to support high-performance deployments of MPI applications on cloud platforms; built high-performance versions of commonly usedDeep Learning frameworks for cloud deployment; made use of high-speed network adapters and accelerators within the cloud environments; and evaluated the use of a platform-neutral web interface for deployment on a cloud vendor. In Phase II, the proof-of-concept developed in Phase I will be ported to three additional cloud platforms, and the high-performance MPI implementation will be tuned for each of these platforms. All four supported plat- forms will be made available through a single platform-neutral interface. The ability to use the Phase II product to easily port an application from traditional HPC clusters to all four cloud platforms will be demonstrated. The success of the project will deliver a productive platform for transitioning important HPC applications (many developed in National Laboratories) to more accessible cloud based HPC platforms in a portable manner while retaining high performance. It will be beneficial to practically all scalable HPC applications ranging from modeling and simulation to AI/ML, where advance message communication hardware and access to accelerator technologies are being more commonly supported in commercial cloud systems. In particular, data analytics and deep learning are areas of high growth and of benefit to a broad range of industries. High performance is critical for these codes — a poorly performing code wastes compute resources, preventing purchased hardware from being used for other uses, increasing a business’s costs for cloud computing resources, and increasing time to solution.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government