You are here

Intelligent, real-time migration of scientific computing applications on commercial cloud-based HPC platforms

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-SC0021862
Agency Tracking Number: 0000268076
Amount: $1,149,868.00
Phase: Phase II
Program: SBIR
Solicitation Topic Code: C52-04a
Solicitation Number: N/A
Timeline
Solicitation Year: 2022
Award Year: 2022
Award Start Date (Proposal Award Date): 2022-08-22
Award End Date (Contract End Date): 2024-08-21
Small Business Information
350 Duffield Hall Suite N
Ithaca, NY 14853-1719
United States
DUNS: 081207293
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Hakim Weatherspoon
 (607) 229-3395
 hweather@exotanium.io
Business Contact
 Hakim Weatherspoon
Phone: (607) 257-4706
Email: hweather@exotanium.io
Research Institution
N/A
Abstract

Cloud computing has the potential to serve as a cost-effective and energy-efficient computing paradigm for scientists to accelerate discoveries. Extensive use of commercial cloud computing resources in the scientific community has the potential to lower costs, accelerate research, and enhance collaboration. However, cloud computing utilization is often suboptimal. Users typically overprovision to accommodate potential surges in server use, as well as to ensure that stateful applications, which cannot tolerate any downtime, are not interrupted.
To reduce wasteful spending and enable more efficient usage of cloud resources, a technology is being developed that consolidates idle workloads and over-sized software containers to take advantage of deeply discounted server space such as the Spot Market. The technology is a combination of two separate products. The first module spawns containers on discounted VM instances (Spot Instances), and dynamically relocates containers between such instances, based on availability and price. A second technology packs idle containers onto a small number of VMs during the idle period, and relocates containers onto different VMs when workload increases, without any service interruption. This lack of service disruption is a fundamental departure from current market solutions that offer “cloud optimization” requiring manually re-architecting cloud infrastructure with significant downtime during testing and redeployment.
In Phase I, live migration of government High-Performance Computing (HPC) workloads within a single public cloud was demonstrated. The measured savings were up to 80% as compared to on demand costs, with the same performance (i.e. 5x the amount of compute for the same cost). The ability to do similar migrations with similar value in other public clouds is required to address substantial commercial opportunities and DOE user needs. Also, demonstrating a successful hybrid cloud live migration of workloads between on-premises private cloud to the public cloud could lead to significant cost savings without changing a line of code for the application, presenting a potential approach for migrating to the cloud in an inexpensive and low-risk manner. Finally, successfully establishing the platform to support GPUs could result in a potentially large number of highly compute-intensive DOE applications being run in the spot market of multiple public GovClouds at significant cost savings. During the Phase II award, multi-public cloud support will be
developed, hybrid cloud support established, and capabilities extended to GPU processing.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government