SBIR Phase I: Virtualization Tolerant, High Performance Computing in Servers, Using Compute Intensive Multicore Accelerators
National Science Foundation
Agency Tracking Number:
Solicitation Topic Code:
Small Business Information
9617 Wendell Rd, Dallas, TX, 75243-5510
Socially and Economically Disadvantaged:
AbstractThe innovation addresses problems of data I/O inefficiency and virtualization in High Performance Computing (HPC). Conventional High Performance Computing (HPC) methods are both inefficient and incompatible with virtualization methods commonly used in servers, creating a roadblock to supercomputing server farms needed for practical AI (Artificial Intelligence) applications. GPU (Graphics Processor Unit) and "many integrated x86 cores" (many-x86) accelerators, while offering high performance, generate excessive heat, are physically large, and do not offer direct, high-speed, low-latency I/O. For example, GPU and many-x86 accelerator boards are full-length, double-wide, consume up to 300W, and do not connect high-speed, low-latency I/O directly between their compute cores and external networks. This project describes research and development efforts based on a novel approach combining arrays of high performance, low heat compute intensive multicore CPU accelerators with network I/O connected directly to the cores, and a superset of the popular OpenMP standard for multicore programming. Results will demonstrate the first fundamentally virtualization tolerant accelerator available on the market, using a single 1U server with (a) 2.5 Teraflop acceleration, (b) 450 W total power consumption, and (c) virtualization-compatible, OpenMP based programming model with high degree of ease-of-use and suitable for rapid adoption by AI application developers and programmers. The broader/commercial impacts of the innovation include increased High Performance Computing (HPC) efficiency and virtualization compatibility in supercomputing server farm applications and enabling new Artificial Intelligence (AI) server farm applications. Commercial examples include practical AI applications that require real-time network data I/O, such as mobile device speech and face recognition, fast and automated analysis of drone and surveillance video (for example detecting human behavior and divining human intent in real-time), financial data modeling and trading network risk checks applied directly at the network edge, and real-time social media data analytics. Breakthroughs in HPC efficiency and virtualization will lead to greater scientific understanding of heterogeneous CPU systems, in this case x86 and ARM server motherboards combined with compute intensive multicore accelerators. The OpenMP programming model will be enhanced to allow compute-intensive, direct I/O cores and general-purpose x86 cores to coexist within a unified platform, under a standards-based model. A practical, scalable server acceleration paradigm will be demonstrated with four (4) compute intensive multicore CPU accelerators inserted in a 1U server, running video analytics and computational finance application examples.
* information listed above is at the time of submission.