Unified, Cross-Platform, Open-Source Library Package for High-Performance Computing
The last several years has seen the emergence of a new trend in HPC design the rise in the use of special purpose add-on hardware optimized for efficient math computations, such as graphics processing units (GPUs) and compute accelerators. Effectively utilizing this hardware is challenging and requires many new programmer skills. Despite these complications with programming them, this hardware provides dramatic performance benefits, and is already used in three of the ten fastest supercomputers in the world, so this trend is unlikely to slow down in the near future. Despite this move to hybrid architectures and the diversity of hardware options available, programmer effort for common tasks is often reduced via the use of software libraries. Examples of commonly used libraries are FFT, BLAS, and LAPACK. The programmer expects that optimized and functionally correct versions of these and other routines are available for any given hardware. Ideally, the use of libraries reduces debugging time, eases porting to alternative or future hardware, and gives excellent performance because it is tuned by experts in the hardware and the algorithms. In reality, the space is fragmented, which pushes significant burden to the software developer. For this project, EM Photonics is developing an open source, unified set of fundamental libraries for use on hybrid HPC systems. EM Photonics will provide baseline functionality for common library routines written in OpenCL that will be cross-platform and open source. We will also provide a framework that will allow others (researchers, hardware vendors, commercial library providers, etc.) to plug in alternate implementations. In Phase I, we prototyped a BLAS library, showed LAPACK functionality, and created a highly-performing matrix-multiplication routine using a robust automatic tuning system. We devised a system by which highly compatible kernels can be written without burdening the programmer and then automatically tuned to achieve both code clarity and the highest performance. In Phase II, we will expand the supported routines to include a commonly-used subset of LAPACK, sparse BLAS, and FFT. We will also build out a framework for easier development and performance optimization of generic kernels, as well as facilities for adding device-specific code. Finally, this code will be thoroughly documented and released as open-source to encourage community participation in development, as well as widespread use.
Small Business Information at Submission:
51 E Main St Suite 203 Newark, NJ 19711-4685
Number of Employees: