You are here

Open|SpeedShop Ease of Use Performance Analysis for Heterogenious Processor Systems

Award Information
Agency: National Aeronautics and Space Administration
Branch: N/A
Contract: NNX14CA29P
Agency Tracking Number: 144669
Amount: $124,974.00
Phase: Phase I
Program: SBIR
Solicitation Topic Code: S5.01
Solicitation Number: N/A
Solicitation Year: 2014
Award Year: 2014
Award Start Date (Proposal Award Date): 2014-06-20
Award End Date (Contract End Date): 2014-12-19
Small Business Information
999 Windcroft Pl.
Annapolis, MD 21401-6578
United States
DUNS: 964379965
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 James Galarowicz
 Senior Computer Scientist
 (612) 644-3303
Business Contact
 Tom Brennan
Title: Business Official
Phone: (515) 598-2722
Research Institution

We propose building upon the modular extensible architecture and existing capabilities of Open|SpeedShop to provide seamless, integrated, heterogeneous processor performance analysis. The NVIDIA GPU and Intel Many Integrated Core (MIC) processors are increasingly important at high performance computing (HPC) laboratories within NASA for use on NASA's high-end computing (HEC) projects because of their ability to accelerate scientific application performance. In order to understand what impact these accelerators are having on performance, tools must succinctly present heterogeneous processor performance information.

One of the key goals of this work is to develop innovative methods for presenting the performance information extracted from applications running on both traditional CPU and GPU/MIC processors. And, specifically, to provide command line interface (CLI) and graphical user interface (GUI) displays of the heterogeneous processor performance information that facilitates the user's understanding of how their application utilizes the accelerator.

For this project, Phase I GPU-related research will include measuring the usefulness of a GPU kernel, including device utilization, data transfer rates, device efficiency metric, internal device tiling factors, device memory hierarchy usage, as well as additional factors outlined in the referenced material. When measuring the performance of an application that contains concurrent processing on both the CPU and GPU processors, attributing time spent to the proper processor can be difficult - a difficulty exacerbated by the potential presence and usage of multiple accelerators concurrently. Accurately measuring the interactions of multiple accelerator devices is necessary in order to accurately report the application's performance to the user, but is difficult with the current set of accelerator interfaces. Developing techniques for mitigating these limitations will be another area of our GPU research.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government