SpeedShop Ease of use Performance Analysis for Heterogeneous Processor Systems
Small Business Information
Argo Navis Technologies, LLC
999 Windcroft Pl, Annapolis, MD, 21401-6578
AbstractWe propose to build on the Open|SpeedShop modular, extensible architecture and existing capabilities to examine the feasibility of providing seamless, integrated heterogeneous processor performance analysis support, focusing on GPU processors. Accelerators such as GPUs are becoming increasingly important at HPC laboratories within the DOE. However, the ability of tools to succinctly consolidate the heterogeneous processor performance information makes it difficult to get an accurate understanding of what impact the accelerator is having on the performance of the user application. The goals of the heterogeneous processor performance analysis support innovations are to first to detect potential source code snippets in the users application that are a candidate to be converted into accelerator kernels. Open|SpeedShop can be used to detect potential kernels now, but with this funding we would improve the usability of that functionality. Second, we would research the best methods for extracting performance information about the performance of the accelerator kernel. Existing tool support, at this time, mainly focuses on measuring the rate at which the data is transferred into the kernel and the time between when the kernel was entered and exited. Our research would also include these types of metrics, but would additionally provide the user more information, examining the trade-offs in application performance due to the cost of the data transfer and cost of executing the kernel. Research will include NVIDIA GPUs, as well as Intels Many Integrated Core (MIC) architecture with Knights Ferry and in the future, Knights Corner. Our research will include other accelerator type processors that are deemed essential by DOE personnel with whom we will consult with. Dr. Richard L Graham, ORNL, has indicated that the proposed innovations described below are very much of interest to ORNL and has provided a letter of support, which is included in the proposal submission. In addition to measuring the performance of the accelerator device, there are measurement complications due to the difficulty of accurately attributing the time spent, when CPU and GPU processing overlap, to the proper processing unit, and when multiple accelerators are employed. These accelerator performance analysis issues, and others described in the proposal, are what the proposed innovation research and subsequent commercialization are aimed at. In addition, the proposed innovations focus on usability and ease of use by integrating the heterogeneous performance analysis results together in concise integrated views. Accelerators such as GPUs are becoming increasingly important at HPC laboratories within the DOE. However, the ability of tools to succinctly consolidate the heterogeneous processor performance information makes it difficult to get an accurate understanding of what impact the accelerator is having on the performance of the user application. Performance tools are available to assist the application developer in identifying these areas, however these tools are typically quite complicated and require a certain level of expertise and training to use. In order to meet the challenges of analyzing applications running on heterogeneous systems and provide improvements in ease of use, our proposed innovations will provide a number of capabilities, including providing the ability of detect sections of source code in the users application that are a candidates for accelerator kernels, providing integrated performance analysis for both CPU and GPU (or other accelerator types) with the ability to analyze the performance of the GPU kernels internally, and additionally providing Open|SpeedShop experiments design to analyze the trade-offs between CPU and GPU usage. Commercial Applications and Other Benefits: More efficient code can be produced accurately, and with less effort, allowing for increased scalability in codes used by both government and private institutions. Example HPC codes that can benefit from these innovations include energy, weather, financial, and geological applications at sites throughout the world.
* information listed above is at the time of submission.