You are here

A Scalable Targeted Debugger for Scientific and Commercial Computing

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-FG02-12ER86503
Agency Tracking Number: 98641
Amount: $149,881.00
Phase: Phase I
Program: STTR
Solicitation Topic Code: 02 b
Solicitation Number: DE-FOA-0000577
Timeline
Solicitation Year: 2012
Award Year: 2012
Award Start Date (Proposal Award Date): 2012-02-20
Award End Date (Contract End Date): 2012-11-19
Small Business Information
999 Windcroft Pl
Annapolis, MD 21401-6578
United States
DUNS: 964379965
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Barton MIller
 Dr.
 (608) 263-3378
 bart@cs.wisc.edu
Business Contact
 Tom Brennan
Title: Mr.
Phone: (515) 598-2722
Email: tjmbrennan@gmail.com
Research Institution
 University of Wisconsin
 
1210 W. Dayton
Madison, WI 53706-1613
United States

 () -
 Nonprofit College or University
Abstract

We propose to produce a commercial powerful lightweight debugging tool, called Swat, which will be of use to both supercomputer application programmers and to programmers of cluster- and cloud-based parallel e-commerce and engineering systems and middleware. The tool will be based the STAT stack-trace debugging tool produced jointly by the University of Wisconsin and Lawrence Livermore National Laboratory. The feasibility of the STAT has been proven in demonstrations on the largest supercomputers, up to 200,000 processes on both the IBM BlueGene and Cray XT systems. To bring this software to commercial use, we need to develop enhanced command, control and display interfaces and ensure support of the main supercomputer and cluster programming models. The result of this work would be a tool of broad use internationally in both the scientific, commercial, and cloud computing communities. We target the identification and diagnosis of program behavior, addressing questions like: what is the application doing? Is it in a deadlock or infinite loop? To solve these problems, we need to address three key technical challenges: First, in most parallel debuggers, a front-end process controls the interactions between back-end tool daemon processes and the debugged application & apos;s processes. The front-end can spend unacceptably long times managing the connections to the back-end daemons at large process counts. Second, as the number of debugged processes increases, the volume of data becomes prohibitively expensive to gather. Third, even if the debug data can be gathered in acceptable time, the time to process and to present it becomes excessive, often causing users to resort to targeted print statements. To address these challenges, we will bring the debugging technology developed under the Stack Trace Analysis Tool (STAT) project to the commercial market place, to produce Swat, a lightweight, easy to use, scalable, cost effective, and powerful debugger. This debugger will manage the scalable collection, analysis and visualization of stack trace profiles used to depict application behavior, providing the critical information needed to identify bugs in parallel programs in the scientific, commercial (e-commerce) and cloud computing domains. Identifying bugs in programs that run on hundreds, thousands, or even hundreds of thousands of processors is a daunting task. Even for experienced programmers, such bugs can take days or even weeks to find, severely hampering productivity. We will address the problem of finding such bugs in a way that encourages novices (an extremely difficult audience to reach) to use such a tool, and in a way that directly benefits the experienced parallel programmer. We will use a tree based overlay network (TBON) as the basis for scalable data collection, analysis, reduction, and presentation. With a carefully design graphical user interface and default usage modes, the tool will be readily usable with minimal training. For the advanced programming, new diagnostic algorithms will help locate many of the most difficult parallel program bugs. Commercial Applications: Multicore, multiple threads, and multiple processes are ubiquitous in every programming environment, from the largest scientific applications, data centers and clouds, to medium scale e-commerce and engineering problems, to the small laboratory or business. A tool such as Swat will have broad applicability in a variety of market spaces.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government