You are here
A Scalable Targeted Debugger for Scientific and Commercial Computing
Phone: (608) 263-3378
Phone: (515) 598-2722
Phone: () -
Type: Nonprofit College or University
We propose to produce a commercial powerful lightweight debugging tool, called Swat, which will be of use to both supercomputer application programmers and to programmers of cluster- and cloud-based parallel e-commerce and engineering systems and middleware. The tool will be based the STAT stack-trace debugging tool produced jointly by the University of Wisconsin and Lawrence Livermore National Laboratory. The feasibility of the STAT has been proven in demonstrations on the largest supercomputers, up to 200,000 processes on both the IBM BlueGene and Cray XT systems. To bring this software to commercial use, we need to develop enhanced command, control and display interfaces and ensure support of the main supercomputer and cluster programming models. The result of this work would be a tool of broad use internationally in both the scientific, commercial, and cloud computing communities. We target the identification and diagnosis of program behavior, addressing questions like: what is the application doing? Is it in a deadlock or infinite loop? To solve these problems, we need to address three key technical challenges: First, in most parallel debuggers, a front-end process controls the interactions between back-end tool daemon processes and the debugged application & apos;s processes. The front-end can spend unacceptably long times managing the connections to the back-end daemons at large process counts. Second, as the number of debugged processes increases, the volume of data becomes prohibitively expensive to gather. Third, even if the debug data can be gathered in acceptable time, the time to process and to present it becomes excessive, often causing users to resort to targeted print statements. To address these challenges, we will bring the debugging technology developed under the Stack Trace Analysis Tool (STAT) project to the commercial market place, to produce Swat, a lightweight, easy to use, scalable, cost effective, and powerful debugger. This debugger will manage the scalable collection, analysis and visualization of stack trace profiles used to depict application behavior, providing the critical information needed to identify bugs in parallel programs in the scientific, commercial (e-commerce) and cloud computing domains. Identifying bugs in programs that run on hundreds, thousands, or even hundreds of thousands of processors is a daunting task. Even for experienced programmers, such bugs can take days or even weeks to find, severely hampering productivity. We will address the problem of finding such bugs in a way that encourages novices (an extremely difficult audience to reach) to use such a tool, and in a way that directly benefits the experienced parallel programmer. We will use a tree based overlay network (TBON) as the basis for scalable data collection, analysis, reduction, and presentation. With a carefully design graphical user interface and default usage modes, the tool will be readily usable with minimal training. For the advanced programming, new diagnostic algorithms will help locate many of the most difficult parallel program bugs. Commercial Applications: Multicore, multiple threads, and multiple processes are ubiquitous in every programming environment, from the largest scientific applications, data centers and clouds, to medium scale e-commerce and engineering problems, to the small laboratory or business. A tool such as Swat will have broad applicability in a variety of market spaces.
* Information listed above is at the time of submission. *