You are here

Automated malware understanding and classification


OBJECTIVE: Automated techniques for understanding and classifying behavior of novel malware. DESCRIPTION: The number of new malware being encountered in the wild is steadily and rapidly increasing. Recent reports show that more than 5,000 new, unique malware samples are encountered daily. In order to keep pace and not fall behind in the arms race with malware creators, there is a dire need for a systematic, automated way to process this deluge of malware. When a malware is encountered, there are two questions that need to be answered: (i) what does the malware do? (ii) is the malware a variant of an already known malware? Automated and effective techniques combining static and dynamic analysis of executables, mining techniques for behaviors, and malware classification are needed to address this challenging problem. The same technique may also help understand behavior of COTS from untrusted and unknown sources. Researchers are exploring new techniques that can address these questions, such as the recent work on automated construction of dependence graphs from executions of malware for understanding and summarizing the behavior of the malware. Researchers have also studied mining tools and techniques based on dependence graphs to extract the behavior of malware. Semi-automated specification generation techniques have been explored to help analysts construct detection mechanisms for newly discovered malware behaviors for incorporating them into behavior-based or cloud-based malware detectors. Some researchers (such as Bailey et al. 2007) have addressed the malware classification problem: classifying malware by type (e.g., Virus, Worm, Spyware), family (e.g., Bagle, Netsky, MyDoom), and whether it has been encountered before. The current practice of analysts manually inspecting each individual incoming malware is not a sustainable solution. There is a need for proven and deployable automated techniques that can process and analyze large volumes of malware binaries. PHASE I: 1) Research and develop automated malware understanding and classification technologies based on recent new techniques such as dependence graphs or symbolic execution that can effectively and efficiently analyze and characterize malware behavior and to defeat the use of obfuscation and polymorphism. 2) Demonstrate that the proposed techniques can be implemented successfully in classifying behaviors for a large corpus of malware in near real-time. PHASE II: 1) Extend the techniques proposed in phase I to mine or extract relevant behaviors of malware. 2) Develop and implement techniques for automatically transforming the extracted malware pattern and behaviors into policies or patterns that can be ported into existing malware detectors. 3) Validate the techniques under operational conditions. The goal of this phase will be to demonstrate that a new malware can be analyzed near real-time. The goal will be to analyze, classify, and mine behaviors in less than five minutes with minimum human intervention. PHASE III DUAL USE APPLICATIONS: Effective techniques for understanding and classifying malware are critical for both military and commercial sectors. The developed system will be marketed as a malware-analysis platform which will be attractive to malware-detection companies and defense agencies. The malware-analysis platform can be used by agencies and companies for developing a faster defense against zero-day attacks. REFERENCES: 1. B. Acohido and J. Swartz. Zero Day Threat: The Shocking Truth of How Banks and Credit Bureaus Help Cyber Crooks Steal Your Money and Identity. Union Square Press, April 2008. 2. Michael Bailey, Jon Oberheide, Jon Andersen, Z. Morley Mao, Farnam Jahanian, and Jose Nazario, Automated Classification and Analysis of Internet Malware, Proceedings of Recent Advances in Intrusion Detection (RAID'07), September 2007. 3. Mihai Christodorescu, Somesh Jha, and Christopher Kruegel, Mining specifications of malicious behavior, ESEC/SIGSOFT FSE, 2007. 4. Matt Fredrikson, Mihai Christodorescu, Somesh Jha, Reiner Sailer, and Xifeng Yan, Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors IEEE Symposium on Security and Privacy, 2010. 5. Lorenzo Martignoni, Elizabeth Stinson, Matt Fredrikson, Somesh Jha, John C. Mitchell, A Layered Architecture for Detecting Malicious Behaviors, RAID 2008. 6. Mila Dalla Preda, Mihai Christodorescu, Somesh Jha, Saumya K. Debray, A semantics-based approach to malware detection, ACM Trans. Program. Lang. Syst., 30(5), 2008 7. David Brumley, Hao Wang, Somesh Jha, Dawn Xiaodong Song, Creating Vulnerability Signatures Using Weakest Preconditions, CSF, 2007: 8. Hao Wang, Somesh Jha, Vinod Ganapathy, NetSpy: Automatic Generation of Spyware Signatures for NIDS, ACSAC, 2006.
US Flag An Official Website of the United States Government