You are here

Development of Tools to Derive High Level Language Code Associated with Executable Software

Description:

TECHNOLOGY AREA(S): Info Systems 

OBJECTIVE: The task of reviewing programs for possible vulnerabilities is often complicated by lack of availability of the program’s source code. Although it is possible to decompile or convert binary executable code back to source code for some languages, for most languages the process remains either unreliable or not possible with current technology. There are tools that can detect vulnerabilities within binary machine code and it can be changed into assembly language, but it is difficult to verify these findings without being knowledgeable of machine code and assembly language, or without being able to translate the machine code back to source code. It is proposed that a tool or set of tools be developed to expand the ability to revert binary machine code back to source code or to a higher level language beyond assembly language. 

DESCRIPTION: There are a number of reasons why source code may be unavailable for systems in use within DoD systems. The reasons include limitations on contract data rights, the use of legacy code for which the vendor is no longer available, and the inclusion of other third party libraries. In these cases, it is still necessary to assess the applications in order to identify potential vulnerabilities and determine their security posture. With current technology, source code assessment can be achieved through numerous tools which parse the code and identify potential defects. Some tools are also available which can parse binary code and identify defects, but these tools do generate some false positives, and further human analysis is required to eliminate those false positives and identify the true security posture. In the case of software without source code, this analysis can be extremely time consuming, and requires specialized skill sets to understand the assembly language generated based on the binary code. 

PHASE I: Develop a white paper/prototype which documents a process for developing a robust Automated Tools set that shall recreate a high level source code based on binary software. The tool shall able to reverse engineer multiple programming languages and regenerate code in its original language as developed before compilation. The proposed solution shall regenerate a higher level language code allowing analysts the flexibility to effectively determine the overall security posture of the systems and accurately review the results of findings from binary analysis tools. The solution shall allow assessment of software defects without the need to manually review any lower level languages such as binary or machine code. All assessment will be performed in higher level languages, for 100% of source code regardless of input language. 

PHASE II: Develop a working prototype, based on the selected Phase I design which demonstrates capabilities of a tool or set of tools to expand the ability to revert machine code back to source code or to a higher level language beyond Assembly language. The solution shall find potential vulnerabilities throughout the Software Development Lifecycle (SDLC), and recreate high level source code based on binary software where a potential security defect has been identified rather than through problem reports after systems are fielded, sustainment costs can be drastically reduced, and system readiness drastically enhanced. The solution shall identify vulnerabilities in source code which are associated with the Common Weaknesses and Exposures (CWE) list. Upon identifying these defects, source code shall be generated in a higher level language for 100% of the defects. The generated source code shall coincide with the full function or module in which the defect was identified, and shall be generated regardless of the original language that the code was developed in. 

PHASE III: In conjunction with Army, optimize the prototype created in Phase II. Implement a Robust Tool for which can recreate high level source code for test and evaluation, using commercially available technologies. The implementation should ensure that the system is interoperable with existing system of systems. Perform steps required to commercialize the system. 

REFERENCES: 

1: Klocwork, "Developing Software in a Multicore and Multiprocessor World," Ottawa, ON, 2010.

2:  G. McGraw, Software Security: Building Security In, Addision-Wesley Professional, 2006.

3:  "Comparative Study of Risk Management in Centralized and Distributed Software Development Environment," Scientific International (Lahore), vol. 26, no. 4, pp. 1523-1528, 2014.

4:  G. Vasiliadis, M. Polychronakis and S. Ioannidis, "GPU-Assisted Malware," International Journal of Information Security, vol. 14, no. 3, pp. 289-297, 2015.

5:  M. Atighetchi, V. Ishakian, J. Loyall, P. Pal, A. Sinclair and R. Grant, "Metronome: Operating System Level Performance Management via Self-Adaptive Computing," in Proceedings of the 49th Annual Design Automation Conference, 2012.

KEYWORDS: Cyber Security, Commercial Off The Shelf (COTS), Malicious, Vulnerabilities, Software Development Lifecycle (SDLC ), Binary Analysis, Source Code, False Positives, High Level Language Code 

CONTACT(S): 

Huy Pham 

(443) 861-3218 

huy.x.pham.civ@mail.mil 

David Arty 

(732) 532-3338 

US Flag An Official Website of the United States Government