You are here

Deep Reinforcement Learning Methods and Simulated Learning Environments for Counter-Unmanned Aircraft Systems (C-UAS) Applications


TECHNOLOGY AREA(S): Air Platform, Info Systems, Ground Sea 

OBJECTIVE: To develop advanced artificial intelligence (AI) for competitive offensive as well as defensive unmanned systems to counter hostile threats that lead to degraded performance. This topic seeks development of (1) computational methods that use a single reinforcement agent to solve complex, multi-task problems and (2) simulated learning environments that can be used to train as well as to evaluate putative solutions. 

DESCRIPTION: AI has recently been described by experts within the U.S. Departmet of Defense Autonomy Community of Interest (COI) as “the next arms race”. The implications of adversarial use of AI and its successively greater incorporation into unmanned and autonomous systems remain both unknown and a considerable source of apprehension. Exponential growth in the C-UAS industry, for example, points to mounting concerns regarding use of drones in civilian and military settings, and particular concern surrounds the potential for highly coordinated and disruptive attacks mediated by groups of small (i.e., “swarming”) unmanned systems. Increased domestic and global investment in such technologies also enhances the probability that swarming systems and other C-UAS technologies could be exploited by the adversary to thwart U.S. military operations for which UAS are preferentially used. Outcompeting increasingly intelligent unmanned vehicle technologies to ensure mission completion, especially where communications to a human operator are limited or non-existent, requires insertion of sophisticated computational architectures that provide means to autonomously perform new and different tasks while operating under rapidly changing conditions such as system failures, variable weather conditions, and adjustments to mission based on new information. Recent work in deep reinforcement learning methods has demonstrated impressive progress in such regard by developing computer programs that can solve progressively more complex tasks. Progress, though, has been delimited primarily to single task performance, and multiple days are required to become proficient in a single domain. For the present purpose, new methods that allow a single algorithm to demonstrate flexibility that more closely mirrors human-like behavior by mastering multiple and diverse sets of tasks within a significantly reduced timeframe are desirable. The overarching aim of the topic is to exceed current state-of-the-art in deep reinforcement learning by challenging existing methods to move beyond single agent-single task performance and to more closely replicate learning, memory, and navigation skills that typify human intelligence. Proof-of-concept is provided by the recent development of IMPALA (Importance Weighted Actor-Learner Architecture) that tackles one of the major impediments to progress in this realm by incorporating scalability without the concomitant sacrifice of training stability or data efficiency.1 IMPALA was evaluated on the DMLab-30 and Atari-57 challenge sets2,3 which incorporate a variety and diversity of cognitive tasks that provide useful benchmark problems for deep learning. The architecture demonstrated superior performance “in terms of data efficiency, stability, and final performance”1 as compared to A3C variants4, boasting a 49.4% versus 23.8% human normalized score on the DMLab-30 challenge set. Although IMPALA and other methods do not yet achieve human performance standards, they nonetheless provide clear evidence that implementation of artificial intelligence agents which can learn multiple domains without extensive resource requirements is conceivable. Of particular interest for the effort described herein is the development of computational architectures that enhance the ability of unmanned systems to avoid detection and interdiction during performance of military operations in potentially complex, communications-limited environments. Detection and tracking systems may utilize one or a combination of approaches that include radar systems as well as radio-frequency, electro-optical, infrared, and acoustic scanners. Interdiction systems likewise involve one or more approaches to virtually or physically intercept unmanned systems prior to mission completion and include radio-frequency and Global Navigation Satellite System jamming, spoofing to hijack communications links, lasers to destroy vital segments of the UAS airframe, nets, projectiles, and adversary drones or drone swarms.5 Unmanned systems must be capable of performing “routine” tasks (e.g., collision avoidance and object tracking) while avoiding unexpected hazards like those delineated above, thus could substantially benefit from a new architecture that confers the ability to efficiently operate in multi-task environments by performing and learning similar tasks concurrently. 

PHASE I: Leverage or create a web-based three-dimensional simulation environment that will serve as a challenge problem for training as well as a methodology for evaluating the performance of developed architectures as compared to benchmark agents. Tasks within the learning environment should reflect the diversity of tasks and goals embodied, explicitly and implicitly, in the unmanned systems mission(s) as broadly described above. They should vary visually and contain physically distinct settings to the extent that they reflect anticipated operating environments for conduct of military missions where UAS are reasonably anticipated to be used. The addition of autonomous (“bot”-like) programs that display their own unique, goal-oriented behaviors is desirable for some sub-environments. Performers will work jointly with the Government sponsor to identify environmental features and complexities that should be included. Initiate processes to develop (or extend existing) computational architectures that can resolve a collection of tasks with a single agent at a rate which is practical for purposes of scalability. Develop metrics to evaluate performance of the new architecture as compared to benchmark agents, including human performers, and to evaluate efficiency and data stability. Phase I deliverables will include (1) a final report and (2) demonstration of the training environment to the cognizant project officer. The report should also provide preliminary results on architecture performance and describe development including parameterization. The report should include plans for development of a user interface which will address Phase II expectations. Operating system, software (where applicable), and data compatibility should be specifically addressed, as should proposed location of the final interface. 

PHASE II: Phase II efforts will focus on iterative improvement to the proof-of-concept approach developed during Phase I. The performer will mature the architecture by refining the simulation environment to include, where needed and appropriate, additional and more advanced tasks and by improving architecture performance as compared to the preliminary architecture evaluated as part of the Phase I effort. The performer will identify weaknesses in performance that could be improved through additional inputs (e.g., additional sensor or coordinate data that allows more precise navigation) and will codify / relay observations to the project officer. The phase II deliverables will be a proof of concept demonstration and a report detailing (1) description of the approach, including optimization techniques and performance outcomes, (2) testing and validation methods, and (3) advantages and disadvantages / limitations of the method; the source code; and a user interface with any associated executables. 

PHASE III: In addition to implementing further improvements that would enhance use of the developed product by the sponsoring office, identify and exploit features that would be attractive for commercial or other private sector UAS applications. 


1: Espeholt L et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv:1802.01561v3 [cs.LG], 2018.

2:  Beattie C et al. Deepmind lab. CoRR, abs/1612.03801, 2016.

3:  Bellemare MG et al. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.

4:  Mnih V et al. Asynchronous Methods for Deep Reinforcement Learning. arXiv:1602.01783v2 [cs.LG], 2016.

5:  Michel AH. Counter-Drone Systems. Center for the Study of the Drone at Bard College,, 2018.

KEYWORDS: Artificial Intelligence, Simulated Environments, UAS, C-UAS, Drones 

US Flag An Official Website of the United States Government