You are here

Explainable Reinforcement Learning (XRL) for Command and Control (C2)




The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.


OBJECTIVE: The objective of this topic is to develop an effective (SBIR Phase II) prototype to enable practical application (s) of Reinforcement Learning (RL) to be explained for interpretability (i.e., generating explanations that are intuitive and understandable to humans), trust (i.e., to verify an agent’s behavior), performance-explanation trade-off (i.e., strike a balance between the performance of the RL agent and the quality of explanations it provides), accountability and safety (i.e., RL agents to be held accountable for their actions to be able to identify and rectify potential risks/errors in agent’s behavior) and finally, human-AI collaboration (i.e., collaboration by effective communication and collaboration). This topic undertakes the operational imperatives as follows: • Operational Imperatives o II - Achieving Operationally Optimized Advanced Battle Management Systems (ABMS) / Air Force Joint All-Domain Command & Control (AF JADC2) o V - Defining optimized resilient basing, sustainment, and communications in a contested environment


DESCRIPTION: RL represents a groundbreaking technology with the ability to perform long-term decision-making in complex and dynamic domains at a level surpassing human capabilities [1]. Leveraging this capability holds immense strategic significance for the United States Department of Defense (DoD), given that RL-enabled systems have the potential to outperform even the most exceptional human minds in a wide range of tasks [2]. Its adoption in high-risk real-world domains like military applications has been limited due to the challenges associated with explaining RL agent decisions and establishing user trust in these agents, despite remarkable improvements. For instance, while the AI AlphaStar competes against highly skilled StarCraft 2 players, comprehending its inner workings necessitates extensive and impractical empirical investigations [3]. This substantial and inhibitory constraint arises because current Explainable Reinforcement Learning (XRL) methods inadequately address the fact that autonomous decision-making agents can alter future data observations through their actions and effectively reason about long-term objectives aligned with the agent's mission. Therefore, it is imperative to develop effective XRL approaches that overcome these limitations to unlock the widespread utilization of RL's capabilities. Therefore, we seek to have proposals that would adhere to effective and efficient models for XRL, which will be used for the US Air Force’s direct operational use.


PHASE I: As this is a Direct-to-Phase-II (D2P2) topic, no Phase I awards will be made as a result of this topic. To qualify for this D2P2 topic, the Government expects the Offeror to demonstrate feasibility by means of a prior “Phase I-type” effort that does not constitute work undertaken as part of a prior SBIR/STTR funding agreement. The Offeror is required to provide detail and documentation in the Direct-to-Phase-II (D2P2)proposal which demonstrates accomplishment of a “Phase I-type” effort where the Offeror demonstrate a case study or prototype of having performed explainable reinforcement learning for any practical applications where they have been able to provide intuitive and understandable explanations to humans based off their AI/ML inference findings to verify an agent behavior.


PHASE II: This phase II topic proposal seeks 6.2 explainable AI/ML solutions using reinforcement learning for command and control applications.  Proposals should include development, installation, integration, demonstration, test and evaluation of the proposed solution prototype system that verifies an agent behavior, provides performance trade-off, trust, quality explanation that ultimately translates into intuitive interpretability for human understanding of how the agent arrived at such decision.


PHASE III DUAL USE APPLICATIONS: Phase III efforts will focus on transitioning the developed technology to a working commercial or warfighter solution. The offeror will identify the transition partners. The technology will meet a minimum of TRL 6 and will be mature and operationally ready. Solution will be configured, tailored, further developed  to  match the customer requirements and specific environment configuration for deployment. A transition plan will be required to be developed and delivered.  Phase III are not competed thus it is the responsibility of the offeror to seek funding opportunities.



  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski and S. Petersen, "Human-level control through deep reinforcement learning," Nature, vol. 518, pp. 529-533, 2015, February;


KEYWORDS: Reinforcement Learning interpretability; Reinforcement Learning explanations;

US Flag An Official Website of the United States Government