You are here

Transfer Learning of Control Policies Pitch Day for Trusted AI


TECH FOCUS AREAS: Autonomy; Artificial Intelligence/Machine Learning TECHNOLOGY AREAS: Information Systems OBJECTIVE: The objective of this topic is to explore the development of explainable decision-making algorithms that address the challenges of deriving explanations of autonomous behavior in decision-making systems. In particular, there is (i) the challenge of handling the fact that autonomous decision-making agents can change future observations of data based on the actions they take and (ii) the challenge of reasoning over long-term objectives of the underlying agent mission. The results of this work may be applied to the development of military recommender systems as part of a Phase III effort by enabling human-interpretable explanations of behavior in automated or autonomous planning solutions. This may also find applications in the commercial autonomous driving sector, where high-performing solutions still lead to unfortunate accidents and fatalities for which the derivation of explanations is difficult. In such settings, explainability not only eases understanding of learning outcomes, but can also be used to develop more effective machine learning algorithms. This topic addresses challenges in the DoD technology area of Artificial Intelligence and Machine Learning as outlined in the National Defense Strategy and, more specifically, the focus area of Autonomy as listed in the USD R&E modernization priorities. This topic also addresses challenges in the DoD technology area of Artificial Intelligence and Machine Learning as outlined in the National Defense Strategy and, more specifically, the focus area of Autonomy as listed in the USD R&E modernization priorities. This topic will reach companies that can complete a feasibility study and prototype validated concepts in accelerated Phase II schedules. This topic is specifically aimed at later stage development rather than earlier stage basic science and research. DESCRIPTION: Current simulation environments are too slow for the adoption of state-of-the art machine learning approaches to decision-making, such as those proposed in the areas of reinforcement learning and planning. Recent noteworthy examples include Monte-Carlo Tree Search and actor-critic architectures which have been used to yield superhuman performance in games of precision and perception such as Go and StarCraft 2. These approaches require tens of millions of simulation runs or tens of thousands of years’ worth of simulation data. As a point of reference, Air Force simulation environments such as AFSIM and AWSIM or other complex operational environments execute an individual simulation on the order of minutes and hours, respectively. These runtimes preclude the adoption of such data-hungry methods. However, the tremendous success of transfer learning in image classification and, more recently, natural language processing gives us hope that transferring learned information may be feasible from a surrogate fast simulation environment (e.g. PySC2, Lab2D, RAND’s AFGYM, etc.) to our existing slow simulators (e.g. AFSIM, AWSIM). This remains an open problem in the aforementioned fields of reinforcement learning and planning. To this end, we seek to develop a transfer learning approach that can transfer decision-making information from a surrogate simulation environment to a target environment. This includes the development of similarity metrics by which a user can determine whether transfer between two environments is feasible or useful. Performance in the surrogate and target simulation environments after transferring learned decision-making information from the former should reflect the similarity metrics developed. That is, low similarity should lead to low or unpredictable performance when transferring from the surrogate to the target. Conversely, high similarity should yield high performance on the target. The performance metric is dependent on the environments chosen and can include things like maximizing a reward signal, yielding explainable actions, or establishing robust control policies, among others. While this motivates the foregoing from an Air Force perspective, prospective performers may choose non-military surrogate and target environments in developing their transfer learning approach and transfer similarity metrics. There is no required use of government materials, equipment data, or facilities. PHASE I: Phase I should completely document 1) the AI-driven explainability requirements the proposed solution addresses; 2) the approach to model, quantify and analyze the representation, effectiveness, and efficiency of the explainable decision-making solution; and 3) the feasibility of developing or simulating a prototype architecture. PHASE II: Develop, install, integrate and demonstrate a prototype system determined to be the most feasible solution during the Phase I feasibility study. This demonstration should focus specifically on: 1. Evaluating the proposed solution against the objectives and measurable key results as defined in the Phase I feasibility study. 2. Describing in detail how the solution can be scaled to be adopted widely (i.e. how can it be modified for scale). 3. A clear transition path for the proposed solution that takes into account input from all affected stakeholders including but not limited to: end users, engineering, sustainment, contracting, finance, legal, and cyber security. 4. Specific details about how the solution can integrate with other current and potential future solutions. 5. How the solution can be sustainable (i.e. supportability). 6. Clearly identify other specific DoD or governmental customers who want to use the solution. PHASE III DUAL USE APPLICATIONS: The contractor will pursue commercialization of the various technologies developed in Phase II for transitioning expanded mission capability to a broad range of potential government and civilian users and alternate mission applications. Direct access with end users and government customers will be provided with opportunities to receive Phase III awards for providing the government additional research & development, or direct procurement of products and services developed in coordination with the program. PROPOSAL PREPARATION AND EVALUATION: Please follow the Air Force-specific Direct to Phase II instructions under the Department of Defense 21.2 SBIR Broad Agency Announcement when preparing proposals. Proposals under this topic will have a maximum value of $1,500,000 SBIR funding and a maximum performance period of 18 months, including 15 months technical performance and three months for reporting. Phase II proposals will be evaluated using a two-step process. After proposal receipt, an initial evaluation will be conducted IAW the criteria DoD 21.2 SBIR BAA, Sections 6.0 and 7.4. Based on the results of that evaluation, Selectable companies will be provided an opportunity to participate in the Air Force Trusted AI Pitch Day, tentatively scheduled for 26-30 July 2021 (possibly virtual). Companies’ pitches will be evaluated using the initial proposal evaluation criteria. Selectees will be notified after the event via email. Companies must participate in the pitch event to be considered for award. REFERENCES: 1. Da Silva, Felipe Leno, and Anna Helena Reali Costa. "A survey on transfer learning for multiagent reinforcement learning systems." Journal of Artificial Intelligence Research 64 (2019): 645-703. 2. Gamrian, Shani, and Yoav Goldberg. "Transfer learning for related reinforcement learning tasks via image-to-image translation." International Conference on Machine Learning. PMLR, 2019. 3. Hanlon, Nicholas, et al. "AFSIM Implementation and Simulation of the Active Target Defense Differential Game." 2018 AIAA Guidance, Navigation, and Control Conference. 2018.
US Flag An Official Website of the United States Government