You are here

AI Based Autonomous Agents that Possess Human-like Cognitive Skills in a Real-Time Strategy Game Environment


TECHNOLOGY AREA(S): Info Systems, 

OBJECTIVE: Develop computer algorithm based autonomous artificial agents that operate in a virtual game environment and which possess human-like cognitive skills to learn complex human tasks (i.e., land navigation, strategy, course of action analysis, etc.), and function in varying scales from individual agents, small teams, to large groups that function in a coordinated fashion both cooperatively and in an adversarial manner with other agents. 

DESCRIPTION: Current state-of-the-art artificial agents are performing human-like activities at or above professional level humans in adversarial real time strategy games such as Dota 2 and StarCraft II. These agents have successfully demonstrated a variety of human-like abilities such as learning, adapting, strategizing, and decision making in extremely complex adversarial games. However, current methods for training autonomous agents for 3D simulations involve utilizing game statistics, domain knowledge, and immense agent training time, which limit their applicability to more complex problem domains. Additionally, these approaches fall short in many areas when considering the ease in which humans perform even the most complex cognitive activities. Shortfalls in capabilities include (1) large computational costs associated with deep reinforcement learning reduce the feasibility of training large multi-agent systems, (2) the agents do not possess temporal memory or long term memory to keep, maintain, or improve upon skills, (3) agents cannot perform complex long term planning instead relying on extensive exploration to learn a policy and (4) the lack of planning reduces the capability of the agents to effectively cooperate in multi-agent scenarios. The following are the desired innovative and technical features to achieve the topic objective: a)Function over long time horizons: up to 24 hours, in a large, high-dimensional, continuous observation/action space, with sparse feedback and delayed reward. Current state-of-the-art agents in Dota 2 perform over an average match length of 35 minutes. We are seeking agents that select optimum actions despite delayed rewards (action feedback) over long time horizons. b)Cooperative planning: agents that are able to plan and coordinate policies with other agents to cooperatively complete a task. c)Online learning: agents that are able to learn from immediate experiences without catastrophic forgetting of important learned information. This includes the ability to adjust to changing environment and task circumstances. d)Memory: agents that possess temporal memory. Example: humans can navigate to desired location and easily retrace their return path back to the starting point without photographically memorizing all features along the way. 

PHASE I: Provide a written innovative technical approach beyond state-of-the-art that demonstrates feasibility of an autonomous agent to learn performing complex tasks in OpenAI’s Neural MMO: A Massively Multiagent Game Environment ( ). Technical approaches must demonstrate feasibility of meeting one or more of the above stated technical features (a-d, above). 

PHASE II: Develop algorithms which demonstrate autonomous agents that perform a variety of complex tasks and scale to large sets of teams to be identified in Phase I and with an ability to compete in adversarial games that possess all of the required technical features: agents that function over time horizons of at least twenty four hours, perform cooperative planning, online learning without catastrophic forgetting, and possess temporal memory. These agents must function in OpenAI’s Neural MMO: A Massively Multiagent Game Environment ( ). Agent algorithms must be capable of being trained on a desktop or server class computer with a minimum of a 16 core CPU at 99th percentile performance according to and a minimum of 8X GPUs with 99th percentile performance according to The agent algorithm must achieve a technical maturity of 

PHASE III: Human-like agents that can perform human tasks at expert level or higher can be used for commercial factory automation, self-driving vehicles, and robot navigation. Government applications would include large scale virtual or constructive wargame simulations, cooperative drone swarms, and large-scale military logistics planning and support. 


1. Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, “Emergent Complexity via Multi-Agent Competition”, conference paper at ICLR 2018, arXiv: 1710.03748.

KEYWORDS: Artificial Intelligence, Deep Learning, Reinforcement Learning, Real-time Strategy Games, Meta Learning, And Deep Neuroevolotion 

US Flag An Official Website of the United States Government