You are here

Automated Imagery Annotation and Segmentation for Military Tactical Objects


TECHNOLOGY AREA(S): Information Systems

OBJECTIVE: Develop and demonstrate a capability to automatically generate image annotation and segmentation data from Full Motion Video (FMV) of complex military tactical objects.

DESCRIPTION: There is a growing need to expedite the manual image annotation and segmentation process that precedes the development of algorithm development for vision-based sensor systems. Annotation (defining regions within an image) and segmentation (labeling pixels within an image) are data prerequisites to the development of computer vison-aided Automatic Target Recognition (ATR) algorithms, Machine Learning (ML), and Artificial Intelligence (AI) capabilities. Prior to the development of algorithms associated with ATR/ML/AI, FMV with new content of interest must be meticulously annotated and segmented by a human-in-the-loop so that the algorithms “understand” the FMV content. This is an extremely expensive, labor intensive task which is recognized as the single greatest bottleneck hindering algorithm development, ML, and AI. This effort will significantly reduce the level-of-effort required to manually annotate and segment tactically relevant information in FMV. Tactical military objects offer unique, additional challenges that commercial annotation and segmentation products do not address. Commercial applications of computer vision-based autonomous systems designed for object detection are focused on autonomous vehicle technology, which emphasizes a totally different application space. For example, most tactical objects are designed to blend into the surrounding environment, void of textual content, objects of interest appear in unexpected location/positions, and dissimilar in appearance to the objects which commercial products tend to focus on (i.e. text, persons, cars). Many advances have occurred in the area of automated annotation and segmentation of FMV for the commercial industry due to requirements of self-driving automobiles. While similarities exist, annotation and segmentation for military tactical objects emphasize a different application space. Although the application space is different, the advances in state of- the-art deep learning models for optical flow computation and semantic segmentation in the commercial sector suggests a strong possibility of success in performing autonomous annotation and segmentation with sufficient accuracy (>95%) for military applications. Typical annotation by an individual varies, but statistical studies indicate an average annotation time of 35 seconds per image for a given annotator. With the use of existing semi-automated tools and various methods, an average time of approximately 7 seconds is achievable with an accuracy of no greater than 70%, which is too low for military applications. The optimal solution must be able to automatically analyze high-resolution FMV of military tactical objects and accurately produce XML metadata files that accurately annotate and segment the object’s tactically relevant “features” which are used by ATR/ML/AIs algorithms operating on similar content of interest. Annotation / segmentation must support algorithms designed to confidently and consistently report attributes such as object classification, identification, and tactically relevant “features” such as the number of wheels, dimensions, track indicators, barrel length, antenna type/configuration, armament, camouflage, and other object attributes discernable by Electro-optical and Infrared imaging sensors. The capability must output XML data products which are consumable in many system architectures. The delivered capability should offer the user options to tailor the focus system’s processing to specific attributes sought by the algorithm developer. It may be acceptable to preload the system with known attributes of the objects within the FMV file and the geospatial environment which the FMV was captured. Prioritized requirements for this capability include: 1) autonomously annotate and segment military tactical objects within FMV files, 2) extract target features from the object which enable ATR/ML/AI development, and 3) minimize the amount of time a person must invest to the pre/post process the FMV.

PHASE I: The research effort shall explore technologies for automated image segmentation and annotation. Investigate and determine the characteristics of the solution that meets the requirements. Using a standard data set (Pascal VOC) of 10,000 images, create a semiautomated solution that meets the requirements: 1) 6 second average annotation time per image; 2) 95 percent average annotation accuracy across entire 10,000 image dataset; 3) resulting annotated images must enable ATR/ML/AI engines to identify “cropped” objects with 5% or less non-object content; 4) segmentation objective must indicate specified target feature 95% of the time that the attributes are resident in any image frame of FMV; 5) output data products in XML format metadata files that accurately annotate and segment the object’s tactically relevant “features” which are used by ATR/ML/AIs algorithms. The primary deliverable is a detailed design and analysis documentation demonstrating a proposed system that meets the requirements and a demonstration of the research including software components, capabilities, and methods to be used to achieve the solution. Develop documentation for a proposal for the solution for phase 2 consideration.

PHASE II: Phase II research should demonstrate the solution required to enable the capability. The focus of the demonstration must be the solution’s ability to achieve the requirements specified in phase 1 using three different standard datasets, each with a minimum of 10,000 images. Additionally, research to design, develop, and integrate a fully automated (no human-in-the-loop) solution to meet the requirements specified in phase 1. Demonstrate the fully automated solution (no human-in-the loop) that meets the requirements using three different standard datasets, each with a minimum of 10,000 images. Deliver 1 semi-automated and 1 fully automated prototype to ARL for testing to validate that the fully automated system is capable of meeting the specified performance, including each of the primary requirements, updated documentation to specify all hardware, software, and firmware subsystems that defines the entire solution. The system must be able to meet all system performance specifications.

PHASE III: Further develop the platform into a fully functional product that can reliably perform fully automated (no human-in-the-loop) image annotation and segmentation, output data in the prescribed format, and provide the user effective options to precondition the system to produce a tailored output. In Phase 3, given 10,000 images from FMV of five different tactical targets, the Phase 3 system must be able to collectively demonstrate the requirements specified in Phase 2, with a repeatability rate of 99% or better when exposed to different FMV image data sets of the same target. Commercial applications include the medical field for accurately screening patients for diseases such as cancer.

KEYWORDS: Image annotation and segmentation, machine vision, ATR, machine learning, artificial intelligence


1: Y. Li; J. Zhang; P. Gao; L. Jiang; M. Chen, “Grab Cut Image Segmentation Based on Image Region”, 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC).; 2: C. Vondrick; D. Patterson; D. Ramanan, “Efficiently Scaling Up Crowdsourced Video Annotation”, International Journal of Computer Vision, June 2012.; 3:, commercially available annotation software tools, accessed March 13, 2018.; 4., extreme clicking for, extreme clicking for efficient object annotation, accessed May 30, 2019.; 5., combining optical flow and semantic segmentation for automated annotation and quality control, accessed June 3, 2019.

US Flag An Official Website of the United States Government