You are here

Algorithm Performance Evaluation with Low Sample Size


TECHNOLOGY AREA(S): Information Systems


Develop novel techniques and metrics for evaluating machine learning -based computer vision algorithms with few examples of labeled overhead imagery.


The National Geospatial Intelligence Agency (NGA) produces timely, accurate and actionable geospatial intelligence (GEOINT) to support U.S. national security. To exploit the growing volume and diversity of data, NGA is seeking a solution to evaluate the performance of a class of algorithms for which there are a limited quantities of training data and evaluation data samples. This is important because statistical significance of the evaluation results is directly tied to the size of the evaluation dataset. While significant effort has been put forth to train algorithms with low sample sizes of labelled data [1-2], open questions remain for the best representative evaluation techniques under the same constraint.

Of specific interest to this solicitation are innovative approaches to rapid evaluation of computer vision algorithms at scale, using small quantities of labelled data samples, and promoting extrapolation to larger data populations. The central challenge to be addressed is the evaluation of performance with the proper range and dimension of data characteristics, when the labeled data represents a small portion of the potential operating conditions. An example is when performance must be evaluated as a function of different lighting conditions, but most of the labelled data was collected under full sun.

The study will be based on panchromatic electro-optical (EO) imagery using a subset (selected by the STTR participants) of the xView detection dataset, although extension to other sensing modalities is encouraged. Solutions with a mathematical basis are desired.


Develop and demonstrate methods and metrics to evaluate machine learning -based computer vision algorithm performance with low sample sizes of labeled EO imagery. The characteristics of the selected data subset should include variation across at least two operating conditions, such as (for example) geographic diversity and object size . Offerors should state those characteristics that will vary in their selected dataset. Offerors should detail anticipated challenges associated with this problem, and how to address those challenges, together with methods to provide uncertainty estimates for assessment results. Phase I will result in proof-of-concept performance assessment on the selected dataset. Phase I will deliver all data collected or curated, and a final report that contains: description of technical approach, assessment results, and identify methods to extend to different data sources and conditions.


Develop refinements to address identified deficiencies from Phase I. Extend Phase I capabilities through application to video, infrared, or multi-spectral sensor imagery, and demonstrate against an operational dataset for both EO panchromatic imagery and the additional sensing type(s). Extend the Phase I dataset to include more sparsely represented data characteristics, which also include additional variation. Deliverables include assessment results and code.


Virtually all domains face an issue of lack of labeled data so better prediction and understanding the likely performance and potential range of that performance, given few examples for empirical performance evaluation, will have wide ranging military and commercial applications. Military applications include assessing algorithms for automated tracking, search and rescue, and hazardous target detection; commercial applications also include tracking, search and rescue, and agriculture.

KEYWORDS: Performance Evaluation; Algorithm Assessment; Low Sample Size; Machine Learning; Deep Learning; Few Shot Learning; Unsupervised Learning


1. L. Fei-Fei, R. Fergus and P. Perona. "One-Shot learning of object categories." IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594 - 611, 2006.

2. W. Wang, et al. "A Survey of Zero-Shot Learning: Settings, Methods, and Applications." ACM Transactions on Intelligent Systems and Technology, 10(2), article 13, 2019.

US Flag An Official Website of the United States Government