You are here
Self-Supervised Training in Geospatial Applications with a Robust Hierarchical Vision Transformer (STAR)
Phone: (301) 345-8664
Email: kemal.davaslioglu@gmail.com
Phone: (301) 345-8664
Email: cheeks@ut-services.com
Contact: Melissa Allen
Address:
Phone: (301) 345-8664
Type: Domestic Nonprofit Research Organization
Satellite Imagery in Geospatial Intelligence (GEOINT), in conjunction with imagery intelligence (IMINT), geospatial information, and other means of gaining intelligence, has greatly improved the potential of the warfighter and decision makers enabling them to gain a more comprehensive perspective, an in-depth understanding, and a cross-functional awareness of the operational environment. The Artificial Intelligence (AI) and Deep Learning (DL) techniques have been one of the main drivers in improving the GEOINT capabilities by extracting knowledge from geospatial data at scale. However, current DL techniques require training on massive, labeled datasets, which are costly and time-consuming for satellite imagery that generally covers an area of several kilometers square. In addition, many of the DL-based (e.g., object detectors, semantic segmentation) models rely on a backbone using a Convolutional Neural Network (CNN) pretrained on an image classification task to extract visual features from the input image. Although these backbone networks play an important role for the performance of the object detectors, they are pretrained with images taken from ground-level which do not apply well to overhead satellite imagery. The recent advances in computer vision show a modeling shift from the CNN architecture in the backbone to the one using Transformers to achieve the state-of-the-art (SoTA) performance results. However, currently, there is no geospatial computer vision pipeline software solution that allows the development of a model pretraining on satellite imagery data leveraging the benefits of self-supervised learning and hierarchical vision transformers. Such solution would enable prototyping SoTA models for almost all geospatial downstream tasks. Thus, innovative solutions are needed to improve the training of DL-based computer vision algorithms for geospatial tasks using satellite imagery.
* Information listed above is at the time of submission. *