Multi-Modal Knowledge Acquisition from Documents

Award Information
Agency: Department of Defense
Branch: Navy
Contract: N00014-10-M-0296
Agency Tracking Number: N10A-019-0065
Amount: $69,908.00
Phase: Phase I
Program: STTR
Awards Year: 2010
Solicitation Year: 2010
Solicitation Topic Code: N10A-T019
Solicitation Number: 2010.A
Small Business Information
ObjectVideo
11600 Sunrise Valley Drive, Suite # 290, Reston, VA, 20191
DUNS: 038732173
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 Gaurav Aggarwal
 Principal Investigator
 (703) 654-9300
 gaggarwal@objectvideo.com
Business Contact
 PAUL BREWER
Title: VP, NEW TECHNOLOGY
Phone: (703) 654-9314
Email: pbrewer@objectvideo.com
Research Institution
 University of Arizona
 Kobus Barnard
 1040 E. 4th Street
Gould-Simpson Building
Tucson, AZ, 85721
 (520) 621-4632
 Nonprofit college or university
Abstract
Images with associated text are now available in vast quantities, and provide a rich resource for mining for the relationship between visual information and semantics encoded in language. In particular, the quantity of such data means that sophisticated machine learning approaches can be applied to determine effective models for objects, backgrounds, and scenes. Such understanding can then be used to: (1) understand, label, and index images that do not have text; and (2) augment the semantic understanding of images that do have text. This points to great potential power for searching, browsing, and mining documents containing image data. To this end, this STTR effort proposes a pipeline-based framework that focuses on the difficult task of text-image alignment (or correspondence). The proposed pipeline will take images and associated text to reduce correspondence ambiguity in stages. The framework will include both feed-forward and feed-back controls passing partially inferred information from one stage to another, leading to information enrichment and potential to provide inputs towards learning and understanding of novel objects and concepts. Ideas from both stochastic grammar representations and (joint) probabilistic representations will be investigated to facilitate modeling of text-image associations and visual modeling of objects, scenes, etc.

* information listed above is at the time of submission.

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government