You are here
Multi-Modal Knowledge Acquisition from Documents
Title: Principal Investigator
Phone: (703) 654-9300
Email: gaggarwal@objectvideo.com
Title: VP, NEW TECHNOLOGY
Phone: (703) 654-9314
Email: pbrewer@objectvideo.com
Contact: Kobus Barnard
Address:
Phone: (520) 621-4632
Type: Nonprofit College or University
Images with associated text are now available in vast quantities, and provide a rich resource for mining for the relationship between visual information and semantics encoded in language. In particular, the quantity of such data means that sophisticated machine learning approaches can be applied to determine effective models for objects, backgrounds, and scenes. Such understanding can then be used to: (1) understand, label, and index images that do not have text; and (2) augment the semantic understanding of images that do have text. This points to great potential power for searching, browsing, and mining documents containing image data. To this end, this STTR effort proposes a pipeline-based framework that focuses on the difficult task of text-image alignment (or correspondence). The proposed pipeline will take images and associated text to reduce correspondence ambiguity in stages. The framework will include both feed-forward and feed-back controls passing partially inferred information from one stage to another, leading to information enrichment and potential to provide inputs towards learning and understanding of novel objects and concepts. Ideas from both stochastic grammar representations and (joint) probabilistic representations will be investigated to facilitate modeling of text-image associations and visual modeling of objects, scenes, etc.
* Information listed above is at the time of submission. *