Context-sensitive Content Extraction and Scene Understanding
Agency / Branch:
DOD / NAVY
Over the last few decades, progress toward automatic video understanding was almost exclusively due to improvements in bottom-up image and video analysis. As was the case with the initial top-down approaches, pure bottom-up approaches have run into fundamental limits. This proposal expands on recent advances in top-down/bottom-up information fusion to exploit syntactic and semantic information inherent in both approaches. Bottom-up information describes scene elements, moving objects and their interactions. Top-down information encapsulates object- and domain-specific information. Top-down information will be represented using stochastic attribute grammars. Grammars, studied mostly in language, are known for their expressive power. Transferring the idea of a grammar from language to vision, ObjectVideo proposes to define a visual vocabulary from pixels, primitives, parts, objects and scenes, and also specify their spatio-temporal or compositional relations. A stochastic bottom-up/top-down inference strategy will use this representation for efficient and accurate content extraction. The detailed representation will be used to automatically annotate imagery specifying its content and context. At the completion of Phase I, ObjectVideo will demonstrate a proof of concept system for syntactic, semantic and conceptual content extraction from maritime and urban imagery.
Small Business Information at Submission:
11600 Sunrise Valley Drive Suite # 290 Reston, VA 20191
Number of Employees: