Generic Automatic Recognition System for Handwritten Arabic-Style Script Documents

Award Information
Agency:
Department of Defense
Branch:
Army
Amount:
$729,973.00
Award Year:
2009
Program:
SBIR
Phase:
Phase II
Contract:
W911QX-09-C-0096
Agency Tracking Number:
A072-057-2621
Solicitation Year:
2007
Solicitation Topic Code:
A07-057
Solicitation Number:
2007.2
Small Business Information
Optimal Synthesis Inc.
95 First Street, Suite 240, Los Altos, CA, 94022
Hubzone Owned:
N
Socially and Economically Disadvantaged:
Y
Woman Owned:
N
Duns:
829385509
Principal Investigator
 Hui-Ling Lu
 Director, Signal Processi
 (650) 559-8585
 vicky@optisyn.com
Business Contact
 P. Menon
Title: President
Phone: (650) 559-8585
Email: menon@optisyn.com
Research Institution
N/A
Abstract
Development of a generic system framework for automatically recognizing handwritten text for non-Arabic languages using Arabic-style script such as Urdu or Pashto is addressed. The goal of the Phase II work is to develop a prototype of the generic handwritten Arabic-style script recognition system useable for screening Urdu documents such as personal letters for key terms and general subject matter. The proposed Phase II SBIR builds on a successful feasibility demonstration of a handwritten Urdu word recognition system carried out during the Phase I SBIR project. The Phase I SBIR project has demonstrated the feasibility of building a generic recognition framework for non-Arabic languages using Arabic-style script based on the Hidden Markov Model (HMM) approach. We developed and evaluated different types of feature extraction methods under the HMM recognition framework. In particular, we have developed the novel Contourlet-based feature extraction algorithm to exploit the cursive nature of Arabic-style scripts. To further enhance the performance of the recognition system, more elaborate feature extraction approaches that integrates the Contourlet feature and Graph-based feature was also developed. The script recognition system was evaluated using a handwritten Urdu database collected during Phase I. Experimental results show that both Contourlet-based feature extraction method and integrated Contourlet- and Graph-based feature extraction methods outperform the state-of-the-art baseline approaches. Based on the successful Phase I feasibility study, Phase II work will develop a prototype system that will be capable of recognizing personal letters written in Urdu. Basic performance of the prototype system will be determined. The prototype system will serve as the baseline system for integrating with the software packages that is being developed under the Army’s Sequoyah Machine Language Translation program.

* information listed above is at the time of submission.

Agency Micro-sites

US Flag An Official Website of the United States Government