You are here

SBIR Phase II: Incorporation of Knowledge Base into Statistical Machine Translation

Award Information
Agency: National Science Foundation
Branch: N/A
Contract: 0548763
Agency Tracking Number: 0441891
Amount: $500,000.00
Phase: Phase II
Program: SBIR
Solicitation Topic Code: IT
Solicitation Number: NSF 04-551
Timeline
Solicitation Year: 2004
Award Year: 2006
Award Start Date (Proposal Award Date): N/A
Award End Date (Contract End Date): N/A
Small Business Information
465 Fairchild Drive
Mountain View, CA 94043
United States
DUNS: N/A
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Yookyung Kim
 Dr
 (650) 864-9900
 kim@sehda.com
Business Contact
 Farzad Ehsani
Title: Mr
Phone: (650) 864-9900
Email: farzad@sehda.com
Research Institution
N/A
Abstract

This Small Business Innovation Research (SBIR) Phase II project embodies an innovative approach to machine translation. The proposed model aims to overcome two important bottlenecks in the development of a high quality statistical machine translation (SMT) system: (1) inability to handle structural problems and (2) dependence on huge amounts of parallel texts. The inability of statistics to sufficiently handle grammatical problems such as word order becomes more evident when the language pair is very different in structure and morphology, such as with English and Korean. The dependence on a huge amount of parallel texts is a great challenge especially to speech translation. Based on successful tests in the Phase I project, this project proposes a method to learn linguistic knowledge crucial to handling word order and non-local dependencies automatically from input and incorporate it into SMT along with simple transformations, maximizing the strength of both knowledge-based approaches and statistical approaches, and minimizing the need for ever-increasing amounts of bilingual data. The proposed approach aims to build a syntactic-phrase-based statistical machine translation engine that not only is more accurate than the existing word-based ones, but also can decrease the need for large data sources. <br> <br>The primary impact of the proposed project is the potential for achieving automatic translation quality as high as the quality of the best knowledge-based machine translation engines; but with a minimum of handcrafting of knowledge and therefore at a much lower cost in terms of development time and human resources. While the research is specifically concerned with MT between English and Korean, the resulting translation models would potentially be usable for translation between any pair of languages. The result of the research will be used to develop a speech translation device, in particular to overcome language barriers in communication with patients in hospitals. It will provide a key technology that will accelerate development of speech translation applications in order to reduce costs of healthcare providers and to enhance the quality of healthcare. Additionally, the proposed method of learning linguistic features will have an impact on many different applications including speech recognition, search engines, genre and topic detection, and document search and query. Finally, the proposed research will have beneficial impacts nationally and globally by helping to solve the 'automatic translation' problem, an area of paramount importance to the economic welfare and security of the United States and the rest of the world.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government