Flexible NLP system for MEDLINE information extraction

Award Information
Agency:
Department of Health and Human Services
Branch
n/a
Amount:
$100,000.00
Award Year:
2003
Program:
SBIR
Phase:
Phase I
Contract:
1R43GM067276-01A1
Award Id:
66515
Agency Tracking Number:
GM067276
Solicitation Year:
n/a
Solicitation Topic Code:
n/a
Solicitation Number:
n/a
Small Business Information
ARIADNE GENOMICS, INC., 9700 GREAT SENECA HWY, ROCKVILLE, MD, 20850
Hubzone Owned:
N
Minority Owned:
N
Woman Owned:
N
Duns:
n/a
Principal Investigator:
NIKOLAIDARASELIA
(240) 453-6296
NIKOLAI@ARIADNEGENOMICS.COM
Business Contact:
ILYAMAZO
(240) 453-6296
MAZOILYA@ARIADNEGENOMICS.COM
Research Institute:
n/a
Abstract
DESCRIPTION (provided by applicant): This Small Business Innovation and Research Phase I project focuses on the development of the fully automatic system for extraction of the protein function information from MEDLINE abstracts and conversion it into a form of a conceptual graph. All existent protein function databases depend on human experts who cannot keep up with the exponential growth of protein function information freely available in MEDLINE. There is an urgent need for an automatic system capable of extracting protein function information from literature. The system we proposed will be based on advanced natural language processing (NLP) technologies, and uses it as a fast and reliable way to extract information about protein function from human readable sources. To this end, we have developed and tested MedScan - a prototype of such system that parses scientific abstracts and converts protein function information into a form of a conceptual graph. It consists of a preprocessor module selecting candidate sentences from MEDLINE, an NLP module utilizing proprietary linguistic model to parse the selected sentences, and an information extraction module utilizing developed ontology to extract and validate protein function information. The results of MedScan evaluation indicate that it is a feasible candidate for a proposed task. In Phase II, the software system will be developed to assist the researchers to quickly access, search and navigate through the MEDLINE content, and to visualize and analyze the large volumes of protein function data. We will also extend our approach to other areas including pharmacogenomics and extraction of clinically relevant information.

* information listed above is at the time of submission.

Agency Micro-sites


SBA logo

Department of Agriculture logo

Department of Commerce logo

Department of Defense logo

Department of Education logo

Department of Energy logo

Department of Health and Human Services logo

Department of Homeland Security logo

Department of Transportation logo

Enviromental Protection Agency logo

National Aeronautics and Space Administration logo

National Science Foundation logo
US Flag An Official Website of the United States Government