Extracting Semantic Knowledge from Clinical Reports

Award Information
Agency: Department of Health and Human Services
Branch: N/A
Contract: 9R44RR024929-02
Agency Tracking Number: LM008974
Amount: $852,683.00
Phase: Phase II
Program: SBIR
Awards Year: 2008
Solicitation Year: 2008
Solicitation Topic Code: N/A
Solicitation Number: PHS2007-2
Small Business Information
DUNS: 144814790
HUBZone Owned: Y
Woman Owned: Y
Socially and Economically Disadvantaged: Y
Principal Investigator
 () -
Business Contact
Phone: (317) 274-4829
Email: pjamieson@logicalsemantics.com
Research Institution
DESCRIPTION (provided by applicant): Analyzing and processing free-text medical reports for data mining and clinical data interchange is one of the most challenging problems in medical informatics, yet it is crucial for continued research advances and impr ovements in clinical care. Natural language processing (NLP) is an important enabling technology, but has been held back because it is difficult to understand human language, since it requires extensive domain knowledge. In Phase I, we developed new statis tical and machine learning methods that apply domain specific knowledge to the semantic analysis of free-text radiology reports. The methods enabled the creation of two new prototype applications - a SNOMED CT (Systematized Nomenclature of Medicine--Clinic al Terms) coding service called SnomedCoder, and a text mining tool for analyzing a large corpus of medical reports, called DataMiner. In Phase II, we will accomplish the following specific aims: 1) Improve the semantic extraction methods developed in Phas e I, 2) Expand the semantic knowledge base and classify at least two million new unique sentences from multiple medical institutions, 3) Provide a SNOMED CT auto coding service (alpha service) to participating Indiana Health Information Exchange hospitals, and 4) Build a commercial version of the DataMiner software, and test its functionality using researchers at the Regenstrief Institute. These scientific innovations will revolutionize the ability of health care researchers to analyze vast reposito ries of clinical information currently locked up in electronic medical records, and correlate this data with new biomedical discoveries in proteonomics and genomics. The ability to codify text rapidly will extend the potential for clinical decision support beyond its narrow base of numeric and structured medical data, and enable SNOMED CT to become a useful coding standard. Phase III will offer coding and data mining services to healthcare payers (both private and government), pharmaceuticals, and academic researchers. A key advantage of our approach over other NLP systems is that we attempt to codify all the information in the report and not just a limited subset, and insist on expert validation which provides a high degree of confidence in the accuracy of the coded data.Project Narrative

* information listed above is at the time of submission.

Agency Micro-sites

US Flag An Official Website of the United States Government