You are here

Clinical Text Automatic De-Identification to Support Large Scale Data Reuse and Sharing

Award Information
Agency: Department of Health and Human Services
Branch: National Institutes of Health
Contract: 2R42GM116479-02A1
Agency Tracking Number: R42GM116479
Amount: $1,484,562.00
Phase: Phase II
Program: STTR
Solicitation Topic Code: 400
Solicitation Number: PA18-575
Timeline
Solicitation Year: 2018
Award Year: 2020
Award Start Date (Proposal Award Date): 2020-02-05
Award End Date (Contract End Date): 2022-01-31
Small Business Information
307 W 200 S, SUITE 3004
Salt Lake City, UT 84101-1282
United States
DUNS: 078649023
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 STEPHANE MEYSTRE
 (843) 792-0015
 meystre@musc.edu
Business Contact
 STEPHANE MEYSTRE
Phone: (801) 784-8184
Email: smeystre@clinacuity.com
Research Institution
 MEDICAL UNIVERSITY OF SOUTH CAROLINA
 
1 SOUTH PARK CIRCLE - BUILDING 1SUITE 506
CHARLESTON, SC 29407-4636
United States

 Nonprofit College or University
Abstract

The adoption of Electronic Health Record (EHR) systems is growing at a fast pace in the U.S., and this
growth results in very large quantities of patient clinical data becoming available in electronic format with
tremendous potential but an equally large concern for patient confidentiality breaches. Secondary use of
clinical data is essential to fulfill the potential for high quality healthcare, improved healthcare management,
and effective clinical research. NIH expects that larger research projects share their research data in a way
that protects the confidentiality of research subjects. De-identification of patient data has been proposed as a
solution to both facilitate secondary use of clinical data and protect patient data confidentiality. The majority of
clinical data found in the EHR is represented as narrative text clinical notes, and de-identification of clinical
text is a tedious and costly manual endeavor. Automated approaches based on Natural Language
Processing have been implemented and evaluated, allowing for higher accuracy and much faster de-
identification than manual approaches.
Clinacuity, Inc. proposes to advance a text de-identification system from a prototype to an accurate,
adaptable, and robust system, integrated into the research infrastructure at our implementation and testing
site (Medical University of South Carolina, Charleston, SC), and ready for commercialization efforts. To
accomplish this undertaking, we will focus on the following specific aims and related objectives, while
continuing to prepare the commercialization of the integrated system, with detailed market analysis,
commercial roadmap development, and modern media communication: 1) Enhance the text de-identification
system performance, scalability, and quality to produce an enterprise-grade solution ready for deployment; 2)
Enable use of structured data for enhanced text de-identification (when structured PII is available) and for
complete patient records de-identification (i.e., records combining structured and unstructured data). This aim
also includes implementing “one-way” pseudo-identifier cryptographic hashing to enable securely linking
already de-identified patient records; 3) Integrate the text de-identification system with a research data
capture and management system. This includes implementation of the de-identification system as a secure
web service, with standards-based access and integration.
This de-identification system has potential commercial applications in clinical research and in healthcare
settings. It will improve access to richer, more detailed, and more accurate clinical data (in clinical text) for
clinical researchers. It will ease research data sharing (as expected for larger NIH-funded research projects)
and help healthcare organizations protect patient data confidentiality. Significant time-savings will also be
offered, with a process at least 200-1000 times faster than manual de-identification.The adoption of Electronic Health Record systems is growing at a fast pace in the U.S., and this growth
results in very large quantities of patient clinical data becoming available in electronic format, with
tremendous potential, but also equally growing concern for patient confidentiality breaches. De-identification
of patient data has been proposed as a solution to both facilitate secondary uses of clinical data and protect
patient data confidentiality. This project will advance a text de-identification system from a prototype to an
accurate, adaptable and robust system allowing for complete patient records de-identification, integrated in
the research infrastructure at our implementation and testing site and ready for commercialization efforts. It
will improve access to richer, more detailed, and more accurate clinical data for clinical researchers, ease
research data sharing and help healthcare organizations protect patient data confidentiality.
!

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government