You are here
Clinical Text Automatic De-Identification to Support Large Scale Data Reuse and Sharing
Phone: (843) 792-0015
Email: meystre@musc.edu
Phone: (801) 784-8184
Email: smeystre@clinacuity.com
Address:
Type: Nonprofit College or University
The adoption of Electronic Health Record (EHR) systems is growing at a fast pace in the U.S., and this
growth results in very large quantities of patient clinical data becoming available in electronic format with
tremendous potential but an equally large concern for patient confidentiality breaches. Secondary use of
clinical data is essential to fulfill the potential for high quality healthcare, improved healthcare management,
and effective clinical research. NIH expects that larger research projects share their research data in a way
that protects the confidentiality of research subjects. De-identification of patient data has been proposed as a
solution to both facilitate secondary use of clinical data and protect patient data confidentiality. The majority of
clinical data found in the EHR is represented as narrative text clinical notes, and de-identification of clinical
text is a tedious and costly manual endeavor. Automated approaches based on Natural Language
Processing have been implemented and evaluated, allowing for higher accuracy and much faster de-
identification than manual approaches.
Clinacuity, Inc. proposes to advance a text de-identification system from a prototype to an accurate,
adaptable, and robust system, integrated into the research infrastructure at our implementation and testing
site (Medical University of South Carolina, Charleston, SC), and ready for commercialization efforts. To
accomplish this undertaking, we will focus on the following specific aims and related objectives, while
continuing to prepare the commercialization of the integrated system, with detailed market analysis,
commercial roadmap development, and modern media communication: 1) Enhance the text de-identification
system performance, scalability, and quality to produce an enterprise-grade solution ready for deployment; 2)
Enable use of structured data for enhanced text de-identification (when structured PII is available) and for
complete patient records de-identification (i.e., records combining structured and unstructured data). This aim
also includes implementing “one-way” pseudo-identifier cryptographic hashing to enable securely linking
already de-identified patient records; 3) Integrate the text de-identification system with a research data
capture and management system. This includes implementation of the de-identification system as a secure
web service, with standards-based access and integration.
This de-identification system has potential commercial applications in clinical research and in healthcare
settings. It will improve access to richer, more detailed, and more accurate clinical data (in clinical text) for
clinical researchers. It will ease research data sharing (as expected for larger NIH-funded research projects)
and help healthcare organizations protect patient data confidentiality. Significant time-savings will also be
offered, with a process at least 200-1000 times faster than manual de-identification.The adoption of Electronic Health Record systems is growing at a fast pace in the U.S., and this growth
results in very large quantities of patient clinical data becoming available in electronic format, with
tremendous potential, but also equally growing concern for patient confidentiality breaches. De-identification
of patient data has been proposed as a solution to both facilitate secondary uses of clinical data and protect
patient data confidentiality. This project will advance a text de-identification system from a prototype to an
accurate, adaptable and robust system allowing for complete patient records de-identification, integrated in
the research infrastructure at our implementation and testing site and ready for commercialization efforts. It
will improve access to richer, more detailed, and more accurate clinical data for clinical researchers, ease
research data sharing and help healthcare organizations protect patient data confidentiality.
!
* Information listed above is at the time of submission. *