You are here

De-Identification Software Tools and Pipelines for Cancer Imaging Research


Fast-Track proposals will NOT be accepted. Direct-to-Phase II proposal will NOT be accepted Number of Anticipated Awards: 3-5 Phase I: up to $400,000 for up to 9 months Phase II: up to $2,000,000 for up to 2 years PROPOSALS THAT EXCEED THE BUDGET OR PROJECT DURATION LISTED ABOVE MAY NOT BE FUNDED. Summary Imaging data is a core component in the development of the National Cancer Data Ecosystem, important in areas from basic research to diagnostics and surveillance. Sharing of any data collected from patients, however, requires first the removal of Protected Health Information (PHI) and personally identifiable information (PII) which can be used to identify the individual from whom the data were collected. Image de-identification, or anonymization, refers to the removal of PHI /PII from imaging data. In digital pathology PHI can be found in the label slide as well as the file header. In clinical imaging, where images are commonly in the DICOM (Digital Imaging and Communication in Medicine) format, PHI is contained in the header of each image file and at times PHI may be embedded in the image itself. For example, the imaging acquisition software may insert PHI in the image field, or the patient may be wearing identifiable jewelry or a personal tag that may be captured in an image. Both the file header and the image field itself must be examined for information that could link the file to a specific individual. In headers, PHI is found in patient identifier fields, such as patient name, patient number, date of birth, etc., and at times in fields not intended to contain such information. In addition, in certain instances, individuals may be identifiable by PII obtained through 3D reconstruction of the face or body surface from tomographic data such as computed tomography or magnetic resonance imaging (MRI). The complexity of the de-identification problem dictates that a substantial amount of human curation is required to ensure proper and complete removal of PHI from images. The need for extensive human participation in the de-identification process impedes the generation of anonymized image collections suitable for public distribution and sharing, including deposition into components of the National Cancer Data Ecosystem like The Cancer Imaging Archive (TCIA) ( and the Imaging Data Commons of the Cancer Research Data Commons. The goal of this concept is to support the development of software tools that comprehensively de- identify images by removing PHI and PII from image files generated by clinical imaging and/or WSI modalities while retaining metadata relevant to providing interoperability. Project Goals While multiple tools exist to remove protected data from image files, particularly DICOM radiology files (, they may not thoroughly remove PHI from unexpected DICOM fields or from the image field itself. In addition, other image formats such as proprietary WSI files and other microscope image formats also contain PHI. Proper de-identification of patient imaging files requires careful analysis and remediation of two components of those files: the header and the image field. The goal of this contract topic is to support development and sustainment of software tools and pipelines for image de-identification, specifically for images produced by radiologic and pathology imaging modalities. Within that goal, the following objectives should be met: 1) Removal of PHI from expected fields in multiple imaging formats, 2) Scanning for PHI in fields not designed for their insertion, identification and subsequent removal, 3) Scanning of imaging data for PHI and PII, identification, labeling and subsequent resolution, and 4) Produce processed images that meet a threshold level of de-identification. Brute force methods for de-identification (e.g., erasing of all header information) are not acceptable. A successful de-identification algorithm would not simply remove data from all elements, but simultaneously remove PHI while retaining information required for research studies. While fully automated image de-identification tools are desired, the proposed solutions should provide a capability to flag suspicious cases that require human intervention for human-in-the-loop remediation. Furthermore, in order to broaden the community of users and developers, offerors are encouraged to consider leveraging open standards to the degree that is possible and does not prevent from the development of commercial solutions. Moreover, the de-identification algorithms should be vendor agnostic particularly for WSI file type, where each vendor has their proprietary format. In addition, development of cloud-ready solutions is also encouraged. To build upon existing resources in medical image de-identification, , the TCIA de-identification knowledge base ( could serve as a foundation.. The final delivery in each phase would require the vendor providing their de-identification tool to NCI for a final validation. For this purpose, NCI, possibly in collaboration with TCIA or another contractor, will need to run the tool on selected validation datasets that would include PHI in various places in the header and the image field and confirm that the developed tool has successfully de-identified the collection. Offerors must identify the eventual customers for this tool. While NCI may be a potential future customer, this is not assured or certain. Offerors are expected to get their own datasets. NIH or the TCIA will not provide data with PHI to the offeror. The TCIA database has free downloadable imaging and digital pathology collections that provide examples of the final product. In general, NCI encourages the development of deidentification tools for developing imaging or digital pathology databases. This contract is not meant to be a service contract to the NCI to deidentify images for TCIA or NCI. Successful companies must provide a version of the software developed as part of this contract along with a user manual to NCI for user acceptability testing. A user acceptance testing (UAT) report will be provided back to the company. Phase I Activities and Deliverables: • Identify different clinical imaging or WSI file types and the fields that contain PHI (i.e. conduct landscape analysis) • Ability to recognize and open multiple clinical imaging or WSI file formats • Display PHI field variable values • Remove or alter PHI field values • Produce a log of removed and altered PHI and PII parameters • Delivery of tool along with required software documentation and user manual to NCI for acceptance testing/validation study • Include funds in budget ($15K) to present phase I findings and for NCI to complete User Acceptance Testing/validation study. Phase II Activities and Deliverables: • Detect PHI in non-PHI fields (e.g., comment fields that may contain PHI) • Alert user, allow user to edit detected field • Detection of PHI within image • Masking of PHI and PII within image • Masking of PII that may be obtained through 3D reconstruction or other manipulation of the image collection • Generation of de-identified images with provenance of process • Flag and report suspicious cases and allow for human-in-the-loop remediation • Validation with a test data set should demonstrate successful PHI/PII removal from image and image file meta data for ≥95% test files • Include funds in budget ($20K) to present phase I findings and for NCI to complete User Acceptance Testing/validation study. • Statistical analysis of validation testing will be provided to NCI • In the first year of the contract, provide the Program and Contract officers with a letter(s) of commercial interest
US Flag An Official Website of the United States Government