You are here

AI/ML Data Management Software System for NSRCs

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-SC0022413
Agency Tracking Number: 0000271273
Amount: $1,146,472.00
Phase: Phase II
Program: STTR
Solicitation Topic Code: C53-15a
Solicitation Number: N/A
Solicitation Year: 2023
Award Year: 2023
Award Start Date (Proposal Award Date): 2023-04-03
Award End Date (Contract End Date): 2025-04-02
Small Business Information
520 E Main Street STE 200
Carnegie, PA 15106-2051
United States
DUNS: 016990771
HUBZone Owned: Yes
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Maria Chan
 (630) 252-4811
Business Contact
 Alexander Heit
Phone: (412) 615-4372
Research Institution
 Argonne National Laboratory (ANL)
 Diane Hart
9700 South Cass Ave
Lemont, IL 60439-4803
United States

 Federally Funded R&D Center (FFRDC)

C53-15a-271273Research labs are unable to effectively collaborate and fully utilize microscopy data due to data silos, insufficient data interoperability, and a lack of common data management standards. To address this problem, this innovation will increase the impact of scientific datasets by delivering (1) an extensible software system and (2) microscopy tools which support findability, accessibility, interoperability, and reuse of data in multi-tool, multi-user scientific research facilities. Challenges within microscopy labs are heightened due to complex and powerful microscopy techniques. While localized efforts have addressed disparate data standards, no large-scale, cloud-enabled collaborative tool exists, and the manual annotation of imagery makes data difficult to index, organize, and reference. Few standard data-sharing tools are available, and data-sharing principles in microscopy are severely lacking. Conventional, time-consuming practices remain prevalent, including sharing research findings solely via publication and only upon request. Some researchers have built their own, home-grown solutions. Metadata is often incomplete and data management suffers, as discovered during Phase I interviews conducted with scientists, researchers, and other potential end-users. Working with scientific research facilities, an architecture of the tool, designed to maximize findability of data, was developed. A cloud-deployable web application to store metadata on microscopy datasets was designed, and wireframes for a user interface were approved. Using a machine learning model, a data ingestion pipeline was implemented to minimize collaboration obstacles between researchers. Phase II will focus on a second pipeline component – an automatic tagging machine learning model. The platform has been designed to implement the following key characteristics in the Phase II effort: automated ingestion, applicability across file types, cloud storage and access, automated metadata tagging, data cataloging, automated attribution, user access control, automated curation, and accessibility to previously “dark” data. Improved collaboration and data availability will save time and money, and will also help to validate research results, enabling the combination of data types and the reuse of hard-to-generate data, accelerating ideas for future research, and benefiting data sharers. Transcribing and anonymizing data may take up to one hour per minute fragment, Data documentation, including adding descriptive metadata, may take four hours per experiment and require 60 metadata fields. The proposed innovation could reduce 50% of the time spent documenting data, increasing researcher efficiency, and saving laboratory money. Additionally, to use microscopy data more efficiently, networks should transfer and store data at a speed of at least 1 gigabit per second and provide centralized storage. This technology will provide both centralized storage and software that performs at that speed.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government