You are here

Infrastructure Requirements, Strategy and Architecture to Enable Scalable Scientific Data and Metadata Acquisition and Curation in Support of the Materials Genome Initiative

Description:

As a part of the Materials Genome Initiative, NIST is charged with developing a materials innovation infrastructure. Key aspects of this infrastructure include the real-time acquisition and curation of experimental and simulation data and associated metadata and control of scientific equipment over a network. To accomplish this, NIST needs research and development on the core requirements and on an overall strategy and software architecture that would enable control of diverse and geographically distributed experimental equipment (e.g. SEM, TEM, x-ray diffractometers, dilatometry, differential scanning calorimeters), computational resources (e.g. workstations, clusters, demonstration code), and the automatic capture and curation of their acquired scientific data and associated metadata across a network using backend systems such as the NIST developed Materials Data Curation System and the National Data Service’s Material Data Facility.

There is a need for developing an infrastructure to push results and metadata from instruments into a data curation system/platform. The goal of the project is to discover and document core requirements and develop an overall strategy and software architecture that when implemented will allow for the control of geographically distributed research equipment and computational resources and their integration with scientific informatics backends including the NIST Materials Data Curator and the National Data Service’s Material Data Facility. Both the Materials Data Curator and the Materials Data Facility have REST APIs to facilitate automated data curation. The project will provide documented requirements and develop a specific strategy and software architecture for controlling scientific instruments and computational resources and interfacing them scientific informatics backends in a format amenable to implementation by software engineers.

Phase I activities and expected results:
Discover, validate, and document requirements for a system to enable scientific equipment control and scalable scientific data and metadata acquisition and curation as described in the project goals. Using previously documented requirements, develop and document an overall strategy and then develop and document a software architecture that when implemented will meet the project goals. We believe that a successful architecture would have several key properties: 1) It would be structured in independent layers, the top-most layer would present a high-level user interface to allow unified user access and control, while the lowest layer would provide connectivity to the scientific equipment or computational output, 2) the architecture relies on two public interfaces one for the highest level and the other at the lowest level that would allow the components to interact as a single application, 3) the architecture includes the notion of a default scripting language and provisions for integrated development environments to facilitate customization and extension of a system implementing the architecture in a standardized fashion, 4) the architecture is highly modular and includes the concepts of plugins and a generalized, abstract command set that facilitates interaction with the scientific equipment, 5) the public interfaces and abstract command set are conceived as being language neutral and allow users to control and extend a system implementing the architecture from a large variety of commonly used programming languages including Python, Java, and C++, 6) the architecture will provide for the capture of scientific provenance and system configuration to facilitate in reproducibility, 7) the architecture will support the concept of scientific workflows. We have been largely inspired by the Micro-Manager project (https://www.micro-manager.org/wiki/Micro-Manager%20Project%20Overview) and recommend that awardees review this project.

Phase II activities and expected results:
Develop an extensible infrastructure for the development of APIs to facilitate data curation of materials data from dilatometers, x-ray diffractometers, scanning electron microscopes (e.g. EDS- composition scans, EBSD patterns), transmission electron microscopes, differential scanning calorimeters, and tensile testing machines.

NIST staff familiar with the various instruments (SEM, TEM, optical microscopes, dilatometer, x-ray diffractometer) and simulations may be available to work with awardee to discover the requirements and develop the metadata schemas needed to collect the data. NIST staff responsible for the development of the Materials Data Curation System may be made available to help the awardee understand the architecture and capabilities of the MDCS.

US Flag An Official Website of the United States Government