You are here

Deep Learning Enabled FAIR Data Management for Center for Functional Nanomaterials

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-SC0022425
Agency Tracking Number: 0000263239
Amount: $200,000.00
Phase: Phase I
Program: SBIR
Solicitation Topic Code: C53-15a
Solicitation Number: N/A
Timeline
Solicitation Year: 2021
Award Year: 2022
Award Start Date (Proposal Award Date): 2022-02-14
Award End Date (Contract End Date): 2022-12-13
Small Business Information
1500 Stony Brook Road
Stony Brook, NY 11794-2852
United States
DUNS: 080960253
HUBZone Owned: No
Woman Owned: Yes
Socially and Economically Disadvantaged: Yes
Principal Investigator
 Yu Sun
 (631) 507-3608
 yu.sun@sunriseaitech.com
Business Contact
 Yu Sun
Phone: (631) 507-3608
Email: yu.sun@sunriseaitech.com
Research Institution
N/A
Abstract

Modern materials science experiments are pushing towards several frontiers: 1) faster data collection rates, for example the Center for Functional Nanomaterials (CFN) and National Synchrotron Light Source II (NSLS-II) at BNL alone generates data in the range of 500TB/week, 2) larger arrays of detectors with higher pixel counts, 3) multi-modal measurement of elemental, structural, chemical, and physical properties at different lengths and scales of material system, and 4) sophisticated reconstruction of the shape, density, and strain field inside a crystal. These trends, once combined, highlight an enormous challenge concerning the scale of data analytics. It is clear that modern scientific facilities must transition to automated data management and analysis that exploit domain-specific expert knowledge. Science can take advantage of the rapid pace of advancement in machine-learning, especially as this field shifts from traditional, naive black-box statistical methods, towards deep neural network models where multiple tasks are used to define the meaning and semantic content of different portions of a neural network. Machine- Learning methods will only see widespread uptake by the scientific community when they can provide both reliable endpoint performance (high-quality predictions), as well as deep understanding of the internal representations being used by the network. In this proposal, we aim to address the data challenges at Center for Nanomaterial and Department of Energy synchrotron light sources, by developing a multi-task deep autoencoder software to handle the multi-dimensional material datasets that are generated by x-ray beam- lines and/or other material study experiments and enable rapid, productive, and accurate scientific discovery from the unprecedented availability of large-scale heterogeneous scientific images. The power of multi-task learning is that it integrates multiple training signals from the different tasks that a single network is be- ing asked to optimize. In a scientific context, these tasks can be selected in such a way that the network generates a physically meaningful internal representation, i.e., the features of the associated materials and measurement problems. Moreover, the intermediate layers of the multi-task neural network can explicitly yield experimentally useful data representations (latent image embedding) that are agile to encode multiple physics concepts, not bound to a particular task and learning problem, and have great flexibility for a new task and the even unknown physics problem. During the Phase I project, we will apply deep autoencoder algorithms on x-ray scattering image management and analysis to extract feature maps to recognize physically grounded signatures and patterns and demonstrate the potential in classifying and searching science images and automating data indexing, organization, and navigation (Findability). We propose to continue improving and hardening our physics-aware machine learning methods that translate our keen understanding of the underlying physics into a set of correlated tasks (interoperability), which guide and regulate the learning of representations that capture and approximate the corresponding real-world physical processes in material design. The Python-based deep learning software undertake the industry standard for quality assurance and ultimately be packaged as the commercial software-as-a-service (SaaS with Accessibility) in Google clouds for a broad subscription and adoption by science communities and industry users (reusability). The project will bring revolution to the fields of computer vision and medical imaging. When a patient takes X-ray images from one hospital, our software will generate new diagnose images without retaking X-ray when admitted to a different hospital.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government