You are here
Open Reproducible Electron Microscopy Data Analysis
Phone: (518) 881-4404
Phone: (518) 881-4413
Electron microscopy (EM) is a powerful technique for understanding structure and composition across various scientific domains at the nano- to atomic-scale. EM is a cornerstone technique for studying structure-property relationships in biology, where the molecular structure is directly correlated to functionality. Advances in technologies mean that one acquires data sets at increasing data rates and sizes. These advancements present enormous opportunities for researchers to understand complex systems. However, processing the resulting large-scale, complex data in a reproducible and shareable way is a real challenge for researchers. The building, managing, and maintaining complex workflows in a reproducible manner requires extensive knowledge in several areas outside the researchers’ core skill set, such as software engineering, data science, and high-performance computing (HPC). We seek to expose user-friendly, advanced algorithms and tools to allow end-users to utilize without expert programming skills. Kitware will work with industry and the Federal Government to create an open source, permissively licensed, high-performance, web-based platform for the experimental data community. The platform will enable reproducible, scalable, shareable pipelines for the analysis and visualization of EM data, focusing on interoperability, enabling the reuse of existing packages and tools developed by the community. First, we will develop interoperable data formats for the major EM modalities. These interoperable formats will allow independent processing modules to work together. Second, we will utilize these formats for composing pipelines of processing modules encapsulated in modern container technologies. We plan to implement processing modules using a language-agnostic interface, freeing the developer to leverage the community’s existing work. These composable pipelines portable execute across a range of computational resources from the desktop to HPC, allowing them to scale along with the data. Finally, the analysis pipelines will be driven through an intuitive web application with capabilities to visualize the pipeline execution results at scale. This project will leverage the DOE and other agencies’ investments to provide a powerful software platform for EM data analysis and visualization. It will create a platform that will remove the steep learning curve of running scalable, reproducible pipelines, freeing researchers to focus on discovery. The community will be able to contribute their advanced algorithms, making them available to a broader audience. The use of permissive open source licensing will facilitate collabo- ration across institutions and spur the development of an ecosystem around the platform with equal access to all. The platform’s open, extensible nature, creating opportunities for software services to help integrate algorithms, deploy the platform, and provide other customization typically offered for open source software platforms.
* Information listed above is at the time of submission. *