Description:
Today’s manufacturing systems are able to collect vast amounts of data; however, much of that data is never used unless and until there is a known problem with the equipment. Sometimes the problem will not even be detected until the product is being used in the field, implying that the manufacturing problem may have persisted for several generations of the product. Advances in data visualization, which is a fundamental means of observing data and discovering problems, hold the potential of faster detection of issues and more rapid improvements. However, data visualization still requires considerable effort to easily integrate with the systems generating data [1].
Current approaches (drag-and-drop dashboards, tableaus, etc.) to visualizing smart and sustainable manufacturing enterprises are limited and suffer from many drawbacks. Substantial manual effort from experienced practitioners is required. In some cases, skilled programming is necessary. In other cases, significant visualization expertise is necessary. Understanding large amounts of data, often stored as combinations of relational and non-relational data in a variety of quasi-federated databases or being streamed directly from machines and not well understood by anyone in an enterprise, adds further difficulty. Combining all of these skills in a single person is costly and is likely to remain out of reach, particularly for small manufacturers. (Large manufacturers have similar problems although for different reasons – while visualization teams exist, inordinately larger data sets make visualization harder in other ways.)
Currently, even the best results are inflexible, unable to adapt to in-process schema changes or schema-less databases. This leads to inflexible software that either suffers from “bit rot” as schemas and databases change out from under the visualization software or from the inability to incorporate new data to improve visualizations.
Manufacturing systems pose other unique characteristics for data. For instance, correlations between time and spatial coordinates are one fundamental concept for assessing manufacturing performance. Performance is often plagued by the interaction of variables along multiple dimensions, rather than a two-factor correlation. In response, some solutions focus on prioritizing dimensions or mathematically reducing dimensionality to best fit to practical visualizations. However, such data transformations can lead to a loss in context and information. Other unique characteristics exist. All in all, the manufacturing environment has become data rich but information poor.
The goal is to make available manufacturing visualization software that is more flexible, powerful, and easy to use than existing tools. The project will study fundamental concepts that are of relevance to manufacturing data, develop procedures for automatically applying visualization techniques to those concepts, and provide a natural language-based user interface to allow manufactures to quickly assemble their own visualizations based on their datasets. The solution is expected to make use of accepted and practical visualization principles, such as the proper mapping of visual variables to its target data [2], and apply these principles to create a manufacturing-focused toolset.
Additional features of the toolset may include a natural language-based front-end, user guidance on types of visualizations to apply to a given dataset, and data crawling capabilities. A natural language-based front-end will be a helpful component and, for some users, a superior interface to traditional drag-and-drop techniques. User guidance may come in the form of proffering certain visualization techniques that are recognizably appropriate for a dataset, dissuading the use of visualization techniques that are inappropriate for given data, and explaining visualizations that are not immediately obvious. The tool should offer and suggest appropriate choices to deal with challenging data such as high-dimension data. The software should include an expandable library of plugin visualization components allowing for inclusion of new visualization technologies as they become available. A backend data crawler may adapt to new data as it becomes available within the enterprise, with and even without explicit schemas.
Phase I expected results:
Phase I of this subtopic will demonstrate the feasibility of developing software for visualizations using limited natural language and based on a library of visualizations for manufacturing-specific “big data” (large and varied databases).
Phase II expected results:
Phase II of the project will focus on richer natural language interfaces, techniques to recommend visualizations based on data, and automated assistance at understanding novel visualizations. The end goal of Phase II will be a user interface that accepts natural language as an input and then produces interactive visualizations as an output.
At the end of the project, non-visualization specialists should be able to interact with the system, producing visualizations that are better than Excel, at least as good as those from R, Wolfram, D3JS, etc., but much more quickly and without the development time or skills required by current visualization software.
NIST may be available to work collaboratively with the awardee providing consultation and input on the activities and directions and providing data and scenarios.