You are here

Real-time Characterization of Variable-rate Streaming Data

Description:

OBJECTIVE: Develop methods and tools for the characterization, underlying structure, trends, and events in streaming data sets in order to aid analysts in discovery and understanding. Methods and tools, applicable over a broad range of bandwidth of streaming data -- kbps up to beyond 100 gbps -- will leverage established principles of statistical analysis, visualization, and cognitive science. DESCRIPTION: Streaming data are not stored either due to cost of storage at very high data rates or they are stored with delayed processing that is of lesser operational value. Increasing data generation and collection make a streaming data model inevitable for some streams -- the question becomes more about the data-rate of the stream and the class of computational operations that are applicable. Current techniques for streaming data analysis use ad-hoc sampling and data decimation techniques, leaving the overwhelming bulk of the collected data unexamined and its value lost. Tasks of streaming data analysis include trend analysis, event detection, and discovery of underlying structures. Human cognitive abilities and the visual system are ideally suited to do these tasks. Data visualization techniques leverage the human visual system to organize and structure data using visual primitives (e.g. shape, color, intensity, size, position, etc.) to encode massive amounts of data and reveal relationships, anomalies, correlations, and associated uncertainties. However, current data visualization techniques, like much current analytical processes, rely on post-processing of stored data. Therefore, a new approach is required that enables analysis of data as it passes through system memory by converting the data stream, based on its rate/bandwidth, into appropriate visual elements that encode and characterize salient features of the data with real-time visualization processing. That is, the visualization process needs to be resident with the data as it passes through the system, and must be systematically driven by statistical characteristics of the data stream. The visual elements generated in this way should incrementally capture base statistics (e.g., counts, distributions, frequency, etc.) and higher order statistical measures (e.g., autocorrelation functions, probability distribution, time and frequency domain measures, etc.) and, when combined, provide insight into underlying structure, relationships, and trends in the data stream. The design of the visual elements should take into account cognitive abilities and biases in segmentation, grouping (e.g. gestalt measures), chunking, and user expertise and training. Additionally, the visual elements should capture sufficient statistics and structure, so that reconstruction of the data stream is possible to some level of precision. The real-time visualization tools need to be able to be tuned to the available processing resources and data throughput while maintaining analytical utility. The goal of this research topic is the application of established statistical and cognitive principles in the demonstration and development of a real-time system that can generate data visualizations that capture the structures and relationships in data streams (from kbps to and beyond 100 gbps streams, where methods may either be uniform or differ across this bandwidth spectrum). Visual primitives that leverage human visual processing will need to be defined based on cognitive principles of streaming information. Algorithmic, statistical, or rule-based definition of the combination of visual and analytical primitives is desirable. The system should be modifiable on-the-fly by human operators to handle new salient features or to highlight discovered correlations. Streaming data may be open source, purchased, or synthetically generated. The techniques should be broadly applicable. PHASE I: Task 1: Develop an approach for incrementally encoding statistical measures in visual elements. The visual elements should be able to be combined into complex visualizations that leverage human cognitive abilities for pattern recognition and correlation. In-situ visualization run-time code should be tunable to differences in system configurations (single vs. multi-core) and data bandwidth. Task 2: Develop an approach for the application and combination of the visual elements (from task 1). Task 3: Develop an architecture and conceptual design for the implementation of a dynamic system based on the elements and principles developed in tasks 1 and 2. Task 4: Implement a minimal proof-of-concept real-time system that processes some set of representative data and generates visualizations constructed from visual elements from tasks 1-2. Phase I deliverables should include a Final Phase I report that includes: (1) a detailed description of the approach (or algorithm) for applying statistical and cognitive principles to a specific data set; (2) a detailed system architecture and design; (3) code and a demonstration of the approach using the proof-of-concept system. PHASE II: Develop, demonstrate, and validate a proof of concept design of the real-time visualization generation tool. The required deliverable for Phase II will include: the full prototype system, demonstration and testing of the prototype system with high bandwidth data streams (order of Gbps), and a Final Report. The Final Report will include (1) a detailed design of the prototype tool, (2) the experimental results from the tool, and (3) a plan for Phase III. PHASE III: Phase III will consist of the delivery of systems to analysts in DoD and/or commercial operational settings. Within DoD and the intelligence community, real-time visualization tools for variable data rates are generically applicable across a broad array of analysis applied to multi-int data. It is anticipated that the final product will handle multiple data types such as structured and unstructured data, imagery, and video with different characteristics such as noise and reliability acquired through sensors. In commercial space also, streaming data is bourgeoning with wide variations across receiving devices, from handhelds to cloud computing. In Phase III, the commercial opportunity is to provide principled and effective visualization technology for this growing market. REFERENCES: 1. http://people.hofstra.edu/geotrans/eng/ch8en/conc8en/fuel_consumption_containerships.html 2. Ware, C., Purchase, H., Colpoys, L., McGill, M.,"Cognitive Measurements of Graph Aesthetics", Information Visualization June 2002 vol. 1 no. 2, pp. 103-110. 3. Agrawala, M., Li, W., Berthouzoz, F.,"Design Principles for Visual Communication", Communications of the ACM, April 2011, 54 (4), pp. 60-69. 4. Mackinlay, J. 1986."Automating the Design of Graphical Presentations of Relational Information", ACM Transactions on Graphics, 5(2), 110-141. 5. Roth, S. F., and Mattis J. 1990."Data Characterization for Intelligent Graphics Presentation", Proc. SIGCHI'90, Seattle, WA, ACM, 193-200. 6. Kosslyn, S. M., 2006,"Graph Design for the Eye and Mind", Oxford University Press. 7. Tufte, E. R.,"Visual Explanations: Images and Quantities, Evidence and Narrative", February, 1997 8. http://en.wikipedia.org/wiki/Sailing_faster_than_the_wind
US Flag An Official Website of the United States Government