You are here

Knowledge-aided Interface for Big Data Streams


OBJECTIVE: Develop an innovative cognitive knowledge-aided interface and supporting information processing techniques to exploit very large data streams over wide areas and autonomously highlight areas of interest for tactical decisions without a priori knowledge of the area and/or location of high value. DESCRIPTION: Big data challenges across Department of Defense (DOD) domains are increasingly problematic for tactical level decision making. Data collections in the open source and military channels are growing at such a staggering rate that it exceeds our ability to store and manage, perform computation and analysis, and maintain data security [1, 2]. Indeed, one author refers to the inability to handle big data as the new"helplessness age"[1], a reaction to the inability of information processing algorithms to rapidly extract key elements of information to aid decision making in time constrained environments. Architectural limitations are a major constraint to discovering knowledge from big data stores that represent complex combinations of many data types [3]. Due to the exponential increase in data, combined with the limitations in processing capability, it is unlikely that Warfighters operating in uncertain and unfamiliar cultural environments will benefit from knowledge discovery capabilities any time soon. To reduce vulnerability and risk for Warfighters from unknown threats, new and innovative approaches are needed for data collection, processing, and user interface designs. Promising approaches in this space include data streams [4], interactive exploration and hypothesis testing of data [5], and temporal segmentation of large text corpora [6]. Addressing data stream computation is recognition that tactical decisions require a very small subset of all data available in military databases and that valuable data may often be separate from the traditional hard sciences approach to persistent collection and quantitative analysis [5]. In data stream processing, data arrives in continuous, high-volume, fast and time-varying streams [4]. Clustering, classification, and association algorithms may be useful for mining data streams, but transferring results over a wireless network with limited bandwidth could prove challenging for tactical units [4]. Interactive exploration and hypothesis testing of data streams could serve to filter large amounts of information for specific tactical knowledge requirements. Frequently, Warfighters don"t know the right questions to ask and have very limited opportunities to explore options for potential outcomes. Bio-inspired applications for interface design and collaboration in a visual domain could improve interface designs. Biological features that might be adapted to interfaces could include autonomy, scalability, adaptability, and robustness [7], each designed to detect data patterns, identify anomalies, and extract knowledge from enormous volumes of data. The key component of achieving success in this particular problem area is to ensure that computer and social scientists work closely together [1] in order to develop sufficiently robust algorithms with greater reliance on reasoning that allow a domain-relevant interpretation of actionable patterns of behavior and meaning for informed decision making [2]. Temporal segmentation of large text corpora may provide a method by which data may be filtered at tactical levels for rapid processing and knowledge extraction. Using text open sources (e.g., newspapers, blogs, Tweets, Facebook posts) would provide Warfighters with near-real time insight into semantic tones of localized text [6]. The potential value of this approach would be to allow users to infer a timeline of factors correlated with ideas identified from analysis of public discussion in text corpora. The challenges with this topic are storage and management of big data, which may contribute to an inability to validate and qualify each data item. Also, careful design of systems is necessary to match user needs and the technologies used for analytics and visual display of information. In addition, accessing very large quantities of semi- or unstructured data is problematic and limited by available storage applications and hardware. Finally, user needs must be supported by computational processes, with these expressed in ways that are consistent with the larger social system in which the user operates. Frequently, user studies concentrate on the micro-system of the individual user and fail to consider the wider range of opportunities, challenges, and constraints. The current topic seeks to address those challenges by focusing on a new and innovative data collection/storage/processing method that can reduce noise in large data while keeping relevant data streams for processing. It also will explore interactive user interface designs that allow temporal segmentation, or other useful algorithms, which should consider bio-inspired applications. Finally, placing the user within the larger social system for developing filtering and visual methods will provide a unitary perspective for knowledge discovery and dissemination. PHASE I: Design an integrated approach for interactive exploration of big data streams that allow users to meaningfully interact with data and apply a variety of algorithmic filters designed to facilitate rapid knowledge extraction. Define requirements for developing and implementing a technique that is noticeably different from current fusion methods and that is useful for large data streams. Define a user scenario that considers a user in a tactical setting and incorporates the larger social system that bounds the knowledge extraction and dissemination process. Provide theoretically based and mathematically sound foundations for proposed approaches that incorporate social and computational science. Requirements definition must include: a description of the model components and the supporting relationships, the computational processing technique that will be used and a description of the integration mechanisms, a determination of the types and characteristics of the metrics that will be captured and used, a detailed discussion of the specific domain to be represented, and a discussion of analysis and assessment techniques to be used. Phase II plans should also be provided, to include key component technological milestones and plans for testing and validation of the proposed system and its components. PHASE II: Produce a prototype system based on the preliminary design from Phase I. All appropriate engineering testing will be performed, and a critical design review will be performed to finalize the design. Phase II deliverables will include a working prototype of the system, specification for its development, and a demonstration and validation of the ability to both accurately represent the model of the soft information fusion and the collaborative visual analytics representation of the data. PHASE III: This technology will have broad application in military, government, and commercial settings. Within the military and government, there is an increasing emphasis on understanding and forecasting group behaviors from social media and online social communities in foreign nations that are potentially hostile to US and Coalition interests. Currently, fusing information from these sources is extremely labor intensive and costly in terms of labor and time. Developing interactive interfaces that can explore dynamic data streams and extract knowledge rapidly will be a powerful addition to tactical decision making.
US Flag An Official Website of the United States Government