You are here

Data Extractor for Event Pattern Archiving

Description:

TECHNOLOGY AREA(S): Info Systems, Human Systems 

OBJECTIVE: Develop technology to auto-extract data relevant to significant events and to archive patterns. The Science and Technology (S&T) challenge is to create intelligent algorithms for building event context awareness to associate areas, timelines and data sources of relevance to preserve essential data collection in a form for efficient recall and analytics. 

DESCRIPTION: The military reports events such as insurgent attacks or terrorist bombings in messages. There is need to improve archiving of events to better recognize recurring or analogous threats and to aid forensic study. Attacks can take place in different contexts yet share many features. It is important to record an event description, spatial-temporal pattern, and associated relevant data. Once recorded, it needs to be archived in a condensed form that preserves entities, relationships and context such as in a graph structure or embedding space. Since large attacks can be rare ("Black Swans") it is necessary to not rely solely on machine learning or statistics methods but also to employ algorithms based on artificial intelligence that use reasoning to build context awareness. For example, the 9/11 attack used planes, the planes had common route properties, passengers had associates, etc. Today, Naval Intelligence Surveillance and Reconnaissance (ISR) systems collect large amounts of data. Data is typically kept for a period of time and then discarded. Motivation for this research topic is the desire to capture important information before it is lost. To achieve this goal, in a cost effective manner, automation is required. Future systems will be based on cloud computing enterprises that have access to tactical sensor data, both semi-structured reports and unstructured documents. Data science offers potential solutions. The fundamental questions that need to be answered for this research and development topic are What data is important to keep?, and How can important data be best preserved? Work on this topic can be best bounded by the target customer data and applications. Background studies provide a useful starting point for this topic. There has been steady advancement in knowledge discovery and data mining. Surveys of methods and tools provide insight into resources available and potential system designs [1, 2]. There are common steps for acquiring knowledge and a recognized need for process iteration to optimize systems. A great deal of time is consumed in data preparation and this can be reduced by use of open standards for data sharing. Experience in extracting content from text shows the value of external machine readable dictionaries and semantic relationship resources [3]. A survey of methods used for contrast set mining, emerging pattern mining and subgroup discovery provides insight into pattern discovery and visualization [4]. Patterns abstraction and constraints are useful for dealing with disparate data and in data reduction [5]. Pattern forms include database schemas, relational views and classification hierarchies. Selection of temporal sequences and granularity of data are important for describing events [6]. S&T for this topic to consider are as follows: 1) Novel search methods to locate, review and collect data sources; 2) Data mining algorithms to reduce content by association to event attributes (e.g. by clustering, regression and rules); 3) Use of common reference databases to ground data content used such as geographic places, entity names, and concept semantics; 4) Means to assess data patterns for rate of occurrence and generalization for predictive value; 5) Best means to store event patterns in a form that is descriptive and has essential data in forms accessible by data mining tools such as CSV, JSON or others. Use of open standards for relationship building (ontology) and graphs structures is encouraged. Demonstration of capabilities for this topic to consider are as follows: 1) Starting with customer events/activities of interest, identify entities, relationship and context relevant for analysis; 2) Show a means to process data sources and extract content from structured and/or unstructured data sources; 3) Show an efficient means to identify and store relevant patterns; 4) Validate machine algorithms used for uncovering patterns that are rational and human understandable; 5) Show methods to apply patterns in a naval cloud computing environment to trigger user defined alerts (domain specific); and 6) Enable operator participation for refinement of process and visualizations of patterns in a means that is user instructive. 

PHASE I: Determine feasibility for the development of Data Extractor for Event Pattern Archiving. Identify an application for pattern archiving of value to government or commercial markets. For this application, provide a method for extracting relevant entities, relationship, and content associations. Show a means to construct a pattern based on open sources/ standards. Provide product description, potential customers and demonstrate capability feasibility. During the Phase I effort, performers are expected to identify metrics to validate performance of analytic process with the goal of reducing technical risk associated with building a working prototype, should work progress. Performers should produce Phase II plans with a technology roadmap and milestones. 

PHASE II: Produce a prototype system based on the preliminary design from Phase I. The prototype should enable users to infer information not overtly evident in the data and provide measures of effectiveness. In Phase II, the small business may be given data by the Government to validate capabilities. An offeror should assume that the prototype system will need to run as a distributed application in cloud architecture that could scale to millions of nodes and billions of edges and have matured a design for a responsive human computer interface. Phase II deliverables will include a working prototype of the system, software documentation including a users manual, and a demonstration using operational data or accurate surrogates of operational data. 

PHASE III: Produce a final design system capable of deployment. The system should be adapted to transition as a component to a larger system or as standalone commercial product. The small business should provide a means for performance evaluation with metrics for analysis (e.g. precision and recall) and method for operator assessment of product interactions (e.g. display visualizations). The Phase III system should have an intuitive human computer interface, providing operator engagement but not work overload. The software and hardware should be modified and documented in accordance with guidelines provided by engaged multi-intelligence and command and control programs of record. Researchers are encouraged to publish S&T contributions. Private Sector Commercial Potential: Internet search engines would benefit from the maturation of data retrieval based on embedded space showing relationships of content. Currently, information retrieval is limited to word searches with some support to graph searches. Information retrieval based on second or higher order association (degrees of separation) would transform content delivery. 

REFERENCES: 

1. Lukasz Kurgan and Petr Musilek, A survey of Knowledge Discovery and Data Mining process models, The Knowledge Engineering Review, 21:1, 2006.

2. Sreenivas Sukumar, Open Research Challenges with Big Data - A Data-Scientists Perspective, IEEE International Conference on Big Data (Big Data), pp 1272-8, 2015. DOI: 10.1109/BigData.2015.7363882.

3. Alain Auger and Caroline Barniere, Pattern-based Approaches to Semantic Relation Extraction “ A state-of-the art. Terminology 14:1, 2008. http://nparc.cisti-icist.nrc-cnrc.gc.ca/eng/view/accepted/?id=3b37c957-2b29-47bd-9786-3bfc0669a8dd

4. Petra Novak, et. Al., Supervised Descriptive Rule Discovery: A Unifying s=Survey of Contrast Set, Emerging Pattern and Subgroup Mining, Journal of Machine Learning 10, 2009.

5. Andreia Silva, et. Al., Constrained pattern mining in the new era Knowledge and Information Systems, 47:3, (2016). DOI: 10.1007/s10115-015-0860-5.

6. Chuanren Liu, et. al, Sequential Pattern Analysis with Right Granularity, IEEE International Conference on Data Mining Workshop (ICDMW), 2014. DOI: 10.1109/ICDMW.2014.164. -

 

KEYWORDS: Knowledge Discovery, Data Mining, Data Science, Machine Learning, Pattern Analysis, Cloud Computing 

US Flag An Official Website of the United States Government