You are here

Conflicting, Suspicious, and Inconsistent Information Detection (CSI-Info)

Description:

OBJECTIVE: Investigate and develop techniques to detect and resolve conflicting, inconsistent, suspicious, and deceptive content (i.e., misinformation) present within multiple sources of information. DESCRIPTION: We face an adversary that is adept at using ubiquitous forms of communications, transactions and movements to organize, and execute operations. These activities generate diverse types of information which can include a considerable amount of conflicting, incomplete, incorrect or even deceptive content. Some of this misinformation may be due in part to errors in research, reporting, translation, or transmission. In other cases, incorrectness is due to deliberate attempts to deceive, based on motives ranging from personal to criminal. Better techniques are needed to help reduce the uncertainty in analysis associated with this misinformation. There has been a great deal of progress in the past decade developing new techniques and representations for reasoning under uncertainly. Many approaches have also been explored for modeling suspicious, illegitimate and/or deceptive behavior, including agent-based approaches, graphical models, guilt-by-association, and expert systems. We are seeking methods that leverage semantic and syntactic techniques, pedigree and lineage (provenance) and provide salience (weights) to the association assertions made within and across sources. Bayesian techniques and graphical models have been widely accepted, and there is a great deal recent progress on probabilistic logic, using approaches such as Markov Logic Networks. Similarity scoring methods that can incorporate new evidence with uncertainty and credibility are also needed. Of key interest to this topic will be an ability to discriminate between legitimate and illegitimate information. This can be very difficult when individuals and groups use deception to disguise their behavior. For example, an adversary might create numerous false identities or relationships in the virtual community to mask their true connections or project a false perception enabling them to operate clandestinely. Tactics such as the use of pseudonyms, varied travel patterns, frequent location changes, and indirect methods of communications can make it difficult or impossible for authorities to detect their presence and monitor their activities. Unfortunately, there are few, if any, techniques that the research community has subjected to replicable experiments to address this problem. Regardless of the approach or application, there are common issues. For instance, one common problem is that the data is typically unbalanced and noisy. For the purpose of this topic, information is limited to semi-structured and structured sources such as web pages or stored in databases. Natural Language Processing (NLP) technology dealing with unstructured sources such as open source text should not be considered. It is assumed that entities and relationships present in natural language will have already been extracted using NLP methods. Domain expert involvement and feedback will be critical for validation and improvement. The goal of this SBIR topic is an automated (or semi automated) capability that can examine information associated with entities, events and the relationships that exist between them, identify misinformation, and suggest possible resolutions if they exist. This capability will help reduce uncertainty and lead to more accurate analysis, better situation awareness and enhanced decision making. PHASE I: Research and develop an innovative approach to meet the SBIR Topic requirements, and assess its feasibility. Develop the initial design for a prototype and demonstrate its application. A proof of concept is required to demonstrate feasibility of approach. PHASE II: Develop the required technologies and prototype, per the Phase I design. Develop and demonstrate prototype tools and techniques for monitoring activities and trends of entities in domains of interest for Air Force users using real-world data. A working prototype is required. PHASE III: Disciplines such as Human Intelligence (HUMINT), and Document Exploitation (DOCEX) lack well established methods for detecting and countering misinformation. Business and law enforcement applications could include money laundering, Arms Trafficking, Fraud, cybercrime and identity theft. REFERENCES: 1. Montagu, E. The Man Who Never Was, J. B. Lippincott Company, Philadelphia, PA (1954). 2. W. Winkler. Overview of record linkage and current research directions. Technical report, Statistical Research Division, U. S. Bureau of the Census, 2006. 3. W. W. Cohen, P. Ravikumar, and S. E. Fienberg,"A comparison of string distance metrics for name-matching tasks", In Proc. of IIWEB, pages 73-78, 2003. 4. X. L. Dong, Data Fusion Resolving Data Conflicts in Integration, www2.research.att.com/~lunadong/talks/dataFusion_ndbc.pptx.
US Flag An Official Website of the United States Government