You are here

Domain-Specific Text Analysis

Description:

TECHNOLOGY AREA(S): Human Systems, Information Systems

OBJECTIVE:

Develop text analysis software that leverages current Natural Language Processing (NLP) algorithms and techniques, (e.g., Bayesian algorithms, word embeddings, recurrent neural networks) for accurately conducting content and sentiment analysis, as well as dictionary development.

DESCRIPTION:

The United Stated Department of Defense (DoD) collects large amounts of text data from their personnel using a variety of different formats including opinion/climate surveys, memoranda, incident reports, standard forms, and transcripts of focus group/sensing sessions. Much of these data are used operationally; however, recent interest in the leveraging of text data to glean insight into personnel trends/behaviors/intentions has prompted a greater degree of research in NLP. Additionally, Topic Modeling and Sentiment Analysis have been explored by various research arms of the DoD; however, two foundational hurdles exist that need to be addressed before they can realistically be applied to the DoD:


First, the varied use of jargon, nomenclature, and acronyms across the DoD and Service Branches must be more comprehensively understood. Additionally, development of a "DoD Dictionary" should enable the fluid use of extant and newly-created jargon, phrases, and sayings used over time.


Second, the emergent nature and rapid innovation of NLP techniques has made bridging the technical gap between DoD analysts and tools difficult. Additionally, the understanding and interpreting of NLP techniques by non-technical leadership is particularly difficult. There currently exists no standard format or package that can be used to analyze and develop visualizations for text data in such a way that accommodates the needs of operational leadership to make decisions regarding personnel policies or actions.

PHASE I:

Expectations for this Phase I feasibility study include, but are not limited to, a white paper detailing software designed to assist the user in:


  • Summarizing key content across a range of sources or in a single document
  • Capturing document-germane sentiment, assessing the tone, intent, and social content
  • Determining the reasons for themed statements
  • Identifying relationships among themes
  • Effectively parsing and combining findings, such as aggregate results by service, occupation, or other demographics. where possible
  • Accommodating the plethora of DoD, Service, and DoD civilian nomenclature, jargon, and acronyms


Design of the user interface may be primarily icon-driven, and should be intuitive and easy to maneuver for those with limited technological experience. At the same time, the program should include accessible syntax using, or derived from, one or more open source programming languages for transparency and customization for more technically-adept users. Efforts should also address how the software could provide hints to users regarding candidate issues/topics to include, along with candidate contexts to consider including in the detailed analysis, based on a preliminary analysis of the text.

PHASE II:

The Phase II effort shall take the white paper solution to development and software pilot and address the following key requirements in implementation:


  1. Accommodating domain-specific terms (words, phrases, sayings) into a comprehensive and flexible dictionary that can be regularly/continuously updated with information regarding the sentiment associated with DoD-specific terms, as well as any incipient or ubiquitous meanings/sentiment associated with otherwise universal words or terms
  2. Maintainable and updatable software solution for conducting NLP text analysis and briefing the results using domain-specific sentiment/understanding, i.e. a GUI or other easily workable "dashboard" for non-technical users to leverage in such a way that they can identify, track, and communicate potential trends and (where possible) forecast areas of concern (i.e., user-identified "hot button" topics) with regard to personnel opinions, attitudes, or contemplated or disclosed behaviors that may require attention by non-technical leadership.

PHASE III:

Examples of Phase III military applications include: A persistently running text-analysis platform capable of automatically identifying emerging patterns or areas of concern in any of the DoD's free-text data collection efforts. These may include, but are not limited to, personnel satisfaction surveys, standard forms, incident reports, and the like. Examples of commercial applications include: A flexible software platform enabling corporate-level analysis of text-data to potentially include opinion/climate surveys, HR forms, or complaint reports to identify emerging trends in personnel attitudes/behaviors.

KEYWORDS: ARTIFICIAL INTELLIGENCE SOFTWARE, NATURAL LANGUAGE PROCESSING SOFTWARE, AUTOMATED TEXT SUMMARIZATION, TEXT ANALYTICS, PREDICTIVE MODELING, CORPUS,WORD RECOGNITION, TOPIC MODELING, CONCEPT DRIFT

References:

https://patents.google.com/patent/US7197449B2/en; https://www.aclweb.org/anthology/W14-6002.pdf

US Flag An Official Website of the United States Government