You are here

Joint Learning of Text-based Categories

Description:

J9CXQ has the challenge of identifying and extracting evidential information from a complex and ambiguous text. An automated extraction system is being developed that will detect and characterize categories of entities, relations, events, and topics. The extracted information will be stored in a knowledge base that will enable automatically finding patterns and searching for critical information. These detection and extraction algorithms depend on well-formed definitions of the elements (entities, relations, events). They are further disambiguated using context such as the topics found in the documents. These definitions typically expressed as a probability distribution. Since these elements are not known beforehand the algorithms must not only characterize them, but also discover them in the first place. The task of creating these characterizations are too large to do manually, and even when known beforehand, the task of annotating is prohibitive, hence it is necessary to automate the process both to discover the categories and then represent them probabilistically. Traditionally this is done using an unsupervised learning algorithm such as Latent Dirichlet Allocation (LDA). Currently, despite the fact that each of the elements is highly interrelated (i.e. topic, entity, relations and events), each of these are learned independently. What is needed is to learn all of these elements in an interrelated manner. This is because a better characterization of one will improve characterizations of the other. For example, topics are identified in an unsupervised way using entity classes are found by clustering in vector spaces in an unsupervised way, or named entity recognition (e.g. using conditional random fields) in a supervised way. Not only will a joint learning approach improve accuracy, but it will also enable tighter, more specific classes which should make the overall analysis of text much more powerful. J9CXQ is seeking research in the area of knowledge representation and reasoning systems that can support the following combination of requirements: (1) The method should infer classes of entities, relationships, and contextual topics in a joint manner to account for interdependencies. (2) The method should not require extensive annotation, and should be unsupervised or require a minimal amount of human input. (3) None of the classes are predefined, but they are discovered through learning. (4) Readily adapts or transfers to new domains, (5) The ability for an analyst to set the topic or distribution of entities or relations and see the effect on the remaining variables. (6) Ideally, the method will include context information and should not be solely based on a bag-of-words language model. (7) Flexible output that facilitates data analysis and visualization. (8) Given the novelty of the method, it should be well documented both within the source code and auxiliary supportive documentation. PHASE I: From basic research develop and demonstrate proof-of-concept. Research and develop methodologies that should infer classes of entities, relationships, and contextual topics in a joint manner to account for interdependencies. At the conclusion of Phase I, produce a conceptual architecture design identifying necessary hardware and software to create a system and identify technology gaps that must be resolved prior to building a system. Develop a proof-of-concept demonstration to support the architecture design. PHASE II: Build a prototype system that will support testing and evaluation. Develop, demonstrate, and validate a prototype system based on the preliminary design from Phase I. All appropriate engineering testing will be performed, and a critical design review will be performed to finalize the design. The Phase II deliverable will include a working prototype of the software, specification for its development, and demonstration of the eight specified requirements. PHASE III: Integrate into the J9CXQ CWMD Analyst Reasoning Environment to provide a new inference capability over extracted events, entities and relationships. Optimize the prototype system and demonstrate it at the full scale level. This technology will have broad application in military, government, and commercial settings. Within the military and government, there is an increasing emphasis on technologies that aid decision-makers while managing big data. Developing tools that can rapidly integrate information and provide a process for analyzing data to compliment a user’s decision making process will be a powerful addition to strategic, operational, and tactical decision making.
US Flag An Official Website of the United States Government