You are here

Large-scale Entity Linking and Disambiguation with DeepDive

Award Information
Agency: Department of Defense
Branch: Navy
Contract: N00014-16-P-2049
Agency Tracking Number: N16A-016-0152
Amount: $79,870.00
Phase: Phase I
Program: STTR
Solicitation Topic Code: N16A-T016
Solicitation Number: 2016.0
Timeline
Solicitation Year: 2016
Award Year: 2016
Award Start Date (Proposal Award Date): 2016-07-06
Award End Date (Contract End Date): 2017-02-06
Small Business Information
460 California Ave
Palo Alto, CA 94306
United States
DUNS: 79865638
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Michael Cafarella
 (206) 257-9657
 michael.cafarella@lattice.io
Business Contact
 John Redgrave
Phone: (847) 436-4044
Email: redgrave@lattice.io
Research Institution
 Stanford University
 Christopher Re
 
1700 Lomas Blvd. NE, Ste 2200 353 Serra Mall
Albuquerque, NM 87131
United States

 Nonprofit College or University
Abstract

DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data --- scientific papers, Web classified ads, customer service notes, and so on --- were instead in a relational database, it would give analysts access to a massive and highly-valuable new set of ``big data'' to exploit. In this proposal, we will describe our plan to enhance the data (as well as the extractions) by linking and disambiguating textual mentions (noun phrases) to their real-world entities, which enables analysis --- never before possible --- with much richer knowledge extracted from text. The main technical challenges are 1) how to efficiently disambiguate an entity mention to one of millions of entities in a typical knowledge base (e.g., Wikipedia); 2) how to resolve ambiguity if the real-world entity is absent from the input knowledge bases; 3) how to effectively leverage contextual information to make accurate link predictions. We will present designs of entity linking and resolution systems to resolve these issues.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government