You are here
Large-scale Entity Linking and Disambiguation with DeepDive
Phone: (206) 257-9657
Email: michael.cafarella@lattice.io
Phone: (847) 436-4044
Email: redgrave@lattice.io
Contact: Christopher Re
Address:
Type: Nonprofit College or University
DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data --- scientific papers, Web classified ads, customer service notes, and so on --- were instead in a relational database, it would give analysts access to a massive and highly-valuable new set of ``big data'' to exploit. In this proposal, we will describe our plan to enhance the data (as well as the extractions) by linking and disambiguating textual mentions (noun phrases) to their real-world entities, which enables analysis --- never before possible --- with much richer knowledge extracted from text. The main technical challenges are 1) how to efficiently disambiguate an entity mention to one of millions of entities in a typical knowledge base (e.g., Wikipedia); 2) how to resolve ambiguity if the real-world entity is absent from the input knowledge bases; 3) how to effectively leverage contextual information to make accurate link predictions. We will present designs of entity linking and resolution systems to resolve these issues.
* Information listed above is at the time of submission. *