You are here
Bootstrapping Background Knowledge to Arbitrate Data Integrity Issues Within Large Volumes of Data
Title: Principal Investigator
Phone: (206) 545-1478
Email: goan@stottlerhenke.com
Title: Contracts Manager
Phone: (650) 931-2700
Email: maxwell@stottlerhenke.com
As intelligence and sensor data acquisition technologies improve and expand, the difficulties of maintaining data integrity across vast amounts of data continue to plague researchers. Generally considered to be a problem of computational scalability, we also recognize that a much greater challenge lies in developing and maintaining background knowledge that can be used to move beyond traditional data integrity checks, in an effort to identify and resolve more complex inconsistencies. With our proposed system, called Arbiter, we seek to exploit the hidden opportunity posed by very large data sources in three ways: (1) constructing pseudo-genomes for each entity instance to rapidly identify likely matches, leveraging lightweight ontology alignment heuristics to efficiently identify high-confidence alignment opportunities; (2) leveraging data redundancy to autonomously learn the background knowledge necessary to facilitate the detection of complex relational inconsistencies; and (3) validating entity instance matches with a wide range of heuristics in combination with the acquired background knowledge to resolve higher levels of uncertainty. Phase I prototyping will draw on existing software components, allowing rapid progress.
* Information listed above is at the time of submission. *