Information Integration of Heterogeneous Data Sources

INFOTECH SOFT, INC., 1201 Brickell Ave, MIAMI, FL, 33131
 (305) 371-5111
Phone: (305) 371-5111
DESCRIPTION (provided by applicant): The wealth of biological and biomedical data constantly being generated promises dramatic advancement in the life sciences. To realize this promise, this pool of rapidly expanding information needs to be efficiently int egrated, that is, combined in such a way that it can be queried to extract relevant data that can be subsequently analyzed to answer meaningful research questions. The main objective of this proposal is to develop the GeneTegra System, an information integ ration solution that provides a common interaction environment to query data and knowledge from multiple sources. Two main obstacles have to be overcome in order to attain an effective integration of knowledge from different data sources: syntactic heterog eneity, where data sources have different representation and access mechanisms; and semantic variability, where similar lexical terms may refer to multiple concepts or dissimilar terms refer to the same concept. The GeneTegra System addresses these obstacl es through the use of Semantic Web technologies: ontologies constructed using the Web Ontology Language (OWL) as a common data and knowledge representation for data sources of diverse formats, automated mechanisms for the generation and maintenance of thes e ontology representations, and a robust system architecture based on reusable, service-oriented mediators. The core of the proposed system consists of general algorithms, procedures, and mechanisms developed during Phase I of this project, that enable the automatic generation of ontologies, the automated identification of semantic correspondences between ontology models, and the creation and execution of queries over these ontology- modeled, distributed, heterogeneous sources. In Phase II, the GeneTegra Sy stem will be developed, implemented, and tested as a human-centered solution building on the core components developed during Phase I, incorporating a highly usable interface for query creation and execution, a mechanism for registration, sharing, and re-u se of information using Web Services standards, a mechanism for determining quality of data and query reliability, and a security and privacy subsystem that allows the construction of collaborative communities while ensuring that users are properly authent icated and authorized to access information through the system. The GeneTegra System will be designed and evaluated to specifically address the integration of sources relevant to investigations of genotype-phenotype associations and to the identification o f genes responsible for human diseases and conditions. PUBLIC HEALTH RELEVANCE The GeneTegra System is an information integration solution that provides a common interaction environment to query data and knowledge from multiple heterogeneous sources. It us es ontologies as the base formulism for semantic and syntactic modeling, and contains automated mechanisms for the generation of these ontologies, and for the reuse and sharing of integration configurations. It is specifically designed to address the integ rated querying of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions.

