Dynamic Mining and Contextualization of the Scientific Literature This project creates interactive science articles and collects data metrics accelerating scientific discovery and reproducibility

Award Information
Agency: Department of Health and Human Services
Branch: National Institutes of Health
Contract: 1R43HG009631-01
Agency Tracking Number: R43HG009631
Amount: $211,668.00
Phase: Phase I
Program: SBIR
Solicitation Topic Code: 172
Solicitation Number: PA16-302
Timeline
Solicitation Year: 2016
Award Year: 2017
Award Start Date (Proposal Award Date): 2017-04-01
Award End Date (Contract End Date): 2018-08-22
Small Business Information
651 W 21ST AVE, Eugene, OR, 97405-2418
DUNS: 034449576
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 KAREN YOOK
 (415) 306-4150
 karen@wormbase.org
Business Contact
 KAREN YOOK
Phone: (415) 306-4150
Email: karen@wormbase.org
Research Institution
N/A
Abstract
The proposed Dynamic Mining and Contextualization of the Scientific Literature DMCSL provides an open lane of communication between authors science journals readers and databases The outcome of this communication portal will be a database containing mineable metadata for researchers reagent supply and biotech companies Data will be available to companies through individualized subscription models This pipeline identifies biological entities e g gene alleles etc and embeds hyperlinks from these entities to NHGRI funded curated Model Organism Databases MODs DMCSL is an enhancement of a markup pipeline that has been in effect since and has linked biological entities in over research articles in GENETICS and G published by the Genetics Society of America GSA to pages in MODs WormBase Flybase and the Saccharomyces Genome Database This proposal seeks funding to expand the scope of the GSA markup pipeline in all aspects biological entities linked authoritative databases linked to Rat Genome Database Mouse Genome Information Zebrafish Model Organism Database and the fission yeast genome database and journals linked from This expansion will also include collecting information on supplies and equipment described in Materials and Method sections of articles along with supplier information The DMCSL will collect and store link information along with author and journal metadata and link access statistics By doing so the DMCSL will provide valuable metrics to all stakeholders including biotech companies and life science vendors as well as a comprehensive and queryable view of biology not currently available In Phase I we will develop code that is flexible enough to scale the pipeline to link an article to more lexica and more databases within a single article and within a strict time limit of turnaround set by the publisherandapos s production process We will also be testing the software in linking publications of other journals and develop tools to query and data mine relationships identified through the data extraction process We will develop basic APIandapos s to serve as a core API database resource a linking API to store created links and monitor link activity and use modern API management to develop a portal for key based access to other API data Proving stability and flexibility of the software based on current parameters in Phase II we will work in collaboration with a wider range of stakeholders more journals more databases including expanding to human biomedical databases and more companies to develop experience based APIs for each stakeholder group These APIs will be intuitively designed based on how each group interacts with the basic API developed in Phase I and will be used to develop subscription based access for commercial companies access for academic stakeholders and collaborating journals will remain free The Dynamic Mining and Contextualization of Science Literature DMCSL accelerates the rate of scientific discovery and reproducibility by creating interactive science articles and collecting data metrics valuable to all stakeholders researchers journals databases biotech research companies and life science vendors The DMCSL creates a communication bridge between authors and authoritative databases allowing databases to enforce the use of standardized nomenclature thereby promoting scientific provenance and reproducibility

* Information listed above is at the time of submission. *

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government