CLUSTER COMPARISON METHODS & THE NCI EXPRESSION DATASET

Award Information
Agency:
Department of Health and Human Services
Branch:
N/A
Amount:
$98,438.00
Award Year:
2002
Program:
SBIR
Phase:
Phase I
Contract:
1R43CA096179-01
Agency Tracking Number:
CA096179
Solicitation Year:
N/A
Solicitation Topic Code:
N/A
Solicitation Number:
N/A
Small Business Information
ANVIL INFORMATICS, INC.
ANVIL INFORMATICS, INC., 600 SUFFOLK ST, 5TH FL N, LOWELL, MA, 01854
Hubzone Owned:
N
Socially and Economically Disadvantaged:
N
Woman Owned:
N
Duns:
N/A
Principal Investigator
 JOHN HOTCHKISS
 (781) 272-1600
 JHOTCHKISS@ANVILINFO.COM
Business Contact
 MICHAEL MCMANUS
Phone: (978) 934-8821
Email: mmcmanus@anvilinfo.com
Research Institution
N/A
Abstract
There is a significant commercial and academic need for new tools that provide quantitative cluster comparison metrics. It is important for pharmaceutical and biotechnology companies to be able to critically evaluate the utility of using different clustering techniques on large high dimensional datasets, in order to make the most informed decisions based upon the clustering results. We propose to evaluate and build bluster comparison metrics, integrating them with high dimensional visualization techniques, so that not only an overall scope, but the cluster distributions can be compared in an intuitive visual fashion. In carrying out our analysis, we will focus on the NCI (approximately 1,400) compound, subset, 118 known mechanism of action compound gene expression dataset analyzed by Scherf, et.al (2000). IN A FOLLOW ON Phase II SBIR Proposal, we will create a robust software package for commercial release where cluster comparison metrics are integrated with the most valuable visualization tools we identify in the Phase I research. PROPOSED COMMERCIAL APPLICATIONS: The Specific Aims of this Phase I proposal will allow us to create new tools where cluster comparison metrics are integrated with high dimensional visualization techniques, so that not only an overall score, but the cluster distributions can be compared in an intuitive visual fashion. We will use the publicly available NCI DIS compound subset, gene expression dataset of Scherf, e.g. al. (2000) to carry out these aims, as ell as data mine this dataset for new discoveries.

* information listed above is at the time of submission.

Agency Micro-sites

US Flag An Official Website of the United States Government