VERY HIGH DIMENSIONAL VISUAL MINING OF THE NCI DATASET
Small Business Information
ANVIL INFORMATICS, INC., 600 SUFFOLK ST, 5TH FL N, LOWELL, MA, 01854
AbstractThere is a significant academic and commercial need for new tools that provide high dimensional data visualizations, coupled to analytical data mining techniques. We believe that visualization is the interface to analysis and provides guidance in the discovery process. As a major aim, we will investigate and evaluate new visualization tools, some of which are proprietary, capable of displaying an arbitrary number of dimensions, some of which are proprietary, capable of displaying an arbitrary number of dimensions of data simultaneously. To do this, we will use the large public NCI DIS compound dataset that has been tested against a battery of 60 cancer cell lines. In addition to tool evaluation using this dataset, a lesser aim will be knowledge discovery in the dataset. We propose calculation of the Molconn-Z chemical descriptors and the combined data mining of these descriptors. and associated cell line data. This activity is aimed at the discovery of new compound cancer activity patterns that may be useful in a clinical setting. In a follow on Phase II research study, we will integrate the selected visualization and analytic tools into a robust integrated data mining package for commercial use.
* information listed above is at the time of submission.