Encapsulating Generalized Principal Components and Support Vector Machines in a Nonlinear Statistical Pattern Recognition Toxicology Tool
Small Business Information
6 New England Executive Park, Burlington, MA, 01803
AbstractIdentifying toxic-substance exposure at low, subtoxic concentrations requires interpretation of complex, time-related changes in gene, protein, and metabolite expression patterns. We propose a statistical pattern recognition software tool based onflexible and adaptable modules. In Phase I, we will produce the tool framework and Principal Components Analysis (PCA) and Support Vector Machine (SVM) modules. The tool framework preprocesses genomic, proteomic, and metabonomic data sets from clinicalsources. PCA linearly transforms each type of data to an orthogonal space of significantly reduced dimension in which expression of like toxins are clustered. Through Statistical Learning Theory, the SVM adapts and estimates a nonlinear mapping functionfrom the expression-data input space to a decision feature space using data for which ground truth has been independently established. A similarity kernel in feature space induces a metric on the input space by selecting key feature components andproduces a nonlinear decision boundary. Selection of the kernel can range from a simple distance metric to neural networks (multi-layer perceptrons, radial/elliptical basis function networks, etc) to fuzzy membership functions. By changing the kernel,the performance of the SVM decision boundaries can be optimized over a range of kernel similarity metrics, feature mappings, and feature selection. Other modules may be added in the future to implement a wider suite of solutions in a user-friendlyanalysis environment.
* information listed above is at the time of submission.