You are here

A Cloud based Application for the Joint Analysis of Multiple Big Data Types

Award Information
Agency: Department of Health and Human Services
Branch: National Institutes of Health
Contract: 1R41GM125435-01
Agency Tracking Number: R41GM125435
Amount: $150,000.00
Phase: Phase I
Program: STTR
Solicitation Topic Code: 400
Solicitation Number: PA14-157
Solicitation Year: 2014
Award Year: 2017
Award Start Date (Proposal Award Date): 2017-08-01
Award End Date (Contract End Date): 2018-03-31
Small Business Information
708 KENT CT, Southlake, TX, 76092-8868
DUNS: 079718159
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 (808) 292-6671
Business Contact
Phone: (808) 358-9101
Research Institution
 BOX 368, 2440 CAMPUS ROAD
HONOLULU, HI, 96822-2234
 Nonprofit college or university
Project Summary Technology advances now enable the cost effective acquisition of Kandgt distinct data types from a common set of N bio samples where i the kth data type is represented by a data matrix with columns containing Pk measurements in N samples for k K and ii at least one of the data types is big i e Pk is much bigger than N for some k The rapid accumulation of such multi modal data sets MMDS in private and public databases has slowed the development of a more predictive precise and personalized approach to detecting and treating cancer and other complex diseases This problem is due in large part to the lack of easily accessible computationally efficient software that can identify small sets of biologically informative variables i e signatures in MMDS that are also predictive of clinical outcomes The primary aim of this project is to develop a cloud based application based on a novel algorithm called the Joint Analysis of Many Matrices via ITeration JAMMIT that exploits a key property of genomic signatures called sparsity to enhance their detection in big data matrices The sparsity assumption asserts that the number of variables needed to explain a key biological and or clinical attribute of the samples constitutes only a very small fraction of the s of thousands measured JAMMIT computes sparse rank matrix approximations that automatically zoom in on sparse signatures that are shared by the data matrices of a MMDS False discovery rate is used to select the best signatures for further downstream data reduction and modeling The JAMMIT algorithm has been validated in data simulations and real experimental data for ovarian and liver cancer A novel cloud based platform called AlgorithmHub will be used to implement JAMMIT as a secure computationally efficient Software as a Service SaaS on Amazon Web Services AWS Researchers will be able to access the application from any device with internet access to upload pre process and analyze up to three big data types in a timely manner Post processing tools will be implemented in AWS that facilitate the training of neural network NN predictors on eigen wavelet EW features extracted from JAMMIT derived signatures using genetic programming and backpropagation to optimize network topology and connection weights respectively Signatures derived by JAMMIT as a SaaS will be compared with published results generated by a version of JAMMIT implemented on local servers in Matlab NNs trained on raw signature and EW features will be assessed and compared using ROC curves confusion matrices and cross validation The Phase I implementation of JAMMIT as a cloud based application will set the stage for a Phase II effort to extend JAMMIT to handle an arbitrary number of data matrices automate the selection of the best sparsity parameter based on FDR enhance ease of use based on user feedback and utilize genetic programming to optimize both EW features and network topology along with network connection weights Project Narrative Multi modal data sets MMDS composed of multiple big data types obtained from a common set of samples are rapidly accumulating in private and public databases thus posing a major bottleneck in the translation of such data into useful clinical applications The Joint Analysis of Many Matrices via ITeration JAMMIT algorithm uses sparse signal processing methods to detect small sets of variables i e signatures in MMDS that are biologically informative and predictive of clinical outcomes The JAMMIT algorithm will be implemented as a computationally efficient user friendly software application in Amazon Web Services AWS using AlgorithmHub a platform for developing and deploying complex algorithms as cloud based applications

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government