You are here

A Cloud based Application for the Joint Analysis of Multiple Big Data Types

Award Information
Agency: Department of Health and Human Services
Branch: National Institutes of Health
Contract: 1R41GM125435-01
Agency Tracking Number: R41GM125435
Amount: $150,000.00
Phase: Phase I
Program: STTR
Solicitation Topic Code: 400
Solicitation Number: PA14-157
Timeline
Solicitation Year: 2014
Award Year: 2017
Award Start Date (Proposal Award Date): 2017-08-01
Award End Date (Contract End Date): 2018-03-31
Small Business Information
708 KENT CT
Southlake, TX 76092-8868
United States
DUNS: 079718159
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 GORDON OKIMOTO
 (808) 292-6671
 gokimoto@crch.hawaii.edu
Business Contact
 ALEXANDER CABELLO
Phone: (808) 358-9101
Email: alex.cabello@algorithmhub.com
Research Institution
 UNIVERSITY OF HAWAII AT MANOA
 
BOX 368, 2440 CAMPUS ROAD
HONOLULU, HI 96822-2234
United States

 Nonprofit college or university
Abstract

Project Summary
Technology advances now enable the cost effective acquisition of Kandgt distinct data types from a common set
of N bio samples where i the kth data type is represented by a data matrix with columns containing Pk
measurements in N samples for k K and ii at least one of the data types is big i e Pk is much bigger
than N for some k The rapid accumulation of such multi modal data sets MMDS in private and public
databases has slowed the development of a more predictive precise and personalized approach to detecting
and treating cancer and other complex diseases This problem is due in large part to the lack of easily
accessible computationally efficient software that can identify small sets of biologically informative variables
i e signatures in MMDS that are also predictive of clinical outcomes The primary aim of this project is to
develop a cloud based application based on a novel algorithm called the Joint Analysis of Many Matrices via
ITeration JAMMIT that exploits a key property of genomic signatures called sparsity to enhance their
detection in big data matrices The sparsity assumption asserts that the number of variables needed to explain
a key biological and or clinical attribute of the samples constitutes only a very small fraction of the s of
thousands measured JAMMIT computes sparse rank matrix approximations that automatically zoom in on
sparse signatures that are shared by the data matrices of a MMDS False discovery rate is used to select the
best signatures for further downstream data reduction and modeling The JAMMIT algorithm has been
validated in data simulations and real experimental data for ovarian and liver cancer A novel cloud based
platform called AlgorithmHub will be used to implement JAMMIT as a secure computationally efficient
Software as a Service SaaS on Amazon Web Services AWS Researchers will be able to access the
application from any device with internet access to upload pre process and analyze up to three big data
types in a timely manner Post processing tools will be implemented in AWS that facilitate the training of neural
network NN predictors on eigen wavelet EW features extracted from JAMMIT derived signatures using
genetic programming and backpropagation to optimize network topology and connection weights respectively
Signatures derived by JAMMIT as a SaaS will be compared with published results generated by a version of
JAMMIT implemented on local servers in Matlab NNs trained on raw signature and EW features will be
assessed and compared using ROC curves confusion matrices and cross validation The Phase I
implementation of JAMMIT as a cloud based application will set the stage for a Phase II effort to extend
JAMMIT to handle an arbitrary number of data matrices automate the selection of the best sparsity
parameter based on FDR enhance ease of use based on user feedback and utilize genetic
programming to optimize both EW features and network topology along with network connection weights Project Narrative
Multi modal data sets MMDS composed of multiple big data types obtained from a common set of samples
are rapidly accumulating in private and public databases thus posing a major bottleneck in the translation of
such data into useful clinical applications The Joint Analysis of Many Matrices via ITeration JAMMIT
algorithm uses sparse signal processing methods to detect small sets of variables i e signatures in MMDS
that are biologically informative and predictive of clinical outcomes The JAMMIT algorithm will be implemented
as a computationally efficient user friendly software application in Amazon Web Services AWS using
AlgorithmHub a platform for developing and deploying complex algorithms as cloud based applications

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government