You are here

GoBig: A Unified Interface to Big Data Systems

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-SC0013252
Agency Tracking Number: 222440
Amount: $999,999.00
Phase: Phase II
Program: SBIR
Solicitation Topic Code: 01c
Solicitation Number: DE-FOA-0001405
Solicitation Year: 2016
Award Year: 2016
Award Start Date (Proposal Award Date): 2016-04-11
Award End Date (Contract End Date): 2018-04-10
Small Business Information
28 Corporate Drive
Clifton Park, NY 12065-8688
United States
DUNS: 010926207
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Jeffrey Baumes
 (518) 371-3971
Business Contact
 Charles Weatherford
Phone: (518) 371-3971
Research Institution

A researcher dealing with big data today is met with a maze of languages, programming environments, data storage and query systems, and compute engines. Pursuing a new path in this space may take years and millions of dollars of investment, only to discover that a new and more applicable big data paradigm has emerged. Costs include learning programming languages, storage systems, and computing paradigms, as well as significant hardware and administrative costs of setting up and maintaining the needed environments for data storage, transfer, and computation.

How this problem is being addressed
GoBig unifies and simplifies big data tools in two important areas: unified user interface to big data software and hardware stacks, and streamlined deployment and modularity to various types of cloud and HPC systems. Data is managed through the extensible Girder data framework, an open-source project started at Kitware which provides a unified interface to many distributed storage systems along with access control and extensible plugins. Romanesco manages analyses and workflows that span programming language boundaries. The results are then persisted in Girder to be
made available for further analysis or visualization. Instead of managing and supporting multiple user endpoints to various big data toolchains, user management and authorization for multiple systems may be managed by GoBig’s account credentials.

What is to be done in Phase I
To demonstrate the feasibility of the GoBig system in Phase I, we will show system modularity by extending computation support in GoBig to Hadoop, HPC clusters running MPI, a queueing system, and a distributed data system. We will also add Julia, Java, and Scala to the analytic programming languages supported in GoBig, and demonstrate the applicability of GoBig to a computational science domain. Our Phase I work will also demonstrate ease of deployment including provisioning of arbitrary systems and easy installation on cloud services such as OpenStack and Amazon Web Services (AWS). This will all be performed utilizing Kitware’s proven practices for agile, durable, and sustainable software.

Commercial applications and other benefits
Because GoBig is open-source and extensible, the community that will grow around the aforementioned tools will foster agility and innovation while reducing maintenance cost over time. The development model used for open-source projects has also been
proven to scale to thousands of developers while maintaining a high standard for quality. We will encourage the participation of developers who can add abstractions for

more data storage and processing systems. GoBig’s flexibility and ease of use will ultimately impact a broad range of data analysts who require a low barrier of entry to distributed compute services, including government, academia, and the business community.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government