Parallel Algorithms for Processing Huge Sparsely Labeled Datasets on Clusters of Multicore Processors for Healthcare and Manufacturing Applications


Many health care and manufacturing applications are unavoidably spatiotemporal and generate large amounts of data. Expert interaction and measurement costs for these datasets imply that only a very small fraction of the data can be labeled such that models or results are presented in a form that supports human understanding. In general, the underlying spatial and spatiotemporal relationships in these data can be represented as three-dimensional grid or graph structures. Traditional machine learning algorithms are not applicable as they tend to be samplebased, requiring labelling of a significant fraction of the data. Therefore, a pressing need exists for algorithms that can handle very large datasets with only a fraction of data labeled. Additionally, data volume and speed of data acquisition requires that such algorithms effectively exploit networked multicore, GPU, and parallel computing resources. The underlying technology should have broad applicability in spatiotemporal big data applications.

The goal of the proposed research is to develop fast, parallel semi-supervised machine learning (ML) algorithms that address challenges of very large datasets and applications in the domains of healthcare and manufacturing. Such ML algorithms should be effective for datasets having millions to billions of data points, with only a few thousands of data points labelled.

Phase I expected results:
Develop novel machine learning algorithms that can work effectively for sparsely labeled datasets. Demonstrate their parallelization capability on hundreds of traditional cores.

Phase II expected results:
Demonstrate the effectiveness of machine learning algorithms developed in Phase I on real-world applications. Demonstrate scalability of these algorithms on large clusters of GPU and multicore machines. Phase II results should lead to a commercialization path for the technology applications in specific domains (for example, healthcare and manufacturing).

NIST may be available to work both in consultative and collaborative capacity in assisting the awardee.

Agency Micro-sites

US Flag An Official Website of the United States Government