Description:
OUSD (R&E) CRITICAL TECHNOLOGY AREA(S): Advanced Computing and Software
OBJECTIVE: Extend the scientific Python programming ecosystem to incorporate a rich set of matrix and tensor operations supported by a compilation system capable of automatic efficient code generation for execution on a range of run-time hardware architectures, including hardware with features that optimize or accelerate operations on sparse data.
DESCRIPTION: Of the nearly 44 zettabytes of data humans have gathered to date, most of it is collected, processed and stored as multi-dimensional arrays or tensors. From the earliest computer and the first FORTRAN compiler, we have perfected how to store and compute dense tensor data. However, a large portion of the data, either obtained from nature or generated by humans, is sparse. The Python programming environment is increasingly the tool of choice for scientific processing, but it has poor support for sparsity; current practice is to ignore the sparsity and treat the data as dense, incurring unnecessary overhead, or to transform the data into unnatural dense formats, making the programs much more complicated. Capitalizing on sparsity can greatly increase the efficiency of scientific computations, but the challenge is that there are a large number of sparse data formats, each of which takes advantage of particular features of the data; the algorithms that operate on these different data formats are then specialized to exploit the advantages of the specific data formats. An additional challenge is that there are a variety of hardware accelerators for matrix and tensor computations, ranging from features in conventional CPUs and GPUs to new hardware capabilities that have been developed to accelerate computations on sparse data. Dealing with this complexity is a challenge that is not within the reach of most programmers. The objective of this effort is to make the processing of sparse data as accessible and efficient as working with dense data. Specifically, to make the Python APIs for array processing (in NumPy, SciPy, PyTorch, and scikit-learn used by 10s of millions of people) work seamlessly with sparse data. ESPy will build on the current Python scientific library approach (e.g. NumPy and SciPy), to minimize source code changes and encourage community take-up, and add sparsity awareness and the ability to target multiple hardware architectures, including at least one architecture with features aimed at accelerating computation on sparse data. A framework or domain-specific language approach can be used to encapsulate the Python application, or the Python environment can be extended with an array/tensor algebra compiler; the selected approach should require minimal source-code changes. A compilation intermediate representation should be devised or adapted that incorporates sparse semantics, such as degree of sparsity and data formatting, and that can be efficiently targeted at multiple run-time hardware architectures. Data formatting transformations should be minimized.
PHASE I: This topic is soliciting Direct to Phase II (DP2) proposals only. Selected performers will have an established and documented background and experience with most or all of the following: the design and implementation of Python-based libraries, frameworks, domain-specific-languages, compilers, compilation intermediate representations, code generation and optimization, and a range of target computer architectures including CPUs, GPUs, and hardware accelerators. Performers should also be able to document familiarity with sparse and dense tensor algebra, including comprehensive coverage of the many sparse and dense data structures and formats.
PHASE II: The goal of ESPy is to establish an efficient and easy-to-use sparse tensor algebra capability in the Python scientific ecosystem. The facility can be based on a library approach, or use a domain-specific language or a framework to facilitate the capability, as long as there are minimal code changes for the Python programmer. There are a multitude of sparse object data structures and formats, and ESPy should support as many of these as possible, and be capable of extension to additional sparse data formats as needed or required. ESPy should also support multiple target architectures including at least one architecture implementation with features that can be exploited to accelerate sparse vector or matrix processing. ESPy’s support of tensor algebra shall be more than just addition and multiplication by providing a rich set of operators and functions, such as convolutions and semi-rings. Expected Phase II metrics:
• At least 100X faster performance on a sparse array- and tensor-processing benchmark vs generic Python processing using scientific libraries with <5% lines of code changed
• Benchmark composition to be proposed and implemented by the proposer with DARPA’s agreement
• Two or more target hardware architectures in Phase II (base)
• Three or more target architectures, including at least one with hardware support for sparse data processing, in Phase II (option)
Schedule/Milestones/Deliverables
• Month 1: Kick-Off meeting; technical approach report that outlines the Python user interface (library/framework/DSL), intermediate representation, and target architecture code generation strategy; proposed tensor-processing benchmark review
• Month 3: PI meeting with PowerPoint presentations of accomplishments since the previous review and plans for the next period
• Month 6: PI meeting with PowerPoint presentations of accomplishments since the previous review and plans for the next period; demonstrations of work so far
• Month 9: PI review, including final design review, demonstration plans, and projected metrics achievements
• Month 12: Final PI review and final report, including quantitative metrics regarding performance and usability (percentage code changes need); demonstrate support for multiple sparse data formats; demonstrate support for multiple target architectures; proposal for Phase II option including additional target architecture and demonstration application (based on transition party requirements); the final report shall also document any scientific advances that have been achieved under the program (A brief statement of claims supplemented by publication material will meet this requirement); delivery of software executables, source code (if applicable), and documentation (publishing as open source is acceptable as delivery)
Phase II Option Phase II option activities should include the addition of a target architecture and a real-world demonstration in collaboration with a transition partner.
Schedule/Milestones/Deliverables
• Month 1: Kick-Off meeting; technical approach report outlining demonstration application based on transition partner requirements and additional target architecture(s)
• Month 3: PI meeting with PowerPoint presentations of accomplishments since the previous review and plans for the next period
• Month 6: PI meeting with PowerPoint presentations of accomplishments since the previous review and plans for the next period; demonstrations of work so far
• Month 9: PI review, including final design review, demonstration plans, and projected metrics
• Month 12: Final review/report including demonstration with transition partner and demonstration of additional target architecture; delivery of software executables, source code (if applicable), and documentation (publishing as open source is acceptable as delivery)
PHASE III DUAL USE APPLICATIONS: Sparse arrays are used extensively in commercial, scientific, and DoD/Military applications, such as social media, financial transactions, network traffic analysis, partial differential equations, optimizations, and Machine Learning applications. A likely pathway forward for the developers is to publish the software as open-source, thereby encouraging uptake and recruiting additional developers, and then to offer support and consulting services based on intimate understanding of the ESPy design and implementation.
REFERENCES:
[1] Shail Dave et al, Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights, https://ieeexplore.ieee.org/document/9507542.
[2] Jie-Fang Zhang et al, SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference, https://web.eecs.umich.edu/~zhengya/papers/zhang_jssc21.pdf.
[3] Shaohuai Shi, Qiang Wang, Xiaowen Chu, Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format, arxiv.org.
[4] Kartik Hegde et al, ExTensor: An Accelerator for Sparse Tensor Algebra, csail.mit.
[5] R. Henry et al, Compilation of Sparse Array Programming Models, DSpace@MIT.
[6] https://sparse.pydata.org/en/stable/ - implementation of sparse arrays of arbitrary dimension on top of numpy and scipy.sparse.
[7] http://tensor-compiler.org/ - a fast and versatile compiler-based library for sparse linear and tensor algebra.
KEYWORDS: Sparse; Dense; Python; Array; Matrix; Tensor