Department of Energy
August 12, 2013
August 12, 2013
SBIR / 2014
October 15, 2013
NOTE: The Solicitations and topics listed on this site are copies from the various SBIR agency solicitations and are not necessarily the latest and most up-to-date. For this reason, you should use the agency link listed below which will take you directly to the appropriate agency server where you can read the official version of this solicitation and download the appropriate forms and rules.
The official link for this solicitation is: http:--science.doe.gov-grants-pdf-SC_FOA_0000969.pdf
Large scale data storage and processing systems are needed to store, access, retrieve, distribute, and process data from experiments conducted at large facilities, such as Brookhaven National Laboratorys Relativistic Heavy Ion Collider (RHIC) and the Thomas Jefferson National Accelerator Facility (TJNAF). In addition, data acquisition for the Facility for Rare Isotope Beams (FRIB) requires unprecedented speed and flexibility in collecting data from new flash ADC based detectors. The experiments at such facilities are extremely complex, involving thousands of detector elements that produce raw experimental data at rates up to a GB-sec, resulting in the annual production of data sets containing hundreds of Terabytes (TB) to Petabytes (PB). Many 10s to 100s of TB of data per year are distributed to institutions around the U.S. and other countries for analysis. Research on large scale data management systems and high speed, distributed data acquisition is required to support these large nuclear physics experiments. All grant applications must explicitly show relevance to the nuclear physics program.
A recent trend in nuclear physics is to construct data handling and distribution systems using web services or data grid infrastructure software such as Globus, Condor, SRB, and xrootd for large scale data processing and distribution. Grant applications are sought for (1) hardware and-or software techniques to improve the effectiveness and reduce the costs of storing, retrieving, and moving such large volumes of data, including, but not limited to, automated data replication coupled with application-level knowledge of data usage, data transfers to Tier 2 and Tier 3 centers from multiple data provenance with an aim for least wait-time and maximal coordination (coordination of otherwise chaotic transfers), distributed storage systems of commercial off-the-shelf (COTS) hardware, storage buffers coupled to 10 Gbps (or greater) networks, and end-to-end monitoring and diagnostics of WAN file transport; (2) hardware and-or software techniques to improve the effectiveness of computational and data grids for nuclear physics examples include integrating storage and data management services with scalable distributed data repositories such as xrootd, and developing application-level monitoring services for status and error diagnosis; (3) effective new approaches to data mining, automatic structuring of data and information, and facilitated information retrieval; (4) new tools for configuring and scheduling compute and storage resources for data-intensive high performance computing tasks such as in user analyses where repeated passes over large datasets requiring fast turnaround times are needed; and (5) distributed authorization and identity management systems, enabling single sign-on access to data distributed across many sites. Proposed infrastructure software solutions should consider and address the advantages of integrating closely with relevant components of Grid middleware, such as contained in the software stack of the Open Science Grid as the foundation used by nuclear physics and other science communities. Applicants that propose data distribution and processing projects are encouraged to determine relevance and possible future migration strategies into existing infrastructures. Grant applications also are sought (1) to provide redundancy and increased reliability for servers employing parallel architecture, so that they are capable of handling large numbers of simultaneous requests by multiple users; (2) for hardware and software to improve remote user access to computer facilities at nuclear physics research centers, while at the same time providing adequate security to protect the servers from unauthorized access; and (3) for hardware and software to significantly improve the energy efficiency and reduce the operating costs of computer facilities at nuclear physics research centers.
Grid deployments such as the Open Science Grid (OSG) in the U.S. and the Worldwide Large Hadron Collider (LHC) Computing Grid (WLCG) in Europe provide standardized infrastructures for scientific computing across large numbers of distributed facilities. To support these infrastructures, computing paradigms have emerged: (1) Grid Computing, sometimes called computing on demand, supports highly distributed and intensive scientific computing for nuclear physics (and other sciences); and (2) Cloud Computing, often referred to as elastic computing, can offer a fast turn-around resource provisioning solution to experiments via virtual machine containing an application-specific computing environment, services and software stack. Accordingly, there is a need for compatible software distribution and installation mechanisms that can be automated and scaled to the large numbers (100s) of computing facilities distributed around the country and the globe including platform independent applications as well as solution supporting the provisioning of resources to multiple experiments at a given site. Grant applications are sought to (a) develop mechanisms and tools that enable efficient and rapid packaging, distribution, and installation of nuclear physics application software on distributed computing facilities such as the OSG and WLCG (b) design innovative solutions for the apportion of resources and achieve resource sharing between many experiments and groups both public and private Cloud environments (c) seek to leverage industry standards such as the Hadoop file system or MapReduce paradigm to enhance the capabilities of Cloud stacks. Software solutions should enable rapid access to computing resources as they become available to users that do not have the necessary application software environment installed.
Modern data acquisition systems are becoming more heterogenous and distributed. This presents new challenges in synchronization of the different elements of this event-driven architecture. The building blocks of the data acquisition system are digitizers, either flash digitizers or integrating digitizers of time, pulse height or charge. These elements respond in real-time to convert electrical signals from detectors into digital form. The data from each detector element is labeled with a precisely synchronized time and transmitted to buffers. The total charge, the number of coincident elements or other information summaries are used to determine if something interesting has happened, that is, forming a trigger. If the trigger justifies it, the data from the elements are assembled together into a time-correlated event for later analysis, a process called Event Building. At present the elements tend to be connected by buses (VME, cPCI), custom interconnects or serial connections (USB). In certain types of experiments at FRIB, low event rates of 1 to 10 kevents-s are anticipated, with dense data streams from FADC-based detector systems. The large latencies possible in highly buffered flash ADC architectures can be used to advantage in the design of the architecture. A concept of the next generation data acquisition system is that it will be ultimately composed of separate ADC's for each detector element, connected by commercial network or serial technology. Development is required to implement the elements of this distributed data acquisition over commercially available network technologies such as 10 Gb Ethernet or Advanced Telecommunications Computing Architecture (ATC). The initial work needed is to develop a software architecture for a system that works efficiently in the available network bandwidth and latencies. The elements desired in the architecture are to (1) synchronize time to a sufficient precision, as good as 10ns or better to support Flash Analog-to-Digital Converter (FADC) clock synchronization, 100ns or better to support trigger formation and event building, (2) determine a global trigger from information transmitted by the individual components (3) notify the elements of a successful trigger, in order to locally store the current information, (4) collect event data from the individual elements to be assembled into events and (5) software tools to validate the function of the synchronization, triggering and event building during normal operation. The synchronization of time is critical to the success of this architecture, as is the constant validation of the synchronization. Grant applications are sought for any of 1) development of the software architecture that specifies a functional model for the individual elements of the system, the high level network protocols, and requirements on the communications fabric for given data rates and system latencies, including a portable software implementation of the elements of the architecture, 2) hardware modules to implement the detector digitizer on Ethernet, and 3) time distribution protocols and hardware to support this architecture. Such an architecture and its implementation could form the basis of a standard for next generation data acquisition in nuclear physics, particularly at the FRIB
Computationally demanding theory calculations as well as detector simulations and data analysis tasks can be significantly accelerated by the use of general purpose Graphics Processing Units (GPUs). The ability to exploit these accelerators is constrained by the effort required to port the software to the GPU environment. More capable cross compilation or source to source translation tools are needed that are able to injest very complicated templatized C++ code and produce high performance codes for heterogenous architectures. Early work by the USQCD (US Quantum Chromo Dynamics) collaboration has demonstrated the power of clusters of GPUs in Lattice QCD calculations. This early work was manpower intensive but yielded a large return on investment through the hand optimization of critical numerical kernels, achieving performance gains of up to 60x with 4 GPUs. However, realizing the full potential of accelerators on the full code base can only be achieved through a capable and performant automated tool chain.
In addition to the specific subtopics listed above, the Department invites grant applications in other areas that fall within the scope of the topic description above.