OBJECTIVE: The objective of this topic is to develop a distributed fusion engine that can reason using information in cloud environments to inform decision tools. To meet this objective, the need exists for hybrid processing (time delayed cloud/real time non-cloud), development of services capable of maintaining data and belief net models across distributed nodes, and the development of inferencing algorithms that can be implemented as map reduce jobs to assemble fused products tailored to user needs (i.e. area, time and interests). DESCRIPTION: Due to the large quantities of data, it is now beyond the abilities of humans, without machine assistance, to assimilate data sets and create analytic products. Cloud architecture offers a means to share data and services across nodes. Hadoop and MapReduce open standards enable data to be structured for parallel processing . Accumulo, based on Google's Big Table design, is being studied for use by many agencies . These standards could serve as a basis for distributed data fusion to address information requirements across all warfighting functions. The challenge is developing a complex data fusion methodology that works within a cloud technology . While cloud architectures offer unprecedented access to large data, legacy applications, such as fusion algorithms which were developed to run on single servers, will not easily be rehosted as distributed applications. To replicate past success, technology is needed that can maintain data models, reason about patterns and belief nets and enable machine learning all across geographically separated cloud nodes. A goal of this topic is to develop technology to contextually filter content for a specific area and time of interest in support of a fusion engine that can address user specified contextual questions. Challenges exist, however, in architecting algorithms that can run as standalone and distributed services. Data fusion algorithms are key to translating raw data to a disambiguated data layer and situational understanding, but maintaining data models and belief nets across a cloud are research challenges that must be addressed in order to realize the power of the cloud for the tactical warfighter. The key technical challenges inherent to the topic include how to maintain a common data model when data enrichment (e.g. entity extraction) is occurring across many distributed nodes and how to implement probabilistic reasoning as a map reduce task. Applications currently authored as map reduce jobs generally do not require deep collaboration between nodes. For example, a state of the art map reduce task characterizes the data content of each node as n-grams (static ontology) and performs searches as map reduce jobs that do not require feedback to the distributed nodes (independent node searches). This topic will expand the state of the art by enabling distributed nodes to work together on a data model and probabilistic reasoning. A successful prototype would have implemented both level 1 fusion (entity and metadata disambiguation) and level 2 (inferencing) fusion algorithms across a set of cloud nodes. PHASE I: Develop techniques to implement data fusion algorithms in a cloud environment; identify key technical risks associated with the development of a prototype; implement a design strategy to measure algorithm performance over time. Technical approach should address the challenges of maintaining data models, inferring patterns and enabling machine learning. Identify a specific application and use case for a customer (military and commercial) and outline a plan for going forward with research. The final Phase I brief should include a proof of concept demonstration and show plans for a Phase II. PHASE II: Produce a prototype system that is capable of running level 1 (data resolution) and level 2 (inference) fusion algorithms across geographically separate cloud nodes, each holding different data sources, some streaming. The prototype system should be able to maintain data models and inferences about behavior while allowing machine learning from a distributed cloud architecture. Validate level 1 and 2 fusion results derived from at least two small computer clusters similar to what is possible from a single node having access to all the data. The prototype should present context and pedigree of information used by the fusion engaging for operator review, independent of which cluster it was sourced from. During the phase II effort, the transition path should be strengthened by focusing on data and use cases of interest. PHASE III: Produce an application or set of applications that are capable of being generalized to N number of cloud nodes with relevance to Navy and Marine Corps use cases. The phase III product(s) should be capable of running on program of record cloud systems such as DCGS-N using existing services to run against operational data. Developed applications must have relevance to amphibious, anti-submarine and integrated air-missile defense warfare mission areas. During this phase the performer should concentrate on operational relevance and transition. PRIVATE SECTOR COMMERCIAL POTENTIAL/DUAL-USE APPLICATIONS: The use of cloud architectures is becoming prevalent in both the DoD and private sector. Law enforcement and news services are private sectors that also have a need to move beyond capabilities that enable data discovery in distributed clouds to systems that can implement complex data fusion algorithms. Data stored in clouds are already being used by these sectors to assess trends and discover events and activities of interest. REFERENCES: 1. Sustaining US Global Leadership: Priorities for the 21st Century Defense, Jan 2012, http://www.defense.gov/news/Defense_Strategic_Guidance.pdf 2. Hadoop and MapReduce, http://hadoop.apache.org/map reduce/ 3. Accumulo, http://accumulo.apache.org/ 4. Scott C. McGirr,"Buiding a Single Integrated Picture Over Networks", MSS National Symposium on Sensor and Data Fusion, at McLean, VA, June 6-8, 2006. 5. Richard Antony and J. A. Karakowski,"Fusion of HUMINT and Conventional Multi-Source Data", MSS National Symposium on Sensors and Data Fusion, July 2007. 6. Chee-Yee Chong, D. Hall, M. Liggins and J. Llinas (editors), Distributed Data Fusion for Network-Centric Operations, CRC Press, Nov 13, 2012.