You are here

Rapid Autonomous Data Ingest Algorithms (RADIA)

Description:

TECHNOLOGY AREA(S): Info Systems, Battlespace 

OBJECTIVE: The Distributed Common Ground Station-Navy Increment 2 (DCGS-N Inc 2) program seeks to employ novel machine learning techniques to optimize data ingest of multiple heterogeneous data types into anticipated Navy program data repositories (e.g., Accumulo). Automated data ingest must aid the DCGS-N Inc 2 system in facilitating real-time analytical processing, post-event analytics, nodal analysis, and support a host of other Navy Intelligence mission functions; e.g., Intelligence Preparation of the Operational Environment (IPOE). 

DESCRIPTION: To maintain maritime supremacy, the U.S. Navy must collect and understand ever increasing volumes and varieties of sensor and intelligence information to ensure proper force application across greater distances under ever compressing time constraints. DCGS-N Inc 2 is the intelligence system principally responsible for providing Navy commanders that understanding. To this end, DCSG-N Inc 2 must quickly aggregate, correlate, and fuse ‘All Source Intelligence’ to produce current and predictive, operational to tactical, battlespace awareness information required to make better decisions faster. With an expected exponential increase in data sources available to the DCGS-N Inc 2 Analyst, the intention of this topic is to provide an automated ingest engine to optimize information aggregation, fusion, and exploitation of unstructured, heterogeneous data streams to aid the DCGS-N Inc 2 Analyst. Additionally, it is common for known data producers to make minor changes and present updated data protocols to ingest interfaces that have not received new data format protocols, causing data loss due to rigid/brittle ingest protocols. Current ingest methodologies fail to pace the volume, variety, variability, velocity, and veracity required of the DCGS-N Inc 2 system, this SBIR topic seeks to advance current state-of-the-art data ingest methodologies to mitigate these problems. Optimally, the developed ingest engine will leverage Commercial-off-the-Shelf (COTS) and Government-off-the-Shelf (GOTS) tools and services, including large data storage and analytics processes employed in DCGS-N Inc 2. Ingest interfaces will enable the automated combining of high volumes of data from differing intelligence communities, National Technical Means (NTM) systems, and network feeds to aid DCGS-N Inc 2 in building a more coherent view of the battlespace. The ingestion process must be able to handle multiple data sources arriving simultaneously to differing nodes (ashore and afloat) and accommodate varying volumes, velocities, and varieties, to include data bursts/blooms. Critical to this effort will be the capacity for the ingest engine to ‘self-learn’ in order to ingest new, previously ‘unseen’ data and adapt to new data sources and formats. It will be able to process data for storage and use by DCGS-N Inc 2 analytics or other key system functions. Data tagging and normalization must be accomplished through the ingest process in accordance with eXtensible Markup Language (XML) Data Encoding Specification for Intelligence Community (IC)- Enterprise Data Header (EDH) V4 6 Sep 13. This ingest process must send a copy of the original message plus the EDH to be persisted and indexed. The volume and velocity of data coming into the system varies widely; the system must dynamically adjust to the changes. The goal is for ingest and preprocess not to exceed 60 seconds from the start of ingest to consumer availability. For estimation purposes, traffic will be measured in ‘messages’ at 10KB per message at DCGS-N Inc 2 specified ingest rates. It is also critical the data ingest indexing mechanism enable rapid retrieval (within 2 seconds) of stored data to meet the demands of operators in a tactical environment. This ingest engine needs to be flexible in handling a combination of streaming, bulk and standing order data with an importance on the expedience of data availability from data acquisition to consumer availability, without system degradation. The process also needs to have the ability to cleanse, de-duplicate, and re-ingest in the event of data ingestion errors. This system should also be scale-able in a virtualized/cloud environment, capable of ingesting multiple data sets in parallel, handling inconsistent loads, and have the ability to synchronize, replicate, and federate. 

PHASE I: Working in conjunction with the DCGS-N Inc 2 Government team, generate a novel design/design approach for a machine learning methodology to address feasibility of automated ingest for the DCGS-N Inc 2 system. Proposed design must be capable of ingesting varying types and formats of data in varying volumes, velocities, variability, and veracity. Examples of data include Navy Message traffic, electronic intelligence (ELINT), communications intelligence (COMINT), acoustical intelligence (ACINT), etc. Proposed design must also be able to adjust (self-learn) to process new data types, and handle changes in formats/fields of existing data types/feeds. 

PHASE II: The selected company must develop a cloud-enabled ingest virtual machine learning capability based on the Phase I proposal. Phase II should produce machine learning algorithms employed for the DCGS-N Inc 2 Program of Record (PoR). Phase II work should include the development of additional data types/feeds. Work produced in Phase II may become classified. Note: The prospective contractor(s) must be U.S. Owned and Operated with no Foreign Influence as defined by DOD 5220.22-M, National Industrial Security Program Operating Manual, unless acceptable mitigating procedures can and have been be implemented and approved by the Defense Security Service (DSS). The selected contractor and/or subcontractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances, in order to perform on advanced phases of this contract as set forth by DSS and SPAWAR in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material IAW DoD 5220.22-M during the advance phases of this contract. 

PHASE III: Continue Phase II research and development (complete necessary engineering, system integration, packaging, and testing) to field the capability into the DCGS-N Inc 2 computing infrastructure. Commercialize the capability for technology transition to the wider defense and intelligence communities and the broader commercial Business Intelligence (BI) market place. The self-learning, data ingest optimization engine described in this topic could have significant commercial potential for any BI/Enterprise Content Management/Cloud Data Services enterprise regardless of business concern. 

REFERENCES: 

1: Balanzinska, M., A. Deshpande, M. Franklin, P. Gibbons, J. Gray, S Nath, M. Hansen, M. Liehold, A. Szalay, and V. Tao. "Data management in the Worldwide Sensor Web," www.Computer.org/pervasive (ISSN: 1536-1268, 2007). https://www.computer.org/csdl/mags/pc/2007/02/b2030.html

2:  Paprotny, A., and M. Thess. "Realtime Data Mining: Self-Learning Techniques for Recommendation Engines, Applied and Numerical Harmonic Analysis," Springer International Publishing (2013). http://www.springer.com/us/book/9783319013206

3:  Minelli, M., M. Chambers, and A Dhiraj. "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses," John Wiley & Sons, Inc. (2013). https://play.google.com/store/books/details/Michael_Minelli_Big_Data_Big_Analytics?id=Mg3WvT8uHV4C

4:  eXtensible Markup Language (XML) Data Encoding Specification for Intelligence Community (IC)- Enterprise Data Header (EDH) V4 6 Sep 13.

 

KEYWORDS: Data Ingest, Cloud Data Services, Data / Machine / Deep Machine Learning, Artificial Intelligence 

US Flag An Official Website of the United States Government