TECHNOLOGY AREA(S): Info Systems
OBJECTIVE: Develop transformative knowledge navigation and document discovery software with the ability to analyze complex, multi-faceted data sets and provide the user with an intuitive interface that shows patterns and connections within the data regardless of data size or file type. PROPOSALS ACCEPTED: Phase I and DP2. Please see the 17.1 DoD Program Solicitation and the DARPA 17.1 Direct to Phase II Instructions for DP2 requirements and proposal instructions.
DESCRIPTION: There is a critical DoD need for new software tools that can rapidly ingest and index large data sets from archived data to provide users with methods to quickly survey and harvest pertinent information. The increasing number and size of data archives has stockpiled vast quantities of information. While the extent of archiving is viewed positive, the retrieval tools enabling rapid survey and use of the archived information have not kept pace. This data explosion opens new opportunities to extract more value from data collected by the military, academia and industry. According to research by MGI and McKinseys Business Technology Office, big data analysis is becoming the key basis of competition due to the increasing volume and detail of information captured by enterprises, multimedia, social media and the internet. In the commercial sector, big data can be transformative through the collection of product performance, consumer and market trend information. The ability to cultivate, analyze and display this information as meaningful output will enable organizations to make decisions regarding future investment areas. The endeavor of making archives useful, is still underdeveloped. Search engines are one of the major strategies to cull through archived data, but only yield lists of information. Search engines do not enable the user to rapidly understand the topics embedded within the archive nor make connections between topics. Some content management tools that can search for keywords and return a list of matching files are available, but often require the user to know precisely what they are seeking. These tools fail to enable the user to explore the data archive in an organized approach. These aforementioned liabilities are particularly acute for archival systems designed to meet security requirements. The tools that have been developed to parse and search through large datasets have not been able to incorporate growing datasets and provide applicable, useful information using visualization techniques that achieve the desired level of interactivity. A software tool that can adapt to datasets of different volumes and compositions, provide the user with the desired output and implement visualization techniques that make the system accessible is needed to deal with the rapid increase in archived data. Tools that can analyze and produce output that displays patterns, connections and actionable information for the user will be increasingly useful. The proposed system would create a collaborative platform that is not only content-rich, intuitive, and useful, but also widely applicable and customizable. The platform will create a process that can rapidly analyze structured and unstructured datasets to query, identify and visualize hidden values.
PHASE I: Analyze existing archival systems and visualization techniques that can be leveraged and improved to meet the topic objective. Conduct an analysis and create a model of a data visualization application from an existing large dataset that estimates the minimum number of assets that are required to create a viable, interactive and scalable system. Phase I deliverables include application source code, preliminary performance results, and a final report.
PHASE II: Create a data visualization software application prototype with the following capabilities: 1) can be implemented at different levels of secure environments; 2) easily ingest datasets ranging from 10,000-10,000,000 documents of varied file types; 3) index the datasets with the potential for daily, weekly or monthly updates; 4) and an innovative indexing feature adaptable to non-structured and structured datasets. The user interface for the application must meet the following requirements: 1) intuitive to navigate with no training; 2) display the data in organized categories; 3) enable users to modify the number and type of categories; 4) highlight connections between categories; and 5) display trends in the data. Conduct market analysis of two potential areas of insertion, which includes a description of how the targeted users impacted the design and functionality of the system. Phase II deliverables will include a product that is ready for beta release for market testing, a preliminary commercialization plan, and final report.
PHASE III: At the conclusion of the SBIR effort, potential military partners such as the Office of Naval Research and the Army Research Office, should be contacted for interest in adopting the innovative software platform to enable increased access to pertinent information embedded in archived data. Other military organizations such as US Army Medical Research and Materiel Command (USAMRMC) could also use the technology to rapidly index their large datasets and provide a user-interface that enables the user to extract more information than the current indexing tools. Commerical applications such as large businesses that are collecting purchasing information could use this technology to parse through their large quantities of data and display consumer information in a more meaningful and useful way.
1: Ibrahim Abaker Targio Hashem, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, Abdullah Gani, Samee Ullah Khan, The rise of big data on cloud computing: Review and open research issues, Information Systems 47 (2015) pp.98-115.
2: M. James, M. Chui, B. Brown, J. Bughin, R.Dobbs, C. Roxburgh, A. Hung Byers, Big data: The next frontier for innovation, competition and productivity. McKinsey Global Institute. May 2011.
3: R.L. Villars, C.W. Olofson, M. Eastwood, Bigdata:whatitisandwhyyoushouldcare, WhitePaper, IDC, 2011, MA, USA.
KEYWORDS: Data Visualization, Knowledge Management, Data Archival Software, Information Design, Knowledge Navigation, Document Search, Design Of Data