Distributed Mining Tool for Large-Scale DOE Science and Technical Information
Small Business Information
15400 Calhoun Drive, Suite 400, Rockville, MD, 20855
AbstractAs numerous documents and data sets accumulate in the science and technology area, the analysis of this information for the purpose of facilitating federated research becomes a challenging task. This project will develop efficient and scalable data mining solutions to conduct data integration, anomaly detection, and correlation. The key innovation of the proposed tool, the DSTMiner (Distributed Science and Technology Miner), is the execution of cutting-edge data mining algorithms for documents, topics, and related experiment or benchmark data. These cutting edge algorithms include hierarchical clustering, massive link analysis, association rule mining, anomaly detection, and correlation mining. Commercial Applications and other Benefits as described by the awardee The technology should find application in the analysis of mixed documents and scientific data sets, significantly improving the quality of knowledge access inside the millions of documents and data sets collected by DOE¿s Office of Scientific and Technical Information. In addition, the system could be customized for other related technical information repositories, such as NIH¿s PubMed, US Patent examination, FDA¿s review of pharmaceutical companies, and CDC¿s assessments of healthcare quality and policy impacts
* information listed above is at the time of submission.