Automated Categorization of Web-Based Content Along Multiple Dimensions
Both private companies and government agencies are faced with an enormous amount of text documents that are distributed over a network of computers and databases. Searching for specific documents is essential to the success of these organizations. Websites such as ScienceEducation.gov indicate that it is beneficial for content to be categorized to make it easier for users to find relevant information. However, the ongoing categorization of text documents and web pages is done manually, which is tedious and consumes too many resources. Moreover, since the quality of the categorization depends on human judgment, a high degree of subjectivity is introduced. This project will develop, implement, and test the key components of a software system to categorize text documents and web pages. Supervised and unsupervised learning techniques from machine learning and text mining fields will be used to develop next-generation categorization tools, which will be integrated into an existing search engine. Commercial Applications and other Benefits as described by the awardee: Today, it is important to find information in a timely manner in the face of information overload. The new system will provide users with quicker access to the most relevant information, enabling organizations to improve the effectiveness of their searching capabilities.
Small Business Information at Submission:
Deep Web Technologies, Llc
301 North Guadalupe Suite 201 Santa Fe, NM 87501
Number of Employees: