SBIR Phase II: Software to Automate the Detection of Websites that are Fraudulent or Otherwise Harmful to Consumers

Award Information
Agency:
National Science Foundation
Branch
n/a
Amount:
$500,000.00
Award Year:
2011
Program:
STTR
Phase:
Phase II
Contract:
1127567
Award Id:
n/a
Agency Tracking Number:
1127567
Solicitation Year:
2011
Solicitation Topic Code:
Phase II
Solicitation Number:
n/a
Small Business Information
3150 18th Street, Suite 318, Box 515, San Francisco, CA, 94110-2076
Hubzone Owned:
N
Minority Owned:
N
Woman Owned:
N
Duns:
020107209
Principal Investigator:
MichaelLai
(415) 894-5806
fastlane@sitejabber.com
Business Contact:
MichaelLai
BS
(415) 894-5806
fastlane@sitejabber.com
Research Institute:
GGL Projects, Inc.




Abstract
This Small Business Innovation Research (SBIR) Phase II project will develop software to automatically detect a broad spectrum of websites that are fraudulent or otherwise harmful to consumers. Much work has been done on specific software capable of detecting websites hosting malware or engaged in phishing. However, software does not yet exist which can detect a broader array of harmful websites, including those selling counterfeits, selling illegal drugs, and hosting weight-loss scams, to name just a few. The challenge in doing this involves selecting the right features of fraudulent sites which in isolation or combination are good indictors of a site's harmfulness. Using these features, a machine learning classifier can be trained using data on known harmful websites. Unknown websites can then be run through the classifier to evaluate their potential for harm. Additional challenges involve gathering sufficient data to properly train the classifier, making the classifier general enough to detect a range of harmful sites while still maintaining accuracy, and updating the classifier in real-time such that it can improve with ongoing human feedback and additional data. The principal impact of this project is the protection of consumers from online fraud. Today, consumers lack reliable resources to evaluate unfamiliar websites. Most use familiar sites like Amazon or take a gamble on Google search results. These gambles frequently result in fraud. It is believed that there are now over 250 million websites and $100 billion lost yearly to online fraud. While the statistics cover many types of fraud, examples of risky sites include online counterfeiters, pharmacies, and retailers. The software developed in this project will greatly improve transparency around websites and protect millions from fraud. The technical achievements in this project involve the use of a vector space model in converting non-discrete features of fraudulent sites into useful data that can be inputted into a machine learning classifier. Additionally, this technology will include innovative feature choices, access to high-quality data, and the creation of a general classifier capable of improving itself in real-time and detecting a broad array of heretofore undetectable fraudulent sites.

* information listed above is at the time of submission.

Agency Micro-sites


SBA logo

Department of Agriculture logo

Department of Commerce logo

Department of Defense logo

Department of Education logo

Department of Energy logo

Department of Health and Human Services logo

Department of Homeland Security logo

Department of Transportation logo

Enviromental Protection Agency logo

National Aeronautics and Space Administration logo

National Science Foundation logo
US Flag An Official Website of the United States Government