Detecting Identity of Authors from Lexical Elements and Cognitive Topics (DIALECT)

Award Information
Agency:
Department of Defense
Amount:
$79,999.00
Program:
SBIR
Contract:
N00014-12-M-0205
Solitcitation Year:
2012
Solicitation Number:
2012.1
Branch:
Navy
Award Year:
2012
Phase:
Phase I
Agency Tracking Number:
N121-080-0319
Solicitation Topic Code:
N121-080
Small Business Information
Aptima, Inc.
12 Gill Street, Suite 1400, Woburn, MA, -
Hubzone Owned:
N
Woman Owned:
N
Socially and Economically Disadvantaged:
N
Duns:
967259946
Principal Investigator
 Charlotte Shabarekh
 Senior Research Scientist
 (781) 496-2465
 cshabarekh@aptima.com
Business Contact
 Thomas McKenna
Title: Chief Financial Officer
Phone: (781) 496-2443
Email: mckenna@aptima.com
Research Institution
N/A
Abstract
Exploiting the anonymous nature of the internet, terrorists are able to cloak their identity when authoring blogs, posting to chatrooms and sending tweets by using pseudonyms and creating multiple usernames. This makes it difficult to ascertain who the true author is of a web post, and to determine if posts under different profiles, across websites can be attributed to the same author. Detecting Identity of Authors from Lexical Elements and Cognitive Topics (DIALECT) addresses the challenge of authorship attribution facing intelligence analysts working with Open-Source Intelligence (OSINT). Using an inherently language-independent approach, DIALECT automatically learns a profile of linguistic, idiosyncratic and content-based features that form a unique fingerprint for an author. Additionally, DIALECT uses social science theory to influence the core machine learning algorithm"s selection of dialectal and semantic features for use in distinguishing which cultural, tribal, religious or political groups the author belongs to. By associating authors with their socio-cultural group, DIALECT provides insight into the authors"cognitive processes, such as their political leanings and ideological affiliations. By modeling feature sets at both the individual author and group levels, DIALECT is able to attribute documents to groups, even when it is unable to determine the specific author.

* information listed above is at the time of submission.

Agency Micro-sites

US Flag An Official Website of the United States Government