Detecting Identity of Authors from Lexical Elements and Cognitive Topics (DIALECT)
Exploiting the anonymous nature of the internet, terrorists are able to cloak their identity when authoring blogs, posting to chatrooms and sending tweets by using pseudonyms and creating multiple usernames. This makes it difficult to ascertain who the true author is of a web post, and to determine if posts under different profiles, across websites can be attributed to the same author. Detecting Identity of Authors from Lexical Elements and Cognitive Topics (DIALECT) addresses the challenge of authorship attribution facing intelligence analysts working with Open-Source Intelligence (OSINT). Using an inherently language-independent approach, DIALECT automatically learns a profile of linguistic, idiosyncratic and content-based features that form a unique fingerprint for an author. Additionally, DIALECT uses social science theory to influence the core machine learning algorithm"s selection of dialectal and semantic features for use in distinguishing which cultural, tribal, religious or political groups the author belongs to. By associating authors with their socio-cultural group, DIALECT provides insight into the authors"cognitive processes, such as their political leanings and ideological affiliations. By modeling feature sets at both the individual author and group levels, DIALECT is able to attribute documents to groups, even when it is unable to determine the specific author.
Small Business Information at Submission:
12 Gill Street Suite 1400 Woburn, MA -
Number of Employees: