Authorship attribution in the wild

Authors: Koppel, Moshe1; Schler, Jonathan2; Argamon, Shlomo3

Source: Language Resources and Evaluation, Volume 45, Number 1, March 2011 , pp. 83-94(12)

Publisher: Springer

Buy & download fulltext article:

OR

Price: $47.00 plus tax (Refund Policy)

Abstract:

Most previous work on authorship attribution has focused on the case in which we need to attribute an anonymous document to one of a small set of candidate authors. In this paper, we consider authorship attribution as found in the wild: the set of known candidates is extremely large (possibly many thousands) and might not even include the actual author. Moreover, the known texts and the anonymous texts might be of limited length. We show that even in these difficult cases, we can use similarity-based methods along with multiple randomized feature sets to achieve high precision. Moreover, we show the precise relationship between attribution precision and four parameters: the size of the candidate set, the quantity of known-text by the candidates, the length of the anonymous text and a certain robustness score associated with a attribution.

Keywords: Authorship attribution; Open candidate set; Randomized feature set

Document Type: Research Article

DOI: http://dx.doi.org/10.1007/s10579-009-9111-2

Affiliations: 1: Bar-Ilan University, Ramat-Gan, Israel, Email: moishk@gmail.com 2: Bar-Ilan University, Ramat-Gan, Israel, Email: schler@gmail.com 3: Illinois Institute of Technology, Chicago, IL, USA, Email: argamon@iit.edu

Publication date: March 1, 2011

Related content

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page