Reasoning about unstructured data de-identification

Notice

The full text article is not available for purchase.

The publisher only permits individual articles to be downloaded by subscribers.

Authors: Thaine, Patricia ¹ ; Penn, Gerald ² ;

Source: Journal of Data Protection & Privacy, Volume 3 / Number 3 / Summer 2020, pp. 299-309(11)

Publisher: Henry Stewart Publications

We frame the problem of de-identifying unstructured text within the greater landscape of privacy-enhancing technologies. We then cover what sort of background knowledge can be gained from only stylistic information about a written document and how we can use research on authorship attribution and author profiling to improve our understanding about the sorts of inferences that can be made from an otherwise de-identified text. Finally, we provide a risk score for determining the likelihood that a message will be attributed to a particular author within a dataset using only author profiling tools.

Keywords: anonymisation; author profiling; authorship attribution; de-identification; risk; unstructured data

Document Type: Research Article

Affiliations: 1: PhD Candidate, University of Toronto Co-Founder & CEO, Private AI 2: Professor of Computer Science, University of Toronto Co-Founder & Chief Science Officer, Private AI

Publication date: 01 June 2020

More about this publication?

Journal of Data Protection & Privacy publishes in-depth, peer-reviewed articles, case studies and applied research on all aspects of data protection, information security and privacy issues across the European Union and other jurisdictions, in the wake of the new EU General Data Protection Regulation (GDPR) and the biggest change in data protection and privacy for two decades.
Editorial Board
Information for Authors
Submit a Paper
Subscribe to this Title
Terms & Conditions
Ingenta Connect is not responsible for the content or availability of external websites

Access Key
Free content
Partial Free content
New content
Open access content
Partial Open access content
Subscribed content
Partial Subscribed content
Free trial content

Reasoning about unstructured data de-identification

Notice

Sign-in

Tools

Share Content