Skip to main content

Reasoning about unstructured data de-identification

Notice

The full text article is not available for purchase.

The publisher only permits individual articles to be downloaded by subscribers.

We frame the problem of de-identifying unstructured text within the greater landscape of privacy-enhancing technologies. We then cover what sort of background knowledge can be gained from only stylistic information about a written document and how we can use research on authorship attribution and author profiling to improve our understanding about the sorts of inferences that can be made from an otherwise de-identified text. Finally, we provide a risk score for determining the likelihood that a message will be attributed to a particular author within a dataset using only author profiling tools.

Keywords: anonymisation; author profiling; authorship attribution; de-identification; risk; unstructured data

Document Type: Research Article

Affiliations: 1: PhD Candidate, University of Toronto Co-Founder & CEO, Private AI 2: Professor of Computer Science, University of Toronto Co-Founder & Chief Science Officer, Private AI

Publication date: 01 June 2020

More about this publication?
  • Journal of Data Protection & Privacy publishes in-depth, peer-reviewed articles, case studies and applied research on all aspects of data protection, information security and privacy issues across the European Union and other jurisdictions, in the wake of the new EU General Data Protection Regulation (GDPR) and the biggest change in data protection and privacy for two decades.
  • Editorial Board
  • Information for Authors
  • Submit a Paper
  • Subscribe to this Title
  • Terms & Conditions
  • Ingenta Connect is not responsible for the content or availability of external websites
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
UA-1313315-29