Quantifying the Effects of Mask Metadata Disclosure and Multiple Releases on the Confidentiality of Geographically Masked Health Data
Abstract:The availability of individual-level health data presents opportunities for monitoring the distribution and spread of emergent, acute, and chronic conditions, as well as challenges with respect to maintaining the anonymity of persons with health conditions. Particularly when such data are mapped as point locations, concerns arise regarding the ease with which individual identities may be determined by linking geographic coordinates to digital street networks, then determining residential addresses and, finally, names of occupants at specific addresses. The utility of such data sets must therefore be balanced against the requirements of protecting the confidentiality of individuals whose identities might be revealed through the availability of precise and accurate locational data. Recent literature has pointed toward geographic masking as a means for striking an appropriate balance between data utility and confidentiality. However, questions remain as to whether certain characteristics of the mask (mask metadata) should be disclosed to data users and whether two or more distinct masked versions of the data can be released without breaching confidentiality. In this article, we address these questions by quantifying the extent to which the disclosure of mask metadata and the release of multiple masked versions may affect confidentiality, with a view toward providing guidance to custodians of health data sets. The masks considered include perturbation, areal aggregation, and their combination. Confidentiality is measured by the areas of confidence regions for individuals' locations, which are derived under the probability models governing the masks, conditioned on the disclosed mask metadata.
Document Type: Research Article
Publication date: 2008-01-01