TY - JOUR
T1 - A Globally Optimal k-Anonymity Method for the De-Identification of Health Data
AU - El Emam, Khaled
AU - Dankar, Fida Kamal
AU - Issa, Romeo
AU - Jonker, Elizabeth
AU - Amyot, Daniel
AU - Cogo, Elise
AU - Corriveau, Jean Pierre
AU - Walker, Mark
AU - Chowdhury, Sadrul
AU - Vaillancourt, Regis
AU - Roffey, Tyson
AU - Bottomley, Jim
N1 - Funding Information:
The authors thank Bradley Malin (Vanderbilt University) for reviewing an earlier version of this paper. This work was partially funded by the Canadian Institutes of Health Research and The Natural Sciences and Engineering Research Council of Canada.
PY - 2009/9
Y1 - 2009/9
N2 - Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
AB - Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
UR - http://www.scopus.com/inward/record.url?scp=69549114557&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69549114557&partnerID=8YFLogxK
U2 - 10.1197/jamia.M3144
DO - 10.1197/jamia.M3144
M3 - Article
C2 - 19567795
AN - SCOPUS:69549114557
SN - 1067-5027
VL - 16
SP - 670
EP - 682
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
IS - 5
ER -