Tree Traversal to Achieve Generalization for Data De-identification
Abstract
Every day data is published of different types and
from various sources. Data de-identification protects the privacy
of most of this data before its publication. Over the recent years,
a technique proposed by Dr. Sweeney, known as kanonymization
as a means for privacy protection has gained
great popularity. There has been intensive research involving this
method and many alterations, in the hope to find an optimal
solution in real-time to the generalization problem. To achieve
either generalization or suppression, researchers have used
different types of heuristics, most of them being tree-based.
Although this is a heavily investigated area, there is no simple
method to prepare data for generalization; in theory, there are
infinite methods for data preparation and partitioning. In this
research, we first propose the use of commonly known algorithms
to prepare data and achieve generalization. We then introduce
the use of tree-based algorithms and tree traversal as the
mechanism to achieve data generalization. We further investigate
them, by comparing the quality of generalization sets obtained in
each traversal method, in the hope to determine which method is
best.
Full Text:
PDFReferences
K. Yang, X. Jia, K. Zhang, and others, “Privacy-Preserving Data
Publish-Subscribe Service on Cloud-based Platforms,” 2014.
L. Jiang, Proceedings of the 2011 International Conference on
Informatics, Cybernetics, and Computer Engineering (ICCE2011)
November 19-20, 2011, Melbourne, Australia: Volume 1: Intelligent
Control and Network Communication, vol. 110. Springer Science &
Business Media, 2011.
R. Turn, “Information privacy issues for the 1990s,” in Research in
Security and Privacy, 1990. Proceedings., 1990 IEEE Computer Society
Symposium on, 1990, pp. 394–400.
V. S. Iyengar, “Transforming data to satisfy privacy constraints,” in
Proceedings of the eighth ACM SIGKDD international conference on
Knowledge discovery and data mining, 2002, pp. 279–288.
J. Nin and J. Herranz, Privacy and Anonymity in Information
Management Systems. Springer, 2010.
A. Manta, “Literature survey on privacy preserving mechanisms for data publishing,” 2013.
A. Saroha, “Survey of k-Anonymity,” 2014.
C. Thomas and D. Thomas, “A Survey on Privacy Preservation in Data
Publishing.”
O. Heffetz and K. Ligett, “Privacy and data-based research,” Journal of Economic Perspectives, vol. 28, no. 2, pp. 75–98, 2014.
J. Cao and P. Karras, “Publishing microdata with a robust privacy
guarantee,” Proceedings of the VLDB Endowment, vol. 5, no. 11, pp.
–1399, 2012.
S. L. Garfinkel, “De-identification of personal information,” NISTIR,
vol. 8053, pp. 1–46, 2015.
M. Callahan, “Us dhs handbook for safeguarding sensitive personally
identifiable information,” Washington, DC, 2012.
G. Navarro-Arribas and V. Torra, Advanced research in data privacy,
vol. 567. Springer, 2014.
F. Kohlmayer, F. Prasser, and K. A. Kuhn, “The cost of quality:
Implementing generalization and suppression for anonymizing
biomedical data with minimal information loss,” Journal of biomedical
informatics, vol. 58, pp. 37–48, 2015.
B. C. Fung, K. Wang, A. W.-C. Fu, and S. Y. Philip, Introduction to
privacy-preserving data publishing: Concepts and techniques. Chapman
and Hall/CRC, 2010.
B. C. Fung, K. Wang, and P. S. Yu, “Top-down specialization for
information and privacy preservation,” in Data Engineering, 2005.
ICDE 2005. Proceedings. 21st International Conference on, 2005, pp.
–216.
Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data mining and knowledge discovery,
vol. 2, no. 3, pp. 283–304, 1998.
D. Riboni, L. Pareschi, and C. Bettini, “Preserving Privacy in Sequential Data Release against Background Knowledge Attacks,” arXiv preprint arXiv:1010.0924, 2010.
E. McCallister, Guide to protecting the confidentiality of personally
identifiable information. Diane Publishing, 2010.
L. Sweeney, “Achieving k-anonymity privacy protection using
generalization and suppression,” International Journal of Uncertainty,
Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 571–588, 2002.
M. Press, Microsoft Press computer dictionary: the comprehensive
standard for business, school, library, and home. Microsoft Pr, 1994.
K. El Emam and F. K. Dankar, “Protecting privacy using k-anonymity,” Journal of the American Medical Informatics Association, vol. 15, no. 5, pp. 627–637, 2008.
L. Sweeney, “k-anonymity: A model for protecting privacy,”
International Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems, vol. 10, no. 05, pp. 557–570, 2002.
A. et al. Machanavajjhala, “l-diversity: Privacy beyond k-anonymity,”
ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1,
no. 1, p. 3, 2007.
P. Samarati, “Protecting respondents identities in microdata release,” IEEE transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010–1027, 2001.
W. E. Yancey, W. E. Winkler, and R. H. Creecy, “Disclosure risk
assessment in perturbative microdata protection,” in Inference control in
statistical databases, Springer, 2002, pp. 135–152.
P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information,” in PODS, 1998, vol. 98, p. 188.
J.-W. Byun, A. Kamra, E. Bertino, and N. Li, “Efficient kanonymization using clustering techniques,” in International Conference
on Database Systems for Advanced Applications, 2007, pp. 188–200.
L. Willenborg and T. De Waal, Elements of statistical disclosure
control, vol. 155. Springer Science & Business Media, 2012.
S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee, “Toward
privacy in public databases,” in Theory of Cryptography Conference, 2005, pp. 363–385.
A. Meyerson and R. Williams, “On the complexity of optimal kanonymity,” in Proceedings of the twenty-third ACM SIGMODSIGACT-
SIGART symposium on Principles of database systems, 2004, pp. 223–228.
M. E. Kabir, H. Wang, and E. Bertino, “Efficient systematic clustering
method for k-anonymization,” Acta Informatica, vol. 48, no. 1, pp. 51– 66, 2011.
V. Ciriani, S. D. C. Di Vimercati, S. Foresti, and P. Samarati, “kanonymous data mining: A survey,” in Privacy-preserving data mining,
Springer, 2008, pp. 105–136.
W. K. Wong, N. Mamoulis, and D. W. L. Cheung, “Non-homogeneous
generalization in privacy preserving data publishing,” in Proceedings of
the 2010 ACM SIGMOD International Conference on Management of
data, 2010, pp. 747–758.
S. Pigeon, “Huffman coding,” Lossless Compression Handbook, pp. 79–100, 2003.
S. S. Skiena, The algorithm design manual: Text, vol. 1. Springer
Science & Business Media, 1998.
K. Mehlhorn and P. Sanders, Algorithms and data structures: The basic toolbox. Springer Science & Business Media, 2008.
D. A. Huffman, “A method for the construction of minimumredundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms. MIT press, 2009.
S. Martello, “Knapsack problems: algorithms and computer
implementations,” Wiley-Interscience series in discrete mathematics and
optimization, 1990.
E. Bertino, D. Lin, and W. Jiang, “A survey of quantification of privacy preserving data mining algorithms,” in Privacy-preserving data mining, Springer, 2008, pp. 183–205.
J.-L. Lin and M.-C. Wei, “An efficient clustering method for kanonymization,” in Proceedings of the 2008 international workshop on
Privacy and anonymity in information society, 2008, pp. 46–50.
R. J. Bayardo and R. Agrawal, “Data privacy through optimal kanonymization,” in Data Engineering, 2005. ICDE 2005. Proceedings.
st International Conference on, 2005, pp. 217–228.
A. Solanas, F. Sebé, and J. Domingo-Ferrer, “Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond,” in Proceedings of the 2008 international workshop on Privacy and anonymity in information society, 2008, pp. 61–69.
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, “Fast data
anonymization with low information loss,” in Proceedings of the 33rd
international conference on Very large data bases, 2007, pp. 758–769.
J. Soria-Comas, “Improving data utility in differential privacy and kanonymity,” arXiv preprint arXiv:1307.0966, 2013.
C. Fang and E.-C. Chang, “Information leakage in optimal anonymized and diversified data,” in International Workshop on Information Hiding, 2008, pp. 30–44.
K. B. Frikken and Y. Zhang, “Yet another privacy metric for publishing micro-data,” in Proceedings of the 7th ACM workshop on Privacy in the electronic society, 2008, pp. 117–122.
S. M. Morton, “An Improved Utility Driven Approach Towards kanonymity Using Data Constraint Rules,” 2013.
T. Li and N. Li, “Towards optimal k-anonymization,” Data &
Knowledge Engineering, vol. 65, no. 1, pp. 22–39, 2008.
G. T. Duncan and S. Mukherjee, “Optimal disclosure limitation strategy in statistical databases: Deterring tracker attacks through additive noise,” Journal of the American Statistical Association, vol. 95, no. 451, pp. 720–729, 2000.
“UC Irvine Machine Learning Repository: Adult Data Set:” [Online].
Available: http://archive.ics.uci.edu/ml/datasets/Adult. [Accessed: 04- Jul-2016].
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient
full-domain k-anonymity,” in Proceedings of the 2005 ACM SIGMOD
international conference on Management of data, 2005, pp. 49–60.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162