Tree Traversal to Achieve Generalization for Data De-identification
Every day data is published of different types and
from various sources. Data de-identification protects the privacy
of most of this data before its publication. Over the recent years,
a technique proposed by Dr. Sweeney, known as kanonymization
as a means for privacy protection has gained
great popularity. There has been intensive research involving this
method and many alterations, in the hope to find an optimal
solution in real-time to the generalization problem. To achieve
either generalization or suppression, researchers have used
different types of heuristics, most of them being tree-based.
Although this is a heavily investigated area, there is no simple
method to prepare data for generalization; in theory, there are
infinite methods for data preparation and partitioning. In this
research, we first propose the use of commonly known algorithms
to prepare data and achieve generalization. We then introduce
the use of tree-based algorithms and tree traversal as the
mechanism to achieve data generalization. We further investigate
them, by comparing the quality of generalization sets obtained in
each traversal method, in the hope to determine which method is
Full Text:
