Darshan.B. Patel, Dheeraj Kumar Singh
Abstract: Data mining techniques are becoming more and more important for assisting decision making processes and, more generally, to extract hidden knowledge from massive data collections in the form of patterns, models, and trends that hold in the data collections. During this extraction of hidden knowledge from this massive data collection, privacy of data is a big issue. PPDM (Privacy Preserving Data Mining) approaches protect data by modifying them to mask or erase the original sensitive data that should not be revealed. PPDM approaches based on principle- loss of privacy, measuring the capacity of estimating the original data from the modified data, and loss of information, measuring the loss of accuracy in the data. The main goal of these approaches is therefore to provide a trade-off between privacy and accuracy. In this paper we show that l-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t).
Keywords: Data mining, PPDM, l-diversity, t-Closeness