System and method for data anonymization using hierarchical data clustering and perturbation

a hierarchical data and clustering technology, applied in the field of data anonymization, can solve the problems of l-diversity, l-diversity has a number of limitations, and is neither necessary nor sufficient to prevent attribute disclosure, and achieves k-anonymity by generalization in the case of high-dimensional datasets

Active Publication Date: 2015-09-15
ELECTRIFAI LLC
View PDF1 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, achieving k-anonymity by generalization is not feasible in cases of high-dimensional datasets because there are many attributes and unique combinations even after the generalization of some attributes.
It has been shown using two simple attacks that a k-anonymized dataset has some subtle, but severe, privacy problems.
However, research shows that l-diversity has a number of limitations and is neither necessary nor sufficient to prevent attribute disclosure.
However, this metric destroys the correlations among different attributes, which may cause statistical inferences from the data to no longer be valid.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for data anonymization using hierarchical data clustering and perturbation
  • System and method for data anonymization using hierarchical data clustering and perturbation
  • System and method for data anonymization using hierarchical data clustering and perturbation

Examples

Experimental program
Comparison scheme
Effect test

first embodiment

[0031]FIG. 6 is a diagram illustrating a first anonymized dataset 600 obtained after applying the perturbation step 240 of FIG. 2 on the dataset 500 of FIG. 5. This embodiment can be referred to as an “assign” method. The assign method achieves k-anonymity in a k-sized cluster (here, 4-sized) by randomly assigning the attribute value of one record to all the records in that cluster. For example, the first cluster 510 of FIG. 5 is converted to an anonymized cluster 610 by assigning the circled values in the first cluster 510 to all the records of the first cluster, resulting in the first anonymized cluster 610. This is done for quasi-identifiers, which are the set of attributes that can be linked with external datasets to identify individuals. In the current example, zip code, age, and nationality are quasi-identifiers, while disease is a sensitive attribute. Similarly, the second and third clusters 520, 530 of FIG. 5 are converted to second and third anonymized clusters 620, 630 of ...

second embodiment

[0032]FIG. 7 is a diagram illustrating a second anonymized dataset 700 obtained after applying the perturbation step 240 of FIG. 2 on the dataset 500 of FIG. 5. This embodiment can be referred to as a “shuffle” method. In the shuffle method, the values of an attribute are shuffled among the records in each k-sized cluster (here, 4-sized) by a random permutation. For example, the first cluster 510 is converted to a first anonymized cluster 710 by randomly shuffling the record values among each other. The shuffling is done only for quasi-identifiers, as in the case of assign method, described above. Similarly, the second and third clusters 520, 530 of FIG. 5 are converted to second and third anonymized clusters 720, 730 of FIG. 7. While this second method does not achieve k-anonymity, it is still advantageous for at least two reasons. First, it does not change the attribute-wise statistical distributions of data within each cluster; and second, it preserves the statistical relationshi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A system and method for data anonymization using hierarchical data clustering and perturbation is provided. The system includes a computer system and an anonymization program executed by the computer system. The system converts the data of a high-dimensional dataset to a normalized vector space and applies clustering and perturbation techniques to anonymize the data. The conversion results in each record of the dataset being converted into a normalized vector that can be compared to other vectors. The vectors are divided into disjointed, small-sized clusters using hierarchical clustering processes. Multi-level clustering can be performed using suitable algorithms at different clustering levels. The records within each cluster are then perturbed such that the statistical properties of the clusters remain unchanged.

Description

CROSS-REFERENCE TO RELATED APPLICATION[0001]This application claims priority to U.S. provisional Patent Application No. 61 / 659,178 filed on Jun. 13, 2012, which is incorporated herein in its entirety by reference and made a part hereof.BACKGROUND[0002]1. Field of the Invention[0003]The present invention relates generally to data anonymization. More specifically, the present invention relates to a system and method for data anonymization using hierarchical data clustering and perturbation.[0004]2. Related Art[0005]In today's digital society, record-level data has increasingly become a vital source of information for businesses and other entities. For example, many government agencies are required to release census and other record-level data to the public, to make decision-making more transparent. Although transparency can be a significant driver for economic activity, care must to be taken to safeguard the privacy of individuals and to prevent sensitive information from falling into...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(United States)
IPC IPC(8): G06F17/30
CPCG06F17/30569G06F16/2228G06F16/258
Inventor GOYAL, KANAVPRAGYA, CHAYANIKAGARG, RAHUL
Owner ELECTRIFAI LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products