System and method for data anonymization using hierarchical data clustering and perturbation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
a hierarchical data and clustering technology, applied in the field of data anonymization, can solve the problems of l-diversity, l-diversity has a number of limitations, and is neither necessary nor sufficient to prevent attribute disclosure, and achieves k-anonymity by generalization in the case of high-dimensional datasets

Active Publication Date: 2015-09-15

ELECTRIFAI LLC

View PDF1 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, achieving k-anonymity by generalization is not feasible in cases of high-dimensional datasets because there are many attributes and unique combinations even after the generalization of some attributes.

It has been shown using two simple attacks that a k-anonymized dataset has some subtle, but severe, privacy problems.

However, research shows that l-diversity has a number of limitations and is neither necessary nor sufficient to prevent attribute disclosure.

However, this metric destroys the correlations among different attributes, which may cause statistical inferences from the data to no longer be valid.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

first embodiment

[0031]FIG. 6 is a diagram illustrating a first anonymized dataset 600 obtained after applying the perturbation step 240 of FIG. 2 on the dataset 500 of FIG. 5. This embodiment can be referred to as an “assign” method. The assign method achieves k-anonymity in a k-sized cluster (here, 4-sized) by randomly assigning the attribute value of one record to all the records in that cluster. For example, the first cluster 510 of FIG. 5 is converted to an anonymized cluster 610 by assigning the circled values in the first cluster 510 to all the records of the first cluster, resulting in the first anonymized cluster 610. This is done for quasi-identifiers, which are the set of attributes that can be linked with external datasets to identify individuals. In the current example, zip code, age, and nationality are quasi-identifiers, while disease is a sensitive attribute. Similarly, the second and third clusters 520, 530 of FIG. 5 are converted to second and third anonymized clusters 620, 630 of ...

second embodiment

[0032]FIG. 7 is a diagram illustrating a second anonymized dataset 700 obtained after applying the perturbation step 240 of FIG. 2 on the dataset 500 of FIG. 5. This embodiment can be referred to as a “shuffle” method. In the shuffle method, the values of an attribute are shuffled among the records in each k-sized cluster (here, 4-sized) by a random permutation. For example, the first cluster 510 is converted to a first anonymized cluster 710 by randomly shuffling the record values among each other. The shuffling is done only for quasi-identifiers, as in the case of assign method, described above. Similarly, the second and third clusters 520, 530 of FIG. 5 are converted to second and third anonymized clusters 720, 730 of FIG. 7. While this second method does not achieve k-anonymity, it is still advantageous for at least two reasons. First, it does not change the attribute-wise statistical distributions of data within each cluster; and second, it preserves the statistical relationshi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A system and method for data anonymization using hierarchical data clustering and perturbation is provided. The system includes a computer system and an anonymization program executed by the computer system. The system converts the data of a high-dimensional dataset to a normalized vector space and applies clustering and perturbation techniques to anonymize the data. The conversion results in each record of the dataset being converted into a normalized vector that can be compared to other vectors. The vectors are divided into disjointed, small-sized clusters using hierarchical clustering processes. Multi-level clustering can be performed using suitable algorithms at different clustering levels. The records within each cluster are then perturbed such that the statistical properties of the clusters remain unchanged.

Description

CROSS-REFERENCE TO RELATED APPLICATION[0001]This application claims priority to U.S. provisional Patent Application No. 61 / 659,178 filed on Jun. 13, 2012, which is incorporated herein in its entirety by reference and made a part hereof.BACKGROUND[0002]1. Field of the Invention[0003]The present invention relates generally to data anonymization. More specifically, the present invention relates to a system and method for data anonymization using hierarchical data clustering and perturbation.[0004]2. Related Art[0005]In today's digital society, record-level data has increasingly become a vital source of information for businesses and other entities. For example, many government agencies are required to release census and other record-level data to the public, to make decision-making more transparent. Although transparency can be a significant driver for economic activity, care must to be taken to safeguard the privacy of individuals and to prevent sensitive information from falling into...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(United States)

IPC IPC(8): G06F17/30

CPCG06F17/30569G06F16/2228G06F16/258

Inventor GOYAL, KANAVPRAGYA, CHAYANIKAGARG, RAHUL

Owner ELECTRIFAI LLC

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

System and method for data anonymization using hierarchical data clustering and perturbation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

first embodiment

second embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology