A method and system for realizing data set desensitization based on k-anonymity algorithm

An anonymous algorithm and data set technology, applied in the field of data desensitization, can solve the problems of not considering the distance, increasing data availability, reducing the availability of obtained data, etc., and achieve the effect of improving the problem of area angle and increasing availability

Active Publication Date: 2022-03-29
JINAN UNIVERSITY
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0017] In order to solve the problem that the prior art does not take into account the distance of adjacent data in the temporary anonymous group, data information loss is likely to occur when generating a data set, and the existing segmentation is a rectangle formed with area corners, thereby reducing the usability of the obtained data To solve this problem, a method and system for desensitizing data sets based on the k-anonymity algorithm is provided, which can solve the problem of area corners, and under the premise of protecting privacy, the more anonymous groups of data, the more generalized the data The lower the rate, the greater the availability of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for realizing data set desensitization based on k-anonymity algorithm
  • A method and system for realizing data set desensitization based on k-anonymity algorithm
  • A method and system for realizing data set desensitization based on k-anonymity algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0092] A method for realizing data set desensitization based on k-anonymity algorithm, said method comprising the following steps:

[0093] S1: Input the original data set T, and set parameters τ, P, k; among them, P represents the record set in the data table, τ represents the hypersphere area occupied by the record set; k represents the parameters of the algorithm, indicating that there are at least k A record has the same quasi-marker as other records, so that each record has a probability of 1 / k to be confirmed;

[0094] S2: Delete the display identifier of each record in the original data set, define the order of the value domain of each attribute in the quasi-identifier, and make it an ordered domain; then map the ordered domain to the real number domain one by one;

[0095] S3: Set range=τ, TMP=P, express it as a linear function of k by |P|=αk+β, where β is a non-negative number smaller than k, and α represents the relationship between the number of records in the anony...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for desensitizing a data set based on a k-anonymity algorithm, comprising the following steps: obtaining a data set that has not been desensitized; deleting the display identifier of the data set; Each attribute takes the order of the value domain to make it an ordered domain; maps the ordered domain to the real number domain one by one; defines the distance of each data point in space and calculates the relative distance, and divides it according to the relative distance combined with the projection area density The algorithm determines the division point of the data set, calculates the division points at all levels recursively, and finally establishes the hypersphere group; generalizes the point information contained in each hypersphere in the hypersphere group, so that the accuracy of all records The values ​​of the identifiers are the same, and the desensitization process is completed. The present invention can improve the area corner problem existing in the rectangle, and can take into account the distance between adjacent points in the temporary anonymous group, so that more anonymous groups can be obtained under the premise of ensuring privacy protection, and the degree of generalization of data can be improved. The lower the value, the greater the availability of data.

Description

technical field [0001] The present invention relates to the technical field of data desensitization, and more specifically, to a method and system for desensitizing data sets based on a k-anonymity algorithm. Background technique [0002] The common processing method of privacy data anonymization originates from the data processing method in the statistical database, mainly through the information loss of the attribute values ​​in the published data in exchange for the accuracy of re-identifying some individuals through these attribute values, and at the same time It is possible to guarantee the availability of published data, and to achieve a balance between the accuracy of published data and privacy protection. [0003] As far as the current technology is concerned, the division strategy for anonymous groups disclosed in [1] is "anonymous algorithm based on rounding and division" (RPF) and "k-anonymous algorithm based on vertex and edge modification" is disclosed in docume...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F21/62G06F17/18
CPCG06F21/6254G06F17/18
Inventor 陈成赖兆荣
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products