A privacy protection data publishing method based on conditional probability distribution

A technology of conditional probability distribution and data release, which is applied in the field of information security and privacy protection, can solve problems such as restricting the flexibility of data space division, privacy leakage, and large data distribution errors, so as to protect user data privacy, protect data privacy, and control distribution effect of error

Active Publication Date: 2019-03-29
FUDAN UNIV
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006](1) In order to achieve the purpose of privacy protection, the l-diversity and t-closeness methods specify the distribution of sensitive attribute values ​​in each equivalence class Restrictions restrict the flexibility of data space division, which in turn affects the query accuracy of data
At the same time, both l-diversity and t-proximity methods assume that the attacker has the same prior knowledge in all transactions, ignoring the attacker's prior knowledge such as public common sense, which easily leads to privacy leakage;
[0007](2) The generalized data set is released in a non-standard form, which prevents many existing data mining tools from performing complex analysis on the data;
[0008](3) The privacy-preserving analysis of the generalization method is often limited to the case of one release
In fact, a query transaction often involves multiple data sets. Even if a certain published data set does not reveal personal privacy, connecting and combining multiple published data sets may lead to privacy leakage;
[0009] (4) Many generalization methods require users to select values ​​for privacy control parameters, which provides users with sufficient flexibility, but at the same time makes users in trouble
[0012](1) The perturbation of sensitive attribute values ​​is completely random, which will easily lead to a large error between the data distribution of the published data set and the data distribution of the original data set, making the data utility less Difference;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A privacy protection data publishing method based on conditional probability distribution
  • A privacy protection data publishing method based on conditional probability distribution
  • A privacy protection data publishing method based on conditional probability distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be further introduced below from three aspects: single-sensitive attribute algorithm, multi-sensitive attribute algorithm and multi-sensitive attribute improved algorithm.

[0028] 1. Single sensitive attribute algorithm

[0029] Only the case where the dataset has a sensitive attribute is considered here (ie: d S = 1). The algorithm consists of two stages (see Appendix 1 for the algorithm code).

[0030] In the first stage, each record in the input dataset T is traversed and the corresponding benchmark distribution is calculated. Taking record t as an example, based on the conditional probability distribution p(A S |A QI ), learn the prior knowledge model M from the dataset T\{t} (t) , and M (t) (A S|t[A QI ]) as the benchmark distribution for record t;

[0031] In the second stage, using the benchmark distribution M (t) (A S |t[A QI ]), randomly sample a sensitive attribute value for each record, and replace the original sensitiv...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of information security and privacy protection, and is a privacy protection data publishing method based on conditional probability distribution. Accordingto the conditional probability distribution, an attacker's prior knowledge is modeled so that the attacker has different prior knowledge in different transactions. Then using the constructed model and quasi-identifier attribute value, the sensitive attribute value of each record is predicted, and the original value is replaced with the predicted value, and then published. There is no direct correlation between the predicted values of the published sensitive attributes and the original values, which effectively protects the privacy of user data. The predicted distribution of sensitive attribute values is similar to the real distribution, which effectively controls the distribution error and ensures the availability of the published dataset better than that of the generalized and stochasticresponse method. The invention can provide privacy protection mechanism for data release in various social fields such as medical treatment, finance, credit generation, transportation and the like, and provides support for application of data in scientific research and social service while protecting user data privacy.

Description

technical field [0001] The invention belongs to the technical field of information security and privacy protection, and in particular relates to a privacy protection method in a data publishing scene. Background technique [0002] The goal of privacy protection research is to find some data processing methods that will not prevent third parties from accessing data sets with sensitive information, while avoiding the disclosure of private information. For example, hospitals hold a large amount of patient medical data. On the one hand, allowing researchers to analyze and mine these data can promote the development of medical and health care; on the other hand, while disclosing these data, personal privacy ( Such as sickness suffered by patients, etc.) should be protected. Simply deleting personally identifiable information from the data set and cutting off the connection between individuals and sensitive information is far from enough to protect data privacy, because if certai...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/62
CPCG06F21/6245
Inventor 周水庚关佶红刘朝斌
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products