Big data generation method and system for preventing privacy leakage

A big data and data technology, applied in the field of big data generation methods and systems to prevent privacy leakage, can solve the problems of incomplete data, unfavorable in-depth analysis, and increased big data analysis overhead of raw data, etc.

Active Publication Date: 2020-09-08
PEKING UNIV
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although the above existing common methods can solve the privacy leakage problem caused by sensitive data, they also have a negative impact on big data analysis
For example, data desensitization processing deletes some information in the original data, resulting in incomplete data, which is not conducive to in-depth analysis; at the same time, desensitization processing is only for more obvious private information (such as address, phone number, etc.) Sensitive processing (such as user browsing records, purchase preferences, etc.) can also reveal some privacy habits of users
In the method of encrypting or random perturbation of data, although the sensitive information is no longer visible, the encryption and perturbation operations increase the extra overhead of algorithms related to big data analysis and reduce the efficiency of the algorithm; at the same time, the original information is modified, which will affect the data Users cause certain troubles, thus affecting the effect of data analysis
Therefore, there is still a lack of a method that can maintain the integrity of the original data to the greatest extent, without increasing the overhead of big data analysis, and at the same time prevent the leakage of private information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data generation method and system for preventing privacy leakage
  • Big data generation method and system for preventing privacy leakage
  • Big data generation method and system for preventing privacy leakage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0071] The present invention provides a large data generation method for preventing privacy leakage. By estimating the probability distribution of features, and generating corresponding random numbers as feature values ​​to form random samples, and then using the nearest neighbor model to verify it, it is possible to obtain Synthetic data of private information.

[0072] figure 1 is a flow chart of a big data generation method to prevent privacy leakage; figure 2 It is a structural block diagram of a big data generation system that prevents privacy leakage.

[0073] The anti-privacy leakage big data generation system provided in the embodiment uses raw data and tag type feature marks as data input, and includes a data processing module, a random sample generation module, and a random sample verifi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a large data generation method and system for preventing privacy leakage, and relates to the technical fields of privacy protection and data mining. Through the method of data synthesis, data that is similar to the original data but does not contain real sensitive information is generated; the generated synthetic data can be used by data analysis algorithms; the use of the generated synthetic data by data analysis algorithms can prevent privacy leakage in the process of big data analysis . Including: preprocessing raw data; estimating probability distribution of features; generating nearest neighbor model; generating random samples; validating random samples; postprocessing. The system includes a data processing module, a random sample generation module and a random sample verification module. The present invention realizes big data generation by means of synthetic data, which can not only prevent privacy leakage in original data, but also ensure the integrity of data samples without increasing the cost of big data analysis.

Description

technical field [0001] The invention relates to the technical fields of privacy protection and data mining, in particular to a big data generation method and system for preventing privacy leakage. Background technique [0002] In recent years, with the development of big data-related technologies, big data analysis has been widely used and penetrated into various fields and industries. With the help of big data analysis technology, shopping websites can recommend products of interest to users and increase sales revenue; scenic spots can predict passenger flow peaks, and take countermeasures in advance to ensure service quality; banks can analyze each transaction record to prevent unauthorized transactions. . Although big data analysis has brought many conveniences to our life and work, it has also brought some privacy issues. [0003] In order to conduct big data analysis, a large amount of user-related data, including user personal information, preferences, browsing recor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F21/62G06F17/18G06K9/62
CPCG06F17/18G06F21/6245G06F18/24143
Inventor 李影岳阳易可欣吴中海
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products