Big data generation method and system for preventing privacy leakage

A technology of privacy leakage and big data, which is applied in the field of big data generation methods and systems to prevent privacy leakage, and can solve the problems of increasing the extra cost of algorithms related to big data analysis, reducing the efficiency of algorithms, and affecting the effect of data analysis, etc.

Active Publication Date: 2018-04-06
PEKING UNIV
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although the above existing common methods can solve the privacy leakage problem caused by sensitive data, they also have a negative impact on big data analysis
For example, data desensitization processing deletes some information in the original data, resulting in incomplete data, which is not conducive to in-depth analysis; at the same time, desensitization processing is only for more obvious private information (such as address, phone number, etc.) Sensitive processing (such as user browsing records, purchase preferences, etc.) can also reveal some privacy habits of users
In the method of encrypting or random perturbation of data, although the sensitive information is no longer visible, the encryption and perturbation operations increase the extra overhead of algorithms related to big data analysis and reduce the efficiency of the algorithm; at the same time, the original information is modified, which will affect the data Users cause certain troubles, thus affecting the effect of data analysis
Therefore, there is still a lack of a method that can maintain the integrity of the original data to the greatest extent, without increasing the overhead of big data analysis, and at the same time prevent the leakage of private information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data generation method and system for preventing privacy leakage
  • Big data generation method and system for preventing privacy leakage
  • Big data generation method and system for preventing privacy leakage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0071] The present invention provides a large data generation method for preventing privacy leakage. By estimating the probability distribution of features, and generating corresponding random numbers as feature values ​​to form random samples, and then using the nearest neighbor model to verify it, it is possible to obtain Synthetic data of private information.

[0072] figure 1 is a flow chart of a big data generation method to prevent privacy leakage; figure 2 It is a structural block diagram of a big data generation system that prevents privacy leakage.

[0073] The anti-privacy leakage big data generation system provided in the embodiment uses raw data and tag type feature marks as data input, and includes a data processing module, a random sample generation module, and a random sample verifi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big data generation method and system for preventing privacy leakage and relates to the technical fields of privacy protection and data mining. According to the method, datawhich is similar to original data but does not contain true sensitive information is generated through a data synthesis method; the generated synthetic data can be used in a data analysis algorithm; and by use of the generated synthetic data in the data analysis algorithm, privacy leakage in a big data analysis process can be prevented. The method comprises the steps that the original data is preprocessed; probability distribution of features is estimated; a nearest neighbor model is generated; random samples are generated; the random samples are verified; and postprocessing is performed. Thesystem comprises a data processing module, a random sample generation module and a random sample verification module. Through the big data generation method and system, big data generation is realizedthrough a data synthesis mode, privacy leakage in the original data can be prevented, meanwhile, the integrity of the data samples can be guaranteed, and the expenditure of big data analysis is not increased.

Description

technical field [0001] The invention relates to the technical fields of privacy protection and data mining, in particular to a big data generation method and system for preventing privacy leakage. Background technique [0002] In recent years, with the development of big data-related technologies, big data analysis has been widely used and penetrated into various fields and industries. With the help of big data analysis technology, shopping websites can recommend products of interest to users and increase sales revenue; scenic spots can predict passenger flow peaks, and take countermeasures in advance to ensure service quality; banks can analyze each transaction record to prevent unauthorized transactions. . Although big data analysis has brought many conveniences to our life and work, it has also brought some privacy issues. [0003] In order to conduct big data analysis, a large amount of user-related data, including user personal information, preferences, browsing recor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/62G06F17/18G06K9/62
CPCG06F17/18G06F21/6245G06F18/24143
Inventor 李影岳阳易可欣吴中海
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products