MapReduce-based distributed data anonymity processing method

A technology of distributed data and processing methods, which is applied in the field of data processing to achieve efficient processing, solve the problem of insufficient server storage and computing capabilities, and improve efficiency

Active Publication Date: 2017-04-26
徐工汉云技术股份有限公司
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But there is no feasible solution for global anonymity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce-based distributed data anonymity processing method
  • MapReduce-based distributed data anonymity processing method
  • MapReduce-based distributed data anonymity processing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0041] Taking a data table with four quasi-identifiers as an example, the specific implementation process is as follows:

[0042] Step 1, first choice, determine the quasi-identifier of the data table, take a data table with four quasi-identifiers (Supplier, Code, Price, Time) as an example for data processing, the generalization rules are as follows: S0 (supplier) , C0 (material code), P0 (material price), T0 (process time) are quasi-identifiers, which are used for generalized attributes. According to prior knowledge, formulate corresponding generalization rules for alignment identifiers:

[0043] For example, generalize {a certain limited company in Xuzhou City, a certain limited company in Beijing, a certain limited company in Hefei, a certain limited company in Suzhou,...} attribute from h=0 to h=1 layer and become gender {Jiangsu Province, Beijing, Anhui Province,...}, generalize from h=1 to h=2 to {China}; generalize the specific time (T) of the process to {≤30min, >30m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a MapReduce-based distributed data anonymity processing method, which comprises a server side and computer terminals, wherein an original data table is stored in the server side to carry out global generalization on data and give a generalized lattice which is likely to meet k- anonymity; the server side utilizes a method of bisection to allocate a computational node to each computer terminal; each computer terminal carries out computation in parallel and returns a value to the server side according to a computation condition; if the return value does not meet k- anonymity, the server side sends a descendant node determined by the method of bisection to each computer node, otherwise the server side sends an ancestor node determined by the method of bisection to a computer; and each computer terminal recalculates according to a new node given by the server side until all nodes which meet k- anonymity are found. The method solves the trouble between explosive data growth and existing server storage and computational capabilities, and the efficiency of massive data processing is improved.

Description

technical field [0001] The invention relates to a distributed data anonymous processing method based on MapReduce, which belongs to the technical field of data processing. Background technique [0002] Due to the needs of knowledge decision-making, information sharing, and scientific research, data owners need to release the data to the outside world. In order to reduce the possibility of privacy leakage during the data release process, it is necessary for the data owner to perform privacy protection related processing on the data before release. [0003] Currently, Sweeney and Samarati et al. proposed a k-anonymity privacy protection model. The k-anonymity privacy protection model can avoid connection attacks and effectively protect private data information, but does not take effective protection measures for sensitive attribute information, and there is still a risk of private data information leakage. In the case of homogeneity attack, background knowledge attack, simil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/62G06F21/57
CPCG06F21/57G06F21/6254
Inventor 黄凯张启亮
Owner 徐工汉云技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products