A data security method and system for MapReduce calculation

A mapreduce framework and data security technology, applied in the direction of digital data protection, computing, electronic digital data processing, etc., can solve the problem of unpredictable time-consuming

Active Publication Date: 2019-04-26
XIDIAN UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The time-consuming process of ran

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data security method and system for MapReduce calculation
  • A data security method and system for MapReduce calculation
  • A data security method and system for MapReduce calculation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0072] See image 3 As shown, compared with the standard MapReduce process, this embodiment mainly modifies the partition function of the map phase and adds another reduce phase in the rewriting of the MapReduce execution process. Another reduce stage added before the standard reduce is called reduce1, and the standard reduce is rewritten as reduce2 in the present invention. The rewritten MapReduce conforms to the indistinguishability of map output, and the amount of data received by each reduce task that conforms to the indistinguishability of the input at the reduce end is the same.

[0073] Suppose the data set input used by the user to submit the job is D, |D| represents the input data size, M represents the number of map tasks, R represents the number of reduce tasks in reduce1 and the number of reduce tasks in reduce2 (that is, the number of reduce tasks in reduce1 and reduce2 The number of reduce tasks is equal). The processing methods of the map phase, the reduce1 phase,...

Embodiment approach 2

[0079] This embodiment uses the method of adding fake data to realize the confidentiality of the data, and at the same time protects the number of key types K. In this embodiment, MapReduce is rewritten to meet the indistinguishability of the map output. The indistinguishability of the input at the reduce end is not equal, but after random addition and marking, it has no post-statistical speculation significance.

[0080] See Figure 4 As shown, compared with the standard MapReduce process, this embodiment mainly modifies the partition function of the map phase and adds another reduce phase in the rewriting of the MapReduce execution process. Another reduce stage added before the standard reduce is called reduce1, and the standard reduce is rewritten as reduce2 in the present invention.

[0081] Suppose the data set input used by the user to submit the job is D, |D| represents the input data size, M represents the number of map tasks, R represents the number of reduce tasks in redu...

Embodiment approach 3

[0090] This embodiment uses the method of adding fake data to realize data confidentiality, and at the same time protects the number of key types K. In this embodiment, MapReduce is rewritten to conform to the indistinguishability of the map output, and the amount of data received by each reduce task that conforms to the indistinguishability of the input at the reduce end is the same.

[0091] See Figure 5 As shown, compared with the standard MapReduce process, this embodiment mainly modifies the partition function of the map phase and adds another reduce phase in the rewriting of the MapReduce execution process. Another reduce stage added before the standard reduce is called reduce1, and the standard reduce is rewritten as reduce2 in the present invention.

[0092] Suppose the data set input used by the user to submit the job is D, |D| represents the input data size, M represents the number of map tasks, R represents the number of reduce tasks in reduce1 and the number of reduce ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data security method and system for MapReduce calculation. The method comprises the following steps: adding another reduce stage reduce 1 before a standard reduce stage reduce 2 of MapReduce; Writing a random distribution function into the function of the map stage, and sending the data in each map task of the map stage to each reduce task of the reduce 1 stage on average; Sending the key value pair data combined with each reduce task in the reduce 1 stage to each reduce task in the reduce 2 stage, or adding false key value pair data into the key value pair data and then sending the false key value pair data; And processing the data, and discarding the key value pair data which does not belong to the merging of each reduce task in the reduce 1 stage. According tothe invention, confidentiality protection of the MapReduce operation data in the cloud computing platform is realized.

Description

Technical field [0001] The present invention relates to the technical field of cloud computing data confidentiality, in particular to a data confidentiality method and system for MapReduce computing, which protects data and privacy based on the MapReduce framework in remote execution environment scenarios, and prevents data privacy of application programs from being maliciously observed by malicious observers. Obtain. Background technique [0002] MapReduce is a parallel programming model used for parallel computing of large-scale data sets. It has the characteristics of functional programming language and vector programming language, and has the functions of data partitioning and computing task scheduling, system optimization, error detection and recovery, so , Making MapReduce suitable for log analysis, machine learning, distributed sorting and other applications. A MapReduce job is a unit of work that the user wants to be executed: it includes input data, MapReduce programs, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/60G06F21/62
CPCG06F21/602G06F21/6245
Inventor 王永智沈玉龙马佳文张小宇
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products