Data secrecy method and system for MapReduce computing

A data confidentiality and data technology, applied in transmission systems, electrical components, etc., can solve problems such as high-performance overhead of failure, unsupported user shuffle function, high-performance overhead, etc.

Active Publication Date: 2020-05-15
XIDIAN UNIV
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] 1. The existing solutions have strict requirements on the operation record of the experimental data set, which also leads to the low applicability of the existing technology to solve the problem
[0012] 2. Compared with the standard MapReduce framework, existing schemes have higher performance overhead. For example, the SHUFFLE-IN-THE-MIDDLE scheme will cause 191% to 205% performance overhead, and the SHUFFLE&BALANCE

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data secrecy method and system for MapReduce computing
  • Data secrecy method and system for MapReduce computing
  • Data secrecy method and system for MapReduce computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] Such as Figure 4 As shown, compared with the standard MapReduce process (such as image 3 , standard MapReduce framework), this embodiment mainly defines three new sub-phases in the Mapping phase layer: Map phase, Combine phase and Partition phase, and defines the Reduce sub-phase in the Reducing phase layer. This scheme conforms to the indistinguishability of the input and output of the map task, and the indistinguishability of the input and output of the reduce end. The amount of data received and stored by each reduce task is equal.

[0078] Assume that the data set input used by the user to submit the job is D, |D| indicates the size of the input data, |.| indicates the size of the encrypted data, M indicates the number of map tasks, R indicates the number of reduce tasks, and Map_function is The map main function in each map task, Reduce_function is the reduce main function in each reduce task, the user-defined allocation function User_Partition_function and the ...

Embodiment 2

[0086] In this embodiment, the way of adding fake data is used to realize data confidentiality, and at the same time, the number K of types of key values ​​is protected. In this embodiment, MapReduce is rewritten to meet the indistinguishability of the input and output of the map task, and the amount of data received by each reduce task that is indistinguishable from the input and output of the Reduce end is equal.

[0087] Such as Figure 5 As shown, compared with the standard MapReduce process (such as image 3 , standard MapReduce framework), this embodiment mainly defines three new sub-phases in the Mapping phase layer: Map phase, Combine phase and Partition phase, and defines the Reduce sub-phase in the Reducing phase layer. This scheme conforms to the indistinguishability of the input and output of the map task, and the indistinguishability of the input and output of the reduce end. The amount of data received and stored by each reduce task is equal.

[0088] Assume th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of cloud computing data secrecy, and discloses a data secrecy method and system for MapReduce computing. The method comprises the steps of proposing Full Shuffle and Safe Shuffle under a standard MapReduce computing framework; sorting input data into a key-value pair form for preprocessing; combining the key value pairs in each map task in a Map sub-stage according to the key value; respectively distributing the combined key value pairs to each reduce task of a Reducing stage layer; processing the data received by each reduce task in the Reduce sub-stage; and enabling the output data of each reduce task to be equal in size. According to the invention, the data and privacy based on the MapReduce framework in a remote execution environment scene are protected, and the data privacy of the application program is prevented from being obtained by malicious observers through side channel attacks.

Description

technical field [0001] The invention belongs to the technical field of cloud computing data security, and in particular relates to a data security method and system for MapReduce computing. Background technique [0002] At present, the closest existing technology: MapReduce is a parallel programming model for parallel computing of large-scale data sets, with the characteristics of functional programming language and vector programming language, with data division and computing task scheduling, system optimization, The capabilities of error detection and recovery, therefore, make MapReduce suitable for applications such as log analysis, machine learning, and distributed sorting. A MapReduce job is a unit of work that the user wishes to be executed: it includes input data, the MapReduce program, and configuration information. MapReduce runs a job by dividing it into tasks. There are two types of tasks: map task and reduce task. The standard MapReduce data flow of multiple r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L29/06
CPCH04L63/0428H04L63/1416H04L63/1441
Inventor 王永智张小宇
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products