A data security method and system for mapreduce computing

A data confidentiality and data technology, applied in transmission systems, electrical components, etc., can solve the problems of not supporting user shuffling functions, high performance overhead, and demanding experimental data set operation records, and achieve the effect of indistinguishability.

Active Publication Date: 2021-08-31
XIDIAN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] 1. The existing solutions have strict requirements on the operation record of the experimental data set, which also leads to the low applicability of the existing technology to solve the problem
[0012] 2. Compared with the standard MapReduce framework, existing schemes have higher performance overhead. For example, the SHUFFLE-IN-THE-MIDDLE scheme will cause 191% to 205% performance overhead, and the SHUFFLE&BALANCE scheme will cause 95% to 101% in the online stage. % performance overhead
[0013] 3. The existing solution only supports the default Partition function, not the user-defined shuffling function
[0014] 4. The premise of the success of the existing scheme is the accurate probability estimation of the input data. Once the estimation is inaccurate, it will lead to the failure of the scheme and unnecessary high-performance overhead

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data security method and system for mapreduce computing
  • A data security method and system for mapreduce computing
  • A data security method and system for mapreduce computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] Such as Figure 4 As shown, compared with the standard MapReduce process (such as image 3 , standard MapReduce framework), this embodiment mainly defines three new sub-phases in the Mapping phase layer: Map phase, Combine phase and Partition phase, and defines the Reduce sub-phase in the Reducing phase layer. This scheme conforms to the indistinguishability of the input and output of the map task, and the indistinguishability of the input and output of the reduce end. The amount of data received and stored by each reduce task is equal.

[0078] Assume that the data set input used by the user to submit the job is D, |D| indicates the size of the input data, |.| indicates the size of the encrypted data, M indicates the number of map tasks, R indicates the number of reduce tasks, and Map_function is The map main function in each map task, Reduce_function is the reduce main function in each reduce task, the user-defined allocation function User_Partition_function and the ...

Embodiment 2

[0086] In this embodiment, the way of adding fake data is used to realize data confidentiality, and at the same time, the number K of types of key values ​​is protected. In this embodiment, MapReduce is rewritten to meet the indistinguishability of the input and output of the map task, and the amount of data received by each reduce task that is indistinguishable from the input and output of the Reduce end is equal.

[0087] Such as Figure 5 As shown, compared with the standard MapReduce process (such as image 3 , standard MapReduce framework), this embodiment mainly defines three new sub-phases in the Mapping phase layer: Map phase, Combine phase and Partition phase, and defines the Reduce sub-phase in the Reducing phase layer. This scheme conforms to the indistinguishability of the input and output of the map task, and the indistinguishability of the input and output of the reduce end. The amount of data received and stored by each reduce task is equal.

[0088] Assume th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of cloud computing data security technology, discloses a data security method and system for MapReduce computing, proposes Full Shuffle and Safe Shuffle under the standard MapReduce computing framework; input data is sorted into the form of key-value key-value pair Preprocessing: Merge the key-value pairs in each map task in the Map sub-stage according to the key value; distribute the merged key-value pairs to each reduce task in the Reducing stage layer; receive the data received by each reduce task in the Reduce sub-stage The data is processed; each reduce task output data size is equal. The invention protects the data and privacy based on the MapReduce framework in the remote execution environment scene, and prevents the data privacy of the application program from being acquired by malicious observers through side channel attacks.

Description

technical field [0001] The invention belongs to the technical field of cloud computing data security, and in particular relates to a data security method and system for MapReduce computing. Background technique [0002] At present, the closest existing technology: MapReduce is a parallel programming model for parallel computing of large-scale data sets, with the characteristics of functional programming language and vector programming language, with data division and computing task scheduling, system optimization, The capabilities of error detection and recovery, therefore, make MapReduce suitable for applications such as log analysis, machine learning, and distributed sorting. A MapReduce job is a unit of work that the user wishes to be executed: it includes input data, the MapReduce program, and configuration information. MapReduce runs a job by dividing it into tasks. There are two types of tasks: map task and reduce task. The standard MapReduce data flow of multiple r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/06
CPCH04L63/0428H04L63/1416H04L63/1441
Inventor 王永智张小宇
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products