A data security method and system for mapreduce computing

A mapreduce framework and data confidentiality technology, applied in digital data protection, computing, electrical digital data processing, etc., can solve problems such as time-consuming and unpredictable, and achieve the effect of indistinguishability

Active Publication Date: 2021-05-07
XIDIAN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The time-consuming process of randomizing records is unpredictable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data security method and system for mapreduce computing
  • A data security method and system for mapreduce computing
  • A data security method and system for mapreduce computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0072] see image 3 As shown, compared with the standard MapReduce process, this embodiment mainly modifies the partition function of the map stage and adds another reduce stage in rewriting the MapReduce execution flow. Another reduce stage added before the standard reduce is called reduce1, and the standard reduce is called reduce2 in the present invention after being rewritten. After rewriting MapReduce, it conforms to the indistinguishability of map output and the indistinguishability of reduce input. The amount of data received by each reduce task is equal.

[0073] Assume that the data set input used by the user to submit the job is D, |D| indicates the size of the input data, M indicates the number of map tasks, R indicates the number of reduce tasks in reduce1 and the number of reduce tasks in reduce2 (that is, the number of reduce tasks in reduce1 and reduce2 equal number of reduce tasks). The processing methods of the map phase, the reduce1 phase, and the reduce2 p...

Embodiment approach 2

[0079] In this embodiment, the method of adding false data is used to realize the confidentiality of data, and at the same time, the number K of types of key values ​​is protected. In this embodiment, after MapReduce is rewritten, it conforms to the indistinguishability of the output of the map, and the indistinguishability of the input of the reduce end is not equal, but after random addition and marking, it has no post-statistical inference significance.

[0080] see Figure 4 As shown, compared with the standard MapReduce process, this embodiment mainly modifies the partition function of the map stage and adds another reduce stage in rewriting the MapReduce execution flow. Another reduce stage added before the standard reduce is called reduce1, and the standard reduce is called reduce2 in the present invention after being rewritten.

[0081] Assume that the data set input used by the user to submit the job is D, |D| indicates the size of the input data, M indicates the num...

Embodiment approach 3

[0090] In this embodiment, the way of adding fake data is used to realize data confidentiality, and at the same time, the number K of types of key values ​​is protected. In this embodiment, after MapReduce is rewritten, the indistinguishability of the map output and the indistinguishability of the input of the reduce end are met, and the amount of data received by each reduce task is equal.

[0091] see Figure 5 As shown, compared with the standard MapReduce process, this embodiment mainly modifies the partition function of the map stage and adds another reduce stage in rewriting the MapReduce execution flow. Another reduce stage added before the standard reduce is called reduce1, and the standard reduce is called reduce2 in the present invention after being rewritten.

[0092] Assume that the data set input used by the user to submit the job is D, |D| indicates the size of the input data, M indicates the number of map tasks, R indicates the number of reduce tasks in reduce1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data security method and system for MapReduce calculation, including: adding another reduce stage reduce1 before the standard reduce stage reduce2 of MapReduce; The data in the task is sent to each reduce task in the reduce1 stage on average; the key-value pair data merged by each reduce task in the reduce1 stage is sent to each reduce task in the reduce2 stage respectively, or a false key is added to the key-value pair data The value pair data is sent separately; the data is processed, and the key-value pair data that does not belong to the merge of each reduce task in the reduce1 stage is discarded. The invention realizes the confidentiality protection of MapReduce operation data in the cloud computing platform.

Description

technical field [0001] The present invention relates to the technical field of cloud computing data security, in particular to a data security method and system for MapReduce computing, which protects data and privacy based on the MapReduce framework in remote execution environment scenarios, and prevents data privacy of applications from being exposed by malicious observers Obtain. Background technique [0002] MapReduce is a parallel programming model for parallel computing of large-scale data sets. It has the characteristics of functional programming languages ​​and vector programming languages. It has the functions of data division and computing task scheduling, system optimization, error detection and recovery, so , making MapReduce suitable for applications such as log analysis, machine learning, and distributed sorting. A MapReduce job is a unit of work that the user wishes to be executed: it includes input data, the MapReduce program, and configuration information. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F21/60G06F21/62
CPCG06F21/602G06F21/6245
Inventor 王永智沈玉龙马佳文张小宇
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products