Check patentability & draft patents in minutes with Patsnap Eureka AI!

Data processing method and device based on mapping reduction

A data processing and data collection technology, applied in the field of cloud computing, can solve problems such as low connection efficiency, achieve the effects of saving reading and transmission, improving operation efficiency, and saving IO overhead

Inactive Publication Date: 2019-07-09
中国移动通信集团陕西有限公司 +1
View PDF14 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The embodiment of the present application provides a data processing method and device based on map-reduce to solve the problem of relatively low connection efficiency in the prior art when the MapReduce framework connects multiple data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device based on mapping reduction
  • Data processing method and device based on mapping reduction
  • Data processing method and device based on mapping reduction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] Such as figure 2 As shown, the flow chart of the data processing method based on map reduction provided by the embodiment of the present application includes the following steps:

[0033] S201: Receive multi-way data sets and connection field information for performing associated queries on the multi-way data sets.

[0034] Here, multi-way data sets such as R, S, T, etc., and connection fields such as age, name, etc. when performing associated queries.

[0035] S202: Perform a mapping operation on each data set to obtain multiple intermediate result sets, and for each intermediate result set, determine at least one Reduce node corresponding to the intermediate result set according to the partition function set for each connection field, and use the intermediate result set The set is sent to each identified Reduce node.

[0036] In the specific implementation process, multiple intermediate result sets obtained by performing the mapping operation are key-value (key-val...

Embodiment 2

[0051] In order to solve the problem of large disk I / O overhead and high network communication cost in the traditional MapReduce framework when realizing the connection of multi-channel data sets, the embodiment of the present application transforms the partition function interface in the existing MapReduce framework, and after transformation, a MapReduce The task can complete the connection task of multiple data sets, and the intermediate result sets satisfying all connection fields are sent to the same Reduce node to save IO overhead and network resources, and the algorithm efficiency is significantly improved.

[0052] The basic idea of ​​the embodiment of the present application is: when using the MapReduce framework to connect multiple data sets, the intermediate result sets that meet the connection conditions in the multiple data sets can be sent to the same Reduce node for connection processing, instead of It is necessary to split the connection task of this multi-way da...

Embodiment 3

[0070] Based on the same inventive concept, the embodiment of this application also provides a data processing device based on map reduction corresponding to the data processing method based on map reduction. The data processing method of the contract is similar, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.

[0071] Such as Figure 5 As shown, the map reduction-based data processing device structure diagram provided by the embodiment of the present application includes:

[0072] A receiving module 501, configured to receive multi-way data sets and connection field information for performing associated queries on the multi-way data sets;

[0073] The sending module 502 is configured to perform a mapping operation on each data set to obtain multiple intermediate result sets, and for each intermediate result set, determine at least one Reduce node corresponding to the intermediate result set according...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of cloud computing, in particular to a data processing method and device based on mapping reduction, and aims at solving the problem that in the prior art,when a MapReduce framework is used for connecting multiple data sets, the connection efficiency is low. The data processing method based on mapping reduction provided by the embodiment of the invention comprises the following steps of: receiving a multi-channel data set and connection field information, executing a mapping operation on each path of data set to obtain a plurality of intermediate result sets, according to a partition function set for each connection field, determining at least one Reduce node corresponding to each intermediate result set, and sending the intermediate result setto each determined Reduce node; summarizing the intermediate result set in each Reduce nodeto obtain the data set meeting all the connection fields in the multi-channel data set. Each intermediate result set is sent to the Reduce node possibly needing the intermediate result set, the intermediate result set does not need to be repeatedly read and transmitted, the disk IO overhead can be saved, and the network communication cost is reduced.

Description

technical field [0001] The present application relates to the technical field of cloud computing, in particular to a data processing method and device based on map-reduce. Background technique [0002] In the field of cloud computing technology, MapReduce (mapping reduction) is an important computing framework, which provides a huge but well-designed parallel computing software framework, which can automatically complete the parallel processing of computing tasks, automatically divide computing data and computing Tasks, automatically assign and execute tasks and collect calculation results on cluster nodes, and hand over many complex details at the bottom of the system involved in parallel computing such as data distribution storage, data communication, and fault-tolerant processing to the system, which greatly reduces the number of software developers. burden. [0003] However, by analyzing the association method of multi-channel data sets under the traditional MapReduce f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48
CPCG06F9/4881
Inventor 王晓春马军
Owner 中国移动通信集团陕西有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More