Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A data processing method and device based on mapreduce

A data processing and data processing technology, applied in the field of cloud computing, can solve problems affecting system performance, etc., and achieve the effect of overall performance improvement, number reduction, and frequency reduction

Active Publication Date: 2018-08-17
CHINA MOBILE COMM GRP CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present invention provides a data processing method and device based on MapReduce, which is used to solve the problem that the MapReduce process in the prior art will frequently perform disk read and write operations when processing data, thereby causing additional overhead and seriously affecting system performance.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data processing method and device based on mapreduce
  • A data processing method and device based on mapreduce
  • A data processing method and device based on mapreduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to solve the problem that the MapReduce process in the prior art will frequently perform disk read and write operations when processing data, thereby causing additional overhead and seriously affecting system performance, the present invention conducts in-depth research on the execution process of the MapReduce process in the prior art and obtains Conclusion: When executing the MapReduce process on data, each MapReduce job must perform data read and write operations on the disk. Therefore, if the number of MapReduce jobs in the MapReduce process can be increased without affecting the final execution result The reduction can reduce the number of disk read and write operations when executing the MapReduce process, and ultimately improve the overall performance of the system.

[0064] According to the basic idea above, the present invention proposes a data processing scheme based on MapReduce. In this technical solution, the second MapReduce process is obtained by...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data processing method based on MapReduce and a data processing device based on MapReduce. The method and the device are used for solving the problems that the disk accessing operation is frequently carried during the data processing in the MapReduce process in the prior art, so that the external expenditure is caused, and the system performance is seriously influenced. The method comprises the following steps of: determining a first MapReducce process to be executed by aiming at data to be processed, wherein the first MapReduce process includes a plurality of MapReduce operations; merging the MapReduce operation meeting the preset merging rule in the first MapReduce process to obtain a second MapReduce process, wherein the merging rule meets the requirement that the executing result of the MapReduce operation before the merging is identical to the executing result of the MapReduce operation after the merging; and executing the second MapReduce process on preprocessing data.

Description

technical field [0001] The present invention relates to the technical field of cloud computing, in particular to a data processing method and device based on MapReduce. Background technique [0002] Hadoop is a distributed system basic framework capable of processing large amounts of data, which is reliable, efficient and scalable. It mainly consists of Hadoop Distributed File System (Hadoop Distributed File System, HDFS) and MapReduce. Among them, MapReduce is a distributed computing framework, which is mainly used for parallel computing of large-scale data sets. It is mainly divided into Map stage (Map operation process) and Reduce stage (Reduce operation process). The processing logic of these two stages is respectively Corresponding to Map function and Reduce function. The general idea of ​​its parallel computing is: divide the file into many small files and run them on each node (that is, the Map operation process), and the running results are temporarily saved locall...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 邓超熊龙徐萌钱岭孙少陵
Owner CHINA MOBILE COMM GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products