Unlock instant, AI-driven research and patent intelligence for your innovation.

A mapreduce execution process optimization method for processing data source updates

A technology for data processing and execution process, which is applied in the computer field, can solve problems such as reducing the execution efficiency of MapReduce, and achieve the effects of low storage space cost, reduced execution time, and improved operating efficiency

Active Publication Date: 2021-05-28
POWER DISPATCHING CONTROL CENT OF GUANGDONG POWER GRID CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In this process, the Reducer task is started after the Map is executed. The Map task may be executed for a few minutes to several hours. If the data source has new data during the Map task execution, all Map tasks must be re-executed. , that is, to restart the MapReduce task, which will greatly reduce the execution efficiency of MapReduce

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A mapreduce execution process optimization method for processing data source updates
  • A mapreduce execution process optimization method for processing data source updates
  • A mapreduce execution process optimization method for processing data source updates

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described below in conjunction with the accompanying drawings. It should be noted that the accompanying drawings are only for illustrative purposes and should not be construed as limitations on this patent.

[0029] Such as figure 1 As shown, a MapReduce execution process optimization method for processing data source updates includes a Map task and a Reducer task. During the Map task execution process, a Monitor monitor task and a Rule rule judgment task are started;

[0030] Monitor task every T μ Time records a snapshot of the data source slice processed by the Map task;

[0031] The Rule rule calculates the difference between the current latest data source slice snapshot and the snapshot of the data source slice processed by Map, and decides whether to restart the Map task.

[0032] combine figure 2 The specific execution process of the present invention is described, in this embodiment, T μ = 3min:

[0033] S1: The Map ta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the field of computer technology, and more specifically, relates to a MapReduce execution process optimization method for processing data source updates, which introduces Monitor monitor tasks and Rule rule judgment tasks to monitor Map tasks, and when data sources are updated , instead of restarting the entire MapReduce task, only the Map task whose processed data source slice is updated is restarted, and other Map tasks continue to execute, which can more effectively use the resources of the Hadoop cluster and improve the operating efficiency of the MapReduce task. The invention not only satisfies the requirement of data update, but also improves the efficiency of program execution.

Description

technical field [0001] The invention relates to the field of computer technology, and more specifically, to a method for optimizing the MapReduce execution process for processing data source updates. Background technique [0002] With the development of big data technology, the storage, analysis and processing of massive data based on Hadoop clusters are becoming more and more widely used. Hadoop is an open source implementation based on Google Cloud Platform. The two core components of Hadoop are the distributed file system HDFS (to store massive data) and the parallel computing framework MapReduce (to perform distributed parallel computing). [0003] The MapReduce computing framework is usually used to analyze and process massive data. During the execution of MapReduce, the data source is first sliced ​​to form several DataSplits, and the Mapper task is started on different nodes in the cluster to read the data source slice DataSplit. After the Map task is executed, the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/182G06F9/48
CPCG06F9/485G06F16/182
Inventor 郭文鑫曾坚永赵瑞锋姚珺玉张锐邓大为徐展强卢建刚李波
Owner POWER DISPATCHING CONTROL CENT OF GUANGDONG POWER GRID CO LTD