Check patentability & draft patents in minutes with Patsnap Eureka AI!

MapReduce report task execution method based on task granularity

A task execution and task technology, applied in the direction of resource allocation, multi-programming devices, etc., can solve the problems of performance degradation, hadoop computing performance waste, no optimization, etc., and achieve the effect of improving computing efficiency

Active Publication Date: 2014-04-02
SHENZHEN INST OF ADVANCED TECH
View PDF6 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in actual use, the execution of report calculation tasks under the MapReduce framework is restricted by the processing capacity of the Hadoop cluster and the data transmission speed between nodes within the cluster. Especially in the case of multi-task execution, the execution of MapReduce tasks competition is inevitable
[0003] Report calculation tasks for the same data set often have the following characteristics: (1) Since the calculation is based on the same data set, the same data may be read by multiple MapReduce tasks in exactly the same way, because Hadoop Distributed File The reading and writing performance of the system is one of the key factors affecting the computing performance of MapReduce, so multiple report computing tasks for the same data will repeatedly read a data set, resulting in performance degradation; (2) Hadoop’s MapReduce task execution mechanism lacks corresponding strategies, The current native Hadoop cluster does not optimize the execution of MapReduce multi-tasks. If there is the same or reusable report calculation task, the task will still be executed multiple times; (3) Report calculation tasks based on the same data set are usually are based on the same calculation conditions, therefore, some calculation subtasks of multiple report calculation tasks can be combined
[0004] The existing common method does not perform any optimization, resulting in waste of hadoop computing performance
Another method is to use tools such as pig or hive to optimize the data report calculation process, but both pig and hive are based on the optimization of a single task, and cannot optimize the queue of a task as a whole, and the optimization of pig and hive depends on their scripts statement, which places higher demands on script writers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce report task execution method based on task granularity
  • MapReduce report task execution method based on task granularity
  • MapReduce report task execution method based on task granularity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0011] The present invention will be described in further detail below in conjunction with specific embodiments and accompanying drawings. Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are only used to explain the technical solution of the present invention, and should not be construed as limiting the present invention.

[0012] In the description of the present invention, the orientation or positional relationship indicated by the terms "inner", "outer", "longitudinal", "transverse", "upper", "lower", "top", "bottom" etc. are based on the drawings The orientations or positional relationships shown are only for the convenience of describing the invention and do not require ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a MapReduce report task execution method based on task granularity. The method comprises the following steps of S10, verifying the validity and the priority of a data report task Jobi, and placing the data report task into a Job queue; S20, performing sub-task partitioning on each Jobi in sequence on basis of minimum granularity segmentation to obtain a Set 1 comprising all sub tasks; S30, removing repeated sub tasks in the Set 1 to obtain a Set 2; S40, performing maximum granularity merger on sub tasks in the Set 2 to obtain a Set 3; S50, creating an operation unit according to the hadoop instant computing power and the number of sub tasks in the Set 3; S60, executing the sub tasks in the Set 3 by the operation unit. By using the MapReduce report task execution method based on the task granularity, the reusable sub tasks in the report computing task queue are searched in a self-adaptive way, segmentation and merger are performed, and the computing efficiency is effectively improved.

Description

【Technical field】 [0001] The invention relates to a method for executing MapReduce report task based on task granularity. 【Background technique】 [0002] In the era of big data, the amount of data has exploded, which has led to extremely high requirements for computing, processing and effective storage of data. The proposal of the Hadoop ecosystem provides a powerful tool for large-scale computing and distributed reliable storage of massive data. In Hadoop, MapReduce is a reliable, easy-to-use, and scalable key component for batch analysis and calculation of massive data, especially widely used in report calculation based on massive log data. However, in actual use, the execution of report calculation tasks under the MapReduce framework is restricted by the processing capacity of the Hadoop cluster and the data transmission speed between nodes within the cluster. Especially in the case of multi-task execution, the execution of MapReduce tasks Competition inevitably arises....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/50
Inventor 邹瑜斌张帆白雪闫茜须成忠
Owner SHENZHEN INST OF ADVANCED TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More