Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Reduce task execution method and device in Spark framework, equipment and storage medium

A task execution and framework technology, applied in the computer field, can solve problems such as limiting Spark system performance, low task execution efficiency, and inability to obtain optimal communication efficiency, so as to reduce communication delay and improve task execution performance

Pending Publication Date: 2022-03-11
NANHUA UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the Reduce phase, the task obtains its part of the intermediate data from all Map tasks for processing. This is an All-to-All communication mode. The data transmission delay largely determines the running time of the job, and Greatly limit the performance of the Spark system
Among them, the starting position of the Executor has a great influence on the data transmission delay, and the existing methods cannot achieve the optimal communication efficiency, making the task execution efficiency low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reduce task execution method and device in Spark framework, equipment and storage medium
  • Reduce task execution method and device in Spark framework, equipment and storage medium
  • Reduce task execution method and device in Spark framework, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0045] In the prior art, Spark provides two different Executor allocation methods: SpreadOut and NoSpreadOut. The former tries to distribute Executors on each node as much as possible, and the latter tries to distribute Executors on fewer nodes. But neither of them consider the impact of Executor location on data communication between tasks, so it may cause a lot of cross-node / rack network traffic, or consume too much time due to long transmission distance, th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Reduce task execution method and device in a Spark framework, equipment and a storage medium, and the method comprises the steps: obtaining a first number of Executors required for executing a Reduce task of an application program, and determining a second number of available Executors in a Spark framework node; determining the first number of the available Executors from the second number of the available Executors on the basis of the communication distance between the available Executors, so as to obtain a target Executor set which is corresponding to the application program and has low communication delay; wherein the communication distance represents the communication delay between the nodes where the available Executors are located; and the available Executor in the target Executor set is started in a corresponding Spark framework node, so that the Reduce task of the application program is executed by utilizing the started available Executor. It can be seen that the Executors are started on the nodes close to each other in the Spark framework by taking the communication distance as a main consideration factor, so that communication delay between Reduce tasks during operation running is reduced, and task execution performance is improved.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method, device, device and storage medium for executing a Reduce task in a Spark framework. Background technique [0002] In Spark, an application usually contains one or more jobs, and each job consists of many stages. In Spark's DAG (Directed Acyclic Graph) execution model, stages are executed sequentially. This means that the tasks in the later stage can only start after the tasks in the previous stage are completed. And the output intermediate result of the previous stage is used as the input of the next stage. Due to the parallel execution of multiple tasks in the stage, a large amount of data communication is required during the job process. In particular, as figure 1 As shown, in the Map phase, the task reads the data block and processes it and then outputs the intermediate data to the local disk. In the Reduce phase, the task obtains its part of the inter...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5027G06F9/5072G06F2209/502
Inventor 付仲明何梦思罗凌云丁平尖朱涛万亚平
Owner NANHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products