Executor allocation method and device based on Spark framework, equipment and storage medium

An allocation method and framework technology, applied in the computer field, can solve problems such as extending task running time, cluster network congestion, affecting system performance, etc., and achieve the effect of improving data locality, reducing network traffic and data access delay

Pending Publication Date: 2022-03-11
NANHUA UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The large amount of data transmission generated by the Spark application during the execution of the calculation logic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Executor allocation method and device based on Spark framework, equipment and storage medium
  • Executor allocation method and device based on Spark framework, equipment and storage medium
  • Executor allocation method and device based on Spark framework, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0037]In the prior art, Spark provides two Executor allocation algorithms, spreadOut and noSpreadOut, to determine on which nodes the Executor starts. However, unlike the Hadoop framework, tasks in Spark run in parallel in Executor in a multi-threaded manner. As the execution container of the task, the position of the Executor will directly affect the locality acquisition of the task. Both spreadOut and noSpreadOut do not fully consider the data locality facto...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an Executor allocation method and device based on a Spark framework, equipment and a storage medium, and the method comprises the steps: determining that all Map tasks in a Map stage respectively obtain the communication cost of a corresponding data block in each first idle node in the Spark framework, so as to obtain the first communication cost corresponding to each first idle node; all the first idle nodes are sorted according to the size sequence of the first communication cost, and first Executors with the maximum available Executor number of the first idle nodes are distributed on the sorted first idle nodes in sequence; and when the total number of the allocated first Executors is the number of the first required Executors, stopping allocating the first Executors to obtain a first Executor set which is used for executing the Map task in the Map stage and contains the currently allocated first Executors. Visibly, the Executor is allocated to the node close to the input data block, so that the data locality in Spark task scheduling can be improved, and the network flow and the data access delay of the task are effectively reduced.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to an Executor allocation method, device, equipment and storage medium based on the Spark framework. Background technique [0002] As the response requirements of applications in the era of big data are getting higher and higher, the emerging Spark distributed computing framework has attracted great attention and has been widely used due to its excellent characteristics, such as Goggle, Yahoo! , Baidu, Tencent, etc. Compared with Hadoop and other distributed computing frameworks, Spark introduces the concept of Resilient Distributed Dataset (RDD), which can use memory computing to efficiently execute jobs, especially for iterative computing. The large amount of data transmission generated by the Spark application during the execution of the calculation logic will prolong the task running time, cause cluster network congestion, and thus affect the performance of the system...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/50
CPCG06F9/5027G06F9/5072G06F2209/502
Inventor 付仲明何梦思罗凌云丁平尖朱涛万亚平
Owner NANHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products