Check patentability & draft patents in minutes with Patsnap Eureka AI!

A task scheduling method and device for online optimal partitioning of a spark cluster system

A task scheduling and cluster system technology, applied in the direction of multi-programming device, resource allocation, program control design, etc., can solve problems such as complex problems, complex task execution plans, different optimal partition numbers, etc., and achieve the effect of speeding up the execution speed

Active Publication Date: 2019-07-12
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, with the addition of multiple computational stages, the resulting task execution plan can become very complex
Furthermore, considering that each stage of computation is different, the optimal number of partitions for each stage may also be different, which further complicates the issue

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A task scheduling method and device for online optimal partitioning of a spark cluster system
  • A task scheduling method and device for online optimal partitioning of a spark cluster system
  • A task scheduling method and device for online optimal partitioning of a spark cluster system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0041] This embodiment provides a task scheduling method for online optimized partitioning of a Spark cluster system, the flow chart of which is as follows figure 2 shown, including the following steps:

[0042] Step 1. Count the output data size of the upstream Stage (stage), the total number of CPU cores participating in job execution, the total amount of memory, and the proportion of memory used to pull data;

[0043] Step 2. According to the relationship between the amount of output data and the memory capacity used to pull data, calculate the number of rounds for task execution, and set the number of partitions for data;

[0044] Step 3. monitor the average CPU utilization rate and memory utilization rate of computing nodes, and evaluate the resource utilization level of each computing node;

[0045] Step 4. Arrange the resource utilization leve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a task scheduling method for an online optimized partition of a Spark cluster system, and belongs to the technical field of online cluster resource scheduling. The method of the present invention includes the steps of: counting the size of the output data of the upstream stage, the total number of CPU cores participating in job execution, the total amount of memory, and the ratio of memory used to pull data; according to the amount of output data and the memory used to pull data The relationship between the size of the capacity, the number of rounds of computing task execution, and the number of optimized partitions; monitor the average CPU utilization and memory utilization of computing nodes, and evaluate the resource utilization level of each computing node; sort the resource utilization of all nodes in descending order Level, priority scheduling tasks to the node with the highest level of resource utilization; repeat the above steps until all tasks are scheduled. The invention can automatically configure the number of optimized partitions, improve the resource utilization rate of the cluster, and accelerate the execution speed of the Spark job.

Description

technical field [0001] The invention belongs to the technical field of online cluster resource scheduling, and in particular relates to a task scheduling method and device for online optimization partitioning of a Spark cluster system. Background technique [0002] Spark is an in-memory computing framework for distributed processing of large amounts of data in a reliable, efficient, and scalable manner. The main component deployment of the Spark cluster is divided into Spark Client, SparkContext, ClusterManager, Worker and Executor, etc., such as figure 1 shown. Spark Client is used for users to submit applications to Spark clusters, and SparkContext is used to communicate with ClusterManager, apply for resources, assign and monitor tasks, and is responsible for life cycle management of job execution. ClusterManager provides resource allocation and management, and plays different roles in different operating modes. After SparkContext divides the running job and allocates ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50G06F9/48
CPCG06F9/4881G06F9/5016G06F9/5038G06F2209/484G06F2209/5021
Inventor 田文洪叶宇飞王金许凌霄匡平
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More