Unlock instant, AI-driven research and patent intelligence for your innovation.

Spark partition load balancing method

A load balancing and sub-partitioning technology, which is applied in the field of big data, can solve the problems of long application program running time and shorten application completion time, etc., and achieve the effects of shortening completion time, alleviating data skew, and partition load balancing

Inactive Publication Date: 2020-11-20
GUANGDONG POLYTECHNIC NORMAL UNIV +1
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The embodiment of the present invention provides a load balancing method and device based on linear regression partition prediction to solve the technical problem that the existing solution to data skew in Spark causes the application program to run for too long, so as to achieve both Make Spark's partition load more balanced, alleviate the problem of data skew, and shorten the application completion time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark partition load balancing method
  • Spark partition load balancing method
  • Spark partition load balancing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0029] see figure 1 , the embodiment of the present invention provides a load balancing method, including

[0030] S1. After starting the Map task, obtain and count the operation information through the partition monitor, and obtain the operation statistics;

[0031] S2. After obtaining the operation statistics information, use the partition size predictor to calculate the amount of intermediate data generated by each partition after 100% of the mapping task is co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a load balancing method. The load balancing method comprises the following steps of after a Map task is started, obtaining and counting operation information through a partition monitor to obtain operation counting information; after the operation statistical information is obtained, calculating an intermediate data volume generated by each partition after 100% of mapping task load is completed through the partition size predictor; according to the intermediate data volume of the partitions, determining whether inclined partitions exist in all the partitions or not through a data inclination detection model; and if so, sorting the data in the inclined partitions in a descending order through a resource scheduler, and dynamically adjusting the original partition fileto balance the load of the Spark partitions. According to the method, the partition load of the Spark can be more balanced, a problem of data inclination is relieved, and the completion time of the application program can be shortened.

Description

technical field [0001] The invention relates to the technical field of big data, in particular to a Spark partition load balancing method. Background technique [0002] With the advent of the era of big data, the rise of various network technologies, the rapid expansion of information data, traditional processing and storage systems have been difficult to cope with massive data, and for the current popular big data analysis platforms such as Hadoop and Spark, the data skew to its Performance has been severely affected. At present, most of the solutions to the problem of data skew are based on the research of the Hadoop platform, and there are relatively few studies on the problem of data skew on the Spark platform. In Spark, the stage before Shuffle is called the Map stage, and the stage after that is called the Reduce stage. However, the default Spark partitioning algorithm will cause data skew after performing the Shuffle operation when the data distribution is uneven. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/505G06F9/5077G06F9/5083
Inventor 谢桂园黄子纯廖信海魏文国
Owner GUANGDONG POLYTECHNIC NORMAL UNIV