Check patentability & draft patents in minutes with Patsnap Eureka AI!

A balanced load processing method and device for data skew

A load balancing and processing method technology, applied in the field of data processing, can solve problems such as long operation time, system performance deterioration, data skew, etc., to achieve the effect of reducing operation time, improving system performance, and weakening data skew

Active Publication Date: 2020-05-08
NAT UNIV OF DEFENSE TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In this case, the task queue will be blocked on these heavily loaded reducers, which will increase the completion time of running jobs and worsen the system performance
Since the delay in Spark Streaming may be superimposed, it is easy to cause delay and congestion
[0004] There is no effective solution to the problem of long operation time and poor system performance caused by data skew in the existing technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A balanced load processing method and device for data skew
  • A balanced load processing method and device for data skew
  • A balanced load processing method and device for data skew

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0047] It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are to distinguish two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present invention, which will not be described one by one in the subsequent embodiments.

[0048] Based on the above purpose, the first aspect of the embodiments of the present invention proposes a first embodiment of a method for balancing load processing for data skew that can perform balanced load processing for data skew for different ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a balanced load processing method for data bias and a device. The method comprises the following steps: collecting a certain batch of flow data and mapping into an intermediate result; generating a duplicate of the intermediate result, analyzing the duplicate and updating a reference table; distributing the intermediate result into a plurality of barrels according to a hash algorithm and the reference table; extracting the intermediate result from the barrels and converting. According to the invention, the data bias can be weakened, the operation time can be reduced and the system performance can be promoted.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a load balancing method and device for data skew. Background technique [0002] In recent years, with the popularity of the Internet, people have to face the exponential growth of data in many fields such as e-commerce and social networking. More and more enterprises and academic institutions choose Spark to deal with cloud-based big data processing problems. Spark is a fast and general-purpose engine for large-scale data processing. It runs programs 100 times faster than Hadoop MapReduce in memory and 10 times faster when running on disk. Due to its excellent performance, Spark has been widely used by companies such as Yahoo, e-Bay, Twitter, Amazon, Alibaba, etc. In the academic field, Spark had more than 1000 contributors in 2015, making it one of the most active projects in the Apache Software Foundation and one of the most active open source big data project...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50
CPCG06F9/5083
Inventor 朱晓敏陈黄科刘桂鹏
Owner NAT UNIV OF DEFENSE TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More