A distributed-aware binary equivalent connection tilt optimization method and system

An optimization method and solution technology, applied in database distribution/replication, resource allocation, structured data retrieval, etc., can solve problems such as high overhead and load tilt on the Reduce side

Active Publication Date: 2019-11-22
HUAZHONG UNIV OF SCI & TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the above deficiencies or improvement needs of the prior art, the present invention provides a method and system for tilt optimization of binary equivalent connection based on distribution perception, the purpose of which is to solve the problems existing in the binary connection method of the existing distributed system The reduce side is prone to load inclination, and the technical problem of high overhead when obtaining the total amount of the original data set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed-aware binary equivalent connection tilt optimization method and system
  • A distributed-aware binary equivalent connection tilt optimization method and system
  • A distributed-aware binary equivalent connection tilt optimization method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0058] The invention proposes a distribution-aware-based tilt optimization method for equivalent binary joins of large tables by studying the problem of data skew under the equivalent join of large tables.

[0059] The core idea of ​​the present invention is that according to the I / O cost, the inclined keys in the first table in the binary equivalent connection are distributed to some partitions...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a binary equivalent connection tilt optimization method based on distributed sensing. The method comprises the steps of carrying out sampling on two data sets R and S preparedto be subjected to binary connection operation to obtain two sample sets, calculating weights and sizes of tilt keys in the sample sets respectively, comparing the weights and the sizes of the tilt keys in the sample sets, and selecting the data set corresponding to relatively large one of comparison results as a main table and the data set corresponding to the other as an auxiliary table; and obtaining a cost estimation value of each cluster in the selected main table and auxiliary table, determining a big cluster set and a small cluster set according to the cost estimation values, generatingRDD partition schemes for the big cluster set and the small cluster set respectively, partitioning the main table and the auxiliary table by using the RDD partition schemes of the big cluster set andthe small cluster set, and finishing connection operation in partitions at a Reduce end according to a result of partitioning the main table and the auxiliary table. According to the method, the technical problem that load tilt easily occurs at the Reduce end in an existing binary connection method can be solved.

Description

technical field [0001] The invention belongs to the technical field of parallel computing, and more specifically relates to a method and system for optimizing a binary equivalence connection tilt based on distribution perception. Background technique [0002] At present, the connection operation for the Spark platform in the distributed system is to first partition the collection of tables, and then implement the connection operation in the partition. The connection operation is based on the basic data structure of the platform - Resilient Distributed Datasets (RDD for short) based operations, each data table can be converted into a key-value pair RDD set of <key, value>. The existing partition methods of the Spark platform include the hash partition algorithm and the simple value domain partition algorithm. The binary join methods corresponding to these two partition algorithms are the hash join algorithm and the range join algorithm. [0003] However, the binary co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/27G06F9/50
CPCG06F9/5083
Inventor 周可杨永坤乔宏永
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products