Data processing method and device for distributed relational database

A data processing device and data processing technology, applied in the database field, can solve the problems of low master node processing efficiency, time-consuming, high cost of data redistribution, etc., and achieve the goal of improving data processing efficiency, reducing physical load, and maximizing parallelism Effect

Active Publication Date: 2017-05-03
北京华胜信泰数据技术有限公司
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] First, statistical data distribution and planning for data redistribution are generally time-consuming, and it is performed on one node, which will serialize the processing process, resulting in low processing efficiency of the master node;
[0008] Second, the existing data migration algorithms generally seek to redistribute the data of each table to each node, which will cause the cost of data redistribution to be too high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device for distributed relational database
  • Data processing method and device for distributed relational database
  • Data processing method and device for distributed relational database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] The source data in the distributed relational database is distributed in the fragmented data of the first table and the fragmented data of the second table after the horizontal segmentation operation of the distribution key. The connection fields of the first table and the second table are the source data respectively In the first attribute field and the second attribute field, and neither the first attribute field nor the second attribute field is a distribution key, the distributed relational database includes a master node and multiple child nodes, wherein the first table and the second The join operation of the table is used as the original join operation.

[0047] figure 1 A schematic flowchart of Embodiment 1 of the data processing method for a distributed relational database according to the present invention is shown.

[0048] Such as figure 1 As shown, the data processing method of the distributed relational database according to the embodiment of the present...

Embodiment 2

[0061] figure 2 A schematic flowchart of Embodiment 2 of the data processing method for a distributed relational database according to the present invention is shown.

[0062] Such as figure 2 As shown, the data processing method of the distributed relational database according to the embodiment of the present invention includes: step 202, the master node defines two tables participating in the connection as S table and R table respectively, and determines the table (such as R table) that needs to be redistributed ); Step 204, each sub-node obtains the histogram of the x field on the R table, and distributes the elements of the histogram; Step 206, after each sub-node receives the elements of the histogram, determines the data redistribution plan and distributes; Step 208 , each child node obtains the first temporary table R1 after receiving the data redistribution plan, and executes the data redistribution plan; step 210, each child node performs a semi-join operation and ...

Embodiment 3

[0082] image 3 A schematic block diagram of a data processing device for a distributed relational database according to an embodiment of the present invention is shown.

[0083] The source data in the distributed relational database is distributed in the fragmented data of the first table and the fragmented data of the second table after the horizontal segmentation operation of the distribution key. The connection fields of the first table and the second table are the source data respectively In the first attribute field and the second attribute field, and neither the first attribute field nor the second attribute field is a distribution key, the distributed relational database includes a master node and multiple child nodes, wherein the first table and the second The join operation of the table is used as the original join operation.

[0084] Such as image 3 As shown, the data processing device 300 of the distributed relational database according to the embodiment of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data processing method and a data processing device for a distributed relational database. The data processing method for the distributed relational database comprises the steps of determining a first table to be the table to be redistributed, when the condition that the data size of the first table is less than that of a second table is detected; determining distribution information of a first attribute field on the first table, sending the distribution information to corresponding sub-nodes according to a preset mapping relation to make a data redistribution plan between the two sub-nodes; and controlling parallel redistribution of fragmented data to which the first attribute field belongs between any two sub-nodes according to the data redistribution plan. According to the technical scheme of the method and the device provided by the invention, all redistribution operations are executed in parallel, and data processing efficiency of the distributed relational database is improved.

Description

technical field [0001] The present invention relates to the technical field of databases, in particular to a data processing method for a distributed relational database and a data processing device for a distributed relational database. Background technique [0002] In order to store massive amounts of data, enterprise-level database systems use distributed databases and data warehouses to store data. These data are split according to specified methods and stored on each node of the database system. It is more common to map a record to a node with the key value of a specified field in the article table, or to map a record to a node with an ID range. The advantage of such mapping is that the data can be distributed as evenly as possible to each node in the system, so that each node in the system can complete certain operations in parallel, and complete the entire SQL (Structured Query Language, Structured Query Language) in parallel. Query language, that is, the execution o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2282G06F16/2456G06F16/27G06F16/284
Inventor 余鹏
Owner 北京华胜信泰数据技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products