Data partition method and device

A data partitioning and data technology, applied in the database field, can solve problems such as low efficiency of join operation, waste of network and storage space, and large amount of data transmission

Active Publication Date: 2016-04-20
HUAWEI TECH CO LTD
View PDF7 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, if the relationship between the tables is not considered, when there is a join (join query) operation between the two tables, a large amount of network transmission will be generated, and the join operation is inefficient
[0003] Oracle proposes a reference partitioning (Reference Partitioning) scheme, which can start partitioning from the root data table according to the relationship between the data tables, first specify a column in the root data table as the partition column, and partition the related data in the subtable while partitioning the table , so that the relevant data is placed on the same node, so that the join process can be completed locally, but this solution is only applicable to the hierarchical structure of the data relationship, and

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data partition method and device
  • Data partition method and device
  • Data partition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0063] The embodiment of the present invention provides a data partition method, such as figure 1 Shown, including:

[0064] 101. The device partitions the dimension tables in the distributed database.

[0065] Among them, the device here can be a computer with its own disk and a central processing unit (CPU). The application scenario of the embodiment of the present invention may be a data distribution problem in a distributed massive parallel processing (Massive parallel processing, mpp) database.

[0066] Specifically, in a distributed mpp database, when there are two tables with a join (join query) relationship, the dimension table can be partitioned according to a general algorithm. The general algorithm here can be a hash algorithm, for example, dimension The table is order (order), the primary key is the O_PK order column, and the foreign key is the C_PK customer column. The dimension table can be partitioned according to C_PK. Because the default C_PK is assigned to differen...

Embodiment 2

[0075] The embodiment of the present invention provides a data partition method, such as figure 2 Shown, including:

[0076] 201. The device partitions the dimension tables in the distributed database.

[0077] Among them, the device here can be a computer, which is applied to a distributed mpp database to solve the problem of data distribution. The Mpp architecture can distribute my data to multiple nodes and process them in parallel by multiple nodes, which can increase the data processing speed.

[0078] When there is a join query operation between the two tables, the dimension table can be partitioned according to a general algorithm. The general algorithm here can be a hash algorithm. The dimension table is used to store the attributes of the object data in the fact table. .

[0079] For example, it can be the test standard of the half of the organization using the general benchmark test

[0080] BenchmarkTPC-H (TransactionProcessingPerformanceCouncil-H) introduces the data sche...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a data partition method and device, relates to the database field, can eliminate a remote join operation, reduces network bandwidth in a data query process, and creates backup data. The method includes the steps of: performing data partition on a dimension table in the distributed database; building up a partition mapping table based on the partition characteristics of the dimension table and according to a pre-set algorithm; partitioning a fact table corresponding to the dimension table according to the partition mapping table; and backuping the data in the fact table according to the partition mapping table. In this way, during the partition, the pre-set algorithm makes the records have conflict, and the conflict causes redundancy records for data backup. The embodiment of the invention is used for the data partition of the distributed database.

Description

Technical field [0001] The invention relates to the field of databases, in particular to a data partition method and equipment. Background technique [0002] In an analytical database such as Online Analytical Processing (OLAP), the amount of processed data is relatively large, and the performance of a single machine can no longer meet the needs. Massive parallel processing (MPP) architecture can distribute data to multiple nodes It is processed by multiple nodes in parallel, thereby increasing the processing speed. In order to make each node process a certain amount of data in parallel, it is generally necessary to split the table horizontally and place it on different nodes. However, if the relationship between the tables is not considered, when there is a join (join query) operation between the two tables, a large amount of network transmission will occur, and the join operation will be inefficient. [0003] Oracle proposes a reference partitioning (ReferencePartitioning) solu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): A99Z99/00
Inventor 时家幸黄乐王玉虎
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products