Data storage optimization method for hash joint

A technology for data storage and optimization methods, applied in database models, multi-dimensional databases, digital data processing, etc. It can solve problems such as being unsuitable for full table scan and storage, and achieve the effect of eliminating the cost of physical partitioning and improving connection performance.

Active Publication Date: 2014-07-23
RENMIN UNIVERSITY OF CHINA
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Oracle database supports hash clustering tables, but hash clustering is not suitable for storing frequently growing tables, nor is it suitable for full table scans

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data storage optimization method for hash joint
  • Data storage optimization method for hash joint
  • Data storage optimization method for hash joint

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0038] Such as figure 1 As shown, the database allocates storage space for tables in units of pages, and the records are stored in pages in order to form a page linked list. During a full table scan, each record in each page is accessed sequentially according to the page linked list.

[0039] Such as figure 2 As shown, the data storage optimization method oriented to hash join of the present invention firstly selects the key that needs to be physically partitioned according to the database mode and query load characteristics. Such as figure 2 As shown, the customer table is large, and the fact table foreign key lo_custkey is stored in radix hash partitions, and the radix hash join operation between the fact table and the customer table is supported. In this embodiment, the lower 2 bits of lo_custkey are used for partitioning, and the record key column lo_custkey can be mapped to four radix hash groups of 00, 01, 10, and 11. Assuming that each page stores 2 records, the r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data storage optimization method for a hash joint. The method includes the steps of selecting a foreign key of a radix hash storage on a fact table, determining a dimension table, setting n bits of radix base numbers, determining 2n hash partition storages, setting 2n radix hash partition queues, storing the fact table and the dimension table by means of a page chain table, carrying out PAX column storage in a page, carrying out radix hash on recorded hash partition attribute values according to low n bits when records are inserted, storing the hash partition attribute values in corresponding pages, dynamically applying a new page after a certain radix hash partition page is filled with the records, directly having access to a specific radix hash partition queue according to a hash value of a connection key, having access to all records of the radix hash partition according to page addresses stored in the queue, having access to the records according to an original physical page link sequence of the tables when full-table scan is carried out, storing a small table R and a large table S in a partition manner, adopting a column type connection method to the fact table during hash joint in a database, and increasing or reducing the bits of the radix hash partitions to achieve dynamic increasing and reducing of the hash partitions.

Description

technical field [0001] The invention relates to a method for implementing database storage, in particular to a hash connection-oriented data storage optimization method in the technical field of database storage and query optimization. Background technique [0002] Hash join is a typical join technology in the database, which is widely used in data warehouses based on the referential integrity constraints of primary and foreign keys, and is an important determinant of OLAP (analytical query processing) performance. Radixjoin (a hash join algorithm based on radix partitioning) partitions the two join tables R and S through multiple radix (base), and then performs hash join operations on the partitions corresponding to the R table and the S table. Radixjoin is the current mainstream technology for multi-core parallel connection, but the partition operation of radixjoin causes the data of R table and S table to be physically reorganized, which not only increases the memory stor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2255G06F16/2456G06F16/283
Inventor 张延松张宇王珊
Owner RENMIN UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products