Multi-core parallel hash partitioning optimizing method based on column storage

A partition optimization and column storage technology, applied in the field of data processing, can solve problems such as inability to efficiently utilize parallel resources of multi-core processors, and inability to handle skewed input data well, so as to improve cache efficiency and overall performance , the effect of resolving write conflicts

Active Publication Date: 2014-11-05
XIDIAN UNIV
View PDF7 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the face of new hardware architectures, traditional parallel hash partitioning algorithms cannot efficiently utilize the parallel resources of multi-core processors, and cannot handle skewed input data well.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-core parallel hash partitioning optimizing method based on column storage
  • Multi-core parallel hash partitioning optimizing method based on column storage
  • Multi-core parallel hash partitioning optimizing method based on column storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to better understand the present invention, the present invention will be described in detail below in conjunction with the accompanying drawings.

[0031] refer to figure 1 , the implementation steps of the present invention are as follows:

[0032] Step 1, read the column store dataset.

[0033] Save the column storage data set entered by the user in a txt text file, and each key-value pair occupies one line of the txt text file;

[0034] By reading each line of the txt file to read the column storage data set entered by the user, the data format of the column storage data set number is a key-value pair in the form of (Key,Value), where each key-value pair is 16 bytes in size and contains 8-byte serial number Key and 8-byte stored value Value;

[0035] Select the traditional hash storage structure or optimized hash storage structure for the read column storage dataset.

[0036] Step 2, split the column storage data set input by the user.

[0037] Divide ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-core parallel hash partitioning optimizing method based on column storage. The method mainly solves the problem that an existing parallel hash partitioning algorithm can not efficiently use resources of a multi-core processor. According to the technical scheme, data partitioning tasks are dynamically distributed to multiple cores for execution by means of a mapping and simplification parallel programming model, and corresponding strategies for avoiding write conflicts are selected according to different storage structures of column storage data sets; primary hash partitioning is carried out through a mapping thread, and an obtained primary hash partitioning result is sent to a simplification thread for secondary hash partitioning after data tilt optimization; a final hash partitioning result is fed back. According to the method, the characteristic that tasks can be executed in parallel on the multi-core processor is well used, the method can be suitable for input data in various distribution modes, high-speed caching efficiency and overall performance of the multi-core processor are improved, and the method can be used for multi-core parallel multi-step hash partitioning of the column storage data sets.

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to a multi-core parallel hash partition optimization method, which can be used for data partition of a column storage database. Background technique [0002] Partitioning is an important operation in the database, and it is also the basic operation of other database operations, such as connection, aggregation, sorting and other operations. Partitioning is the division of a larger task into several smaller subtasks. The total time taken to process several subtasks is often less than the time taken to process one larger task, because smaller tasks make efficient use of cache and memory. Partitioning operations have been extensively studied in different applications, mainly for database operations. In join operations and aggregation operations, partitioning can significantly improve its performance; in parallel sorting operations, partitioning is also an important...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/38G06F17/30
Inventor 黄鑫刘志镜袁通刘慧王梓徐曾强波李宗利邱龙滨王鹏
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products