Data-driven parallel sorting system and method

A data-driven, sorting method technology, applied in data classification, processing input data, electronic digital data processing and other directions, can solve the problems of slow sorting process, reducing memory block, less obvious, etc., to improve performance and avoid centralized competition , the effect of optimizing utilization

Active Publication Date: 2014-10-29
IBM CORP
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

That is to say, when each partition sorts all the data records in the memory block based on the driver newly allocated to its own data records, each partition will compete for CPU resources at the same time, which causes image 3 The appearance of the peak in
In addition, due to the above reasons, when the sorting in each partition is completed, in order to write the ordered data blocks to the hard disk, each partition will also compete for input and output (IO) resources at the same time, which will also cause waiting and delay.
This makes the use of system resources inefficient
[00

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data-driven parallel sorting system and method
  • Data-driven parallel sorting system and method
  • Data-driven parallel sorting system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] exist image 3 In the prior art parallel sorting method used in , as soon as a data record is assigned to a partition, the partition sorts the data record and other sorted data records (if any) in the partition's memory block. Reorder, and once a partition's memory block becomes full, write the data blocks in that memory block to disk. Therefore, this parallel sorting method is a data-driven method.

[0022] The inventors of the present invention have found that the decrease in system resource utilization is mainly due to the following two reasons:

[0023] 1. The peak of CPU resource usage requirements of each partition occurs almost at the same time, and the peak occurs when the amount of data in the memory block exceeds a certain threshold until the memory block becomes full (because the more data in the memory block, the sorting process needs to be compared The more times, the more CPU resources are used).

[0024] 2. Due to the round-robin data allocation, the m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A data driven parallel sorting method includes distributing input data records to n partitions one by one in a circular manner. Each partition corresponds to a parallel sorting process with an allocated memory chunk sized to store m data records. The method also includes sorting, in parallel, current data records in respective memory chunks in respective partitions. The method also includes in response to distribution of data records of └m/n┘ rounds, circularly controlling one of the n partitions, and writing data records that have been sorted in the memory chunk of the partition into a mass storage as an ordered data chunk, and emptying the memory chunk. The method also includes in response to all data records being distributed, writing data chunks that have been sorted in respective memory chunks into the mass storage, and performing a merge sort on all ordered data chunks in the mass storage.

Description

technical field [0001] The present invention relates to the field of parallel computing, more specifically, the present invention relates to a data-driven parallel sorting system and method. Background technique [0002] The parallel sorting algorithm is an algorithm proposed in order to improve the sorting efficiency after the computer's parallel computing capability has been greatly developed. Parallel sorting is a technique widely used in databases, data extraction, transformation and loading (Extraction-Transformation-Loading, ETL) and other fields. Parallel sorting algorithm is a very typical application of Divide and Conquer. Its principle is to divide the sequence to be sorted into several subsequences, make each subsequence in order, and then merge the ordered subsequences. get a completely ordered sequence. [0003] In parallel sorting, data is distributed into multiple partitions, each partition corresponding to a sorting process, such as a process or thread. Fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F17/30309G06F16/278G06F7/24G06F7/36G06F16/24554G06F16/254
Inventor 韦东杰杨新颖刘尔浩布莱恩·康菲尔德
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products