Data processing method based on hardware sorting MapReduce

A data processing and hardware sorting technology, applied in the direction of electrical digital data processing, special data processing applications, digital data processing components, etc., can solve the problems of non-reusable optimization methods and limited performance improvement of MapReduce, etc. Wide-ranging, processing-speed-enhancing effects

Inactive Publication Date: 2017-08-29
青岛蓝云信息技术有限公司
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The optimization method related to the user program requires the user to participate in the task allocation, and the user needs to be very familiar with the processing flow of the program and the programming specification of the GPU, and the optimization method between different user programs cannot be reused
However, the existing MapReduce optimization method based on GPU sorting is only for the optimization of the first sorting process, that is, to replace the CPU-based quick sorting algorithm with a GPU-based quick sorting algorithm or a GPU-based dual-tone sorting algorithm, while for the other three sorting The operation is not concerned, and the improvement of MapReduce performance is limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method based on hardware sorting MapReduce
  • Data processing method based on hardware sorting MapReduce
  • Data processing method based on hardware sorting MapReduce

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0037] GPU-based quick sort process such as figure 2 As shown, the specific operation steps are as follows:

[0038] 201. Sequence division: store data into the global storage space of the GPU, and divide it into m non-overlapping data blocks, and each data block is processed by a thread block;

[0039] 202. Thread traversal: m thread blocks traverse the corresponding data blocks in parallel, and n threads inside each thread block traverse a part of the corresponding data block in parallel, and record the number of elements greater than and less than the boundary value;

[0040] 203. Traversal count: Count the relative count value of each thread in turn, such as thread block B k The count value of each thread in (1≤k≤m) is L respectively k,1 ,...,L k,n and R k,1 ,...,R k,n , then the relative count values ​​of the i-th (1≤i≤n) thread are and respectively;

[0041] where L k,1 and R k,1 represent thread block B respectively k Among the data traversed by the first thre...

specific Embodiment 2

[0049] The process of GPU-based merge sort method is as follows: image 3 As shown, the specific operation steps are as follows:

[0050] 301. Sequence grouping: group the sequences to be merged in pairs, divide them into m groups, and pair two sequences A in one group each time 1 and B 1 merge;

[0051] 302. Divide subsequences: separate A 1 and B 1 Divided into n subsequences, each thread block pairs A 1 A subsequence of and B 1 A subsequence of a subsequence is merged, and a total of log is required 2 The merging of n+1 rounds of sub-sequences can make sequence A 1 and B 1 Merged into an ordered sequence;

[0052] 303. Subsequence merge: for A 1 and B 1 The k-th round of merging (0≤k≤log 2 n), generate n / 2k merged results R 1 , R 2 ,...,R kr , and each merged result R i (1≤i≤kr) is divided into 2k subsequences R i,1 , R i,2 ,...,R i,kn ;

[0053] where kr(kr=n / 2k) represents the sequence A 1 and B 1 The number of merging results generated by the k-th r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method based on hardware sorting MapReduce. The method comprises the following steps that CPU-based quick sorting is replaced by GPU-based quick sorting; CPU-based merge sorting is replaced by GPU-based merge sorting; CPU-based heap sort is replaced by GPU-based merge sorting. The GPU-based sort algorithm replaces the CPU-based sort algorithm, the powerful computing capability of a GPU is made full use of, the middle data processing speed is improved, the MapReduce performance is improved, and the method is especially suitable for the large data field.

Description

technical field [0001] The invention relates to the field of computers, in particular to the fields of cloud computing and parallel computing, in particular to a data processing method based on hardware sorting MapReduce. Background technique [0002] The MapReduce framework is a distributed programming framework commonly used in cloud computing and big data processing. When the amount of data is large, the sorting operation takes a long time, and MapReduce is usually used for big data processing. like figure 1 As shown, in the traditional MapReduce framework, the processing of intermediate result data is very complicated, requiring four sorting operations on the data, but the performance of CPU-based sorting operations is not high, which affects the improvement of the overall performance of MapReduce. [0003] Because of its powerful computing power and relatively low performance-price ratio, GPU is usually used instead of CPU to improve task processing performance. There...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F7/36G06F17/30
CPCG06F7/36G06F16/182
Inventor 计晓斐李建波刘亮
Owner 青岛蓝云信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products