GPU-based distributed big data parallel computing method

A parallel computing and big data technology, applied in computing, electrical digital data processing, resource allocation, etc., can solve problems such as complex management, high cost of working nodes, insufficient number of working nodes, etc., and achieve the effect of improving efficiency

Inactive Publication Date: 2019-08-30
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF1 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The additional working nodes in this way are costly and complex to manage, and the number of working nodes is far from enough to achieve even close to the parallelism at the key-value pair level

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU-based distributed big data parallel computing method
  • GPU-based distributed big data parallel computing method
  • GPU-based distributed big data parallel computing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Embodiment 1 specifically realizes a kind of GPU-based distributed big data parallel computing method proposed by the present invention, and its data flow diagram is as follows figure 1 shown.

[0036] This embodiment is based on the design idea of ​​google MapReduce. To improve the efficiency of the Map and Reduce phases, the most direct way is to increase the number of working nodes and further subdivide the parallel granularity. However, if you increase the number of CPUs in the network or increase the number of CPU physical cores to increase the number of work nodes, the cost is high, the management is complicated, and the number of work nodes is far from enough to achieve or even approach the level of key-value pair parallelism.

[0037] GPU is a massively parallel computing hardware whose thread architecture and storage structure can be abstracted as figure 2structure shown. Each computing device (Compute Device) has several computing units (Compute Unit), and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a GPU-based distributed big data parallel computing method, including Map, Group, Reduce steps. In the Map step, a user program is executed on each input key value pair so asto be converted into an intermediate key value pair. In the Group step, all intermediate key value pairs are sorted and grouped. In the Reduce step, the user program is used for processing the groupedintermediate key value pairs, and a final calculation result is obtained. In the Map step and the Reduce step, each working node corresponds to one GPU thread, and the input key value pairs are submitted to different GPU threads for parallel processing. The GPU is used as a distributed working node for big data parallel computing, and equipment memory, thread scheduling and data sorting are effectively managed and optimized in the distributed computing process, so that the distributed computing efficiency can be effectively improved.

Description

technical field [0001] The invention relates to a parallel computing method, in particular to a GPU-based distributed big data parallel computing method. Background technique [0002] MapReduce was first proposed by Google as a parallel computing model and method for large-scale data processing. In two papers, Google announced the basic principles and main design ideas of MapReduce. Apache Hadoop is a set of open source software utilities that is basically an open source implementation of Google's MapReduce framework. [0003] The idea of ​​MapReduce itself is not complicated, and its core idea is to process data at each stage in the form of key-value pairs. In general, MapReduce is generally divided into three stages: Map, Group, and Reduce. The specific input and output and processing flow of the three stages are as follows: [0004] The input of the Map stage is a data set of key-value pairs in a prescribed form. The input phase of Map has no special requirements for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5027
Inventor 黄天羽毛续锟丁刚毅李鹏
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products