FFT (Fast Fourier Transform) paralleling method based on GPU (Graphics Processing Unit) multi-core platform

A platform and dimensional array technology, applied in the field of FFT parallelism, to achieve the effect of improving operation accuracy, improving operating efficiency and reducing communication time

Inactive Publication Date: 2011-01-05
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, FFT operations are rarely implemented in parallel on GPUs with powerfu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • FFT (Fast Fourier Transform) paralleling method based on GPU (Graphics Processing Unit) multi-core platform
  • FFT (Fast Fourier Transform) paralleling method based on GPU (Graphics Processing Unit) multi-core platform
  • FFT (Fast Fourier Transform) paralleling method based on GPU (Graphics Processing Unit) multi-core platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] Taking the application of imaging a 4096×4096 point target in a SAR imaging system as an example, the realization of this patent mainly includes the following processes:

[0020] Step 1: In the SAR imaging algorithm, the original data is a 4096×4096 two-dimensional array, each element type is a floating point number, and FFT is performed on each row of this matrix, that is to say, 4096 times of one Dimensional 4096-point FFT; identify the data of different rows as idata1, idata2...idata4096, and then connect them end to end according to the order of identifiers from small to large, and form a two-dimensional array into a one-dimensional array, which is recorded as idata;

[0021] Step 2: Calculate the size of the storage space occupied by idata: mem_size=sizeof(float)*4096*4096;

[0022] Step 3: Allocate a storage space of mem_size size on the GPU global memory and mark it as idata_gpu, and then copy the data idata on the memory to idata_gpu of the GPU global memory thr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an FFT (Fast Fourier Transform) paralleling method based on a GPU (Graphics Processing Unit) multi-core platform. In the FFT paralleling method, the communication is carried out for one time to complete FFT operation of N M points according to a principle of once communication mass operation on a storage aspect, which greatly reduces the communication consumption; and by using the high-speed cache, i.e. a shared storage, inside each thread block, the communication time is further reduced, and the operating efficiency is enhanced. The invention is used for parallelly processing the data by using hundreds of processing cores through scientific comprehensive arrangement, thereby furthest enhancing the parallelism degree and efficiently completing the operation and enhancing the operation accuracy.

Description

technical field [0001] The invention relates to an FFT parallel method based on a GPU many-core platform and its application in engineering practice. Background technique [0002] FFT, that is, fast Fourier transform, is widely used in engineering and is an important factor affecting engineering efficiency. Although there are a variety of different algorithms to realize FFT, they all perform serial processing on the CPU. As a traditional core processor, the CPU has unquestionable powerful instruction control and data processing capabilities. However, since nearly 75% of the area of ​​the CPU chip is used for storage, its integration level is relatively low, and the current performance is relatively excellent. The most popular is the 8-core processor jointly launched by IBM and SONY: CELL. In recent years, the performance of the graphics processing unit (GPU) has been greatly improved. Unlike the CPU, the GPU is a parallel vector processor that can integrate hundreds or tho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/14G06F15/167
Inventor 姚迪龙腾靳星星刘峰
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products