Method for realizing high performance of radix-2 one-dimensional FFT (Fast Fourier Transform) based on domestic SW 26010 processor

An implementation method and processor technology, applied in the field of Fourier transform, can solve problems such as low performance, achieve the effects of improving performance, solving limited memory access bandwidth, and improving performance

Active Publication Date: 2017-07-07
INST OF SOFTWARE - CHINESE ACAD OF SCI +1
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The technology of the present invention solves the problem: it overcomes the problem that the open-source FFTW function library based on the prior art is directly applied to this platform with low performance, and provides a high-performance radix-2 one-dimensional fast Fourier transform based on the domestic Shenwei 26010 processor. The implementation method, designing a variety of high-performance optimization methods, and proposing a two-layer decomposition FFT algorithm structure, which can be effectively applied to radix-2 one-dimensional FFT calculations and fully improve the performance of the FFT function library

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for realizing high performance of radix-2 one-dimensional FFT (Fast Fourier Transform) based on domestic SW 26010 processor
  • Method for realizing high performance of radix-2 one-dimensional FFT (Fast Fourier Transform) based on domestic SW 26010 processor
  • Method for realizing high performance of radix-2 one-dimensional FFT (Fast Fourier Transform) based on domestic SW 26010 processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Such as figure 1 As shown, the present invention is a high-performance implementation method of base-2 one-dimensional FFT based on the domestic Shenwei 26010 processor. The design framework includes four layers: interface layer, main core layer, slave core layer, and core layer. The calling relationship is the interface layer - master core layer - slave core layer - core layer, the core layer is called multiple times from the core layer. The interface layer establishes descriptors containing information such as input data size and data dimension; the main core layer is based on the descriptor information. When the input data size is greater than or equal to 512, the input sequence is decomposed. When the input data size is less than or equal to 256, it is directly in the Perform FFT calculation on the master core; the slave core layer is responsible for reading and storing main memory data and local memory data according to the data decomposition results of the master ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for realizing high performance of a radix-2 one-dimensional FFT (Fast Fourier Transform) based on a domestic SW 26010 processor. Based on the domestic processor SW26010 platform, various optimization technologies such as a row or column register communication mechanism in an auxiliary core, an access memory-calculation overlapping double-buffer mechanism and a vector operation of a 256-bit single-instruction stream multiple-data stream are designed, meanwhile a two-layer decomposition-based Stockham FFT calculation framework is provided and a decomposition rule is a Cooley-Turkey algorithm, a four-layer structure framework of an interface layer, a main core layer, an auxiliary core layer and a kernel layer is designed for calculation of the radix-2 one-dimensional FFT, and thus the problem of limitation of an access memory bandwidth for the FFT calculation is effectively solved and the calculation performance of the radix-2 one-dimensional FFT is effectively improved. Compared with an open source FFTW (Fastest Fourier Transform in the West) library, the calculation performance of the radix-2 one-dimensional FFT based on the platform is rapidly improved; and a floating-point operation per second of the FFT calculation is taken as an example, an average speed-up ratio of the FFT calculation is 34.4 and a maximum speed-up ratio reaches 50.3.

Description

technical field [0001] The invention belongs to the field of Fourier transform, and in particular relates to a high-performance realization method of radix-2 one-dimensional FFT based on the domestic Shenwei 26010 processor. Background technique [0002] Fast Fourier Transform (FFT) is a fast calculation method of discrete Fourier transform. Discrete Fourier Transform (DFT) means that the Fourier transform is expressed as a discrete state in both the time domain and the frequency domain, and the time-domain sampling of the signal is transformed into the frequency-domain sampling of the discrete-time Fourier transform. DFT converts continuous and complex problems in natural science and engineering technology into discrete and simple operations. For a one-dimensional input sequence with a data size of N, the DFT calculation formula is as follows: [0003] [0004] Among them, ω N is the twiddle factor sequence, ω N = e -i2π / N ,e ix =cos x+i sin x, It can be seen fro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/14
CPCG06F17/142
Inventor 张佳佳杨超尹万旺赵玉文魏迪刘芳芳袁欣辉
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products