FFT efficient parallel achieving optimizing method based on Loongson number three processor

A technology of Loongson 3 and optimization method, which is applied in the direction of complex mathematical operations, etc., which can solve problems such as no gain, good operation speedup ratio, etc., and achieve the effect of improving hit rate, increasing operating speed, and improving memory access performance

Inactive Publication Date: 2014-03-26
合肥康捷信息科技有限公司
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The parallel FFT algorithm currently used in practice has not been specially optimized for the Godson-3 processor, so the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • FFT efficient parallel achieving optimizing method based on Loongson number three processor
  • FFT efficient parallel achieving optimizing method based on Loongson number three processor
  • FFT efficient parallel achieving optimizing method based on Loongson number three processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention is based on the FFT high-efficiency parallel implementation optimization method of Loongson No. 3 processor. First, a parallel FFT algorithm based on shared memory programming is implemented on Loongson 3A / 3B; then the source input data vector is divided into some smaller sub-vectors, Finally, the size of the data block processed by each thread is between half of the first-level cache and half of the second-level cache; then, within a certain data length, perform a performance test on all cases where the block is selected between half of the first-level cache and half of the second-level cache , finally select the most suitable block size in any length; finally use the principle of data locality to optimize the parallel FFT algorithm.

[0024] In this embodiment, the improved parallel FFT algorithm based on OpenMP programming is first transplanted on the Godson No. 3 processor; then the source vector is divided according to the block size of 32*1024...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an FFT efficient parallel achieving optimizing method based on a Loongson number three processor. The FFT efficient parallel achieving optimizing method is characterized by comprising the following steps which are carried out by means of base-2 butterfly computation. Firstly, initial parameters are set; secondly, the number of grades of FFT conversion is obtained; thirdly, all twiddle factors are obtained; fourthly, molecular vectors are divided and whether blocking processing is carried out or not is judged; fifthly, blocking processing is carried out. The problem that the speed-up ratio of an existing parallel FFT algorithm on the Loongson number three processor is low can be solved by means of the FFT efficient parallel achieving optimizing method based on the Loongson number three processor and efficient parallel of FFT on the Loongson number three processor can be achieved.

Description

technical field [0001] The invention belongs to the technical field of electrical digital data processing, and in particular relates to an optimization method for efficient parallel realization of FFT on a Godson No. 3 processor. Background technique [0002] The Loongson-3 processor is a domestic high-performance general-purpose RISC processor developed by the Institute of Computing Technology, Chinese Academy of Sciences. It is based on the MIPS instruction-level set, and has high integration, high performance, low power consumption and low cost characteristic. Loongson 3 processors include quad-core Loongson 3A processors and octa-core Loongson 3B processors, which are mainly for high-performance machine applications and high-end servers. Fast Fourier Transform FFT (Fast Fourier Translation) is one of the most effective algorithms in the application of computer systems and digital systems, and is widely used in speech signal processing, image processing, power spectrum e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/14
Inventor 顾乃杰江国荐任开新
Owner 合肥康捷信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products