FFTW3 optimization method based on loongson 3B processor

An optimization method, Godson's technology, applied in the direction of complex mathematical operations, etc., can solve the problems of not being able to make good use of the characteristics of the Godson 3B processor, and the optimization of the Godson 3B processor.

Active Publication Date: 2014-07-02
INST OF ADVANCED TECH UNIV OF SCI & TECH OF CHINA
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Currently, FFTW3 is not optimized for the Loongson 3B processor, so the general FFTW3 simply transpla

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • FFTW3 optimization method based on loongson 3B processor
  • FFTW3 optimization method based on loongson 3B processor
  • FFTW3 optimization method based on loongson 3B processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The purpose of the present invention is to propose an optimization method of FFTW3 to overcome the problem that the general FFTW3 is not optimized for the hardware characteristics of the Godson 3B processor and the operating performance is not high.

[0047] Loongson 3B processor supports MIPS64 instruction set and Loongson extended instruction set, 9-stage super-pipeline structure, four-issue out-of-order execution structure, 2 fixed-point units, 2 floating-point units and 1 memory access unit, each floating-point unit Supports 256-bit vector operations. The present invention transplants the latest version FFTW3fftw-3.3.3 to the Godson 3B processor.

[0048] The FFTW3 optimization method of Loongson 3B uses the vector instruction method, the Cooley-Tukey algorithm, and the separate calculation and processing method of the real part and the imaginary part to optimize the discrete Fourier transform function according to the following conditions;

[0049] Situation 1: Op...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an FFTW3 optimization method based on a loongson 3B processor. The FFTW3 optimization method is characterized by comprising the steps of utilizing a vector quantity instruction method and a Cooley-Tukey algorithm for optimization in complex number discrete Fourier transform with the calculation scale being a sum, and utilizing the vector quantity instruction method and a real part and imaginary part individual processing method for optimization in real number discrete Fourier transform calculation. According to the FFTW3 optimization method based on the loongson 3B processor, the running performance of FFTW3 on the loongson 3B processor can be effectively improved, and therefore the FFTW3 can be efficiently obtained on the loongson 3B processor.

Description

technical field [0001] The invention belongs to the technical field of electrical digital data processing, and in particular relates to an FFTW3 optimization method on a Loongson 3B processor. Background technique [0002] Loongson 3B is the first domestic commercial 8-core processor with a main frequency of 1GHz, supports vector computing acceleration, and a peak computing capacity of 128GFLOPS, with a high performance-to-power ratio. Loongson 3B is mainly used in high-performance computers, high-performance servers, digital signal processing and other fields. FFTW (the Faster Fourier Transform in the West) is a standard C language program set for fast calculation of discrete Fourier transform, developed by M.Frigo and S.Johnson of MIT, which can calculate one-dimensional or multi-dimensional real and complex data and Discrete Fourier Transform (DFT, Discrete Fourier Transform) of any scale. FFTW3 is a new version developed on the basis of FFTW. It adds parallel transform...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/14
Inventor 顾乃杰王小乐张明任开新
Owner INST OF ADVANCED TECH UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products