Optimization method based on 4-bit common convolution calculation

An optimization method and convolution technology, applied in the field of image recognition, can solve problems such as slow speed, and achieve the effect of speed improvement, simple steps, and optimization of existing technologies

Pending Publication Date: 2022-06-03
北京君正集成电路股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And on the Ingenic chip, for example, if you us...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimization method based on 4-bit common convolution calculation
  • Optimization method based on 4-bit common convolution calculation
  • Optimization method based on 4-bit common convolution calculation

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example approach

[0353] At this time, the simd instruction algorithm is about 20 times higher than the pure C algorithm.

[0354] Specifically, as Figure 8 As shown, the method according to the first embodiment of the present invention includes the following steps:

[0355] S1, let the input data indata be a set of input depth in_depth 32, width in_width 512, height in_height 512 data; convolution kernel data filter_data is a set of output depth out_depth 128, input depth in_depth 32, which is the same as the input The data depth is the same, the convolution kernel width ft_w is 3, and the convolution kernel height ft_h is 3 data;

[0356] Let the output data be the structure of the feature map outdata: the depth is out_depth, the width is out_width, and the height is out_height; in the convolution calculation, there is a step size, and the step size is set as stride;

[0357] Set simd type variables: sum_0, sum_1, in_value, in_0, ft_0, vrt1, vrt2, muls, mul_0, mul_1; other param...

no. 2 example approach

[0418] S7.3, perform fn=fn+1, and return to step S7.1.

[0419] like Figure 11 As shown in the second embodiment of the present invention, the method can also be the following steps:

[0420] S1, let the input data indata be a set of input depth in_depth 32, width in_width 512, height in_height 512 data; convolution kernel data filter_data is a set of output depth out_depth 128, input depth in_depth 32, which is the same as the input The data depth is the same, the convolution kernel width ft_w is 3, and the convolution kernel height ft_h is 3 data;

[0421] Let the output data be the structure of the feature map outdata: the depth is out_depth, the width is out_width, and the height is out_height; in the convolution calculation, there is a step size, and the step size is set as stride;

[0422] Set simd type variables: sum_0, sum_1, sum_20, sum_21, in_value, in_value1, in_0, in_1, ft_0, vrt1, vrt2, mul_0, other parameters are pointers or specific conventional da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an optimization method based on 4-bit common convolution calculation, and the method comprises the steps: adding simd instruction operation in a cycle of an innermost layer of an algorithm in a complete convolution calculation process; data are loaded in the circulation of the innermost layer through a data loading simd instruction, and after the data are loaded, the data are not loaded repeatedly in the register all the time; in the circulation of the innermost layer, the repeated use of the data is realized by copying a simd instruction; and 8 pieces of 16-bit data are finally stored in a 128-bit register through a multiplication simd instruction, a selection simd instruction and a shift simd instruction. According to the method, 16 pieces of data are loaded at a time and 16 results are calculated at a time, one piece of data in the data loaded at each time is copied to a variable of a simd instruction, 8-bit multiplication simd instruction calculation is carried out, and after 16-bit conversion is carried out, simd instruction accumulation calculation is carried out; the multiplication and accumulation are realized in the innermost layer circulation of the algorithm. The method is simple, and the speed is increased by about 10-20 times compared with that of a pure C algorithm.

Description

technical field [0001] The invention relates to the technical field of image recognition, in particular to an optimization method based on 4-bit ordinary convolution calculation. Background technique [0002] With the development of the times, the application of image recognition technology has become more and more common. There are also various optimization methods for image recognition. In particular, the optimization of convolution calculation, the current optimization method includes, for example, an optimization method based on the design of simd instruction sets of T and X series chips such as Beijing Insignia T30 and T31. This algorithm is suitable for operations on vector (vector) instructions. The registers of T30 and T31 are 128-bit registers, and the number of registers is limited, but the number of registers should be considered in the optimization design. And for example, on the Beijing Insignia chip, the speed will be very slow if the C program is used direc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06T1/00
CPCG06T1/00Y02D10/00
Inventor 田凤彬于晓静
Owner 北京君正集成电路股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products