An image target recognition method based on a deep learning neural network model

CN119580068BActive Publication Date: 2026-06-23BEIJING AEROSPACE AUTOMATIC CONTROL RES INST

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING AEROSPACE AUTOMATIC CONTROL RES INST
Filing Date
2024-11-21
Publication Date
2026-06-23

Smart Images

  • Figure CN119580068B_ABST
    Figure CN119580068B_ABST
Patent Text Reader

Abstract

The application provides an image target recognition method, comprising: inputting images in an image set into a trained floating-point deep neural network model, and obtaining a value range of floating-point numbers output by each hidden layer of the floating-point deep neural network model; determining parameters in an asymmetric quantization relationship between the floating-point numbers output by each hidden layer of the floating-point deep neural network model and fixed-point numbers according to the value range of the floating-point numbers output by each hidden layer of the floating-point deep neural network model and in combination with a target bit width value of the fixed-point numbers after quantization; determining the asymmetric quantization relationship between the floating-point numbers output by each hidden layer and the fixed-point numbers as: after each hidden layer of the floating-point deep neural network model, an output floating-point number corresponding to each hidden layer is replaced by an asymmetric quantization relationship between the output floating-point number and the fixed-point number to obtain a fixed-point neural network model; inputting a to-be-detected image into the fixed-point neural network model, converting the floating-point numbers output by each hidden layer in the fixed-point neural network model into fixed-point numbers, and completing image target recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image recognition technology, and specifically relates to an image target recognition method based on a deep learning neural network model. Background Technology

[0002] Deep learning network models contain a large number of parameters and computations. During training on a server (e.g., on the ground), to ensure the accuracy of the deep learning network model, the initial convolutional kernels of each convolutional layer, the initial bias matrices of each convolutional layer, the initial weight matrices of fully connected layers, and the initial bias vectors of fully connected layers typically use 32-bit floating-point data format. When using a trained deep learning network model at the edge (e.g., in an aircraft), due to strict limitations in storage space and computing resources, it is necessary to use low-bit-width fixed-point data format for inference computation to reduce computational consumption while maintaining accuracy. The method of converting high-bit-width data in a deep learning network model into low-bit-width data is called quantization.

[0003] In deep learning neural networks, activation functions play a crucial role in activating hidden nodes to produce more desirable outputs. The Rectified Linear Unit (ReLU) is frequently used for the outputs of hidden neurons in neural networks. Especially in edge-deployed networks, ReLU computation is simple to implement, efficiently adaptable to hardware, and its sparse activation property helps reduce redundancy in the neural network, making it more efficient. Therefore, ReLU is often used in neural network algorithms deployed at the edge.

[0004] In traditional edge deep learning neural networks, such as Figure 1 As shown, the ReLU function and quantization module are two separate steps, and each convolutional layer requires both ReLU and quantization modules, resulting in a complex neural network structure, increased computational steps, and low computational efficiency. Deep learning neural networks are widely used in image (such as infrared and SAR images) target recognition. Image processing involves large amounts of data, requiring more computing power to meet the real-time requirements of target recognition. Therefore, it is necessary to provide efficient image target recognition methods. Summary of the Invention

[0005] In order to overcome the shortcomings of the existing technology, the inventors have conducted intensive research and provided an image target recognition method based on a deep learning neural network model. By converting the floating-point deep learning neural network model into a fixed-point neural network model, the method can ensure the efficiency of image target recognition while also achieving high accuracy.

[0006] The technical solution provided by this invention is as follows:

[0007] Firstly, an image target recognition method includes:

[0008] Images from the image set are input into the trained floating-point deep neural network model to obtain the value range of floating-point numbers output by each hidden layer of the floating-point deep neural network model; the floating-point deep neural network model is a deep neural network model whose model parameters use floating-point format and whose output data is in floating-point format;

[0009] Based on the range of floating-point values ​​of the output floating-point numbers of each hidden layer in the floating-point deep neural network model, and combined with the target bit width of the quantized fixed-point number, the parameters in the asymmetric quantization relationship between the output floating-point number and the fixed-point number of each hidden layer are determined.

[0010]

[0011] Z = -2 b-1

[0012] Where S represents the quantization step size, Z represents the quantization zero point, b represents the target bit width of the quantized fixed-point number, and R... max This indicates the maximum floating-point number output by the hidden layer;

[0013] The asymmetric quantization relationship between the output floating-point and fixed-point values ​​of each hidden layer is determined as follows: x float For floating-point numbers, x int For fixed-point numbers;

[0014] After each hidden layer of the floating-point deep neural network model, the ReLU function is replaced by the asymmetric quantization relationship between the output floating-point number and fixed-point number corresponding to each hidden layer to obtain the fixed-point neural network model.

[0015] The image to be tested is input into a fixed-point neural network model. The floating-point numbers output by each hidden layer in the fixed-point neural network model are converted into fixed-point numbers, and the image target recognition is completed.

[0016] In conjunction with the first aspect, in the step of inputting the images in the image set into the trained floating-point deep neural network model, the images in the image set and the images to be tested are of the same type.

[0017] In conjunction with the first aspect, the step of inputting the image to be tested into the fixed-point neural network model and converting the floating-point numbers output by each hidden layer of the fixed-point neural network model into fixed-point numbers includes:

[0018] Affine transformation:

[0019] Rounding: x2 = Round(x1), where Round represents the rounding function.

[0020] Truncation:

[0021] x1 and x2 are intermediate process variables.

[0022] Secondly, a method for target recognition in aircraft-side images includes:

[0023] Images from the image set are input into a floating-point deep neural network model trained on the server side to obtain the value range of floating-point numbers output by each hidden layer of the floating-point deep neural network model; the floating-point deep neural network model is a deep neural network model whose model parameters use floating-point number format and whose output data is in floating-point number format;

[0024] Based on the range of floating-point values ​​of the output floating-point numbers of each hidden layer in the floating-point deep neural network model, and combined with the target bit width of the quantized fixed-point number, the parameters in the asymmetric quantization relationship between the output floating-point number and the fixed-point number of each hidden layer are determined.

[0025]

[0026] Z = -2 b-1

[0027] Where S represents the quantization step size, Z represents the quantization zero point, b represents the target bit width of the quantized fixed-point number, and R... max This indicates the maximum floating-point number output by the hidden layer;

[0028] The asymmetric quantization relationship between the output floating-point and fixed-point values ​​of each hidden layer is determined as follows: x float For floating-point numbers, x int For fixed-point numbers;

[0029] After each hidden layer of the floating-point deep neural network model, the ReLU function is replaced by the asymmetric quantization relationship between the output floating-point number and fixed-point number corresponding to each hidden layer to obtain the fixed-point neural network model.

[0030] A fixed-point neural network model is configured on the aircraft. The aircraft inputs the image to be tested into the fixed-point neural network model. The floating-point numbers output by each hidden layer in the fixed-point neural network model are converted into fixed-point numbers, and the image target recognition is completed.

[0031] In conjunction with the second aspect, in the step of inputting the images in the image set into the floating-point deep neural network model trained on the server, the images in the image set and the images to be tested are of the same type.

[0032] In conjunction with the second aspect, the step of converting the floating-point numbers output by each hidden layer of the fixed-point neural network model into fixed-point numbers, where the aircraft inputs the image to be tested into the model, includes:

[0033] Affine transformation:

[0034] Rounding: x2 = Round(x1), where Round represents the rounding function.

[0035] Truncation:

[0036] x1 and x2 are intermediate process variables.

[0037] The image target recognition method based on a deep learning neural network model provided by the present invention has the following beneficial effects:

[0038] (1) The image target recognition method based on a deep learning neural network model provided by this invention considers that the ReLU activation function after each hidden layer of the deep neural network model is similar to the truncation of the lower boundary in the quantization truncation formula. The minimum floating-point value 0 is taken as Q after the quantization "affine transformation-rounding-truncation" steps. min This allows us to obtain specific quantization step sizes and quantization zeros, thereby deriving an updated asymmetric quantization relationship between floating-point and fixed-point numbers. Replacing the ReLU function with the updated asymmetric quantization relationship between floating-point and fixed-point numbers serves as both an activation function and a quantization module, effectively improving the efficiency of transmission and subsequent calculations.

[0039] (2) The present invention provides an image target recognition method based on a deep learning neural network model. The present invention replaces the ReLU function with the updated floating-point and fixed-point asymmetric quantization relationship, and converts the floating-point deep neural network model into a fixed-point neural network model. The fixed-point neural network model inherits the high accuracy of the floating-point deep neural network model, and improves the efficiency of image target recognition while also having high accuracy of image target recognition. Attached Figure Description

[0040] Figure 1 This describes the data processing flow for neurons in a traditional convolutional neural network.

[0041] Figure 2 The optimized data processing flow for convolutional neural network neurons in this invention. Detailed Implementation

[0042] The features and advantages of the present invention will become clearer and more apparent from the following detailed description.

[0043] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.

[0044] When running forward inference of a neural network at the edge, quantization is a necessary step for resource-constrained hardware, and ReLU is a common activation function in deep learning neural networks. The inventors have discovered that for hardware platforms that support asymmetric quantization, the ReLU activation function can be replaced by truncation computation during quantization, thereby reducing the number of network layers during forward inference, decreasing data transmission and computation, and improving the speed of forward inference.

[0045] For the asymmetric quantization part, the calculation relationship between floating-point numbers and fixed-point numbers for uniform and asymmetric quantization methods is as follows:

[0046] x float =S(x int -Z)

[0047] The conversion from floating-point numbers to fixed-point numbers includes the following steps:

[0048] 1. Affine transformation:

[0049] 2. Rounding: x2 = Round(x1), where Round represents the rounding function;

[0050] 3. Truncation:

[0051] Where S represents the quantization step size, Z represents the quantization zero point, and x float For floating-point numbers, x int For fixed-point numbers, x1 and x2 are intermediate process variables, and b is the target bit width value of the quantized fixed-point number. S and Z are calculated as follows:

[0052]

[0053]

[0054] R max and R min Q represents the maximum and minimum floating-point values ​​in the current tensor, respectively. max (2 b-1 -1) and Q min (-2 b-1 ) represents the maximum and minimum values ​​in the quantized fixed-point tensor.

[0055] In deep neural network models, activation functions are typically located after the output of each neuron, such as after convolutional layers and fully connected layers. Since computation at the edge requires quantization of the neuron's output to improve transmission and subsequent computation efficiency, asymmetric quantization can be used to replace the ReLU activation function. The truncation operation during quantization introduces nonlinearity, which can be considered a specific form of activation function. This invention introduces this truncated nonlinearity after the neuron's output to replace the traditional ReLU activation function, see [link to invention]. Figure 2 This approach preserves the model's expressive power to some extent while reducing computational costs.

[0056] For the ReLU activation function, its calculation is as follows:

[0057]

[0058] It can be observed that the process is similar to the truncation process in quantization, with the lower boundary being truncated. Therefore, it is worth considering adjusting the lower boundary of the truncation to replace the ReLU activation function.

[0059]

[0060] The input image data first passes through convolutional layers for computation. The output data of the convolutional layers is then processed by the ReLU activation function to obtain the neuron activation values. The neuron activation values ​​are typically 32-bit floating-point data, which are converted into b-bit fixed-point data through a quantization module (including affine transformation, rounding, and truncation). Considering that the ReLU activation function is usually connected after convolutional layers and batch normalization layers, the input tensor data distribution of the ReLU activation function typically exhibits a Gaussian distribution. Quantization usually achieves a balance between the range of values ​​and precision, thereby optimizing the overall performance of the network.

[0061] To ensure that the range of values ​​for the lower boundary after truncation is from Q min The range of values ​​for the ReLU activation function coincides with that of the minimum floating-point value 0. After the quantization process of "affine transformation-rounding-truncation", the value is Q. min Then, the quantized parameters S and Z are calculated, as shown in the following formula:

[0062]

[0063]

[0064] By using specific values ​​for S and Z, the lower boundary of quantization can be combined with the nonlinear part of the ReLU activation function, thereby replacing the ReLU activation function in the computation.

[0065] Let's take 8-bit asymmetric quantization as an example for explanation.

[0066] The first step is to input the image data into the server-side floating-point deep neural network model and statistically analyze the data range of each hidden layer output by the server-side floating-point deep neural network model, including the minimum value R. min and maximum value R max The parameters of the server-side floating-point deep neural network model are all in 32-bit floating-point format, and the output data is also in 32-bit floating-point format.

[0067] The second step is based on R. max Given the target bit width b of the quantized fixed-point number, determine the quantization step size S and quantization zero Z in the asymmetric quantization relationship between the output floating-point number and the fixed-point number of each hidden layer, S = (R max ) / (2^8-1), Z=-128.

[0068] The third step is to use the trained floating-point deep neural network model at the edge, and use the S and Z from the second step to determine the asymmetric quantization relationship between the floating-point and fixed-point outputs of each hidden layer, and replace the ReLU activation function after each hidden layer of the floating-point deep neural network model to obtain the updated fixed-point neural network model.

[0069] The fourth step involves inputting the image to be tested into the updated fixed-point neural network model. The floating-point numbers output by each hidden layer in the fixed-point neural network model are converted into fixed-point numbers, and the image target recognition is completed.

[0070] The present invention has been described in detail above with reference to specific embodiments and exemplary examples; however, these descriptions should not be construed as limiting the present invention. Those skilled in the art will understand that various equivalent substitutions, modifications, or improvements can be made to the technical solutions and embodiments of the present invention without departing from the spirit and scope of the invention, and all such modifications and improvements fall within the scope of the present invention. The scope of protection of the present invention is defined by the appended claims.

[0071] The contents not described in detail in this specification are common knowledge to those skilled in the art.

Claims

1. An image target recognition method, characterized in that, include: Images from the image set are input into the trained floating-point deep neural network model to obtain the value range of floating-point numbers output by each hidden layer of the floating-point deep neural network model; the floating-point deep neural network model is a deep neural network model whose model parameters use floating-point format and whose output data is in floating-point format; Based on the range of floating-point values ​​of the output floating-point numbers of each hidden layer in the floating-point deep neural network model, and combined with the target bit width of the quantized fixed-point number, the parameters in the asymmetric quantization relationship between the output floating-point number and the fixed-point number of each hidden layer are determined. in, S Z represents the quantization step size, and Z represents the quantization zero. b This represents the target bit width value of the fixed-point number after quantization. This indicates the maximum floating-point number output by the hidden layer; The asymmetric quantization relationship between the output floating-point and fixed-point values ​​of each hidden layer is determined as follows: , For floating-point numbers, For fixed-point numbers; After each hidden layer of the floating-point deep neural network model, the ReLU function is replaced by the asymmetric quantization relationship between the output floating-point number and fixed-point number corresponding to each hidden layer to obtain the fixed-point neural network model. The image to be tested is input into a fixed-point neural network model. The floating-point numbers output by each hidden layer in the fixed-point neural network model are converted into fixed-point numbers, and the image target recognition is completed. The step of inputting the image to be tested into the fixed-point neural network model and converting the floating-point numbers output by each hidden layer of the fixed-point neural network model into fixed-point numbers includes: Affine transformation: Rounding: , Round Represented as a rounding function Truncation: in x 1 and x 2 represents an intermediate process variable.

2. The image target recognition method according to claim 1, characterized in that, In the step of inputting the images from the image set into the trained floating-point deep neural network model, the images in the image set and the images to be tested are of the same type.

3. A method for target recognition in aircraft end-images, characterized in that, include: Images from the image set are input into a floating-point deep neural network model trained on the server side to obtain the value range of floating-point numbers output by each hidden layer of the floating-point deep neural network model; the floating-point deep neural network model is a deep neural network model whose model parameters use floating-point number format and whose output data is in floating-point number format; Based on the range of floating-point values ​​of the output floating-point numbers of each hidden layer in the floating-point deep neural network model, and combined with the target bit width of the quantized fixed-point number, the parameters in the asymmetric quantization relationship between the output floating-point number and the fixed-point number of each hidden layer are determined. in, S Z represents the quantization step size, and Z represents the quantization zero. b This represents the target bit width value of the fixed-point number after quantization. This indicates the maximum floating-point number output by the hidden layer; The asymmetric quantization relationship between the output floating-point and fixed-point values ​​of each hidden layer is determined as follows: , For floating-point numbers, For fixed-point numbers; After each hidden layer of the floating-point deep neural network model, the ReLU function is replaced by the asymmetric quantization relationship between the output floating-point number and fixed-point number corresponding to each hidden layer to obtain the fixed-point neural network model. A fixed-point neural network model is configured on the aircraft side. The aircraft side inputs the image to be tested into the fixed-point neural network model. The floating-point numbers output by each hidden layer in the fixed-point neural network model are converted into fixed-point numbers, and the image target recognition is completed. The step of converting the floating-point numbers output by each hidden layer of the fixed-point neural network model into fixed-point numbers, where the aircraft inputs the image to be tested into the model, includes: Affine transformation: Rounding: , Round Represented as a rounding function Truncation: in x 1 and x 2 represents an intermediate process variable.

4. The aircraft-end image target recognition method according to claim 3, characterized in that, In the step of inputting the images in the image set into the floating-point deep neural network model trained on the server, the images in the image set and the images to be tested are of the same type.