Image processing methods, apparatus, electronic devices and storage media

By combining feature extraction layer, attention layer and normalization layer, the initial sub-image features are weighted using feature weights, which solves the problem of reduced neural network accuracy caused by normalization layer and achieves higher image processing accuracy and effect.

CN115830429BActive Publication Date: 2026-06-30BEIJING XIAOMI MOBILE SOFTWARE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Filing Date
2022-12-09
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, while normalization layers in neural networks can help accelerate the training process, too many can lead to reduced accuracy and affect image processing performance.

Method used

By combining feature extraction layer, attention layer and normalization layer, the initial sub-image features are weighted using feature weights, which reduces the regularization effect of the normalization layer and improves the accuracy of the image processing network.

Benefits of technology

It improves the accuracy and image processing performance of neural networks, reduces the impact of regularization in the normalization layer, and enables the network to converge faster.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115830429B_ABST
    Figure CN115830429B_ABST
Patent Text Reader

Abstract

This disclosure relates to an image processing method, apparatus, electronic device, and storage medium. The image processing method includes: extracting initial sub-image features corresponding to each feature channel from an image to be processed through multiple feature channels of a feature extraction layer; inputting the initial sub-image features into an attention layer to obtain feature weights corresponding to the initial sub-image features; normalizing the initial sub-image features using a normalization layer to obtain intermediate sub-image features corresponding to the initial sub-image features; weighting the intermediate sub-image features corresponding to the initial sub-image features based on the feature weights corresponding to the initial sub-image features to obtain weighted sub-image features; and inputting each weighted sub-image feature into a feature processing layer for feature processing to obtain the image processing result of the image to be processed. This disclosure can improve the accuracy of a target image processing network and enhance the image processing effect using the target image processing network.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer vision technology, and in particular to an image processing method, apparatus, electronic device and storage medium. Background Technology

[0002] Computer vision, a crucial technology in the field of artificial intelligence, uses neural networks to process images and extract their feature information, thereby replacing the human eye in detecting, classifying, tracking, and measuring target objects. In related technologies, normalization layers within neural networks can address the internal covariance shift problem during training, making the neural network easier to train and converge faster.

[0003] However, normalization layers have a certain regularization effect. Too many normalization layers can actually reduce the accuracy of the neural network, resulting in poor image processing performance when using the neural network. Summary of the Invention

[0004] To overcome the problems existing in related technologies, this disclosure provides an image processing method, apparatus, electronic device, and storage medium, which can improve the accuracy of a target image processing network and enhance the effect of image processing using the target image processing network.

[0005] According to a first aspect of the present disclosure, an image processing method is provided, the method comprising:

[0006] The image to be processed is input into the feature extraction layer of the target image processing network for feature extraction. Through multiple feature channels of the feature extraction layer, initial sub-image features corresponding to each feature channel are extracted from the image to be processed.

[0007] The initial sub-image features are input into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features;

[0008] The initial sub-image features are input into the normalization layer of the target image processing network, and the initial sub-image features are normalized using the normalization layer to obtain intermediate sub-image features corresponding to the initial sub-image features.

[0009] Based on the feature weights corresponding to the initial sub-image features, the intermediate sub-image features corresponding to the initial sub-image features are weighted to obtain weighted sub-image features;

[0010] The weighted sub-image features are input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0011] In some embodiments, the step of weighting the intermediate sub-image features corresponding to the initial sub-image based on the feature weights corresponding to the initial sub-image features to obtain weighted sub-image features includes:

[0012] The weighted feature parameters are obtained by multiplying the feature weights corresponding to the initial sub-image features with the feature parameters of the intermediate sub-image features corresponding to the initial sub-image features.

[0013] The weighted sub-image features are obtained based on each of the weighted feature parameters.

[0014] In some embodiments, the normalization process performed on the initial sub-image features using the normalization layer to obtain intermediate sub-image features corresponding to the initial sub-image features includes:

[0015] Using the normalization layer, based on the feature channels corresponding to each of the initial sub-image features, normalization parameters corresponding to each of the initial sub-image features are determined from a preset database; wherein, the preset data stores the normalization parameters corresponding to each of the feature channels, and the normalization parameters are determined when the target image processing network is trained;

[0016] The feature parameters of the initial sub-image features are adjusted based on the normalization parameters corresponding to the initial sub-image features to obtain the intermediate sub-image features; wherein the feature parameters of the intermediate sub-image features conform to a preset distribution law.

[0017] In some embodiments, the step of inputting the initial sub-image features into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features includes:

[0018] Using the attention layer, the compressed feature parameters corresponding to the initial sub-image features are determined based on the average value of the feature parameters of the initial sub-image features;

[0019] The compressed feature parameters corresponding to the initial sub-image features are input into the fully connected layer in the attention layer to obtain the feature weights corresponding to the initial sub-image features.

[0020] In some embodiments, the network parameters of the target image processing network include: a target scaling parameter and a target offset parameter, wherein the target scaling parameter represents the scaling ratio of the initial feature parameters of the weighted sub-image features; and the target offset parameter represents the offset amount of the initial feature parameters of the weighted sub-image features; the method includes:

[0021] Based on the target ratio adjustment parameters, the initial feature parameters of the weighted sub-image features are scaled to obtain the intermediate feature parameters of the weighted sub-image features.

[0022] Based on the target offset adjustment parameter, the intermediate feature parameters of the weighted sub-image features are offset to obtain the target feature parameters of the weighted sub-image features;

[0023] Based on the target feature parameters of the weighted sub-image features, the adjusted weighted sub-image features are obtained;

[0024] The step of inputting the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed includes:

[0025] The adjusted weighted sub-image features are input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0026] In some embodiments, the method further includes:

[0027] Each labeled training image in the training image set is sequentially input into the initial image processing network to obtain the prediction processing result of each labeled training image.

[0028] Based on the difference between the annotation labels of the labeled training images and the prediction processing results, the network loss value corresponding to the labeled training images is determined.

[0029] Based on the network loss values ​​corresponding to each of the labeled training images, the total loss value of the initial image processing network is determined;

[0030] The network parameters of the initial image processing network are adjusted based on the total loss value to obtain the target image processing network.

[0031] In some embodiments, the feature processing layer includes a classifier; the step of inputting the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed includes:

[0032] The weighted sub-image features are input into the classifier for classification processing to obtain the classification result of the image to be processed.

[0033] According to a second aspect of the present disclosure, an image processing apparatus is provided, the apparatus comprising:

[0034] The feature extraction module is configured to input the image to be processed into the feature extraction layer of the target image processing network for feature extraction, and extract initial sub-image features corresponding to each feature channel from the image to be processed through multiple feature channels of the feature extraction layer.

[0035] The weight determination module is configured to input the initial sub-image features into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features;

[0036] The normalization module is configured to input the initial sub-image features into the normalization layer of the target image processing network, and use the normalization layer to normalize the initial sub-image features to obtain intermediate sub-image features corresponding to the initial sub-image features.

[0037] The weighting module is configured to perform weighted processing on the intermediate sub-image features corresponding to the initial sub-image features based on the feature weights corresponding to the initial sub-image features, to obtain weighted sub-image features;

[0038] The feature processing module is configured to input the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0039] In some embodiments, the weight determination module is configured as follows:

[0040] The weighted feature parameters are obtained by multiplying the feature weights corresponding to the initial sub-image features with the feature parameters of the intermediate sub-image features corresponding to the initial sub-image features.

[0041] The weighted sub-image features are obtained based on each of the weighted feature parameters.

[0042] In some embodiments, the normalization module is configured as follows:

[0043] Using the normalization layer, based on the feature channels corresponding to each of the initial sub-image features, normalization parameters corresponding to each of the initial sub-image features are determined from a preset database; wherein, the preset data stores the normalization parameters corresponding to each of the feature channels, and the normalization parameters are determined when the target image processing network is trained;

[0044] The feature parameters of the initial sub-image features are adjusted based on the normalization parameters corresponding to the initial sub-image features to obtain the intermediate sub-image features; wherein the feature parameters of the intermediate sub-image features conform to a preset distribution law.

[0045] In some embodiments, the weight determination module is configured as follows:

[0046] Using the attention layer, the compressed feature parameters corresponding to the initial sub-image features are determined based on the average value of the feature parameters of the initial sub-image features;

[0047] The compressed feature parameters corresponding to the initial sub-image features are input into the fully connected layer in the attention layer to obtain the feature weights corresponding to the initial sub-image features.

[0048] In some embodiments, the network parameters of the target image processing network include: a target scaling parameter and a target offset parameter, wherein the target scaling parameter represents the scaling ratio of the initial feature parameters of the weighted sub-image features; and the target offset parameter represents the offset amount of the initial feature parameters of the weighted sub-image features; the apparatus includes:

[0049] The scaling module is configured to scale the initial feature parameters of the weighted sub-image features based on the target ratio adjustment parameters to obtain the intermediate feature parameters of the weighted sub-image features.

[0050] The offset module is configured to perform offset processing on the intermediate feature parameters of the weighted sub-image features based on the target offset adjustment parameters, so as to obtain the target feature parameters of the weighted sub-image features;

[0051] The processing module is configured to obtain adjusted weighted sub-image features based on the target feature parameters of the weighted sub-image features;

[0052] The feature processing module is configured to input the adjusted weighted sub-image features into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0053] In some embodiments, the apparatus further includes:

[0054] The prediction module is configured to sequentially input each labeled training image in the training image set into the initial image processing network to obtain the prediction processing result of each labeled training image.

[0055] The first loss value determination module is configured to determine the network loss value corresponding to the labeled training image based on the difference between the label of the labeled training image and the prediction processing result.

[0056] The second loss value determination module is configured to determine the total loss value of the initial image processing network based on the network loss value corresponding to each of the labeled training images;

[0057] The parameter adjustment module is configured to adjust the network parameters of the initial image processing network based on the total loss value to obtain the target image processing network.

[0058] In some embodiments, the feature processing layer includes a classifier;

[0059] The feature processing module is configured to input the features of each weighted sub-image into the classifier for classification processing to obtain the classification result of the image to be processed.

[0060] According to a third aspect of the present disclosure, an electronic device is provided, comprising:

[0061] processor;

[0062] Memory used to store processor-executable instructions;

[0063] The processor is configured to implement the steps of any of the image processing methods in the first aspect above when executing.

[0064] According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, wherein when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to perform the steps of any of the image processing methods in the first aspect described above.

[0065] The technical solutions provided by the embodiments of this disclosure may include the following beneficial effects:

[0066] In this disclosure, firstly, the normalization layer in the target image processing network can improve the accuracy of the target image processing network and make it more convenient for the target image processing network to process data; secondly, by determining the feature weights corresponding to each initial sub-image feature through the attention layer in the target image processing network, and by weighting the intermediate sub-image features corresponding to each initial sub-image feature, the regularization effect brought by the normalization layer can be mitigated, so that initial sub-image features of different importance can play different roles in the image processing process, thereby making the target image processing network more accurate and achieving better image processing results using the target image processing network.

[0067] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0068] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.

[0069] Figure 1This is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure;

[0070] Figure 2 This is a partial structural schematic diagram of a target image processing network according to an exemplary embodiment of the present disclosure;

[0071] Figure 3 This is a graph showing the loss values ​​of the original ResNet network, the ResNet network with some normalization layers removed, and the target ResNet network according to an exemplary embodiment of this disclosure.

[0072] Figure 4 These are accuracy curves of the original ResNet network, the ResNet network with some normalization layers removed, and the target ResNet network, as illustrated in an exemplary embodiment of this disclosure.

[0073] Figure 5 This is a structural block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;

[0074] Figure 6 This is a hardware structure block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Figure 1 ;

[0075] Figure 7 This is a hardware structure block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Figure 2 . Detailed Implementation

[0076] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.

[0077] Figure 1 This is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure, such as... Figure 1 As shown, the image processing method mainly includes the following steps:

[0078] Step 110: Input the image to be processed into the feature extraction layer of the target image processing network for feature extraction. Through the multiple feature channels of the feature extraction layer, extract the initial sub-image features corresponding to each feature channel from the image to be processed.

[0079] Step 120: Input the initial sub-image features into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features;

[0080] Step 130: Input the initial sub-image features into the normalization layer of the target image processing network, and use the normalization layer to normalize the initial sub-image features to obtain intermediate sub-image features corresponding to the initial sub-image features.

[0081] Step 140: Based on the feature weights corresponding to the initial sub-image features, perform weighted processing on the intermediate sub-image features corresponding to the initial sub-image features to obtain weighted sub-image features;

[0082] Step 150: Input the weighted sub-image features into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0083] It should be noted that the image processing method proposed in this disclosure can be applied to electronic devices or servers. Here, electronic devices can include terminal devices, such as mobile terminals or fixed terminals. Mobile terminals can include devices such as mobile phones, tablets, and laptops. Fixed terminals can include desktop computers or smart TVs.

[0084] Here, the image to be processed can be any type of image. For example, it can be a grayscale image or a color image; it can be an image containing people or an image containing buildings; it can be an image containing text or an image without text.

[0085] In some embodiments, when the image processing method is applied to an electronic device, the image to be processed can be obtained from the local database of the electronic device, or from a cloud server, or from images to be processed from other electronic devices via a communication module.

[0086] The target image processing network can be a pre-trained neural network, which can be used to perform image processing such as classification, target detection, target recognition, or image segmentation on the input image to be processed. In some embodiments, the target image processing network can be a residual network (ResNet), a convolutional neural network (CNN), or a region-CNN (R-CNN), etc.

[0087] The target image processing network may include a feature extraction layer, which can be a convolutional layer or a combination of convolutional and pooling layers. The feature extraction layer is used to extract features from the input image to obtain its image features. Image features may include color features, texture features, shape features, or spatial relationship features. These image features can be represented as feature vectors, feature maps, or two-dimensional matrices corresponding to the feature maps.

[0088] It is understood that the feature extraction layer can include multiple feature extraction channels (e.g., convolutional kernels), and different image features can be extracted through different feature extraction channels. Therefore, in this embodiment, initial sub-image features corresponding to each feature channel can be extracted from the image to be processed through the multiple feature channels of the feature extraction layer.

[0089] In some embodiments, initial sub-image features in matrix form corresponding to each convolution kernel in the feature extraction layer can be obtained by performing convolution calculations with the pixel matrix of the image to be processed.

[0090] In some embodiments, multiple initial sub-image features can be combined to form a multi-dimensional initial image feature. For example, each feature channel can extract an initial sub-image feature in the form of a two-dimensional matrix, so three feature channels can extract three initial sub-image features in the form of two-dimensional matrices, and these three initial sub-image features in the form of three-dimensional matrices can be combined to form an initial image feature in the form of a three-dimensional matrix.

[0091] After extracting initial sub-image features corresponding to each feature channel from the image to be processed through multiple feature channels of the feature extraction layer, the extracted initial sub-image features can be input into the attention layer of the target image processing network.

[0092] Here, the attention layer can be a visual attention layer. In some embodiments, the attention layer may include a Squeeze-and-Excitation Network (SEnet) or a Convolutional Block Attention Module (CBAM).

[0093] The attention layer can be used to determine the feature weights corresponding to each initial sub-image feature. After inputting each initial sub-image feature into the attention layer of the target image processing network, the attention layer can determine the feature weights corresponding to each initial sub-image feature based on the feature parameters in each input initial sub-image feature and the correlation between the initial sub-image features. Here, each initial sub-image feature can include multiple feature parameters. For example, when the initial sub-image feature is a two-dimensional matrix, this two-dimensional matrix can be composed of multiple feature parameters.

[0094] It is understandable that the feature weights corresponding to each initial sub-image feature can characterize the importance of each initial sub-image feature in the image processing process. Initial sub-image features with higher importance can correspond to larger feature weights, while initial sub-image features with lower importance can correspond to smaller feature weights.

[0095] The feature weights corresponding to the initial sub-image features determined by the attention layer can be used to weight each initial sub-image feature. This allows initial sub-image features of different importance to play different roles in the image processing process. For example, the weights of features with higher importance can play a greater role in the image processing process, thereby making the target image processing network more accurate and achieving better image processing results.

[0096] In some embodiments, the features of each initial sub-image can be input into the normalization layer of the target image processing network for normalization processing. Here, the normalization layer may include a batch normalization layer (BN), a layer normalization layer (LN), or an instance normalization layer (IN), etc.

[0097] The normalization layer can be used to normalize the initial sub-image features of the input. For example, it can adjust each feature parameter in the initial sub-image features to a preset value range, such as adjusting each feature parameter to (0, 1); or it can adjust each feature parameter in the initial sub-image features so that each feature parameter conforms to a preset distribution law, such as making each feature parameter conform to a distribution law with a mean of 0 and a variance of 1.

[0098] After inputting the initial sub-image features into the normalization layer, the normalization layer normalizes each initial sub-image feature to obtain the intermediate sub-image features corresponding to each initial sub-image feature, and outputs each intermediate sub-image feature. Here, the intermediate sub-image features are the normalized initial sub-image features.

[0099] When using a target image processing network, normalizing the initial sub-image features through a normalization layer can make subsequent feature weighting or feature processing more convenient.

[0100] Understandably, while normalization layers have a certain regularization effect, too many normalization layers can actually slow down the convergence speed of the neural network, resulting in lower accuracy. However, in this embodiment, by determining the feature weights corresponding to each initial sub-image feature through an attention layer, and then using the feature weights corresponding to the initial sub-image features to weight the intermediate sub-image features corresponding to the initial sub-image features, the regularization effect of the normalization layer can be mitigated. This allows the convergence speed of the target image processing network to be accelerated through the normalization layer, while simultaneously improving the accuracy of the target image processing network.

[0101] After obtaining the normalized initial sub-image features (i.e., intermediate sub-image features), the intermediate sub-image features corresponding to the initial sub-image features obtained from the attention layer can be weighted to obtain weighted sub-image features. Here, the weighted sub-image features are the weighted intermediate sub-image features.

[0102] In some embodiments, the step of weighting the intermediate sub-image features corresponding to the initial sub-image features based on the feature weights corresponding to the initial sub-image features to obtain weighted sub-image features may include:

[0103] The weighted feature parameters are obtained by multiplying the feature weights corresponding to the initial sub-image features with the feature parameters of the intermediate sub-image features corresponding to the initial sub-image features.

[0104] The weighted sub-image features are obtained based on each of the weighted feature parameters.

[0105] Here, each initial sub-image feature can correspond to a feature weight, and each initial sub-image feature can correspond to an intermediate sub-image feature. Furthermore, the feature weight corresponding to the initial sub-image feature can correspond to the intermediate sub-image feature corresponding to the initial sub-image feature. In other words, the feature parameters in the intermediate sub-image feature can be weighted using the feature weight corresponding to the intermediate sub-image feature.

[0106] In this embodiment, each feature parameter in the intermediate sub-image features can be multiplied by a corresponding feature weight to obtain a weighted feature parameter. After weighting multiple feature parameters in the intermediate sub-image features to obtain multiple weighted feature parameters, the weighted sub-image features can be obtained based on the multiple weighted feature parameters.

[0107] Taking the intermediate sub-image features and weighted sub-image features as two-dimensional matrix forms as an example, the feature weights corresponding to the intermediate sub-image features (i.e., the intermediate feature matrix) can be multiplied with each feature parameter in the intermediate feature matrix to obtain a weighted feature matrix (i.e., the weighted sub-image features) containing each weighted feature parameter. The position of each weighted feature parameter in the weighted feature matrix corresponds to the position of each feature parameter in the intermediate feature matrix.

[0108] For example, in the intermediate feature matrix is When the feature weights corresponding to the intermediate feature matrix are x, t can be multiplied by the feature parameters a, b, c, and d in the intermediate feature matrix to obtain the weighted feature parameters ta, tb, tc, and td. The weighted feature matrix can then be obtained from these weighted feature parameters ta, tb, tc, and td. The position of the weighted feature parameter ta in the weighted feature matrix corresponds to the position of the feature parameter a in the intermediate feature matrix, the position of the weighted feature parameter tb in the weighted feature matrix corresponds to the position of the feature parameter b in the intermediate feature matrix, and the position of the weighted feature parameter tc in the weighted feature matrix corresponds to the position of the feature parameter c in the intermediate feature matrix.

[0109] In this way, by multiplying the feature weights corresponding to the initial sub-image features with each feature parameter in the intermediate sub-image features corresponding to the initial sub-image features, the weights of each feature parameter in the intermediate sub-image features are weighted, thus obtaining weighted sub-image features. This allows each weighted sub-image feature to play a different role in the image processing process, thereby improving the accuracy of the target image processing network and achieving better image processing results using the target image processing network.

[0110] After obtaining the features of each weighted sub-image, the features of each weighted sub-image can be input into the feature processing layer of the target image processing network. The feature extraction layer then performs feature processing on the input features of each weighted sub-image to obtain the processing result of the image to be processed.

[0111] For example, when a target image processing network is used for image classification, the feature processing layer can be a classifier, which outputs the classification result of the image to be processed based on the weighted features of each input sub-image. When a target image processing network is used for object detection, the feature processing layer can be an object detection layer, which outputs the object detection structure of the image to be processed based on the weighted features of each input sub-image. When a target image processing network is used for image segmentation, the feature processing layer can be an image segmentation layer, which outputs the image segmentation result of the image to be processed based on the weighted features of each input sub-image.

[0112] In some embodiments, the feature processing layer includes a classifier; the step of inputting the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed includes:

[0113] The weighted sub-image features are input into the classifier for classification processing to obtain the classification result of the image to be processed.

[0114] Here, after inputting the weighted sub-image features into the classifier, the classifier performs a non-linear combination of at least one weighted sub-image feature to obtain the classification result of the image to be processed.

[0115] In some embodiments, the classifier may include a fully connected layer and a logistic regression layer, such as a Softmax logistic regression layer. After inputting the features of each weighted sub-image into the classifier, the fully connected layer of the classifier performs a non-linear combination of at least one weighted sub-image feature to obtain a feature vector of the image to be processed. This feature vector is then input into the logistic regression layer. Based on the feature vector of the image to be processed, the logistic regression layer obtains the predicted probability distribution of the image to be processed for each of the preset types, thereby obtaining the classification result.

[0116] In some embodiments, the classification result may be the predicted probability distribution of each type of the labeled image in a preset type, or it may be the type corresponding to the maximum probability value in the predicted probability distribution.

[0117] Thus, since each weighted sub-image feature is an image feature obtained by weighting each intermediate sub-image feature (i.e., the normalized initial sub-image feature) with corresponding feature weights, firstly, normalizing the initial sub-image features through the normalization layer can improve the accuracy of the target processing network; secondly, the classifier can more accurately classify images based on weighted sub-image features with different weights.

[0118] In this disclosure, the image to be processed is input into the feature extraction layer of the target image processing network for feature extraction. Initial sub-image features corresponding to each feature channel are extracted from the image to be processed through multiple feature channels of the feature extraction layer. These initial sub-image features are then input into the attention layer of the target image processing network to obtain feature weights corresponding to the initial sub-image features. The initial sub-image features are then input into the normalization layer of the target image processing network to perform normalization processing on the initial sub-image features, obtaining intermediate sub-image features corresponding to the initial sub-image features. Based on the feature weights corresponding to the initial sub-image features, the intermediate sub-image features corresponding to the initial sub-image features are weighted to obtain weighted sub-image features. Finally, each weighted sub-image feature is input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0119] In this disclosure, firstly, the normalization layer in the target image processing network can improve the accuracy of the target image processing network and make it more convenient for the target image processing network to process data; secondly, by determining the feature weights corresponding to each initial sub-image feature through the attention layer in the target image processing network, and by weighting the intermediate sub-image features corresponding to each initial sub-image feature, the regularization effect brought by the normalization layer can be mitigated, so that initial sub-image features of different importance can play different roles in the image processing process, thereby making the target image processing network more accurate and achieving better image processing results using the target image processing network.

[0120] In some embodiments, the method further includes:

[0121] Each labeled training image in the training image set is sequentially input into the initial image processing network to obtain the prediction processing result of each labeled training image.

[0122] Based on the difference between the annotation labels of the labeled training images and the prediction processing results, the network loss value corresponding to the labeled training images is determined.

[0123] Based on the network loss values ​​corresponding to each of the labeled training images, the total loss value of the initial image processing network is determined;

[0124] The network parameters of the initial image processing network are adjusted based on the total loss value to obtain the target image processing network.

[0125] In some embodiments, the labeled training images can be divided into multiple training image sets, each with the same number of images. The initial image processing network is then trained in batches using each training image set; that is, the network is trained once for each input training image set. Here, the labels on the labeled training images can be the actual processing results, such as the actual classification results.

[0126] During the training of the initial image processing network, each labeled training image in the training image set can be input into the initial image processing network in sequence to obtain the prediction processing result of each labeled training image. Then, the difference between the label of each labeled training image (i.e. the actual processing result) and the prediction processing result of each labeled training image is used to determine the network loss value corresponding to each labeled training image. Finally, based on the network loss value corresponding to each labeled training image, the total loss value in this training process (i.e. the total loss value corresponding to the training image set) is determined.

[0127] In some embodiments, the average value of the network loss values ​​corresponding to multiple labeled training images in the training image set can be determined as the total loss value corresponding to the training image set. Alternatively, the maximum value among the network loss values ​​corresponding to multiple labeled training images in the training image set can be determined as the total loss value corresponding to the training image set. Or, the sum of the network loss values ​​corresponding to multiple labeled training images in the training image set can be determined as the total loss value corresponding to the training image set.

[0128] After obtaining the total loss value of the initial image processing network during this training process, the network parameters of the initial image processing network can be adjusted through backpropagation to optimize the initial image processing network until the convergence condition is met, thus obtaining the target image processing network.

[0129] The initial image processing network can include feature extraction layers, attention layers, normalization layers, and feature processing layers. Understandably, the normalization layer in the initial image processing network can avoid the problem of internal covariance shift during training, making the initial image processing network easier to train and converge faster.

[0130] In some embodiments, after inputting multiple labeled training images from the training image set into the initial image processing network, initial sub-training features corresponding to each feature channel can be extracted from the labeled training images through multiple feature channels of the feature extraction layer. Each initial sub-training feature of each labeled training image is then input into the attention layer of the initial image processing network to obtain the feature weights corresponding to each initial sub-training feature.

[0131] After extracting the initial sub-training features corresponding to each feature channel from each labeled training image, the initial sub-training features corresponding to all labeled training images in the training image set can be input into the normalization layer of the initial image processing network. The normalization layer of the initial image processing network can normalize the input initial sub-training features according to the corresponding feature channels to obtain intermediate sub-training features.

[0132] In the process of normalizing the initial sub-training features of the input, the normalization layer of the initial image processing network can first determine the mean and variance of the feature parameters of all initial sub-training features corresponding to the same feature channel, and then use the mean and variance as training normalization parameters to normalize each feature parameter in each initial sub-training feature corresponding to that feature channel.

[0133] For example, the initial sub-training features corresponding to a certain feature channel of the input normalization layer include x1, x2, ..., x. m Let m represent the number of labeled training images in the training image set. The average value of the feature parameters of all initial sub-training features corresponding to this feature channel can be determined by the following formula:

[0134]

[0135] In formula (1), μ represents the initial sub-training features x1, x2, ..., xn. m The average value, m represents the number of initial sub-training features corresponding to that feature channel, x i Let i represent the i-th initial sub-training feature.

[0136] The variance of the feature parameters of all initial sub-training features corresponding to this feature channel can be determined using the following formula:

[0137]

[0138] In formula (2), σ 2 Represents the initial sub-training features x1, x2, ..., x m The variance, μ, represents the initial sub-training features x1, x2, ..., xn. m The average value, m represents the number of initial sub-training features corresponding to that feature channel, x i Let i represent the i-th initial sub-training feature.

[0139] The initial sub-training features can be normalized using the following formula:

[0140]

[0141] In formula (3), Let x represent the initial sub-training feature after normalization, i.e.i Let x1, x2, ..., xn be the i-th initial sub-training feature, and μ be the initial sub-training feature x1, x2, ..., xn. m The average value, σ 2 Represents the initial sub-training features x1, x2, ..., x m The variance of , where ε is a non-zero constant.

[0142] After normalizing the initial sub-training features to obtain intermediate sub-training features, the intermediate sub-training features corresponding to the initial sub-training features are weighted using the feature weights corresponding to the initial sub-training features to obtain weighted sub-training features. Then, each weighted sub-training feature is input into the feature processing layer of the initial image processing network for feature processing to obtain the prediction results for the labeled training images. Based on the difference between the labeled labels of the labeled training images and the prediction results, the network loss value corresponding to the labeled training images is determined. Based on the network loss values ​​corresponding to each labeled training image, the total loss value of the initial image processing network is determined. Based on the total loss value, the network parameters of the initial image processing network are adjusted to obtain the target image processing network.

[0143] It is understandable that for each input training image set, the normalization layer can determine the training normalization parameters (i.e., the mean and variance of the feature parameters of all initial sub-training features corresponding to each feature channel in all initial sub-training features input to the normalization layer).

[0144] After training, the average value of the training normalization parameters corresponding to each feature channel determined during multiple training processes can be used as the normalization parameter corresponding to each feature channel. Then, the normalization parameters corresponding to each feature channel are stored in a preset database.

[0145] After training the target image processing network, the network can be used to process the input image to be processed, yielding the processing result. As described in the above embodiment, during the image processing of the input image to be processed, initial sub-image features corresponding to each feature channel can be extracted from the image to be processed through multiple feature channels in the feature extraction layer of the target image processing network. The initial sub-image features can also be input into the normalization layer of the target image processing network, where they are normalized to obtain intermediate sub-image features corresponding to the initial sub-image features.

[0146] In some embodiments, the normalization process performed on the initial sub-image features using the normalization layer to obtain intermediate sub-image features corresponding to the initial sub-image features includes:

[0147] Using the normalization layer, based on the feature channels corresponding to each of the initial sub-image features, normalization parameters corresponding to each of the initial sub-image features are determined from a preset database; wherein, the preset data stores the normalization parameters corresponding to each of the feature channels, and the normalization parameters are determined when the target image processing network is trained;

[0148] The feature parameters of the initial sub-image features are adjusted based on the normalization parameters corresponding to the initial sub-image features to obtain the intermediate sub-image features; wherein the feature parameters of the intermediate sub-image features conform to a preset distribution law.

[0149] When normalizing the input initial sub-image features using the normalization layer in the target image processing network, the feature channels corresponding to the initial sub-image features can be determined first. Then, the normalization parameters corresponding to the feature channels in the preset database can be determined as the normalization parameters corresponding to the initial sub-image features. Then, the feature parameters of the initial sub-image features can be adjusted based on the normalization parameters corresponding to the initial sub-image features to obtain intermediate sub-image features.

[0150] The process of adjusting the feature parameters of the initial sub-image features based on the normalization parameters corresponding to the initial sub-image features can be referred to formula (3).

[0151] In this way, by normalizing the feature parameters of the initial sub-image features based on the normalization parameters determined during the training of the target image processing network, it is not necessary to redetermine the normalization parameters, which can improve the data processing efficiency of the target image processing network and thus improve the efficiency of the target image processing network in image processing.

[0152] In some embodiments, the step of inputting the initial sub-image features into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features includes:

[0153] Using the attention layer, the compressed feature parameters corresponding to the initial sub-image features are determined based on the average value of the feature parameters of the initial sub-image features;

[0154] The compressed feature parameters corresponding to the initial sub-image features are input into the fully connected layer in the attention layer to obtain the feature weights corresponding to the initial sub-image features.

[0155] Here, the compressed feature parameters, determined based on the average value of the feature parameters of the initial sub-image features, possess a global receptive field to a certain extent, and can characterize the global distribution of the response of the initial sub-image features on the corresponding feature channels. In other words, the overall feature information of the initial sub-image features can be represented by a single compressed feature parameter.

[0156] In this way, after inputting the compressed feature parameters corresponding to each initial sub-image feature into the fully connected layer in the attention layer, the correlation between the feature channels corresponding to each initial sub-image feature can be determined through each compressed feature parameter, and then the feature weights corresponding to each initial sub-image feature (i.e. the feature weights corresponding to the feature channels corresponding to each initial sub-image feature) can be determined.

[0157] Understandably, the fully connected layers in the attention layer can learn during network training how to determine the feature weights corresponding to each initial sub-image feature based on the compressed feature parameters corresponding to each initial sub-image feature in the input.

[0158] In some embodiments, the network parameters of the target image processing network include: a target scaling parameter and a target offset parameter, wherein the target scaling parameter represents the scaling ratio of the initial feature parameters of the weighted sub-image features; and the target offset parameter represents the offset amount of the initial feature parameters of the weighted sub-image features; the method includes:

[0159] Based on the target ratio adjustment parameters, the initial feature parameters of the weighted sub-image features are scaled to obtain the intermediate feature parameters of the weighted sub-image features.

[0160] Based on the target offset adjustment parameter, the intermediate feature parameters of the weighted sub-image features are offset to obtain the target feature parameters of the weighted sub-image features;

[0161] Based on the target feature parameters of the weighted sub-image features, the adjusted weighted sub-image features are obtained;

[0162] The step of inputting the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed includes:

[0163] The adjusted weighted sub-image features are input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0164] Understandably, normalization layers can address the issue of internal covariance shift during network training. However, the normalized network may not be able to learn the distribution of features effectively. Therefore, after obtaining the weighted sub-image features, scaling and shifting these features can, to some extent, restore the normalized image features.

[0165] Here, the target scaling adjustment parameter and the target offset adjustment parameter can be network parameters that can be learned in the target image processing network. The target scaling adjustment parameter and the target offset adjustment parameter can be obtained by adjusting them multiple times during the training of the target image processing network.

[0166] In some embodiments, the weighted sub-image features can be adjusted using the following formula:

[0167] y i =γz+β (4);

[0168] In formula (4), y i γ can represent the adjusted weighted sub-image features (specifically, the target feature parameters), z can represent the weighted sub-image features (specifically, the initial feature parameters in the weighted sub-image features), γ can represent the target scale adjustment parameters, β can represent the target offset adjustment parameters, and γz can represent the intermediate feature parameters.

[0169] In this way, by adjusting the weighted sub-image features, the normalized image features can be restored to a certain extent, making the adjusted weighted sub-image features more accurate. Then, feature processing is performed based on the adjusted weighted sub-image features, resulting in more accurate image processing results.

[0170] Figure 2 This is a partial structural diagram of a target image processing network according to an exemplary embodiment of the present disclosure. Figure 2 As shown, in this embodiment of the present disclosure, the image to be processed 210 with image channel C1 and pixel matrix size h1*W1 can be input into the feature extraction layer of the target image processing network. The initial image features 220 composed of C2 initial sub-image features are extracted through the C2 feature channels of the feature extraction layer. The feature matrix of each initial sub-image feature has a size of h2*W2.

[0171] The initial image features 220 are input into the attention layer of the target image processing network to obtain the feature weights corresponding to each initial sub-image feature in the initial image features 220. The initial image features 220 are input into the normalization layer of the target image processing network, and the normalization layer is used to normalize each initial sub-image feature in the initial image features 220 to obtain intermediate image features 230, which includes intermediate sub-image features corresponding to each initial sub-image feature.

[0172] The intermediate sub-image features corresponding to the initial sub-image features are weighted based on the feature weights corresponding to the initial sub-image features to obtain weighted sub-image features. Then, weighted image features 240 are obtained based on multiple weighted sub-image features, and weighted image features 240 include multiple weighted sub-image features.

[0173] The weighted sub-image features are adjusted based on the target ratio adjustment parameter and the target offset adjustment parameter to obtain the adjusted weighted sub-image features. Then, the adjusted weighted image features 250 are obtained based on the adjusted weighted sub-image features. The adjusted weighted image features 250 include multiple adjusted weighted sub-image features.

[0174] After obtaining the adjusted weighted image features 250, the adjusted weighted image features 250 can be input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed 210.

[0175] In some embodiments, the target image processing network may be a target ResNet network.

[0176] Figure 3 This is a graph showing the loss values ​​of the original ResNet network, the ResNet network with some normalization layers removed, and the target ResNet network according to an exemplary embodiment of this disclosure. Figure 4 This is a graph showing the accuracy of the original ResNet network, the ResNet network with some normalization layers removed, and the target ResNet network according to an exemplary embodiment of this disclosure.

[0177] exist Figure 3 In the diagram, the horizontal axis represents the number of iterations during network training, and the vertical axis represents the network's loss value; Figure 4 In the diagram, the horizontal axis represents the number of iterations during network training, and the vertical axis represents the network's accuracy.

[0178] The original ResNet network can be any ResNet network without attention layers, while the target ResNet network can include normalization layers and attention layers. According to... Figure 3 and Figure 4 It can be seen that the ResNet network with some normalization layers removed converges faster than the original ResNet network, and its accuracy is also higher. This indicates that excessive normalization layers can lead to network regularization, affecting network convergence. Furthermore, the target ResNet network converges faster than both the original and the ResNet network with some normalization layers removed, and its accuracy is also higher than both. Therefore, attention layers can mitigate the regularization effect of normalization layers. A target ResNet network including both normalization and attention layers can accelerate the convergence speed of the target image processing network while improving its accuracy.

[0179] In some embodiments, the target image processing network can be a target classification network. Table 1 is a comparison table of the target classification network and the preset classification network. The preset classification network can be any neural network used for image classification.

[0180] Table 1 Comparison between the target classification network and the preset classification network

[0181] Neural Networks Loss value accuracy Parameters Preset classification network 0.410865 86.500000 4.04 trillion Target classification network 0.393565 86.900000 4.24 trillion

[0182] As shown in Table 1, the loss value of the target classification network is smaller than that of the preset classification network, the accuracy of the target classification network is higher than that of the preset classification network, and the number of parameters of the target classification network is larger than that of the preset classification network.

[0183] In some embodiments, the target image processing network can be a target detection network. Table 2 is a comparison table of the target detection network and the preset detection network. The preset detection network can be any neural network used for target detection.

[0184] Table 2 Comparison of Target Detection Network and Preset Detection Network

[0185] Neural Networks Loss value Parameters Preset detection network 0.0753 20.3 trillion Object detection network 0.0712 21.8 trillion

[0186] As shown in Table 2, the loss value of the target detection network is smaller than that of the preset detection network, and the number of parameters of the target detection network is larger than that of the preset detection network.

[0187] In other words, in this disclosure, the target image processing network includes both a normalization layer and an attention layer, which can accelerate the convergence speed of the target image processing network while improving its accuracy.

[0188] Figure 5 This is a structural block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure. Figure 5 As shown, the image processing device 500 mainly includes:

[0189] The feature extraction module 510 is configured to input the image to be processed into the feature extraction layer of the target image processing network for feature extraction, and extract initial sub-image features corresponding to each feature channel from the image to be processed through multiple feature channels of the feature extraction layer.

[0190] The weight determination module 520 is configured to input the initial sub-image features into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features;

[0191] The normalization module 530 is configured to input the initial sub-image features into the normalization layer of the target image processing network, and use the normalization layer to normalize the initial sub-image features to obtain intermediate sub-image features corresponding to the initial sub-image features.

[0192] The weighting module 540 is configured to perform weighted processing on the intermediate sub-image features corresponding to the initial sub-image features based on the feature weights corresponding to the initial sub-image features, so as to obtain weighted sub-image features;

[0193] The feature processing module 550 is configured to input the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0194] In some embodiments, the weight determination module is configured as follows:

[0195] The weighted feature parameters are obtained by multiplying the feature weights corresponding to the initial sub-image features with the feature parameters of the intermediate sub-image features corresponding to the initial sub-image features.

[0196] The weighted sub-image features are obtained based on each of the weighted feature parameters.

[0197] In some embodiments, the normalization module 530 is configured to:

[0198] Using the normalization layer, based on the feature channels corresponding to each of the initial sub-image features, normalization parameters corresponding to each of the initial sub-image features are determined from a preset database; wherein, the preset data stores the normalization parameters corresponding to each of the feature channels, and the normalization parameters are determined when the target image processing network is trained;

[0199] The feature parameters of the initial sub-image features are adjusted based on the normalization parameters corresponding to the initial sub-image features to obtain the intermediate sub-image features; wherein the feature parameters of the intermediate sub-image features conform to a preset distribution law.

[0200] In some embodiments, the weight determination module 520 is configured to:

[0201] Using the attention layer, the compressed feature parameters corresponding to the initial sub-image features are determined based on the average value of the feature parameters of the initial sub-image features;

[0202] The compressed feature parameters corresponding to the initial sub-image features are input into the fully connected layer in the attention layer to obtain the feature weights corresponding to the initial sub-image features.

[0203] In some embodiments, the network parameters of the target image processing network include: a target scaling parameter and a target offset parameter, wherein the target scaling parameter represents the scaling ratio of the initial feature parameters of the weighted sub-image features; and the target offset parameter represents the offset amount of the initial feature parameters of the weighted sub-image features; the apparatus includes:

[0204] The scaling module is configured to scale the initial feature parameters of the weighted sub-image features based on the target ratio adjustment parameters to obtain the intermediate feature parameters of the weighted sub-image features.

[0205] The offset module is configured to perform offset processing on the intermediate feature parameters of the weighted sub-image features based on the target offset adjustment parameters, so as to obtain the target feature parameters of the weighted sub-image features;

[0206] The processing module is configured to obtain adjusted weighted sub-image features based on the target feature parameters of the weighted sub-image features;

[0207] The feature processing module 550 is configured to input the adjusted weighted sub-image features into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0208] In some embodiments, the apparatus further includes:

[0209] The prediction module is configured to sequentially input each labeled training image in the training image set into the initial image processing network to obtain the prediction processing result of each labeled training image.

[0210] The first loss value determination module is configured to determine the network loss value corresponding to the labeled training image based on the difference between the label of the labeled training image and the prediction processing result.

[0211] The second loss value determination module is configured to determine the total loss value of the initial image processing network based on the network loss value corresponding to each of the labeled training images;

[0212] The parameter adjustment module is configured to adjust the network parameters of the initial image processing network based on the total loss value to obtain the target image processing network.

[0213] In some embodiments, the feature processing layer includes a classifier;

[0214] The feature processing module 550 is configured to input the features of each weighted sub-image into the classifier for classification processing to obtain the classification result of the image to be processed.

[0215] Figure 6 This is a hardware structure block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Figure 1 For example, electronic device 600 can be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc.

[0216] Reference Figure 6 The electronic device 600 may include one or more of the following components: a processing component 602, a memory 604, a power supply component 606, a multimedia component 608, an audio component 610, an input / output (I / O) interface 612, a sensor component 614, and a communication component 616.

[0217] Processing component 602 typically controls the overall operation of electronic device 600, such as operations associated with display, telephone calls, data communication, camera operation, and recording operations. Processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the methods described above. Furthermore, processing component 602 may include one or more modules to facilitate interaction between processing component 602 and other components. For example, processing component 602 may include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602.

[0218] Memory 604 is configured to store various types of data to support the operation of electronic device 600. Examples of this data include instructions for any application or method operating on electronic device 600, contact data, phonebook data, messages, pictures, videos, etc. Memory 604 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0219] Power supply component 606 provides power to various components of electronic device 600. Power supply component 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 600.

[0220] Multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of the touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 608 includes a front-facing camera and / or a rear-facing camera. When the electronic device 600 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0221] Audio component 610 is configured to output and / or input audio signals. For example, audio component 610 includes a microphone (MIC) configured to receive external audio signals when electronic device 600 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 604 or transmitted via communication component 616. In some embodiments, audio component 610 also includes a speaker for outputting audio signals.

[0222] I / O interface 612 provides an interface between processing component 602 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0223] Sensor assembly 614 includes one or more sensors for providing state assessments of various aspects of electronic device 600. For example, sensor assembly 614 can detect the on / off state of electronic device 600, the relative positioning of components such as the display and keypad of electronic device 600, changes in position of electronic device 600 or a component of electronic device 600, the presence or absence of user contact with electronic device 600, orientation or acceleration / deceleration of electronic device 600, and temperature changes of electronic device 600. Sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 614 may also include an accelerometer, gyroscope, magnetometer, pressure sensor, or temperature sensor.

[0224] Communication component 616 is configured to facilitate wired or wireless communication between electronic device 600 and other devices. Electronic device 600 can access wireless networks based on communication standards, such as Wi-Fi, 4G, or 5G, or combinations thereof. In one exemplary embodiment, communication component 616 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 616 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0225] In an exemplary embodiment, the electronic device 600 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the image processing method described above.

[0226] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 604 including instructions, which can be executed by a processor 620 of an electronic device 600 to complete the image processing method described above. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0227] A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform an image processing method, comprising:

[0228] The image to be processed is input into the feature extraction layer of the target image processing network for feature extraction. Through multiple feature channels of the feature extraction layer, initial sub-image features corresponding to each feature channel are extracted from the image to be processed.

[0229] The initial sub-image features are input into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features;

[0230] The initial sub-image features are input into the normalization layer of the target image processing network, and the initial sub-image features are normalized using the normalization layer to obtain intermediate sub-image features corresponding to the initial sub-image features.

[0231] Based on the feature weights corresponding to the initial sub-image features, the intermediate sub-image features corresponding to the initial sub-image features are weighted to obtain weighted sub-image features;

[0232] The weighted sub-image features are input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0233] Figure 7 This is a hardware structure block diagram of an electronic device according to an exemplary embodiment. Figure 2 For example, electronic device 700 can be provided as a server. (See reference...) Figure 7 The electronic device 700 includes a processing component 722, which further includes one or more processors, and memory resources represented by a memory 732 for storing instructions, such as application programs, that can be executed by the processing component 722. The application programs stored in the memory 732 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component 722 is configured to execute instructions to perform an image processing method, including:

[0234] The image to be processed is input into the feature extraction layer of the target image processing network for feature extraction. Through multiple feature channels of the feature extraction layer, initial sub-image features corresponding to each feature channel are extracted from the image to be processed.

[0235] The initial sub-image features are input into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features;

[0236] The initial sub-image features are input into the normalization layer of the target image processing network, and the initial sub-image features are normalized using the normalization layer to obtain intermediate sub-image features corresponding to the initial sub-image features.

[0237] Based on the feature weights corresponding to the initial sub-image features, the intermediate sub-image features corresponding to the initial sub-image features are weighted to obtain weighted sub-image features;

[0238] The weighted sub-image features are input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

[0239] Electronic device 700 may also include a power supply component 726 configured to perform power management of electronic device 700, a wired or wireless network interface 750 configured to connect electronic device 700 to a network, and an input / output (I / O) interface 758. Electronic device 700 may operate on an operating system stored in memory 732, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, or similar.

[0240] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.

[0241] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.

Claims

1. An image processing method, characterized by, The method includes: The image to be processed is input into the feature extraction layer of the target image processing network for feature extraction. Through multiple feature channels of the feature extraction layer, initial sub-image features corresponding to each feature channel are extracted from the image to be processed. The initial sub-image features are input into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features; the step of inputting the initial sub-image features into the attention layer to obtain the feature weights corresponding to the initial sub-image features includes: inputting the initial sub-image features into the attention layer, and the attention layer determining the feature weights corresponding to each initial sub-image feature based on the feature parameters in each of the input initial sub-image features and the correlation between each of the initial sub-image features; The initial sub-image features are input into the normalization layer of the target image processing network. The normalization layer normalizes the initial sub-image features to obtain intermediate sub-image features corresponding to the initial sub-image features. The normalization process involves the normalization layer normalizing the input initial sub-image features, adjusting each feature parameter in the initial sub-image features to a preset value range, or adjusting each feature parameter in the initial sub-image features to conform to a preset distribution pattern, thereby obtaining the intermediate sub-image features. The intermediate sub-image features corresponding to the initial sub-image features are then weighted based on the feature weights corresponding to the initial sub-image features to obtain weighted sub-image features. The weighted sub-image features are input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

2. The method of claim 1, wherein, The step of weighting the intermediate sub-image features corresponding to the initial sub-image features based on the feature weights corresponding to the initial sub-image features to obtain weighted sub-image features includes: The weighted feature parameters are obtained by multiplying the feature weights corresponding to the initial sub-image features with the feature parameters of the intermediate sub-image features corresponding to the initial sub-image features. The weighted sub-image features are obtained based on each of the weighted feature parameters.

3. The method of claim 1, wherein, The step of normalizing the initial sub-image features using the normalization layer to obtain intermediate sub-image features corresponding to the initial sub-image features includes: Using the normalization layer, based on the feature channels corresponding to each of the initial sub-image features, normalization parameters corresponding to each of the initial sub-image features are determined from a preset database; wherein, the preset data stores the normalization parameters corresponding to each of the feature channels, and the normalization parameters are determined when the target image processing network is trained; The feature parameters of the initial sub-image features are adjusted based on the normalization parameters corresponding to the initial sub-image features to obtain the intermediate sub-image features; wherein the feature parameters of the intermediate sub-image features conform to a preset distribution law.

4. The method of claim 1, wherein, The step of inputting the initial sub-image features into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features includes: Using the attention layer, the compressed feature parameters corresponding to the initial sub-image features are determined based on the average value of the feature parameters of the initial sub-image features; The compressed feature parameters corresponding to the initial sub-image features are input into the fully connected layer in the attention layer to obtain the feature weights corresponding to the initial sub-image features.

5. The method according to claim 1, characterized in that, The network parameters of the target image processing network include: a target scaling parameter and a target offset adjustment parameter. The target scaling parameter represents the scaling ratio of the initial feature parameters of the weighted sub-image features; the target offset adjustment parameter represents the offset amount of the initial feature parameters of the weighted sub-image features. The method includes: Based on the target ratio adjustment parameters, the initial feature parameters of the weighted sub-image features are scaled to obtain the intermediate feature parameters of the weighted sub-image features. Based on the target offset adjustment parameter, the intermediate feature parameters of the weighted sub-image features are offset to obtain the target feature parameters of the weighted sub-image features; Based on the target feature parameters of the weighted sub-image features, the adjusted weighted sub-image features are obtained; The step of inputting the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed includes: The adjusted weighted sub-image features are input into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

6. The method according to claim 1, characterized in that, The method further includes: Each labeled training image in the training image set is sequentially input into the initial image processing network to obtain the prediction processing result of each labeled training image. Based on the difference between the annotation labels of the labeled training images and the prediction processing results, the network loss value corresponding to the labeled training images is determined. Based on the network loss values ​​corresponding to each of the labeled training images, the total loss value of the initial image processing network is determined; The network parameters of the initial image processing network are adjusted based on the total loss value to obtain the target image processing network.

7. The method according to claim 1, characterized in that, The feature processing layer includes a classifier; the step of inputting the features of each weighted sub-image into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed includes: The weighted sub-image features are input into the classifier for classification processing to obtain the classification result of the image to be processed.

8. An image processing apparatus, characterized in that, The device includes: The feature extraction module is configured to input the image to be processed into the feature extraction layer of the target image processing network for feature extraction, and extract initial sub-image features corresponding to each feature channel from the image to be processed through multiple feature channels of the feature extraction layer. The weight determination module is configured to input the initial sub-image features into the attention layer of the target image processing network to obtain the feature weights corresponding to the initial sub-image features; specifically, the initial sub-image features are input into the attention layer, and the attention layer determines the feature weights corresponding to each initial sub-image feature based on the feature parameters in each of the input initial sub-image features and the correlation between each initial sub-image feature. The normalization module is configured to input the initial sub-image features into the normalization layer of the target image processing network, and use the normalization layer to normalize the initial sub-image features to obtain intermediate sub-image features corresponding to the initial sub-image features; specifically, the normalization layer normalizes the input initial sub-image features, adjusts each feature parameter in the initial sub-image features to a preset value range, or adjusts each feature parameter in the initial sub-image features so that each feature parameter conforms to a preset distribution law, so as to obtain the intermediate sub-image features; The weighting module is configured to perform weighted processing on the intermediate sub-image features corresponding to the initial sub-image features based on the feature weights corresponding to the initial sub-image features, to obtain weighted sub-image features; The feature processing module is configured to input the features of each of the weighted sub-images into the feature processing layer of the target image processing network for feature processing to obtain the image processing result of the image to be processed.

9. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to implement the steps of any one of the image processing methods of claims 1 to 7 during execution.

10. A non-transitory computer-readable storage medium, characterized in that, When the instructions in the storage medium are executed by the processor of the electronic device, the electronic device is able to perform the steps of any one of the image processing methods of claims 1 to 7.