A method, apparatus, system, and storage medium for image processing.

By calculating the conflict coefficient and information richness of target images, high-value and information-rich images are selected for training, solving the problem of low training efficiency caused by low-value and information-poor images in a large number of images, and improving the efficiency of model training.

CN116152775BActive Publication Date: 2026-06-30NEUSOFT REACH AUTOMOBILE TECH (SHENYANG) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NEUSOFT REACH AUTOMOBILE TECH (SHENYANG) CO LTD
Filing Date
2022-12-23
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In the training of target object recognition models, existing technologies often encounter a large number of low-value images with low information richness in the massive amount of images, resulting in poor training performance and low efficiency.

Method used

By obtaining the conflict coefficient and information richness of the target images, their value is calculated, and images below a preset threshold are deleted, thus selecting high-value and information-rich images for training.

Benefits of technology

While ensuring the accuracy of model training, the training efficiency of the target object recognition model has been improved, and the number of low-value and information-poor images has been reduced.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116152775B_ABST
    Figure CN116152775B_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, system, and storage medium for image processing, applicable to the automotive field. The method includes: obtaining a target conflict coefficient and a target image information richness corresponding to a target image. The target conflict coefficient indicates the difficulty of recognizing a target object in the target image. The image information richness indicates the degree of information richness contained in the target image. Based on the target conflict coefficient and the target image information richness, a target value score is obtained for the target image. Here, the target value score is related to both the difficulty of recognizing the target object and the degree of image information richness. When the target value score is low, the target image is deleted. Thus, by using the target value score, images with low target object recognition difficulty and low image information richness are filtered out, reducing the number of massive images and improving model training efficiency while ensuring the training accuracy of the target object recognition model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of vehicle recognition, and in particular to a method, apparatus, system and storage medium for image processing. Background Technology

[0002] With the development of automotive technology, object recognition models have become an important technical means because they can identify objects, such as vehicles, in complex scenes. They are used for tasks such as detecting traffic congestion and optimizing traffic flow.

[0003] Current technologies often involve pre-collecting massive amounts of image data before training object recognition models. However, these massive images may contain images with low information richness and / or low-value images. For example, an image may only contain information about a vehicle type, indicating low information richness; or an image may not contain any vehicle information at all, making it a low-value image. Inputting these low-information-rich and low-value images into the object recognition model has virtually no effect on the training results and leads to low training efficiency.

[0004] Low-value images are those whose vehicle-related information values ​​are below a preset threshold. High-value images are those whose vehicle-related information values ​​are at least equal to or higher than the preset threshold. Summary of the Invention

[0005] In view of this, this application provides a method, apparatus, system and storage medium for image processing, which aims to improve the training efficiency of the target object recognition model by selecting images with high image richness and high value from a large number of pre-collected images.

[0006] Firstly, this application provides a method for image processing, the method comprising:

[0007] In response to the received target image, obtain the target conflict coefficient, which indicates the difficulty of identifying the target object, and the target image information richness, which indicates the richness of image information.

[0008] The target value of the target image is determined based on the target conflict coefficient and the target image information richness.

[0009] When the target value is lower than a preset value threshold, the target image is deleted.

[0010] Optionally, the target value of the target image is determined based on the target conflict coefficient and the target image information richness, including:

[0011] Based on the target conflict coefficient and the target image information richness, and according to a preset mapping relationship, the target value is determined.

[0012] The mapping relationship is that the value is positively correlated with the target conflict coefficient and positively correlated with the information richness of the target image.

[0013] Optionally, the mapping relationship is a linear mapping relationship;

[0014] The linear mapping relationship is that the value is directly proportional to the target conflict coefficient and also directly proportional to the information richness of the target image.

[0015] Optionally, obtaining the target conflict coefficient, which indicates the difficulty of identifying the target object, includes:

[0016] Determine multiple bounding boxes and predicted values ​​within the bounding boxes corresponding to the target image;

[0017] Calculate the boundary box deviation of the multiple boundary boxes based on the multiple boundary boxes;

[0018] Calculate the in-boundary prediction bias of the multiple bounding boxes based on the in-boundary prediction values ​​of the multiple bounding boxes;

[0019] The collision coefficient of the target image is determined based on the bounding box deviation and the in-bounding box prediction deviation of the plurality of bounding boxes.

[0020] Optionally, the information richness of the target image is obtained in the following way:

[0021] Determine multiple feature maps of the target image and a weight coefficient for each feature map; the weight coefficient is used to indicate the degree of influence of the feature map on the recognition result of the vehicle recognition model;

[0022] The image information richness of the target image is determined based on the weight coefficient of each feature map among the multiple feature maps.

[0023] Optionally, the method further includes:

[0024] When the target value is not lower than a preset value threshold, the target image is added to the training sample library; the training sample library is used to store image data to be trained in order to train the target object recognition model.

[0025] Secondly, this application provides an image processing apparatus, the apparatus comprising:

[0026] The response unit is used to respond to the received target image by acquiring the target conflict coefficient, which indicates the difficulty of identifying the target object, and the target image information richness, which indicates the richness of image information.

[0027] The determining unit is used to determine the target value of the target image based on the target conflict coefficient and the target image information richness.

[0028] The filtering unit is used to delete the target image when the target value is lower than a preset value threshold.

[0029] Optionally, the determining unit is specifically used for:

[0030] Based on the target conflict coefficient and the target image information richness, and according to a preset mapping relationship, the target value is determined.

[0031] The mapping relationship is that the value is positively correlated with the target conflict coefficient and positively correlated with the information richness of the target image.

[0032] Thirdly, this application provides a vehicle system including an image processing apparatus as described in the second aspect.

[0033] Fourthly, this application provides a computer storage medium storing code, wherein when the code is executed, a device executing the code implements the method described in any of the first aspects above.

[0034] This application discloses a method, apparatus, system, and storage medium for image processing. When executing the method: a target conflict coefficient and a target image information richness corresponding to the target image are obtained. The target conflict coefficient indicates the difficulty of identifying the target object. The target image information richness represents the information richness of the target image. Based on the target conflict coefficient and target image information richness, a target value of the target image is obtained. Here, the target value is related to both the difficulty of identifying the target object and the information richness of the image. When the target value is low, the target image is deleted. Thus, by using the target value, images with low relevance to the target object and low information richness are filtered out, reducing the number of images and improving the training efficiency of the model while ensuring the training accuracy of the target object recognition model. Attached Figure Description

[0035] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.

[0036] Figure 1 A flowchart of an image processing method provided in this application embodiment;

[0037] Figure 2A flowchart illustrating a method for obtaining bounding box deviations provided in this application embodiment;

[0038] Figure 3 A schematic diagram illustrating the acquisition of a first bounding box and a second bounding box, provided as an embodiment of this application;

[0039] Figure 4 A schematic diagram of bounding box calculation provided for an embodiment of this application;

[0040] Figure 5 A flowchart illustrating a method for obtaining weight coefficients of a feature map, provided in an embodiment of this application;

[0041] Figure 6 This is a schematic diagram of an image processing apparatus provided in an embodiment of this application. Detailed Implementation

[0042] The term "comprising" and its variations as used herein are open-ended inclusions, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the description below.

[0043] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.

[0044] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0045] As mentioned earlier, current object recognition model training involves pre-collecting massive amounts of image data related to the training samples. This massive image data may include low-value images or images with limited information richness. Inputting these images into the object recognition model not only has little effect on the training effect but also leads to low training efficiency.

[0046] Based on this, this application provides an image processing method aimed at filtering out high-value images with rich information content, thereby improving the training efficiency of the object recognition model while ensuring its training accuracy.

[0047] To better illustrate the image processing method provided in this application, the following description, in conjunction with the accompanying drawings, provides a detailed explanation of the image processing method provided in the embodiments of this application.

[0048] See Figure 1 This is a flowchart illustrating an image processing method provided in an embodiment of this application. This method can be applied to a vehicle detection system, and its execution entity is the detection server within the detection system. The method includes the following steps:

[0049] S101: Detects the server to obtain a massive number of images.

[0050] The detection system's server receives massive amounts of image data. These images may include both high-value and low-value images.

[0051] High-value images are those whose target-object related information values ​​are higher than a preset relevance threshold. For example, the target object could be a vehicle. Low-value images are those whose target-object related information values ​​are lower than a preset relevance threshold. For example, assuming the target object is a vehicle, low-value images indicate images with few or no vehicles, while high-value images indicate images with many vehicles.

[0052] A large number of images may include images with low information richness and images with high information richness. For example, an image that only contains the vehicle model information has low information richness. If the image includes multiple vehicle information such as vehicle model, vehicle color, and vehicle status (e.g., parked or moving), then the image has high information richness.

[0053] During the training of an object recognition model, low-value images and / or images with low information richness have virtually no impact on the training results and lead to low training efficiency. Therefore, filtering out and deleting low-value images and / or images with low information richness can improve the training efficiency of the model while ensuring its training accuracy.

[0054] S102: Obtain the target conflict coefficient and the target image information richness corresponding to the target image.

[0055] The conflict coefficient is used to represent the difficulty of object detection in a target image. The conflict coefficient is positively correlated with the difficulty; the greater the difficulty, the larger the conflict coefficient. High-value images contain more relevant information, therefore, object detection is more difficult, and the conflict coefficient is also larger.

[0056] The embodiments of this application can determine the collision coefficient of a target image based on the boundary box deviations of multiple bounding boxes.

[0057] One possible implementation uses the bounding box deviation as the conflict coefficient of the target image. This is because high-value images have more labeled bounding boxes and larger bounding box deviations, while low-value images have almost no labeled bounding boxes and smaller bounding box deviations. Therefore, the bounding box deviation can be directly used as the conflict coefficient of the target image.

[0058] In another possible implementation, the conflict coefficient of the target image is determined using bounding box deviation and in-bounding box prediction deviation. Specifically, the bounding box deviation and in-bounding box prediction deviation can be substituted into a preset conflict coefficient calculation formula to calculate the conflict coefficient of the target image. The preset conflict coefficient calculation formula is positively correlated with both the bounding box deviation and the in-bounding box prediction deviation.

[0059] Assuming the prediction bias within the bounding box is P, the bounding box bias is W, and the collision coefficient of the target image is C, the preset collision coefficient formula can be:

[0060] C = a1P + b1W (1)

[0061] Wherein, a1 and b1 are parameters that can be adjusted by those skilled in the art as needed.

[0062] Furthermore, the preset conflict coefficient formula can be directly proportional to the square of the prediction deviation within the bounding box, and also directly proportional to the square of the bounding box deviation. Other positive correlations are also possible, and those skilled in the art can adjust them as needed.

[0063] The prediction bias within the bounding box is obtained in the following way:

[0064] The softmax layer output of multiple bounding boxes is divided into the softmax layer output of multiple targets;

[0065] The softmax layer outputs results for the same target are used to calculate the prediction bias for each target among multiple targets based on cross-entropy.

[0066] The prediction biases of multiple targets are averaged to obtain the in-boundary prediction biases of multiple bounding boxes.

[0067] Compared to calculating the conflict coefficient by only considering the bounding box deviation, the high-value images selected are more accurate.

[0068] For calculations of bounding box deviations, please refer to [link / reference]. Figure 2 As shown, it will not be elaborated further here.

[0069] In this embodiment of the application, the image information richness of the target image can be determined based on the weight coefficients of multiple feature maps corresponding to the target image.

[0070] Image information richness refers to the degree of information richness in an image. The richer the image information, the higher the image information richness; conversely, the lower the image information richness, the lower the image information richness.

[0071] In one possible implementation, a weight distribution of the weight coefficients of multiple feature maps corresponding to the target image can be plotted based on the weight coefficients of the feature maps. The weight distribution represents the distribution of the weight coefficients of each feature map. The image information richness is determined based on the weight distribution. For example, a mapping relationship between image information richness and the uniformity of the weight distribution can be preset; the uniformity of the weight distribution represents the dispersion of the weight coefficients. When the uniformity of the weight distribution is high, i.e., the dispersion of the weight coefficients is small, the image information richness is high. The image information richness is determined based on the uniformity of the weight distribution.

[0072] In one possible implementation, the information entropy of the target image can be calculated based on the weight coefficients of the feature map, and the information richness of the image can be determined based on the information entropy.

[0073] Information entropy is used to evaluate the amount of information in a piece of information, and its specific definition is as follows:

[0074] Let X be a discrete random variable with m possible outcomes, each with a probability p. i If i = 1, 2, 3, ..., m, then its information entropy H(X) is:

[0075]

[0076] The higher the information entropy, the more information it carries, meaning the richer the information content of the image.

[0077] For information on the weighting coefficients of the feature maps, please refer to the following text. Figure 5 This will not be elaborated upon here.

[0078] The detection server can first obtain the target conflict coefficient corresponding to the target image, and then obtain the target image information richness. Alternatively, the detection server can first obtain the target image information richness corresponding to the target image, and then obtain the target conflict coefficient. Or, the detection server can simultaneously obtain both the target conflict coefficient and the target image information richness. Those skilled in the art can adjust these settings as needed.

[0079] S103: Determine the target value of the target image based on the target conflict coefficient and the information richness of the target image.

[0080] Target value score represents the information richness of a target image and the difficulty of identifying the target object. Assuming the target object is a vehicle, in practical applications, some target images are difficult to identify as vehicles but have low information richness, possibly containing only one type of information. Inputting such an image into a target object recognition model has little effect on model training. Conversely, some target images have high information richness but contain few or virtually no vehicles, also contributing little to model training.

[0081] Therefore, it is necessary to comprehensively consider both the target conflict coefficient and the target image information richness, i.e., the target value. Based on the target value, high-value images that meet the requirements and images with high image information richness are selected.

[0082] In one possible implementation, the target value is determined based on the target conflict coefficient and the target image information richness using a pre-defined mapping relationship. The mapping relationship is such that the value is positively correlated with both the conflict coefficient and the image information richness.

[0083] Example illustration: Assuming the value is V, the target conflict coefficient is C, and the target image information richness is R, the mapping relationship can be:

[0084] V = a²C 2 +b2R 2 (3)

[0085] Wherein, a2 and b2 are random parameters, which can be adjusted by those skilled in the art as needed.

[0086] In one possible implementation, the mapping relationship can be a linear mapping relationship, where the value is directly proportional to the target conflict coefficient and directly proportional to the information content of the target image.

[0087] Example explanation: Assume the value is V, the target conflict coefficient is C, the target image information richness is R, and the linear mapping relationship is:

[0088] V=fC+nR+p (4)

[0089] Where f is the proportionality coefficient with respect to the target, n is the proportionality coefficient with respect to the image information content, and p is a constant term. Those skilled in the art can adjust f, n, and p as needed.

[0090] S104: Determine whether the target value is lower than the preset value threshold. If yes, proceed to S105; otherwise, proceed to S106.

[0091] The detection server determines whether the target value score is greater than a preset value score threshold. Assuming the preset value score threshold is 9, if the obtained target value score is 8.198, then the target value score is less than the preset value score threshold. Similarly, if the obtained image information richness score is 9.235, then the target value score is also less than the preset value score threshold.

[0092] S105: Delete target image

[0093] When the information richness of an image is not greater than a preset threshold, the image information richness of the target image is low, and the target image is deleted.

[0094] S106: Use the target image as the image for training the target object recognition model.

[0095] When the image information richness exceeds a preset threshold, the target image is considered to have high information richness. This image is then added to the training sample set to train the target object recognition model.

[0096] Then, the detection server repeatedly executes steps S101 to S106 until all images in the pre-collected massive image set have been processed, obtaining the training sample set. At this point, the training sample set only contains images with relatively rich image information.

[0097] This application provides an image processing method that first obtains the target conflict coefficient and target image information richness corresponding to the target image. The target conflict coefficient indicates the difficulty of identifying the target object. The target image information richness represents the information richness of the target image. Based on the target conflict coefficient and target image information richness, the target value of the target image is obtained. Here, the target value is related to both the difficulty of identifying the target object and the information richness of the image. When the target value is low, the target image is deleted. In this way, by using the target value, images with low relevance to the target object and low image information richness are filtered out, reducing the number of massive images and improving the training efficiency of the model while ensuring the training accuracy of the target object recognition model.

[0098] See Figure 2 This application provides a method for obtaining bounding box deviations, which includes the following steps:

[0099] S201: The detection server determines multiple bounding boxes corresponding to the target image.

[0100] The detection server receives a massive number of images, including the target image. In this embodiment, the target image can be any image from the massive number of images.

[0101] To select high-value images and filter out low-value images, one possible implementation is to process the target image to obtain multiple bounding boxes corresponding to the target image.

[0102] In the embodiments of this application, multiple bounding boxes can be obtained in various ways, such as by using different recognition models to obtain multiple bounding boxes corresponding to the target object in the target image and the target prediction value within the bounding boxes.

[0103] In one possible implementation, the multiple bounding boxes include multiple first bounding boxes and multiple second bounding boxes. The bounding boxes represent the positions of objects in the target image and can be rectangular boxes or other shapes. No limitations are placed on the bounding boxes here. The following explanation will only use rectangular boxes as an example.

[0104] In this embodiment of the application, the first bounding box and the second bounding box are bounding boxes corresponding to the target image obtained by different methods.

[0105] In one possible implementation, a first bounding box can be obtained by inputting the target image into a first preset model, and a second bounding box can be obtained by inputting the target image into a second preset model. The first and second preset models have different implementation principles, but both input the target image and output two preset models of bounding boxes.

[0106] In one possible implementation, the first preset model is a first-order object recognition model, such as the YOLO series of algorithms. The second preset model is obtained from a second-order object recognition model, such as the Faster R-CNN model.

[0107] In one possible implementation, the first preset model is obtained by a second-order target recognition model, and the second preset model is obtained by a first-order target recognition model.

[0108] It is worth noting that the first and second bounding boxes can also be obtained in other ways in the embodiments of this application, and those skilled in the art can adjust them as needed.

[0109] To improve the efficiency and accuracy of filtering massive amounts of image data, the Faster R-CNN model can be selected for second-order object recognition, while the YOLOV7 model can be selected for first-order object recognition.

[0110] YoloV7 uses an architecture based on E-ELAN (Extended Efficient Layer Aggregation Network) and cascaded model scaling, employing a trainable BOF method, including planned reparameterization of convolutions. Cascaded model scaling considers factors such as resolution (i.e., the size of the input image), width (i.e., the number of channels), and depth (i.e., the number of network layers).

[0111] Train multiple models using different training data but with the same settings. Then average their weights to obtain the final model. Take the average of the model weights at different epochs to obtain the planned reparameterized convolution.

[0112] Therefore, compared to other models, such as YOLOv4 and YOLOv7, it has higher processing speed and accuracy.

[0113] The specific implementation process for obtaining the bounding box in YOLOv7 is as follows:

[0114] S2011: Set the number of categories.

[0115] For large images, such as 640*640 or 1280*1280, first preset the number of grids to divide the image into, and the number of bounding boxes predicted for each grid. For example, divide the image into 7*7 grids, and predict one bounding box for each grid.

[0116] S2012: Scale the image to 448*448, input it into a CNN convolutional network for processing, and output bounding boxes.

[0117] A CNN (Convolutional Neural Network) consists of two fully connected processes: convolution and pooling. It processes bounding boxes to obtain 7x7x2 values ​​and then outputs these bounding boxes.

[0118] S2013: Set threshold to filter borders.

[0119] Calculate the score for all vehicles in all bounding boxes, including bounding box confidence and bounding box grid position. Obtain the highest score for each bounding box and record its index. Filter by a threshold and create a mask. Output the threshold-filtered score, bounding box, and rating using the mask value.

[0120] S2014: Perform nonmaximum suppression.

[0121] Select the bounding box with the highest score and add it to the output list, then remove it from the bounding box list. Calculate the Interchange of Union (IOU) between the bounding box with the highest score and other candidate boxes, and remove bounding boxes with an IOU greater than a set threshold IOU. Repeat this process until the bounding box list is empty.

[0122] S2015: Display the bounding box of the final output list in the target image.

[0123] The Faster R-CNN network framework can be divided into two parts according to its model function: feature extraction and decision-making. The feature extraction part generates high-quality region proposal candidate boxes, which are then judged and refined using a classification function and a bounding box regression function to initially locate the target.

[0124] The target image is first subjected to feature extraction. Proposed feature blocks of the same size are extracted and input into the decision part. Then, the classification function is used to calculate the category of the proposed feature blocks, and the bounding box regression function is used to accurately detect the position of the bounding box.

[0125] Example illustration: See Figure 3 This diagram illustrates an embodiment of the present application for obtaining multiple first bounding boxes and multiple second bounding boxes. The white bounding boxes represent the second bounding boxes, obtained using Faster R-CNN, while the gray bounding boxes represent the first bounding boxes, obtained using YOLOv7.

[0126] In this embodiment, the color, thickness, and text style of the frame are independent of the model and are all settings related to image processing programs. Those skilled in the art can adjust them as needed.

[0127] It is worth noting that the first and second bounding boxes are only illustrative; a third, fourth, and other bounding boxes can also be obtained from the target image. This application does not limit the number of types of bounding boxes that can be obtained.

[0128] In addition, while acquiring multiple bounding boxes, this embodiment of the application can also simultaneously acquire the predicted values ​​within the bounding boxes of multiple bounding boxes, so as to use the predicted values ​​within the bounding boxes to determine the predicted value deviation within the bounding boxes, and combine the bounding box deviation to obtain the conflict coefficient of the target image.

[0129] In one possible implementation, the predicted value within the bounding box is the output of the softmax layer. That is, the output layer is combined with the softmax activation function to output a probability value between 0 and 1.

[0130] Furthermore, the predicted bias within the bounding box can also be the sigmoid output, i.e., the probability value output by the output layer and the sigmoid activation function. Here, the predicted bias within the bounding box is not limited to the softmax layer output.

[0131] S202: Calculate the bounding box deviations of multiple bounding boxes.

[0132] The detection server calculates the bounding box deviations of the multiple bounding boxes based on the acquired bounding boxes.

[0133] In one possible implementation, the multiple bounding boxes include multiple first bounding boxes obtained from a first preset model and multiple second bounding boxes obtained from a second preset model. The multiple first bounding boxes and multiple second bounding boxes are input into a preset bounding box deviation calculation formula, and the bounding box deviations of the multiple bounding boxes can be calculated.

[0134] The formula for calculating bounding box deviation is positively correlated with the degree of difference among multiple bounding boxes. This means that for a target image, different methods of labeling the target object will result in different bounding boxes. For low-value images, such as those without target objects, there are no bounding boxes, and the bounding boxes obtained using different methods are indistinguishable, meaning the degree of difference is low and the bounding box deviation is almost zero. For high-value images, such as those containing multiple image types, the obtained bounding boxes show greater differences, meaning the degree of difference is high and the bounding box deviation is large. Therefore, the difference in bounding boxes can be used to determine whether a target image is high-value.

[0135] In one possible implementation, the bounding box deviation is calculated using the formula GIOU.

[0136] Assuming the first bounding box is A and the second bounding box is B, as follows: Figure 4 The diagram shown illustrates a bounding box calculation method according to an embodiment of this application. First, the smallest bounding box C, including A and B, is obtained. Then:

[0137]

[0138]

[0139] Where A∩B represents the intersection area of ​​the first and second bounding boxes. C\(A∪B) is the area of ​​C minus the area of ​​A∪B. A∪B is the union area of ​​the first and second bounding boxes.

[0140] The boundary box deviation of multiple boundary boxes can be obtained by calculating the average of the GIOU values ​​of multiple boundary boxes.

[0141] In one possible implementation, DIoU can be chosen to represent the conflict coefficient. Compared to GIoU, it has better convergence speed and accuracy, and the obtained conflict coefficients are more precise.

[0142]

[0143] Where ρ represents b and b gt The Euclidean distance between them. b represents the center point of the first predicted bounding box. gt This represents the center point of the second prediction box. ρ 2 The squared distance *c* between the two center points represents the length of the diagonal of the smallest bounding rectangle of the two rectangles. If the two boxes perfectly overlap, IoU = 1, DIoU = 1 - 0 = 1. If the two boxes are far apart, DIoU = 0 - 1 = -1. Therefore, the value range of DIoU is [-1, 1].

[0144] The boundary box deviation of multiple boundary boxes can be obtained by calculating the average of the DIoU values ​​of multiple boundary boxes.

[0145] In the embodiments of this application, other formulas for calculating the conflict coefficient may also be used, and those skilled in the art can adjust them as needed.

[0146] See Figure 5 This is a flowchart illustrating a method for obtaining weight coefficients of a feature map according to an embodiment of this application. The method includes:

[0147] S501: Determine multiple feature maps corresponding to the target image.

[0148] The detection server receives a massive number of images, including the target image. In this embodiment, the target image can be any image from the massive number of images.

[0149] In this embodiment, it is first necessary to determine multiple feature maps corresponding to the target image. These feature maps are images that correspond to the features of the target image.

[0150] In one possible implementation, the target image is input into a convolutional neural network, and feature maps corresponding to the target image are determined through convolution and pooling. The number of feature maps corresponds to the number of convolution kernels.

[0151] In one possible implementation, the detection server inputs the target image into the backbone network of the vehicle recognition model used for feature extraction, obtaining multiple feature maps corresponding to the target image. For example, this could be the ResNet residual network in a Faster R-CNN model, which determines the multiple feature maps corresponding to the target image. The number of feature maps corresponding to the target image is determined by the number of neurons in the last layer of the backbone network. For example, if the last layer has 1280 neurons, then the number of feature maps is 1280.

[0152] In one possible implementation, the target image can be input into the backbone network of a VGG series version, such as the VGG16 backbone network, to extract multiple feature maps corresponding to the target image.

[0153] It is worth noting that this application does not limit the specific acquisition of multiple feature maps corresponding to the target image, and those skilled in the art can adjust the acquisition method as needed.

[0154] S502: Determine the weight coefficients for each feature map among multiple feature maps.

[0155] The detection server needs a weight coefficient for each of the multiple feature maps acquired. This weight coefficient represents the influence of the feature map on the vehicle recognition model's recognition result. The greater the influence of the feature map on the vehicle recognition model, the larger its weight coefficient.

[0156] In one possible implementation, the weight coefficients of each feature map are obtained in a non-perturbative manner. Specifically, the weight coefficients of the feature maps can be determined through Singular Value Decomposition (SVD). SVD will be discussed in detail below.

[0157] Example illustration: Assume the feature map has a dimension A of m×n. That is, A contains m data points and n features. Then the SVD value of A is:

[0158]

[0159] Where λ1, λ2, ..., λ k Let p be the eigenvalues ​​of matrix A. p = (p1, p2, ..., p...) k ) is called a left singular matrix with dimensions m×m, q=(q1 T ,q2 T , ...q k T ) T It is called a right singular matrix with dimensions n×n, where k≤m and k≤n.

[0160] In this embodiment, the first row of the right singular value matrix can be approximated as the weights of the feature activation map. Compared to other methods, obtaining the weight coefficients of each feature map has higher processing efficiency.

[0161] In one possible implementation, the weight coefficients of each feature map are obtained through perturbation. Specifically, a weight acquisition function is preset, and the weight coefficients of each feature map in the plurality of feature maps are determined based on the multiple feature maps and the weight acquisition function. The weight acquisition function is related to the object recognition function of the feature maps in the target object recognition model.

[0162] Specifically, the weighting function can be obtained by identifying the probability distribution deviation of the results and the rate of change of the object's bounding box area. For example, the weighting function is:

[0163] λ(x,y)=δ1x+δ2y (9)

[0164] Where x is the offset rate of the object bounding box, such as IOU, and y is the probability distribution deviation of the recognition result, such as cross-entropy. δ1 and δ2 are any parameters that can be adjusted as needed.

[0165] Example explanation: Set the parameters of one feature map among multiple feature maps to 0, input multiple feature maps corresponding to the target image into the subsequent steps of the vehicle recognition model, the probability distribution deviation of the obtained object recognition result is 1, the area change rate of the object bounding box is 50%, and assuming a1 and a2 are 0.3 and 0.5 respectively, then the weight coefficient of this feature map is 0.55.

[0166] In one possible implementation, reference weight coefficients for multiple feature maps can be obtained first using the non-perturbative method described above. These reference weight coefficients are then arranged in a preset order, such as from largest to smallest or vice versa. Next, a portion of the target feature maps corresponding to these reference weight coefficients are selected using a perturbative method. The weights of this portion of the feature maps under the perturbative condition are then recalculated. The remaining unperturbed feature maps continue to use the weights obtained in the previous non-perturbative method as the required weights. This approach balances processing efficiency and accuracy.

[0167] In one possible implementation, backpropagation can be used to calculate gradients across multiple feature maps, and the weights of each feature map can be determined based on the gradient magnitude. The gradient is related to the degree of influence of the feature map on the object recognition result of the vehicle recognition model.

[0168] In one possible implementation, the weight coefficients of the feature map are obtained based on IoU and its derivative algorithms (such as GIOU).

[0169] Example explanation: Before the feature map is artificially perturbed, the obtained bounding box is A. After artificially perturbing the feature map (i.e., setting all parameters to 0), the obtained bounding box is B.

[0170]

[0171] Where A∩B represents the intersection area of ​​the first and second bounding boxes. A∪B is the union area of ​​the first and second bounding boxes. Compared to the non-perturbation method, the weight coefficients of the feature map obtained by the perturbation method have higher accuracy.

[0172] In the embodiments of this application, the weight coefficients of multiple feature maps corresponding to the target image can also be obtained in other ways, and those skilled in the art can adjust them as needed.

[0173] In addition, embodiments of this application also provide an image processing apparatus, see [link to relevant documentation]. Figure 6 This is a schematic diagram of an image processing apparatus 600 provided in an embodiment of this application. The apparatus 600 includes:

[0174] The response unit 601 is used to, in response to the received target image, acquire the target conflict coefficient indicating the difficulty of identifying the target object, and the target image information richness indicating the richness of image information;

[0175] The determining unit 602 is used to determine the target value of the target image based on the target conflict coefficient and the target image information richness.

[0176] The filtering unit 603 is used to delete the target image when the target value is lower than a preset value threshold.

[0177] Unit 602 is specifically used for:

[0178] Based on the target conflict coefficient and the richness of target image information, and using a preset mapping relationship, the target value is determined.

[0179] Among them, the mapping relationship is that the value is positively correlated with the target conflict coefficient and positively correlated with the information richness of the target image.

[0180] Optionally, the mapping relationship can be a linear mapping relationship;

[0181] The linear mapping relationship is that the value is directly proportional to the target conflict coefficient and also directly proportional to the information richness of the target image.

[0182] Optionally, obtaining the target conflict coefficient representing vehicle relevance includes:

[0183] Determine multiple bounding boxes and predicted values ​​within the bounding boxes corresponding to the target image;

[0184] Calculate the boundary box deviation of the multiple boundary boxes based on the multiple boundary boxes;

[0185] Calculate the in-boundary prediction bias of the multiple bounding boxes based on the in-boundary prediction values ​​of the multiple bounding boxes;

[0186] The collision coefficient of the target image is determined based on the bounding box deviation and the in-bounding box prediction deviation of the plurality of bounding boxes.

[0187] Optionally, the information richness of the target image is obtained in the following ways:

[0188] Determine multiple feature maps of the target image and a weight coefficient for each feature map; the weight coefficient is used to indicate the degree of influence of the feature map on the recognition result of the vehicle recognition model;

[0189] The image information richness of the target image is determined based on the weight coefficient of each feature map among the multiple feature maps.

[0190] Optionally, device 600 also includes:

[0191] The addition unit is used to add the target image to the training sample library when the target value is not lower than the preset value threshold; the training sample library is used to store the image data to be trained in order to train the target object recognition model.

[0192] This application discloses an image processing apparatus. A response unit 601 acquires the target conflict coefficient and target image information richness corresponding to a target image. The target conflict coefficient represents the difficulty of identifying the target object. The target image information richness represents the information richness of the target image. A determination unit 602 is used to acquire the target value of the target image based on the target conflict coefficient and target image information richness. Here, the target value is related to both the difficulty of identifying the target object and the information richness of the image. A filtering unit 603 deletes the target image when the target value is low. Thus, by using the target value, images with low identification difficulty and low information richness are filtered out, reducing the number of images and improving the training efficiency of the model while ensuring the training accuracy of the target object recognition model.

[0193] This application also provides a vehicle system including the image processing apparatus described above.

[0194] This application also provides corresponding devices and computer-readable storage media for implementing the solutions provided in this application.

[0195] The device includes a memory and a processor. The memory stores instructions or code, and the processor executes the instructions or code to cause the device to perform an image processing method according to any embodiment of this application.

[0196] In practical applications, the computer-readable storage medium can be any combination of one or more computer-readable media. The computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. For example, a computer-readable storage medium can be, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this embodiment, the computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0197] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device.

[0198] Program code contained on a computer-readable medium may be transmitted using any suitable medium, including but not limited to wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0199] Computer program code for performing the operations of this invention can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as "C" or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0200] It should also be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0201] The above description is merely one specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for image processing, characterized in that, The method includes: In response to the received target image, a target conflict coefficient indicating the difficulty of identifying the target object and a target image information richness indicating the richness of image information are obtained; the target image is any image from the massive number of images received by the detection server for training the target object recognition model; The target value of the target image is determined based on the target conflict coefficient and the target image information richness. When the target value is lower than a preset value threshold, the target image is deleted from the massive image database. When the target value is not lower than a preset value threshold, the target image is added to the training sample library; the training sample library is used to store image data to be trained in order to train the target object recognition model. The acquisition of the target conflict coefficient, which indicates the difficulty of identifying the target object, includes: Determine multiple bounding boxes and predicted values ​​within the bounding boxes corresponding to the target image; Calculate the boundary box deviation of the multiple boundary boxes based on the multiple boundary boxes; Calculate the in-boundary prediction bias of the multiple bounding boxes based on the in-boundary prediction values ​​of the multiple bounding boxes; The collision coefficient of the target image is determined based on the bounding box deviation and the in-bounding box prediction deviation of the plurality of bounding boxes; The information richness of the target image is obtained through the following methods: Determine multiple feature maps of the target image and a weight coefficient for each feature map; the weight coefficient is used to indicate the degree of influence of the feature map on the recognition result of the vehicle recognition model; The image information richness of the target image is determined based on the weight coefficient of each feature map among the multiple feature maps.

2. The method according to claim 1, characterized in that, The target value of the target image is determined based on the target conflict coefficient and the target image information richness, including: Based on the target conflict coefficient and the target image information richness, and according to a preset mapping relationship, the target value is determined. The mapping relationship is that the value is positively correlated with the target conflict coefficient and positively correlated with the information richness of the target image.

3. The method according to claim 2, characterized in that, The mapping relationship is a linear mapping relationship; The linear mapping relationship is that the value is directly proportional to the target conflict coefficient and also directly proportional to the information richness of the target image.

4. An image processing apparatus, characterized in that, The device includes: The response unit is used to respond to the received target image by acquiring the target conflict coefficient, which indicates the difficulty of identifying the target object, and the target image information richness, which indicates the richness of image information. The determining unit is used to determine the target value of the target image based on the target conflict coefficient and the target image information richness. A filtering unit is used to delete the target image when the target value is lower than a preset value threshold; The acquisition of the target conflict coefficient, which indicates the difficulty of identifying the target object, includes: Determine multiple bounding boxes and predicted values ​​within the bounding boxes corresponding to the target image; Calculate the boundary box deviation of the multiple boundary boxes based on the multiple boundary boxes; Calculate the in-boundary prediction bias of the multiple bounding boxes based on the in-boundary prediction values ​​of the multiple bounding boxes; The collision coefficient of the target image is determined based on the bounding box deviation and the in-bounding box prediction deviation of the plurality of bounding boxes; The information richness of the target image is obtained through the following methods: Determine multiple feature maps of the target image and a weight coefficient for each feature map; the weight coefficient is used to indicate the degree of influence of the feature map on the recognition result of the vehicle recognition model; The image information richness of the target image is determined based on the weight coefficient of each feature map among the multiple feature maps.

5. The apparatus according to claim 4, characterized in that, The determining unit is specifically used for: Based on the target conflict coefficient and the target image information richness, and according to a preset mapping relationship, the target value is determined. The mapping relationship is that the value is positively correlated with the target conflict coefficient and positively correlated with the information richness of the target image.

6. A vehicle system comprising the image processing apparatus as described in claim 4 or 5.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores code that, when executed, performs the steps of the method as described in any one of claims 1-3.