A model training method, a target detection method, a device and an electronic device

By assigning higher weights to fewer targets in the object detection model and adjusting the model parameters, the problem of the model over-focusing on features of a large number of targets during training is solved, thereby improving detection accuracy and generalization.

CN122244590APending Publication Date: 2026-06-19INTELLINDUST INFORMATION TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INTELLINDUST INFORMATION TECH (SHENZHEN) CO LTD
Filing Date
2026-03-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing object detection models suffer from low detection accuracy because the number of different types of objects in sample images varies significantly during training. This leads the models to focus more on the features of objects with a larger number of objects and ignore the features of objects with a smaller number of objects.

Method used

By calculating the loss value of each target and assigning it a weight, and then weighting the loss value according to the weight, the model parameters of the target detection model are adjusted so that the model pays more attention to the fewer target features.

Benefits of technology

This improved the detection accuracy and generalization of the target detection model, enabling the model to better learn the features of a smaller number of targets and thus enhancing the detection performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244590A_ABST
    Figure CN122244590A_ABST
Patent Text Reader

Abstract

This invention provides a model training method, an object detection method, an apparatus, and an electronic device, relating to the field of image processing technology. One model training method includes: acquiring a sample image and ground truth information of each object in the sample image; inputting the sample image into an object detection model to be trained for object detection, obtaining prediction information for each object; calculating the loss value of each object based on the ground truth and prediction information; obtaining the weight corresponding to each object based on the loss value; performing a weighted calculation on the loss values ​​of each object based on the weights corresponding to each object to obtain the loss value of the sample image; and adjusting the model parameters of the object detection model based on the loss value of the sample image. Therefore, this solution can improve the detection accuracy of the object detection model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a model training method, a target detection method, an apparatus, and an electronic device. Background Technology

[0002] In various scenarios such as industrial production and monitoring, target detection of objects such as parts and cigarette butts is required, and appropriate processing is performed based on the detection results. Furthermore, due to the increasingly widespread application of neural network models in various fields in recent years, related technologies often train neural network models for different scenarios to obtain target detection models, and then use these models in practical applications to identify targets in images or videos corresponding to various scenarios.

[0003] However, the sample images used for model training contain different types of targets, and the number of targets of different types may vary significantly. As a result, the target detection model trained using sample images will focus more on the features of targets with a larger number of targets and ignore the features of targets with a smaller number of targets, which leads to lower detection accuracy of the target detection model. Summary of the Invention

[0004] The purpose of this invention is to provide a model training method, an object detection method, a device, and an electronic device to improve the detection accuracy of object detection models. The specific technical solution is as follows:

[0005] In a first aspect, embodiments of the present invention provide a model training method, the method comprising:

[0006] Acquire sample images and ground truth information of each target in the sample images; wherein, the ground truth information of any target includes: the ground truth position of the target and the ground truth category of the target;

[0007] The sample image is input into the target detection model to be trained for target detection to obtain the prediction information of each target; wherein, the prediction information of any target includes: the predicted location of the target and the predicted category of the target;

[0008] Based on the ground truth information and prediction information of each target, the loss value of each target is calculated; wherein, the loss value of any target includes: the position loss value of the target calculated based on the difference between the ground truth position and the prediction position of the target, and the category loss value of the target calculated based on the difference between the ground truth category and the prediction category of the target.

[0009] Based on the loss value of each target, the weight corresponding to each target is obtained. The loss value within the set range is positively correlated with the weight. The weight corresponding to any target includes: the weight of the position loss value for that target and the weight of the category loss value for that target.

[0010] The loss value of the sample image is obtained by weighting the loss values ​​of each target based on the weights corresponding to each target; wherein, the loss value of the sample image is: the loss value determined based on the weighted position loss value and the weighted category loss value of each target.

[0011] The model parameters of the target detection model are adjusted based on the loss value of the sample images.

[0012] Secondly, embodiments of the present invention provide a target detection method, the method comprising:

[0013] Acquire the image to be detected;

[0014] The image to be detected is input into the target detection model for target detection to obtain the target detection result; wherein, the target detection model is trained based on any of the model training methods described above, and the target detection result includes: the location information and category information of the target in the image to be detected.

[0015] Thirdly, embodiments of the present invention provide a model training apparatus, the apparatus comprising:

[0016] The first acquisition module is used to acquire a sample image and the ground truth information of each target in the sample image; wherein, the ground truth information of any target includes: the ground truth position of the target and the ground truth category of the target;

[0017] The first input module is used to input the sample image into the target detection model to be trained for target detection and obtain prediction information for each target; wherein, the prediction information for any target includes: the predicted location of the target and the predicted category of the target;

[0018] The first calculation module is used to calculate the loss value of each target based on the true value information and the predicted information of each target; wherein, the loss value of any target includes: the position loss value of the target calculated based on the difference between the true value position and the predicted position of the target, and the category loss value of the target calculated based on the difference between the true value category and the predicted category of the target;

[0019] The acquisition module is used to obtain the weight corresponding to each target based on the loss value of each target. The loss value within a set range is positively correlated with the weight. The weight corresponding to any target includes: the weight of the position loss value for that target and the weight of the category loss value for that target.

[0020] The weighted calculation module is used to perform weighted calculation on the loss value of each target based on the weight corresponding to each target, so as to obtain the loss value of the sample image; wherein, the loss value of the sample image is: the loss value determined based on the weighted position loss value and the weighted category loss value of each target;

[0021] The adjustment module is used to adjust the model parameters of the target detection model based on the loss value of the sample image.

[0022] Fourthly, embodiments of the present invention provide a target detection device, the device comprising:

[0023] The second acquisition module is used to acquire the image to be detected;

[0024] The second input module is used to input the image to be detected into the target detection model for target detection and obtain the target detection result; wherein, the target detection model is trained based on the model training device described above, and the target detection result includes: the location information and category information of the target in the image to be detected.

[0025] Fifthly, embodiments of the present invention provide an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

[0026] Memory, used to store computer programs;

[0027] The processor, when executing the program stored in memory, implements the above-mentioned model training method or the above-mentioned object detection method.

[0028] Beneficial effects of the embodiments of the present invention:

[0029] Therefore, for each target in any sample image, its corresponding weight can be obtained, and the loss value of each target can be calculated by weighting the weights of each target, so as to achieve differentiated weighting of the loss value of each target, and the loss value within the set range is positively correlated with the weight. Specifically, due to the significant differences in the number of different types of targets and the significant differences in detection difficulty among targets of the same type, after multiple rounds of training, the target detection model performs better in learning the features of targets with a larger number of targets. Although the loss value of targets with a larger number of targets is lower, the total loss value of such targets is larger. Therefore, the loss value of targets with a larger number of targets can be assigned a lower weight. Conversely, the target detection model performs worse in learning the features of targets with a smaller number of targets. Although the loss value of targets with a smaller number of targets is higher, the total loss value of such targets is smaller. Therefore, the loss value of targets with a smaller number of targets can be assigned a higher weight, thereby increasing the loss value of sample images containing a smaller number of targets. When adjusting the model parameters of the target detection model based on the loss value of sample images, the target detection model can focus more on the features of targets with a smaller number of targets, enabling the target detection model to learn the features of targets with a smaller number of targets better, thereby improving the detection accuracy and generalization of the target detection model.

[0030] Of course, implementing any product or method of the present invention does not necessarily require achieving all of the advantages described above at the same time. Attached Figure Description

[0031] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other embodiments can be obtained based on these drawings.

[0032] Figure 1 A line graph showing the loss values ​​of targets in various sample images, providing insights for related technologies;

[0033] Figure 2 A schematic flowchart of a model training method provided in an embodiment of the present invention;

[0034] Figure 3 A flowchart illustrating another model training method provided in an embodiment of the present invention;

[0035] Figure 4 This is a schematic flowchart of a target detection method provided in an embodiment of the present invention;

[0036] Figure 5This is a schematic diagram of the structure of a model training device provided in an embodiment of the present invention;

[0037] Figure 6 This is a schematic diagram of the structure of a target detection device provided in an embodiment of the present invention;

[0038] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0039] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art based on the present invention are within the scope of protection of the present invention.

[0040] First, a brief introduction to some technical terms used in the embodiments of this invention:

[0041] Gradient: It is the core driving force of neural network training (backpropagation), and it tells the model how to adjust the parameters to reduce the loss value.

[0042] Secondly, to better understand this solution, the problems existing in the model training methods of related technologies will be introduced:

[0043] During the training of an object detection model, there may be numerous sample images, each containing a large number of objects. Furthermore, the number of objects of different types may vary significantly. When the number of objects of one type is large, the cumulative loss value of that type may significantly exceed the cumulative loss value of other types, causing that type of object to dominate the model's training direction and ultimately resulting in low detection accuracy. To better understand this part, the accompanying diagram is provided, such as... Figure 1 As shown:

[0044] Figure 1 The horizontal axis represents the distribution of target loss values ​​(0-0.7), the vertical axis to the left represents the number of targets (0-35000), and the vertical axis to the right represents the cumulative loss values ​​of all targets under the same loss value (0-5000). Figure 1 The solid line 110 in the figure represents the curve related to the number of targets; targets with higher loss values ​​have fewer targets. Figure 1 The dashed line 120 represents the cumulative loss value of the target. Targets with higher loss values ​​have lower cumulative loss values. The peak of dashed line 120 is approximately 0.19. Figure 1There is also a solid line 130, which represents the mean of the loss values ​​of all targets; Figure 1 The solid line 110 and the dashed line 120 in the figure are both left-skewed curves, and the number of targets and the cumulative value of the loss value are more biased towards targets with low loss values.

[0045] For example, if there are 100 sample images, of which 95 are images of kittens and 5 are images of riding bicycles, then when there are a large number of kitten-related targets, the kitten-related targets dominate the training direction of the target detection model.

[0046] Accordingly, the loss value calculated based on 100 sample images is:

[0047] ;

[0048] Where L1 is the loss value calculated from 100 sample images, and E represents the species of kitten. For the target species of kittens, H represents the type of car. For the target of car types, For the target of the i-th kitten species, For the j-th car type, The mean loss value for the target species is the kitten. Let be the mean loss value for car-type targets. Therefore, when there are many cat-type targets, cat-type targets dominate the training direction of the target detection model, resulting in higher accuracy for cat-type targets and lower accuracy for car-type targets.

[0049] To address the aforementioned issues, embodiments of the present invention provide a model training method, an object detection method, an apparatus, and an electronic device.

[0050] The following describes a model training method provided by an embodiment of the present invention.

[0051] The model training method provided in this embodiment of the invention can be applied to electronic devices, such as computers or other terminal devices used to train object detection models, or servers used to train object detection models. This embodiment of the invention does not limit the specific form of the electronic device.

[0052] Of course, the entity executing a model training method can be a model training device. For example, a model training device can be functional software running on a terminal device to execute the model training method; a model training device can also be a functional module in a server to execute the model training method.

[0053] One possible implementation scheme, such as Figure 2 As shown, a model training method includes: acquiring sample images and ground truth information of each target in the sample images; wherein, the ground truth information of any target includes: the ground truth position of the target and the ground truth category of the target (step S201); inputting the sample images into the target detection model to be trained for target detection, and obtaining prediction information of each target; wherein, the prediction information of any target includes: the predicted position of the target and the predicted category of the target (step S202); calculating the loss value of each target based on the ground truth information and prediction information of each target; wherein, the loss value of any target includes: the position loss value of the target calculated based on the difference between the ground truth position and the predicted position of the target, and the loss value calculated based on the difference between the ground truth category and the predicted category of the target. The difference is used to calculate the category loss value of the target (step S203); based on the loss value of each target, the weight corresponding to each target is obtained, wherein the loss value within the set range is positively correlated with the weight; wherein the weight corresponding to any target includes: the weight of the position loss value of the target and the weight of the category loss value of the target (step S204); the loss value of each target is weighted based on the weight corresponding to each target to obtain the loss value of the sample image; wherein the loss value of the sample image is: the loss value determined based on the weighted position loss value of each target and the weighted category loss value of each target (step S205); the model parameters of the target detection model are adjusted according to the loss value of the sample image (step S206).

[0054] Therefore, for each target in any sample image, its corresponding weight can be obtained, and the loss value of each target can be calculated by weighting the weights of each target, so as to achieve differentiated weighting of the loss value of each target, and the loss value within the set range is positively correlated with the weight. Specifically, due to the significant differences in the number of different types of targets and the significant differences in detection difficulty among targets of the same type, after multiple rounds of training, the target detection model performs better in learning the features of targets with a larger number of targets. Although the loss value of targets with a larger number of targets is lower, the total loss value of such targets is larger. Therefore, the loss value of targets with a larger number of targets can be assigned a lower weight. Conversely, the target detection model performs worse in learning the features of targets with a smaller number of targets. Although the loss value of targets with a smaller number of targets is higher, the total loss value of such targets is smaller. Therefore, the loss value of targets with a smaller number of targets can be assigned a higher weight, thereby increasing the loss value of sample images containing a smaller number of targets. When adjusting the model parameters of the target detection model based on the loss value of sample images, the target detection model can focus more on the features of targets with a smaller number of targets, enabling the target detection model to learn the features of targets with a smaller number of targets better, thereby improving the detection accuracy and generalization of the target detection model.

[0055] For step S201, the sample image can be any image in the dataset used for training the object detection model, and the sample image contains various objects. For example, sample image 1 can be an image about a road, and the cars in sample image 1 can be considered as objects. The ground truth information of any object can include: the ground truth location of the object and the ground truth category of the object. The ground truth location of the object can be the location of the object in the ground truth region in the sample image, and the ground truth category of the object can be the ground truth category to which the object belongs. The ground truth location and ground truth category of any object can be manually labeled. The ground truth region can also be called the ground truth box.

[0056] The above dataset can be constructed by collecting multiple sample images from the network in advance.

[0057] For step S202, the prediction information of any target may include the predicted location of the target and the predicted category of the target. The predicted location of any target may be the location of the target in the predicted region of the sample image predicted by the target detection model, and the predicted category of any target may be the category to which the target belongs predicted by the target detection model.

[0058] Each target in the sample image corresponds to a predicted location and a predicted category; the predicted region can also be called a predicted bounding box.

[0059] The process of predicting the location can be considered as the target detection model setting a predefined region (also called an anchor box) for each anchor point in the sample image, and selecting the prediction region of each target from the set predefined region, thereby determining the predicted location of each target.

[0060] In one implementation, after the object detection model performs object detection on the sample image, it can output not only the predicted location and predicted category of each object, but also the predicted result value that distinguishes between positive and negative samples; and subsequently, a loss value for the predicted result value used to distinguish between positive and negative samples can be determined, and the object detection model can be trained based on the loss value.

[0061] Regarding step S203, the greater the difference between the true position and the predicted position of any target, the greater the calculated position loss value of that target; conversely, the smaller the difference between the true position and the predicted position of any target, the smaller the calculated position loss value of that target.

[0062] The greater the difference between the true class and the predicted class of any target, the greater the calculated class loss value of that target; conversely, the smaller the difference between the true class and the predicted class of any target, the smaller the calculated class loss value of that target.

[0063] In one implementation, based on the true and predicted positions of each target, the distance between the center point of the real area represented by the true position of each target and the center point of the predicted area represented by the predicted position can be calculated to obtain the unutilized distance of each target. Based on the unutilized distance of each target, the position loss value of each target is calculated. The larger the unutilized distance of any target, the larger the position loss value of that target.

[0064] Regarding step S204, to enable the object detection model to focus more on the features of a smaller number of targets, each target in this scheme has a corresponding weight, and the weight of each target is related to the target's loss value. Specifically, the weight for the target's positional loss value is related to the target's positional loss value, and the weight for the target's category loss value is related to the target's category loss value. For example, the positional loss value of target 1 is 0.5, the positional loss value of target 2 is 0.6, the weight for the positional loss value of target 1 is 1, and the weight for the positional loss value of target 2 is 1.1.

[0065] The set range can include the entire range of location loss values / class loss values, meaning that the location loss values / class loss values ​​of the entire range are positively correlated with the weights; the set range can also be a partial range of location loss values / class loss values, meaning that the location loss values / class loss values ​​of a partial range are positively correlated with their corresponding weights.

[0066] In one implementation, a mapping relationship between the loss value of the target and its corresponding weight can be pre-defined. After calculating the loss value of each target, the weight corresponding to each target can be obtained based on the above mapping relationship.

[0067] In step S205, the loss values ​​of each target are weighted and calculated to differentiate the weights of the loss values ​​of each target, thus obtaining the weighted loss values ​​of each target. The loss value of the sample image can be calculated from the weighted loss values ​​of each target.

[0068] By considering the positional loss value of each target and its weight, the weighted positional loss value of each target can be calculated. Similarly, by considering the category loss value of each target and its weight, the weighted category loss value of each target can be calculated. After calculating the weighted positional loss value and the weighted category loss value of each target, the loss value of the sample image can be determined.

[0069] There are several ways to calculate the loss value of a sample image. One example is described below: For each target, calculate the product of the weight corresponding to the target and the loss value of the target to obtain the weighted position loss value and the weighted class loss value of the target. Calculate the average of the weighted position loss values ​​and the average of the weighted class loss values ​​of each target. Then, sum the averages of the two types of loss values ​​to obtain the loss value of the sample image.

[0070] Regarding step S206, after obtaining the loss value of the sample image, backpropagation can be performed on the target detection model to adjust the model parameters of the target detection model;

[0071] In one implementation, after adjusting the model parameters of the object detection model, an object detection model with adjusted model parameters can be obtained. Then, it is checked whether the object detection model has converged. If it has converged, the trained object detection model can be obtained.

[0072] One possible implementation involves weighting the loss values ​​of each target based on their respective weights to obtain the loss value of the sample image. This includes: calculating the weighted positional loss value of each target based on its weight and the positional loss value itself (step A1); calculating the weighted category loss value of each target based on its weight and the category loss value itself (step A2); and determining the loss value of the sample image based on the weighted positional loss value and the weighted category loss value itself (step A3). It is evident that a weighted loss value can be calculated for both the positional and category loss values ​​of each target. Furthermore, the loss value of the sample image is obtained based on the weighted positional and category loss values ​​of each target. The loss value of the sample image reflects the proportion of the weighted positional and category loss values ​​of each target, allowing the target detection model to focus more on the features of targets with high loss values, thereby improving the detection accuracy of the target detection model.

[0073] Regarding step A1, each target has a position loss value and a corresponding weight for that position loss value. For each target, the weight of its position loss value and the target's position loss value can be calculated to obtain the weighted position loss value of that target. For example, if the weight of target 1's position loss value is 0.8 and the position loss value of target 1 is 0.5, the weighted position loss value of target 1 can be calculated to be 0.4.

[0074] Regarding step A2, each target has a category loss value and a corresponding category loss value weight. For each target, the weight of the category loss value and the category loss value of the target can be calculated to obtain the weighted category loss value of the target. For example, the category loss value weight of target 1 is 0.9, and the position loss value of target 1 is 0.1, so the weighted position loss value of target 1 can be calculated to be 0.09.

[0075] For step A3, after calculating the weighted position loss value and the weighted category loss value of each target, the sum of the weighted position loss values ​​of each target can be calculated, the sum of the weights of the position loss values ​​of each target can be calculated, and the ratio of the sum of the weighted loss values ​​to the sum of the weights can be calculated as the weighted position loss value of each target; and the sum of the weighted category loss values ​​and the sum of the weights of the category loss values ​​of each target can be calculated, and the ratio of the sum of the weighted loss values ​​to the sum of the weights can be calculated as the weighted category loss value of each target; the sum of the weighted position loss value and the sum of the weighted category loss values ​​of each target can be calculated to obtain the loss value of the sample image.

[0076] One possible implementation, based on step A2 above, is to obtain the loss value of the sample image according to the following expression:

[0077] ;

[0078] in, Let N be the loss value for the sample image, and N be the total number of targets in the sample image. The weights for the position loss value of the i-th target. Let i be the position loss value of the i-th target. The weights for the category loss value for the i-th target. Let be the category loss value for the i-th target.

[0079] As can be seen, the loss value of the sample image can be obtained according to the above expression. The loss value of the sample image can reflect the proportion of the weighted position loss value and the weighted category loss value of each target, so that the target detection model pays more attention to the features of targets with high loss values, thereby improving the detection accuracy of the target detection model.

[0080] It can be considered as the weighted position loss value of the i-th target, that is, the product of the weight of the position loss value for the i-th target and the position loss value for the i-th target. It can be considered as the weighted class loss value of the i-th target, that is, the product of the weight of the class loss value for the i-th target and the class loss value for the i-th target; It can be considered as the sum of the weighted position loss values ​​of each target. It can be considered as the sum of the weighted class loss values ​​of each objective; It can be considered as the sum of the weights of the position loss values ​​for each target. It can be considered as the sum of the weights of the category loss values ​​for each target.

[0081] Furthermore, the above expression can be considered as: calculating the ratio of the cumulative weighted position loss value of each target to the cumulative weight of the position loss value for each target, and calculating the ratio of the cumulative weighted class loss value of each target to the cumulative weight of the class loss value for each target, and calculating the sum of the two ratios as the loss value of the sample image.

[0082] In one implementation, any weight in the weights corresponding to any objective is obtained according to the following relationship:

[0083] ;

[0084] For the weights corresponding to the i-th target The weight, Let the position loss value or the category loss value be the loss value for the i-th target. , All are constants, and >1, ≥0;

[0085] In another implementation, the weights corresponding to each objective are obtained according to the following relationship:

[0086] ;

[0087] For the weights corresponding to the i-th target The weight, Let a be the position loss value or the category loss value in the loss value of the i-th target, where a and b are constants and a < 0, b > 0.

[0088] It can be seen that, based on relational This allows obtaining any weight from the weights corresponding to any target. Targets with high loss values ​​also have higher weights, as they contribute more to the loss value of the sample image. Therefore, during the training of the object detection model, it is more biased towards targets with high loss values, causing the model to focus more on the features of targets with high loss values, thereby improving the detection accuracy of the object detection model; and, based on the relational formula... It is also possible to obtain any weight among the weights corresponding to any target. Targets with medium loss values ​​have higher weights, and targets with medium loss values ​​contribute more to the loss value of the sample image. Therefore, when the target detection model is trained, it is more biased towards targets with medium loss values, so that the target detection model will pay more attention to the features of targets with medium loss values, thereby improving the detection accuracy of the target detection model.

[0089] in, It can be the position loss value of the i-th target, or it can be the category loss value of the i-th target. When the position loss value is the i-th target, As the weights for the position loss value of the i-th target, in When the category loss value is for the i-th target, The weights are the class loss values ​​for the i-th target.

[0090] Regarding the first implementation method, because >1, ≥0, therefore the relation is It can be considered a monotonically increasing function; therefore, based on the relational expression... When obtaining any weight among the weights corresponding to any target, it can be assumed that the set range includes the entire range of loss values. The loss value of the target is positively correlated with the weight. The larger the loss value of the target, the larger the corresponding weight.

[0091] It is important to emphasize that when the training direction of the desired object detection model is biased towards targets with high loss values, the relational formula can be used. To obtain any weight among the weights corresponding to any target; if the position loss value of the target is high, then the weight for the position loss value of the target is also high, and if the category loss value of the target is high, then the weight for the category loss value of the target is also high, then the loss value of the target is high. Targets with high loss values ​​contribute more to the loss value of the sample image. When adjusting the model parameters of the target detection model based on the loss value of the sample image, the target detection model will pay more attention to the features of targets with high loss values.

[0092] For the second implementation, since a < 0 and b > 0, the relation is... It can be considered a convex quadratic function (a quadratic function with its opening facing downwards); therefore, based on the relation... When obtaining any weight among the weights corresponding to any target, it can be assumed that the set range includes part of the range of values ​​of the loss value. The loss value of the target within the set range is positively correlated with the weight. The larger the loss value of the target, the larger the corresponding weight. Correspondingly, the loss value of the target outside the set range is negatively correlated with the weight. The larger the loss value of the target, the smaller the corresponding weight.

[0093] It is important to emphasize that when the desired training direction of the target detection model is biased towards targets with moderate loss values ​​(targets with potential and training difficulty), the relational formula can be used. To obtain any weight among the weights corresponding to any target; if the target's positional loss value is high / low, then the weight for the target's positional loss value is low; and if the target's category loss value is high / low, then the weight for the target's category loss value is also low, and the target's loss value is low; targets with high loss values ​​and targets with low loss values ​​have lower corresponding weights; targets with medium loss values ​​contribute more to the loss value of the sample image; and when adjusting the model parameters of the target detection model based on the loss value of the sample image, the target detection model will pay more attention to the features of targets with medium loss values.

[0094] Furthermore, based on different training directions, one of the two relationships mentioned above can be selected to obtain any weight among the weights corresponding to any target; of course, the relationships used to obtain weights are not limited to the two mentioned above, and other relationships may also exist, for example: , It is a monotonically increasing function.

[0095] In one possible implementation, such as Figure 3 As shown, after adjusting the model parameters of the target detection model, the process further includes: acquiring a test image and ground truth information of each target in the test image (step S301); inputting the test image into the adjusted target detection model for target detection to obtain prediction information of each target (step S302); calculating a first score based on the ground truth information and prediction information of each target; wherein, the first score is a score of the similarity between the ground truth information and prediction information of each target (step S303); if a predetermined condition is met, training of the target detection model is stopped; wherein, the predetermined condition is: the first score is lower than the second score, and the second score is lower than the first score calculated after the model parameters of the target detection model were last adjusted before obtaining the second score, and the second score is the first score calculated after the model parameters of the target detection model were last adjusted (step S304); if the predetermined condition is not met, it is checked whether a preset convergence condition has been reached; if the preset convergence condition has not been reached, the process returns to the step of acquiring a sample image and ground truth information of each target in the sample image (step S305). As can be seen, after adjusting the model parameters of the object detection model, the test image and the ground truth information of each target in the test image can be obtained. The test image is then input into the adjusted object detection model for object detection to obtain the prediction information of each target. Based on the ground truth information and prediction information of each target, the first score is calculated. If the predetermined condition is met (the detection accuracy of the adjusted object detection model continues to decrease), the training of the object detection model is stopped directly to ensure the stability of the object detection model after the model parameter adjustment and reduce the probability of the detection accuracy of the object detection model continuously decreasing. If the predetermined condition is not met (the detection accuracy of the adjusted object detection model does not continue to decrease), it is possible to check whether the preset convergence condition has been reached. If the preset convergence condition has not been reached, the process returns to the step of obtaining the sample image and the ground truth information of each target in the sample image, thereby realizing continuous training of the object detection model and improving the detection accuracy of the object detection model.

[0096] For step S301, the test image can be any image used to test the adjusted target detection model, and the test image contains each target; the test image is different from the sample image mentioned above.

[0097] The ground truth information for each target can include the location of each target in the real region of the test image and the real category of each target. The ground truth information for each target can be manually labeled.

[0098] For step S302, the prediction information for each target can be the prediction of the adjusted target detection model, the position of the predicted region of each target in the test image and the predicted category of each target, and each target in the test image has a corresponding predicted position and a predicted category.

[0099] Regarding step S303, the first score can be a score about the similarity between the ground truth information and the predicted information of each target. That is, it is a score obtained by combining the similarity between the ground truth position and the predicted position of each target and the similarity between the ground truth category and the predicted category. The higher the first score, the higher the detection accuracy of the target detection model and the better the training effect of the target detection model. Conversely, the lower the first score, the lower the detection accuracy of the target detection model and the worse the training effect of the target detection model.

[0100] In one implementation, AP (Average Precision) can be calculated as the accuracy of the object detection model. Specifically, AP can be obtained by plotting the PR (Precision-Recall) curve and calculating the area under the PR curve, and the calculated AP can be used as the first score.

[0101] In one implementation, the distance between the center point of the real region represented by the ground truth location of each target and the center point of the predicted region represented by the predicted location can be determined first to obtain the distance to be utilized for each target. The average value of the distance to be utilized is calculated. Based on the average value of the distance to be utilized, the similarity between the ground truth location and the predicted location of each target is calculated to obtain a first similarity. The similarity between the ground truth category and the predicted category of each target is calculated to obtain a second similarity. A first score is calculated based on the first similarity and the second similarity. The lower the first similarity and the second similarity, the smaller the calculated first score. Correspondingly, the higher the first similarity and the second similarity, the larger the calculated first score.

[0102] In another implementation, the first score can be determined directly based solely on the similarity between the ground truth location and the predicted location. For example, the test image includes target 3 and target 4. The exploitable distance for target 3 is 0.5 cm, and the exploitable distance for target 4 is 0.7 cm. The average exploitable distance is 0.6 cm. Using the formula y = -20x + 100, the first score is calculated to be 88, where x is the average exploitable distance and y is the first score.

[0103] In another implementation, the intersection-union ratio (IUR) of the ground truth region and the predicted region represented by the predicted location of each target can be determined first to obtain the target IUR of each target. The average IUR of the target IURs is then calculated, and a first score is calculated based on this average. A higher average IUR results in a higher first score, and vice versa. For example, in a test image containing target a and target b, the IUR of target a is 0.8, the IUR of target b is 0.7, and the average IUR is 0.75. Using the formula y=100x, the first score is calculated to be 75, where x is the average IUR and y is the first score. Of course, the above implementation is merely an illustrative example and does not limit the scope of the present invention.

[0104] Regarding step S304, which is a condition that meets a predetermined condition, namely, the first score is lower than the second score, and the second score is lower than the score after the target retrieval model's parameters were last adjusted before the second score was obtained, where the second score is the first score calculated after the target detection model's parameters were last adjusted; in this case, it can be considered that the detection accuracy of the adjusted target detection model has decreased compared to the target detection model before adjustment, and the detection accuracy of the target retrieval model is continuously decreasing, which has a negative impact on the training of the target detection model. Therefore, the training of the target detection model can be stopped immediately, and the model parameters of the target detection model can be adjusted by the supervisor of the model training to ensure the stability of the target detection model after the model parameters are adjusted, thereby reducing the probability of the target detection model's detection accuracy continuing to decrease.

[0105] Regarding step S305, which represents a situation where the predetermined condition is not met, i.e., the first score is not lower than the second score, in this case, it can be considered that the detection accuracy of the adjusted object detection model has not decreased compared to the original model, and the training of the object detection model has not had a negative effect. Therefore, it is possible to check whether the preset convergence condition has been met. If the preset convergence condition has not been met, the process returns to the step of obtaining the sample image and the ground truth information of each object in the sample image. Conversely, if the preset convergence condition has been met, it is determined that the object detection model has been trained successfully, and the trained object detection model can be obtained.

[0106] The preset convergence condition can be whether the loss value of the test image is lower than a predetermined threshold. If so, the preset convergence condition is detected. This embodiment of the invention does not specifically limit this.

[0107] In one implementation, if the preset convergence condition is not met, the model parameters related to the learning rate in the object detection model can be increased, and the steps of obtaining sample images and ground truth information of each object in the sample images can be returned. As a result, the learning rate of the object detection model is higher, and the object detection model can reach the preset convergence condition more quickly in the subsequent training process, instead of training the model indefinitely, thereby improving the efficiency of model training.

[0108] In one possible implementation, the calculation of the position loss value of each target includes: obtaining the intersection-union ratio (IUGR) between the actual and predicted regions of each target based on their actual and predicted positions (step B1); and determining the position loss value of each target based on the IUGR (step B2). It is evident that the IUGR between the actual and predicted regions of each target can intuitively reflect the difference between their actual and predicted regions. The smaller the IUGR of any target, the greater the difference between its actual and predicted regions, and the larger the determined position loss value of that target. Conversely, the larger the IUGR, the smaller the difference between its actual and predicted regions, and the smaller the determined position loss value of that target. Thus, an accurate position loss value for each target can be calculated.

[0109] Regarding step B1, the ground truth region of any target can be considered as the ground truth bounding box of the target. In the sample image, the target is located within the ground truth region of the target, and the predicted region of any target can be considered as the predicted bounding box of the target.

[0110] The intersection-union ratio between the real and predicted regions of each target is the ratio between the intersection region and the union region of the real and predicted regions of each target.

[0111] For step B2, the range of the intersection-union ratio between the real area and the predicted area of ​​each target is [0, 1]. It can be assumed that the maximum value of the intersection-union ratio between the real area and the predicted area of ​​each target is 1. The position loss value of any target can be 1 minus the intersection-union ratio between the real area and the predicted area of ​​that target.

[0112] To better understand this section, we will explain it in conjunction with expressions below:

[0113] ;

[0114] in, Let A be the intersection-union ratio (IoU) between the ground truth region and the predicted region of any target in the sample image, and let B be the predicted region of the target. This is the intersection-union ratio between the actual and predicted regions of the target.

[0115] Of course, the above expression is only an example, and the position loss value of each target can also be calculated using other expressions below;

[0116] (1) According to the following expression, the position loss value of each target is:

[0117] ;

[0118] ;

[0119] in, Let the intersection-union ratio (IoU) be the ratio between the actual and predicted regions of the i-th target. This is the Euclidean distance (the distance between the center of the predicted region and the center of the true region). Let i be the predicted region for the i-th target. Let be the true region of the i-th target, c be the diagonal length of the minimum bounding box between the predicted region and the true region, where the minimum bounding region is the smallest bounding rectangle that can simultaneously contain the predicted region and the true region; L be the position loss value of each target, and N be the total number of targets in the sample image.

[0120] As can be seen, the above loss function can solve the gradient vanishing problem caused by the non-intersection of the target's true region and the predicted region.

[0121] (2) According to the following expression, the position loss value of each target is:

[0122] ;

[0123] ;

[0124] in, Let the intersection-union ratio (IoU) be the ratio between the actual and predicted regions of the i-th target. Euclidean distance. Let i be the predicted region for the i-th target. Let c be the true region of the i-th target, and let c be the diagonal length of the minimum bounding region between the predicted region and the true region. v represents the difference in aspect ratio between the predicted region and the actual region. The width of the actual region. The actual height of the region. To predict the width of the region, To predict the height of the area, , The aspect ratio of the predicted region to the real region is weighted, L is the position loss value of each target, and N is the total number of targets in the sample image.

[0125] As can be seen, the above loss function can solve the gradient vanishing problem caused by the non-intersection of the target's real region and the predicted region, and takes into account the consistency of the aspect ratio between the target's real region and the predicted region.

[0126] (3) According to the following expression, the position loss value of each target is:

[0127] ;

[0128] ;

[0129] in, Let the intersection-union ratio (IoU) be the ratio between the actual and predicted regions of the i-th target. Euclidean distance. Let i be the predicted region for the i-th target. Let c be the true region of the i-th target, and let c be the diagonal length of the minimum bounding region between the predicted region and the true region. The width of the actual region. The actual height of the region. To predict the width of the region, To predict the height of the area, The width of the minimum enclosing region. L is the height of the minimum enclosing region, L is the position loss value of each target, and N is the total number of targets in the sample image.

[0130] As can be seen, the above loss function can solve the gradient vanishing problem caused by the non-intersection of the target's real region and the predicted region. Furthermore, considering the consistency of the aspect ratio between the target's real region and the predicted region, it also avoids the problem of the gradient of the target's aspect ratio being coupled with the gradient of the intersection-union ratio between the target's real region and the predicted region, making the gradient more stable.

[0131] The inventors' research revealed that the most basic IoU loss function often yields the best training results for object detection models.

[0132] In one possible implementation, such as Figure 4 As shown in the figure, this embodiment of the invention also provides a target detection method, which includes:

[0133] S401, acquire the image to be detected;

[0134] S402, Input the image to be detected into the target detection model to perform target detection and obtain the target detection result;

[0135] The target detection model is trained based on any of the above model training methods, and the target detection results include: the location information and category information of the target in the image to be detected.

[0136] The target detection method provided in this embodiment of the invention can acquire an image to be detected and input the image to be detected into a target detection model for target detection to obtain a target detection result. The target detection model is a target detection model trained according to any of the above-mentioned model training methods. Using a pre-trained target detection model with high detection accuracy can improve the accuracy of target detection.

[0137] In this embodiment of the invention, the image to be detected (i.e., the image to be detected) can be acquired. Furthermore, a target detection model trained according to any of the above-described model training methods can be acquired. Subsequently, the target detection model, combined with the image to be detected, can be used to determine the location information of the target in the image to be detected (i.e., the target detection result). Thus, a pre-trained target detection model with high detection accuracy can be used for target detection. Consequently, the accuracy of target detection can be improved.

[0138] Based on the above method embodiments, this invention also provides a model training device, such as... Figure 5 As shown, the device includes:

[0139] The first acquisition module 510 is used to acquire a sample image and the ground truth information of each target in the sample image; wherein, the ground truth information of any target includes: the ground truth position of the target and the ground truth category of the target;

[0140] The first input module 520 is used to input the sample image into the target detection model to be trained for target detection and obtain prediction information of each target; wherein, the prediction information of any target includes: the predicted position of the target and the predicted category of the target;

[0141] The first calculation module 530 is used to calculate the loss value of each target based on the true value information and the predicted information of each target; wherein, the loss value of any target includes: the position loss value of the target calculated based on the difference between the true value position and the predicted position of the target, and the category loss value of the target calculated based on the difference between the true value category and the predicted category of the target;

[0142] The module 540 is used to obtain the weights corresponding to each target based on the loss values ​​of each target, wherein the loss values ​​within a set range are positively correlated with the weights; wherein the weights corresponding to any target include: the weight of the position loss value for that target and the weight of the category loss value for that target;

[0143] The weighted calculation module 550 is used to perform weighted calculation on the loss value of each target based on the weight corresponding to each target to obtain the loss value of the sample image; wherein, the loss value of the sample image is: the loss value determined based on the weighted position loss value and the weighted category loss value of each target;

[0144] The adjustment module 560 is used to adjust the model parameters of the target detection model based on the loss value of the sample image.

[0145] Optionally, the weighted calculation module is specifically used for:

[0146] Based on the weights of the position loss values ​​of each target and the position loss values ​​of each target, calculate the weighted position loss value of each target;

[0147] Based on the weights of the category loss values ​​of each target and the category loss values ​​of each target, calculate the weighted category loss value of each target;

[0148] The loss value of the sample image is determined based on the weighted position loss value and the weighted category loss value of each target.

[0149] Optionally, the loss value of the sample image can be obtained according to the following expression:

[0150] ;

[0151] in, Let N be the loss value of the sample image, and N be the total number of targets in the sample image. The weights for the position loss value of the i-th target. Let i be the position loss value of the i-th target. The weights for the category loss value for the i-th target. Let be the category loss value for the i-th target.

[0152] Optionally, any weight among the weights corresponding to any objective can be obtained according to the following relationship:

[0153] ;

[0154] For the weights corresponding to the i-th target The weight, Let the position loss value or the category loss value be the loss value for the i-th target. , All are constants, and >1, ≥0;

[0155] or,

[0156] The weights corresponding to each objective are obtained according to the following formula:

[0157] ;

[0158] For the weights corresponding to the i-th target The weight, Let a be the position loss value or the category loss value in the loss value of the i-th target, where a and b are constants and a < 0, b > 0.

[0159] Optionally, the device further includes:

[0160] The third acquisition module is used to acquire a test image and ground truth information of each target in the test image after adjusting the model parameters of the target detection model.

[0161] The third input module is used to input the test image into the adjusted target detection model for target detection and obtain prediction information for each target.

[0162] The second calculation module is used to calculate a first score based on the ground truth information and predicted information of each target; wherein, the first score is a score on the similarity between the ground truth information and predicted information of each target.

[0163] A stop module is used to stop training the object detection model if a predetermined condition is met; wherein the predetermined condition is: the first score is lower than the second score, and the second score is lower than the first score calculated after the model parameters of the object detection model were last adjusted before the second score was obtained, and the second score is the first score calculated after the model parameters of the object detection model were last adjusted.

[0164] The detection module is used to detect whether a preset convergence condition has been reached if the predetermined condition is not met, and if the preset convergence condition is not reached, return to the step of obtaining the sample image and the ground truth information of each target in the sample image.

[0165] Optionally, the method for calculating the position loss value of each target:

[0166] Based on the true and predicted locations of each target, the intersection-union ratio (IU / U) between the true and predicted regions of each target is obtained.

[0167] The location loss value of each target is determined based on the intersection-union ratio between the real and predicted regions of each target.

[0168] Based on the above method embodiments, this invention also provides a target detection device, such as... Figure 6 As shown, the device includes:

[0169] The second acquisition module 610 is used to acquire the image to be detected;

[0170] The second input module 620 is used to input the image to be detected into the target detection model for target detection and obtain the target detection result; wherein, the target detection model is trained based on the above-mentioned model training device, and the target detection result includes: the location information and category information of the target in the image to be detected.

[0171] This invention also provides an electronic device, such as... Figure 7 As shown, it includes a processor 701, a communication interface 702, a memory 703, and a communication bus 704, wherein the processor 701, the communication interface 702, and the memory 703 communicate with each other through the communication bus 704.

[0172] Memory 703 is used to store computer programs;

[0173] The processor 701, when executing the program stored in the memory 703, implements any of the above-mentioned model training methods / object detection methods.

[0174] The communication bus mentioned in the above electronic devices can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This communication bus can be divided into address bus, data bus, control bus, etc. For ease of illustration, only one thick line is used to represent it in the diagram, but this does not mean that there is only one bus or one type of bus.

[0175] The communication interface is used for communication between the aforementioned electronic devices and other devices.

[0176] The memory may include random access memory (RAM) or non-volatile memory (NVM), such as at least one disk storage device. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.

[0177] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.

[0178] In another embodiment of the present invention, a computer-readable storage medium is also provided, which stores a computer program that, when executed by a processor, implements any of the above-described model training methods / object detection methods.

[0179] In another embodiment of the present invention, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to execute any of the model training methods / object detection methods described in the above embodiments.

[0180] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).

[0181] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0182] The various embodiments in this specification are described in a related manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0183] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention are included within the scope of protection of the present invention.

Claims

1. A model training method, characterized in that, The method includes: Acquire sample images and ground truth information of each target in the sample images; wherein, the ground truth information of any target includes: the ground truth position of the target and the ground truth category of the target; The sample image is input into the target detection model to be trained for target detection to obtain the prediction information of each target; wherein, the prediction information of any target includes: the predicted location of the target and the predicted category of the target; Based on the ground truth information and prediction information of each target, the loss value of each target is calculated; wherein, the loss value of any target includes: the position loss value of the target calculated based on the difference between the ground truth position and the prediction position of the target, and the category loss value of the target calculated based on the difference between the ground truth category and the prediction category of the target. Based on the loss value of each target, the weight corresponding to each target is obtained. The loss value within the set range is positively correlated with the weight. The weight corresponding to any target includes: the weight of the position loss value for that target and the weight of the category loss value for that target. The loss value of the sample image is obtained by weighting the loss values ​​of each target based on the weights corresponding to each target; wherein, the loss value of the sample image is: the loss value determined based on the weighted position loss value and the weighted category loss value of each target. The model parameters of the target detection model are adjusted based on the loss value of the sample images.

2. The method according to claim 1, characterized in that, The loss value of the sample image is obtained by weighting the loss values ​​of each target based on the weights corresponding to each target, including: Based on the weights of the position loss values ​​of each target and the position loss values ​​of each target, calculate the weighted position loss value of each target; Based on the weights of the category loss values ​​of each target and the category loss values ​​of each target, calculate the weighted category loss value of each target; The loss value of the sample image is determined based on the weighted position loss value and the weighted category loss value of each target.

3. The method according to claim 2, characterized in that, The loss value of the sample image is obtained according to the following expression: ; in, Let N be the loss value of the sample image, and N be the total number of targets in the sample image. The weights for the position loss value of the i-th target. Let i be the position loss value of the i-th target. The weights for the category loss value for the i-th target. Let be the category loss value for the i-th target.

4. The method according to any one of claims 1-3, characterized in that, The following formula can be used to obtain any weight among the weights corresponding to any objective: ; For the weights corresponding to the i-th target The weight, Let the position loss value or the category loss value be the loss value for the i-th target. , All are constants, and >1, ≥0; or, The weights corresponding to each objective are obtained according to the following formula: ; For the weights corresponding to the i-th target The weight, Let a be the position loss value or the category loss value in the loss value of the i-th target, where a and b are constants and a < 0, b > 0.

5. The method according to any one of claims 1-3, characterized in that, After adjusting the model parameters of the target detection model, the method further includes: Obtain the test image and the ground truth information of each target in the test image; The test image is input into the adjusted target detection model for target detection to obtain the prediction information of each target; Based on the ground truth information and predicted information of each target, a first score is calculated; wherein, the first score is a score on the similarity between the ground truth information and predicted information of each target. If a predetermined condition is met, training of the object detection model is stopped; wherein the predetermined condition is: the first score is lower than the second score, and the second score is lower than the first score calculated after the model parameters of the object detection model were last adjusted before the second score was obtained, and the second score is the first score calculated after the model parameters of the object detection model were last adjusted. If the predetermined conditions are not met, then check whether the preset convergence condition has been reached. If the preset convergence condition has not been reached, then return to the step of obtaining the sample image and the ground truth information of each target in the sample image.

6. The method according to any one of claims 1-3, characterized in that, The calculation methods for the position loss values ​​of each target include: Based on the true and predicted locations of each target, the intersection-union ratio (IU / U) between the true and predicted regions of each target is obtained. The location loss value of each target is determined based on the intersection-union ratio between the real and predicted regions of each target.

7. A target detection method, characterized in that, The method includes: Acquire the image to be detected; The image to be detected is input into a target detection model for target detection to obtain a target detection result; wherein the target detection model is trained based on the model training method described in any one of claims 1-6, and the target detection result includes: the location information and category information of the target in the image to be detected.

8. A model training device, characterized in that, The device includes: The first acquisition module is used to acquire a sample image and the ground truth information of each target in the sample image; wherein, the ground truth information of any target includes: the ground truth position of the target and the ground truth category of the target; The first input module is used to input the sample image into the target detection model to be trained for target detection and obtain prediction information for each target; wherein, the prediction information for any target includes: the predicted location of the target and the predicted category of the target; The first calculation module is used to calculate the loss value of each target based on the true value information and the predicted information of each target; wherein, the loss value of any target includes: the position loss value of the target calculated based on the difference between the true value position and the predicted position of the target, and the category loss value of the target calculated based on the difference between the true value category and the predicted category of the target; The acquisition module is used to obtain the weight corresponding to each target based on the loss value of each target. The loss value within a set range is positively correlated with the weight. The weight corresponding to any target includes: the weight of the position loss value for that target and the weight of the category loss value for that target. The weighted calculation module is used to perform weighted calculation on the loss value of each target based on the weight corresponding to each target, so as to obtain the loss value of the sample image; wherein, the loss value of the sample image is: the loss value determined based on the weighted position loss value and the weighted category loss value of each target; The adjustment module is used to adjust the model parameters of the target detection model based on the loss value of the sample image.

9. A target detection device, characterized in that, The device includes: The second acquisition module is used to acquire the image to be detected; The second input module is used to input the image to be detected into the target detection model for target detection and obtain the target detection result; wherein the target detection model is trained based on the model training device described in claim 8, and the target detection result includes: the location information and category information of the target in the image to be detected.

10. An electronic device, characterized in that, It includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus; Memory, used to store computer programs; A processor, when executing a program stored in memory, implements the method of any one of claims 1-6 or 7.