Image recognition and neural network model training method, device and system

A neural network model and image recognition technology, applied in the field of image processing, can solve the problem of unbalanced performance of traditional models, and achieve the effect of balancing image recognition performance

Active Publication Date: 2019-08-30
MEGVII BEIJINGTECH CO LTD
8 Cites 14 Cited by

AI-Extracted Technical Summary

Problems solved by technology

In short, for different data sets, the traditiona...
View more

Method used

In a word, the present embodiment has carried out distance transformation processing to reference image feature by simple distance transformation coefficient, and distance transformation coefficient can be obtained by inputting reference image characteristic calculation in the distance transformation network of neural network model, because neural The parameters in the network model are continuously trained, so correspondingly, the trained neural network model can extract the reference image features for the input image to be recognized, and calculate the appropriate distance transformation coefficient according to the reference image features, so that after Distance transformation coefficient The target image features obtained by distance transformation processing can satisfy: the constraint of the dataset feature distance corresponding to the dataset to which the image to be processed belongs, so that for multiple images to be processed from different datasets, different datasets can be realized Balanced control of image recognition performance.
In the present embodiment, carry out normalization process to reference image feature before calculating distance transformation coefficient, make the modulus value of different reference image features equal, carry out by the reference image feature calculation after normalization process and obtain distance transformation coefficient, so that the distance transformation coefficient is only related to the azimuth angle of the reference image feature, then when the azimuth angles of the reference image features corresponding to any two data sets are close, the distance transformation coefficients corresponding to the two data sets are c...
View more

Abstract

The invention relates to an image recognition and neural network model training method, device and system and a readable storage medium. The method comprises the steps of obtaining a to-be-identifiedimage; inputting the to-be-identified image into a neural network model, and outputting a target image feature of the to-be-identified image, wherein the neural network model is trained based on eachsample image belonging to a plurality of training data sets, a difference value between data set characteristic distances corresponding to any two training data sets is smaller than a preset thresholdvalue, and the data set feature distance is an inter-data set class feature distance or an intra-data set class feature distance; and performing image recognition processing on the target image features according to a judgment threshold corresponding to the neural network model to obtain an image recognition result of the to-be-recognized image. For different data sets, the method can show relatively balanced image recognition performance.

Application Domain

Character and pattern recognitionNeural architectures +1

Technology Topic

Computer visionTraining data sets +7

Image

  • Image recognition and neural network model training method, device and system
  • Image recognition and neural network model training method, device and system
  • Image recognition and neural network model training method, device and system

Examples

  • Experimental program(1)

Example Embodiment

[0047] In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.
[0048] The image recognition method provided in this application can be but not limited to figure 1 In the application environment shown. Among them, the photographing device 12 can obtain the image to be identified of the object to be identified, and send the image to be identified to the computer device 11; the computer device 11 can extract the target image feature from the image to be identified, and perform image processing based on the target image feature. Image recognition processing such as verification, image search, and image clustering. Among them, the computer device 11 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, servers, and the like.
[0049] In one embodiment, such as Figure 2a As shown, an image recognition method is provided, which is applied to figure 1 As an example, the computer equipment in the description includes the following steps:
[0050] S201: Acquire an image to be recognized.
[0051] The above-mentioned image to be recognized can be a to-be-recognized image received by a computer device and sent by other equipment, such as a photographing device, other computer equipment, etc., or a to-be-recognized image stored locally in the computer device, or a to-be-recognized image from other sources; The computer equipment needs to extract image features from the image to be recognized, and then recognize the image to be recognized based on the image features. The application scenarios can be but not limited to images such as identity authentication, facial recognition, image similarity comparison, etc. Identify the scenario of the task.
[0052] Among them, image recognition can include, but is not limited to: image verification (verify whether multiple target face pictures correspond to the same object), image search (find the image that is closest to the query image among multiple target images), and image clustering (use Multiple target images for classification). Objects to be identified may include, but are not limited to: people, flowers, scenes, objects, etc.
[0053] Of course, after S201, this embodiment can also perform various types of preprocessing on the image to be recognized, and then input the preprocessed image to be recognized into the neural network model; various types of preprocessing include but are not limited to the following At least one of: the processing of subtracting the mean value from the image, the image extraction of the region of interest (for example, the facial image is extracted from the image to be recognized through face recognition, and the key point area in the facial image is even further extracted Image, such as eye image, nose image, etc.), batch normalization processing, etc.; this embodiment is not limited to this.
[0054] S202: Input the image to be recognized into the neural network model, and output target image features of the image to be recognized; the neural network model is trained based on each sample image belonging to multiple training data sets, and the data set features corresponding to any two training data sets The difference between the distances is less than the preset threshold; the feature distance of the dataset is the feature distance between the datasets or the feature distance within the dataset. The feature distance between the datasets is used to represent the same dataset and different in the feature space The distance between any two feature points of a category. The feature distance within a data set class is used to characterize the distance between any two feature points that belong to the same data set and belong to the same category in the feature space.
[0055] It is important to note that the image to be recognized can come from different test data sets, but there is no need to pre-determine the data set label of the test data set corresponding to the image to be recognized; accordingly, the neural network model in this embodiment is also based on different training The data set is obtained by training. It should be noted that the data set labels of all test data sets can be included in the data set labels of all training data sets; in general, each test data set and each training data set have a one-to-one correspondence, and the corresponding test data set The same dataset label exists as the training dataset. In this embodiment, unless otherwise specified, the description of the data set may be suitable for the training data set or the test data set.
[0056] For example, when the image recognizes the corresponding person, different data sets can be data sets composed of images of different skin color races, and each data set corresponds to a skin color race (data set label), then the category can be the identity of the person (determined as a certain People), such as the yellow race data set, the white race data set, the black race data set, etc. The image recognition method of this embodiment can recognize images to be recognized of different skin color races, for example, can identify the corresponding image to be recognized Passerby. For example, when image recognition corresponds to flowers, different data sets can be data sets composed of images of different flower families, and each data set corresponds to a flower family (data set label), such as orchid family data set, rosaceae data set, chrysanthemum Family data set, Loniceraceae data set, the category can be flower varieties. The image recognition method of this embodiment can recognize images to be recognized in different flower families. For example, it can recognize that the image to be recognized corresponds to roses (belonging to Rosaceae). ).
[0057] It can be understood that from another perspective, the category can be regarded as the classification category when classifying each image in different data sets, and is a subdivision granularity level of the granularity level of the data set. In addition, it should be noted that the categories corresponding to different data sets may not have intersections; but in practical applications, there may be cases where a category can belong to one data set or another data set; in short, this implementation The example does not limit this.
[0058] The feature distance between the above data sets can represent the degree of dispersion between the feature points that belong to the same data set and belong to different categories in the feature space. It can be any two features that belong to the same data set and belong to different categories in the feature space. The distance between points, or the maximum, average, median, minimum and other feature values ​​of the distance between feature points that belong to the same dataset and belong to different categories in the feature space; feature distance between datasets The smaller the value, the smaller the degree of dispersion; the larger the feature distance between data sets, the greater the degree of dispersion. Correspondingly, the feature distance within the aforementioned data set class can represent the degree of aggregation between the feature points belonging to the same data set and belonging to the same category in the feature space, and it can be any data set belonging to the same data set and belonging to the same category in the feature space. The distance between two feature points, or the maximum, average, median, minimum and other feature values ​​of the distance between feature points that belong to the same data set and belong to the same category in the feature space; data set class The smaller the inner feature distance, the greater the degree of aggregation; the larger the feature distance within the data set, the smaller the degree of aggregation.
[0059] It is understandable that the feature points of each image feature belonging to the same category in the feature space need to tend to gather as much as possible, and the feature points of each image feature belonging to different categories need to be as scattered as possible in the feature space. Therefore, each The larger the feature distance between the data sets corresponding to each data set, the better, and the smaller the feature distance within the data set corresponding to each data set, the better. In addition, it is obvious that the feature distance between data sets corresponding to the same data set is greater than the feature distance within data sets corresponding to the same data set.
[0060] It should be noted that, for a certain data set, when the distance between two image features belonging to the data set is greater than or equal to the determination threshold, it can generally be determined that the two image features belong to different categories. ; When the distance between two image features belonging to the data set is less than the determination threshold, it can generally be determined that the two image features belong to the same category.
[0061] When the feature distance between the classes corresponding to the data set is larger, the distance between the feature points in the feature space of each image feature belonging to different categories in the data set is larger, so the two belonging to the data set and belong to different categories The probability that the distance between the image features is less than the judgment threshold is small, so the probability of judging that the two image features belong to the same category is small, that is, the false acceptance rate is small; therefore, the inter-class feature distance corresponding to the data set corresponds to the data set The false acceptance rate is negatively correlated. Correspondingly, when the within-class feature distance corresponding to the data set is smaller, the distance between the feature points in the feature space of each image feature belonging to the same category in the data set is smaller, and therefore the distance between the feature points in the feature space of the data set and the same category The probability that the distance between two image features is greater than or equal to the determination threshold is small, so the probability of determining that the two image features belong to different categories is small, that is, the false rejection rate is small; therefore, the within-class feature distance corresponding to the data set is The false rejection rate corresponding to this data set is positively correlated. Generally, the false acceptance rate and false rejection rate are relatively small. For example, the false acceptance rate for the face recognition payment scene is within 0.0001%, the false rejection rate is within 2%; the false acceptance rate for the unlocking scene is within 0.001%.
[0062] Reference Figure 2b As shown, a schematic diagram of the feature distance between the data sets and the feature distance within the data set corresponding to different data sets is shown. The reference point may be the origin in the feature space, and the multidimensional sphere may be a schematic diagram of the unit vector in the feature space. For data set A and data set B, for simple description, data set A includes three images, and the image features of these three images correspond to three feature points A1, A2, and A3 in the feature space; data set B includes three Images, the image features of these three images respectively correspond to the three feature points B1, B2, and B3 in the feature space. It should be noted that if a conventional neural network model is used to extract features from the images in the data set, the positions of the above three feature points A1, A2, A3 and the three feature points B1, B2, and B3 in the feature space may be Such as Figure 2b Shown. It is understandable that if A1 and A3 are in the same category, and A2 is another category, then the feature distance within the dataset corresponding to the A dataset can be A_1, and the feature distance between the datasets can be A_0; accordingly, if B1 It is the same category as B2, and B3 is another category, then the feature distance within the dataset corresponding to the B dataset can be B_1, and the feature distance between the datasets can be B_0. This is just a simple example, and does not actually limit the number of images and the number of categories in each dataset.
[0063] Exemplarily, when the test is performed, assuming that the determination threshold is selected as A_0, for each test image belonging to the data set A, the probability that the distance between each test image feature of the same category is greater than the determination threshold is small, that is, an error The rejection rate is low; for data set B, because the feature distance B_1 in the data set corresponding to data set B is greater than A_0, for each test image belonging to data set B, each test image feature belonging to the same category The probability that the distance between is greater than the judgment threshold is greater, that is, the false rejection rate is higher. That is, if a conventional neural network model is used, the false rejection rate is quite different for data set A and data set B. Similarly, there may be cases where the misacceptance rates for different data sets are quite different, which will not be repeated here.
[0064] If the neural network model in this embodiment is used, because the feature distance between classes corresponding to the data set is negatively correlated with the false acceptance rate corresponding to the data set, the feature distance within the class corresponding to the data set is negatively related to the false rejection rate corresponding to the data set. Positive correlation, therefore, according to actual performance requirements, when the neural network model is trained through each sample image of multiple training data sets, the training data set corresponding to the feature distance of the data set is constrained to make the trained neural network The model can implement corresponding constraints on the feature distances of the data sets corresponding to different test data sets when processing test images of different test data sets, so as to control the false acceptance rate of different data sets, and/or for different data sets Control of false rejection rate.
[0065] Among them, the preset threshold is used to constrain the difference between the feature distances of the data sets corresponding to any two training data sets, so as to realize the constraint condition that the feature distances of the data sets corresponding to each training data set are as close as possible. Therefore, the preset threshold can be based on It can be set by actual needs and experience, and it can even be set dynamically; it can be obtained by statistics of the feature distance of each training data set corresponding to the data set, for example, during training to a certain stage (such as every 50 steps, every 100 steps, etc.) Calculate the average value of the feature distance of the current data set corresponding to each training data set, and use q times the average value as the preset threshold. When the above constraints are met, the neural network model training can be considered complete; where q is a positive value less than 1. The number can be 0.1, 0.2, 0.3, 0.5, etc., for example. Similarly, the preset threshold can only be used to implement the above constraints without an exact fixed value. For the subsequent neural network model trained with the loss function including the characteristic distance transformation loss, the constraint condition can be automatically satisfied. See the following description for details. Similarly, when the variance between the feature distances of the data sets corresponding to the training data set is less than the preset variance threshold, the above constraint conditions can also be considered to be achieved.
[0066] Because the constraint condition is that the difference between the feature distances of any two training data sets corresponding to the data set is less than the preset threshold, the neural network model obtained by training can control different test data when processing test images of different test data sets The feature distance of the data set corresponding to the data set is close; correspondingly, when the feature distance of the data set is the feature distance between data sets, because the feature distance between the data sets corresponding to the data set is negatively correlated with the false acceptance rate corresponding to the data set, so The neural network model has a more balanced false acceptance rate for different data sets; accordingly, when the feature distance of the data set is the feature distance within the data set, because the within-class feature distance corresponding to the data set is positive for the false rejection rate corresponding to the data set Related, so the neural network model has a more balanced false rejection rate for different data sets.
[0067] In particular, if the data set feature distance corresponding to each data set is calculated in the same way, when the data set feature distance is the feature distance between data sets, the false acceptance rate corresponding to each data set is close or equal, when the data set feature distance is When the feature distance within the data set class, the false rejection rate corresponding to each data set is close or equal; that is, the performance of the neural network model for different data sets is very balanced.
[0068] Of course, the performance of the neural network model of this embodiment can also be measured by other indicators except the false rejection rate and the false acceptance rate, but correspondingly, other indicators are also related to the feature distance of the data set, so this embodiment is also applicable. So I won't repeat it.
[0069] Among them, the neural network model can perform feature extraction processing on the image to be recognized to extract the target image features. The target image features can be in the form of tensor, matrix or vector; the neural network model can be any neural network capable of feature extraction processing , Such as VGG (Visual Geometry Group Network), Res Net (residual neural network), Mobile Net (a lightweight convolutional neural network based on deep separable convolution), MobileNet_v2 (for Mobile Net Improved lightweight convolutional neural network), Shuffle Net (shuffle network), etc.
[0070] In an embodiment, the loss function of the neural network model may include: the loss between the feature distances of the data sets corresponding to each training data set; for example, the loss between the feature distances of the data sets corresponding to each training data set It is the variance of the feature distance of the data set corresponding to each training data set; based on the loss function, the training of the neural network model can be realized. Of course, in order to improve the accuracy of the extracted image features, generally, the loss function may also include classification loss, triple loss and other losses, which will not be repeated here.
[0071] S203: Perform image recognition processing on the features of the target image according to the judgment threshold value corresponding to the neural network model to obtain an image recognition result of the image to be recognized.
[0072] In the testing process, different undetermined decision thresholds can be used to obtain performance indicators such as false acceptance rate and false rejection rate tested in the testing process for different test data sets, and a unified decision threshold that meets the performance requirements is selected. It is understandable that if different judgment thresholds are selected through different data sets, then suitable judgment thresholds need to be selected in the testing process for different data sets, and in the process of use, it is necessary to increase the identification of which data set the image to be processed belongs to. In the processing process, the neural network model also needs to increase the corresponding processing logic, which will increase the workload and bring a lot of inconvenience.
[0073] Exemplarily, different types of image recognition processing tasks can be performed based on the extracted target image features. The image verification task is equivalent to calculating whether the distance between the corresponding points of multiple images in the feature space in the feature space is less than the judgment threshold, for example, is the distance between the target image feature of the image to be recognized and the base library image feature less than the judgment threshold ; The image recognition task is equivalent to finding the closest point from the corresponding point of the query image from the corresponding points of multiple images in the feature space, for example, finding the target image feature of the image to be recognized from the image features of multiple base images The base library image corresponding to the closest image feature; the image clustering task is equivalent to using a clustering algorithm such as the k-means method to cluster the points in the feature space, for example, multiple images corresponding to multiple images to be recognized The target image features are clustered, and the classification category of the image to be recognized is the classification category of the target image feature of the image to be recognized; it is also possible to divide a plurality of images to be recognized whose distances between each other are less than the threshold value during clustering into one class. Of course, the image features of the above-mentioned image can all be extracted from the image by using the neural network model of this embodiment.
[0074] In short, in the image recognition method of this embodiment, because the constraint condition of the neural network model is that the difference between the feature distances of any two training data sets corresponding to the data set is less than the preset threshold, the neural network model obtained by training can be When processing the test images of different test data sets, the effect of controlling the feature distances of the data sets corresponding to different test data sets is realized; accordingly, when the feature distance of the data set is the feature distance between the data sets, because the data set corresponds to the class The feature distance between the data sets is negatively correlated with the false acceptance rate corresponding to the data set. Therefore, the false acceptance rate of the neural network model is more balanced for different data sets; accordingly, when the feature distance of the data set is the feature distance within the data set, because the data set The corresponding feature distance within the class is positively correlated with the false rejection rate corresponding to the data set, so the false rejection rate of the neural network model for different data sets is more balanced. In particular, if the data set feature distance corresponding to each data set is calculated in the same way, when the data set feature distance is the feature distance between data sets, the false acceptance rate corresponding to each data set is close or equal, when the data set feature distance is When the feature distance within the data set class, the false rejection rate corresponding to each data set is close or equal; that is, the performance of the neural network model for different data sets is very balanced. In short, for different data sets, the image recognition method of this embodiment can exhibit a relatively balanced image recognition performance.
[0075] Reference Figure 3a As shown, taking the neural network model including the feature extraction network and the distance transformation network as an example, a specific process of the neural network model extracting the target image features from the image to be recognized is shown, that is, S202 may include:
[0076] S301: Perform feature extraction processing on the input feature extraction network of the image to be recognized to obtain a reference image feature of the image to be recognized.
[0077] The idea of ​​this embodiment is to perform feature extraction processing on the image to be identified based on the feature extraction network to obtain the reference image feature, and then perform distance transformation processing on the reference image feature, so that the target image feature obtained after the distance transformation processing can satisfy: this embodiment Constraints on the feature distance of the dataset corresponding to the dataset to which the image to be processed belongs.
[0078] S302: Input the reference image feature into the distance transformation network, calculate the distance transformation coefficient corresponding to the reference image feature, perform distance transformation processing on the reference image feature according to the distance transformation coefficient, and output the target image feature obtained after the distance transformation processing.
[0079] Exemplarily refer to Figure 3b As shown, a schematic structural diagram of a network G (feature extraction network) is shown. The network G may include: at least one convolutional layer (CNN) and at least one fully connected layer (FC). The convolutional layer can respond to the input Recognize the image for convolution processing to extract image features of different depth levels. The image features of different depth levels can at least be expressed in the form of (C, H, W) three-dimensional data, where C is the number of image channels, H is the pixel height, W is the pixel width; the fully connected layer can perform fully connected processing on image features of different depth levels to obtain reference image features. Correspondingly, the network G can be obtained by training based on various sample images of different training data sets, and can be used as the structure of the neural network model in S201-S203.
[0080] In this embodiment, refer to Figure 3c As shown, a schematic structural diagram of the neural network model of this embodiment is shown, and a network D (distance transformation network) is added to the network G; it is understandable that the network G can implement the feature extraction process of S301, The input of the network D is the output of the network G (that is, the reference image feature), which can realize the distance transformation process of S302. Exemplarily, the network D may include: a fully connected layer and a distance transformation processing layer; wherein the fully connected layer corresponds to a distance function, and can perform fully connected processing on the reference image features input to the fully connected layer, and output the reference image feature corresponding to the Distance transform coefficient; the input of the distance transform processing layer is the reference image feature and the distance transform coefficient corresponding to the reference image feature, and the output is the product result of the reference image feature and the distance transform coefficient (equivalent to the reference image feature Scaling processing), as the target image feature obtained after distance transformation processing.
[0081] Of course, the calculation of the above-mentioned distance transform coefficient can also be carried out in the following manner: according to the above-mentioned reference image characteristics, the amount of change in the scaling ratio is calculated; and the amount of change in the above-mentioned scaling ratio is added to 1 to obtain the distance transform coefficient. In addition, in practical applications, the distance function is not limited to using a single fully connected layer to achieve, it can also use multiple fully connected layers, or use convolutional layer + fully connected layer to achieve, or use sparse connection layer or other the way. In short, this embodiment does not limit this.
[0082] Reference Figure 3d Shown in the above Figure 2b On the basis of, the feature points in the feature space of the reference image features such as A1, A2, A3, B1, B2, and B3 are also shown; for simple description, assume that the distance transform coefficients corresponding to A1, A2, and A3 are all 1. Therefore, the positions of A1, A2, and A3 in the feature space have not changed after the distance transformation processing; while the corresponding distance transformation coefficients of B1, B2, and B3 are less than 1, so the position in the feature space after the distance transformation processing has changed , As shown in the figure B1', B2', B3'. Therefore, if the constraint condition is: the difference between the feature distances in the data set corresponding to any two training data sets is less than the preset threshold, then after the distance transformation processing, the feature distance in the data set corresponding to the data set B is B_1 'Close to A_1, the feature distance B_1 in the data set corresponding to data set B must be less than A_0; therefore, when the judgment threshold is also selected as A_0, for data set B, because the data set corresponding to data set B The intra-class feature distance B_1' is less than A_0. Therefore, for each test image belonging to the data set B, the probability that the distance between each test image feature belonging to the same category is greater than the determination threshold is small, that is, the false rejection rate is also low. That is, when the same judgment threshold is used, the false rejection rate of the neural network model for data set A and data set B is also low.
[0083] In short, this embodiment uses a simple distance transformation coefficient to perform distance transformation processing on the reference image features, and the distance transformation coefficients can be calculated by inputting the reference image features into the distance transformation network of the neural network model, because the neural network model The parameters of is obtained through continuous training, so correspondingly, the trained neural network model can extract the reference image characteristics of the input image to be recognized, and calculate the appropriate distance transformation coefficient according to the reference image characteristics, so that the distance transformation coefficient The feature of the target image obtained by the distance transformation process can meet the constraint of the feature distance of the data set corresponding to the data set to which the image to be processed belongs, so that image recognition for different data sets can be realized for multiple images to be processed from different data sets Balanced control of performance.
[0084] It should be noted that, in this embodiment, the distance transform coefficient is equivalent to a function of the reference image feature, and the target image feature is the product of the reference image feature and the distance transform coefficient, so the target image feature is equivalent to a function of the reference image feature. The reference image feature can be expressed as a multi-dimensional vector with a certain modulus and azimuth angle, and the target image feature is reflected in the position of the feature point in the feature space, so the distance transform coefficient is related to the modulus and azimuth angle of the reference image feature. The position of the feature point in the feature space is related to the modulus and azimuth angle of the reference image feature.
[0085] However, refer to Figure 4a As shown, when each feature point of data set A and each feature point of data set B are located close to each other in the feature space, that is, when the azimuth angles of each reference image feature corresponding to data set A and data set B are close; generally , The modulus of each reference image feature corresponding to data set A and the modulus of each reference image feature corresponding to data set B are generally different, so there is such a scenario: each feature point of data set A and each of data set B The position of the feature point is close in the feature space, but the distance from the reference point is large, which can be distinguished. However, because the distance transform coefficients are related to the modulus and azimuth angle of the reference image feature, the transform coefficients corresponding to data set A are different from the distance transform coefficients corresponding to data set B. Therefore, after the distance transform processing, it may There are scenarios in which each feature point of the data set A and each feature point of the data set B are close to each other in the feature space, and the distance from the reference point is small, which is difficult to distinguish. Such as Figure 4a For the sake of simple description, assume that the distance transform coefficients corresponding to data set A are all 1, and the distance transform coefficients corresponding to data set B are all less than 1. It is possible that after the distance transform processing, the data set A in the feature space The feature points A1, A2, A3 and the feature points B1', B2', and B3' of the data set B in the feature space are mixed together, and even the distance between A2 and B3' is less than A_1 and less than B_1', these are not consistent In reality, because A2 and B3' belong to different data sets and belong to different categories, it may lead to incorrect recognition in the subsequent image recognition process.
[0086] Reference Figure 4b As shown, in order to avoid the occurrence of the aforementioned disadvantages, the reference image feature may be normalized before calculating the distance transform coefficient. Taking the distance transform network including at least one fully connected layer as an example, specifically, the above S302 may include:
[0087] S401: Perform normalization processing on the reference image feature to obtain the reference image feature after the normalization process.
[0088] Exemplarily, the reference image feature may be in the form of a multi-dimensional vector, and the normalization process may include: calculating the modulus of the multi-dimensional vector corresponding to the reference image feature; calculating the multi-dimensional vector corresponding to the reference image feature and the modulus Quotient, as the reference image feature after normalization.
[0089] S402: Perform full connection processing on the normalized reference image feature input at least one fully connected layer to obtain a distance transform coefficient corresponding to the reference image feature.
[0090] It can be understood that the modulus values ​​of the reference image features after the normalization process are all equal, so the distance transform coefficient calculated according to the reference image features after the normalization process is only related to the azimuth angle of the reference image feature. For the rest, refer to the description of S302, which will not be repeated here. Among them, the fully connected layer used to calculate the distance transform coefficients in the distance transformation network can be one layer or multiple layers; in practical applications, the distance transformation network can also include an activation layer, etc., which can be before inputting the fully connected layer, First, perform activation processing on the normalized reference image features, and then input the result of the activation processing into the fully connected layer; in short, this embodiment is not limited to this.
[0091] S403: Perform distance transformation processing on the reference image feature before normalization processing according to the distance transformation coefficient, to obtain the target image feature obtained after the distance transformation processing.
[0092] Reference Figure 4c As shown, when the azimuth angles of each reference image feature corresponding to data set A and data set B are close, the distance transform coefficients corresponding to data set A and data set B are close, then after distance transformation processing, data set A and data Each feature point of set B is equivalent to scaling in the feature space. For datasets A and B that are originally far apart, after the distance transformation processing, the data set A in the feature space The feature points A1', A2', A3' and the feature points B1', B2', and B3' of the data set B in the feature space still have a large distance, and they will not be mixed together after the distance transformation processing. For the rest, please refer to the description of S302, which will not be repeated here.
[0093] In this embodiment, before calculating the distance transform coefficients, the reference image features are normalized to make the modulus of different reference image features equal, and the reference image features after normalization are calculated to obtain the distance transform coefficients, so that The distance transform coefficient is only related to the azimuth angle of the reference image feature. When the azimuth angle of each reference image feature corresponding to any two data sets is close, the distance transform coefficients corresponding to these two data sets are close, so in the distance transform processing Later, the feature points of the two data sets are equivalent to the same scale scaling in the feature space, and the feature points of the two data sets that are originally far apart will not be mixed together after the distance transformation processing. Therefore, the above-mentioned disadvantages are avoided and the stability of the image recognition method is improved.
[0094] Understandably, for the above Figure 3a with Figure 4b In the two embodiments shown, the neural network model can be obtained by training using a loss function including the loss between the feature distances of the data sets corresponding to each training data set. Similarly, another embodiment is disclosed herein, and the neural network model can be obtained by training using a loss function including the characteristic distance transformation loss. Exemplarily, in one embodiment, the weighted sum of the feature extraction loss and the feature distance transformation loss can be used as the loss function of the neural network model to train the entire neural network model; in one embodiment, the feature extraction network can It has been pre-trained, and then the loss function including the feature distance transformation loss is used to train the distance transformation network. For example, the feature extraction loss can be used to train the feature extraction network. After the feature extraction network training is completed, you can use The loss function including the feature distance transformation loss is used to train the distance transformation network. Among them, the feature extraction loss may include at least one of the following: classification loss, triple loss, or other loss.
[0095] In short, the distance transformation network can also be obtained by training the loss function including the characteristic distance transformation loss. For example, the distance transformation network is trained by feature distance transformation loss. The feature distance transformation loss is the loss between the expected feature distance and the transformation feature distance of each reference image feature. The expected feature distance is the reference value of the transformation feature distance, and the transformation feature distance of the reference image feature is the sample feature distance and distance of the reference image feature. The product result of the transform coefficients; the reference image feature is extracted from the sample image by the feature extraction network of the neural network model.
[0096] Among them, the data set feature distance is related to the sample feature distance of each reference image feature belonging to the same data set; among them, the sample feature distance is the feature distance between the sample classes or the feature distance within the sample class; the feature distance between the sample classes, the characterization and the reference image The distance between other reference image features that belong to the same data set and belong to different categories in the feature space and the reference image feature; the feature distance within the sample class, which represents other reference images that belong to the same data set and belong to the same category as the reference image feature The distance between the feature in the feature space and the feature of the reference image.
[0097] Similarly, for a certain reference image feature, the inter-class reference image feature of the reference image feature can represent other reference image features that belong to the same data set as the reference image feature and belong to different categories, so the class of the reference image feature The inter-sample feature distance can characterize the degree of dispersion between the reference image feature and the reference image feature of the reference image feature in the feature space between the feature points, and can be any inter-class reference between the reference image feature and the reference image feature The distance between the feature points of the image feature, or the maximum, average, median, minimum and other features of the distance between the reference image feature and the feature points of the reference image feature between all classes of the reference image feature value. Correspondingly, the intra-class reference image feature of the reference image feature can represent other reference image features that belong to the same data set and belong to the same category as the reference image feature, so the intra-class sample feature distance of the reference image feature can represent the reference image The clustering degree between the feature and the reference image feature of the reference image feature in the feature point in the feature space can be the distance between the reference image feature and the feature point of any reference image feature in the reference image feature , Or feature values ​​such as the maximum, average, median, and minimum distances between the reference image feature and the feature points of the reference image features in all classes of the reference image feature.
[0098] It is understandable that, for a certain data set, if the inter-class sample feature distance of each reference image feature in the data set is smaller, the data set corresponding to the data set has a smaller inter-class feature distance; if the data set The larger the inter-class sample feature distance of each reference image feature in the data set, the larger the inter-class feature distance of the data set corresponding to the data set. Correspondingly, if the intra-class sample feature distance of each reference image feature in the data set is smaller, the smaller the intra-class feature distance of the data set corresponding to the data set is; if the intra-class sample feature of each reference image feature in the data set The larger the distance, the larger the feature distance within the data set corresponding to the data set. That is to say, this embodiment is equivalent to adopting the constraint on the sample feature distance corresponding to each reference image feature in each data set to realize the constraint on the feature distance of the data set corresponding to each data set; obviously, the feature distance between data sets is the same as The inter-sample feature distances of the reference image features belonging to the same data set are related, and the intra-data set feature distances are related to the intra-sample feature distances of each reference image feature belonging to the same data set. Because the transformed feature distance of each reference image feature is equivalent to the sample feature distance of each reference image feature after distance transformation processing, it also has the above characteristics.
[0099] It should be noted that the aforementioned desired feature distance is the desired sample feature distance of each reference image feature after distance transformation processing, and is a reference value, so it can be any determined value, or even zero.
[0100] When the loss function decreases during the training process, that is, the feature distance transformation loss decreases, the transformation feature distance of each reference image feature and the expected feature distance tend to be close, because the expected feature distance is the same for each reference image feature, so each reference The transformation feature distance of image features tends to be close, that is, the sample feature distance of each reference image feature after the distance transformation process tends to be close. Because each reference image feature belongs to different training data sets, the data set feature distances corresponding to different training data sets It tends to be close, so for different data sets, the image recognition method of this embodiment can also show a more balanced image recognition performance.
[0101] Accordingly, refer to Figure 5 As shown, for the neural network model obtained by training the loss function including the feature distance transformation loss, the training method can be:
[0102] S501: Obtain each sample image belonging to different training data sets; the sample images are marked with a category label and a data set label;
[0103] S502. Input each sample image into the feature extraction network of the initial neural network model for feature extraction processing to obtain the reference image feature of each sample image; and input the reference image feature of each sample image into the distance transformation network of the initial neural network model, and calculate The distance transform coefficient of each reference image feature;
[0104] S503: According to each reference image feature, the distance transform coefficient, category label, and data set label corresponding to each reference image feature, and the threshold alignment strategy, calculate the feature distance transformation loss, and calculate the initial neural network model according to the feature distance transformation loss The value of the loss function;
[0105] S504: According to the value of the loss function, the parameters to be trained of the initial neural network model are adjusted to obtain the neural network model; the parameters to be trained include the parameters in the distance transformation network.
[0106] The loss function of the neural network model in this embodiment is the above-mentioned characteristic distance transformation loss, and can also include other losses, such as classification-based cross-entropy loss, triple loss, etc., and then the initial neural network model is calculated according to the gradient descent method. The parameters to be trained for training include but are not limited to parameters such as the convolution kernel in the convolution layer and the weight in the fully connected layer.
[0107] It is understandable that the feature extraction network can be pre-trained, so only the distance transformation network needs to be trained; that is, the feature distance transformation loss can be used as the value of the loss function of the initial neural network model; according to the value of the loss function, Adjust the training parameters of the distance transformation network to obtain the neural network model. In this way, there is no need to perform joint training on the feature extraction network and the distance transformation network, which can reduce training complexity, improve training efficiency, and reduce the number of sample images required for different training data sets, so that fewer sample images can be used to implement the training process.
[0108] For part of the description of the above steps, refer to the above description, and the following examples are used for description. Exemplarily, taking the recognition object as a face image as an example, suppose that there are 400 sample images of different training data sets, numbered 1 to 400, specifically including: 200 people of the yellow race data set numbered 1 to 200 Face images. The dataset labels of these 200 face images are all yellow people, and the face images numbered 1-50 are Zhang San's face images, and the category labels of the face images numbered 1-50 are all Zhang San. , The category labels and data set labels corresponding to different numbers are shown in Table 1 below:
[0109] Numbering Data set label Category label 1~50 Asian Zhang San 51~100 Asian Li Si 101~150 Asian Wang Wu 151~200 Asian Zhao Liu 201~250 White people James 251~300 White people Green 301~350 black people Smith 351~400 black people Mandela
[0110] Optionally, the aforementioned desired feature distance may be a dynamic transformation value, which is the average value (may be referred to as the mean value) of the sample feature distances of each reference image feature. This can significantly reduce the feature distance transformation loss during the training process, thereby helping training In the process, the neural network model converges to improve training efficiency; specifically, the above S503 may include: according to each reference image feature, the category label and the data set label corresponding to each reference image feature, and the threshold alignment strategy, calculating each The sample feature distance of each reference image feature; calculate the product of the sample feature distance of each reference image feature and the distance transformation coefficient, and use the product result as the transformation feature distance of each reference image feature; calculate the sample feature distance of each reference image feature The mean value is used as the expected feature distance; the feature distance transformation loss is determined according to the loss between the expected feature distance and the transformation feature distance of each reference image feature.
[0111] Specifically, the characteristic distance transformation loss L can be calculated by using the following relational expression or a variation of the relational expression:
[0112]
[0113] Among them, N is the total number of sample images, x i Is the reference image feature of the i-th sample image, F(x i ) Is the distance transform coefficient of the reference image feature of the i-th sample image, R i Is the sample feature distance of the reference image feature of the i-th sample image, R c Is the desired characteristic distance.
[0114] In the above relationship, the feature distance transformation loss can be the average value of the difference between the desired feature distance and the transformation feature distance of each reference image feature; if the training data set shown in Table 1 is used, then N=400, and it needs to For each sample image numbered from 1 to 400, the absolute value of the difference between the transformed feature distance and the expected feature distance of the reference image feature corresponding to each sample image is calculated, and the mean value is obtained. Of course, the above relational expression for calculating the feature distance transformation loss L is only an example, as long as the relational expression that can reach the minimum value when the expected feature distance and the transformation feature distance of each reference image feature are the same can be used as the calculation feature distance transformation The relationship of loss. In practical applications, the sum of the difference between the transformation feature distances of the reference image features can also be used as the feature distance transformation loss, and the variance of the transformation feature distances of the reference image features can also be used directly as the feature distance transformation loss . In short, this embodiment does not limit this.
[0115] As mentioned earlier, different constraints can be imposed on the feature distances of the data sets corresponding to the training data sets during training to achieve the corresponding constraints on the feature distances of the data sets corresponding to different test data sets, so as to realize the false acceptance of different data sets Rate control, and/or control of the false rejection rate of different data sets; and in this embodiment, different constraints can be imposed on the sample feature distance of the reference image feature corresponding to each training data set during training. Realize the control of the false acceptance rate of different data sets, and/or the control of the false rejection rate of different data sets.
[0116] It is emphasized that different constraints on the sample feature distance of the reference image features corresponding to each training data set can be embodied in different judgment threshold alignment strategies, and different judgment threshold alignment strategies correspond to different sample feature distance calculation methods. The decision threshold alignment strategy may at least include: a false acceptance rate alignment strategy and a false rejection rate alignment strategy. Among them, the mis-acceptance rate alignment strategy can realize the control of the neural network model's performance balance of the mis-acceptance rate for different data sets. Specifically, the mis-acceptance rate alignment strategy can be proportional to the mis-acceptance rate. The rate is W times the false acceptance rate of data set B, and W can be any positive number. In particular, when W=1, the false acceptance rate of data set A is equal to the false acceptance rate of data set B. Among them, the false rejection rate alignment strategy can achieve the performance balance of the false rejection rate of the control neural network model for different data sets. Specifically, the false rejection rate alignment strategy can be proportional to the false rejection rate, such as the false rejection of the data set A. The rate is V times the false rejection rate of data set B, and V can also be any positive number. In particular, when V=1, the false rejection rate of data set A is equal to the false rejection rate of data set B.
[0117] Exemplarily refer to Image 6 As shown, when the threshold alignment strategy is judged to be the false acceptance rate alignment strategy, the sample feature distance is the feature distance between sample classes. For a certain reference image feature, the sample feature distance of the reference image feature can be calculated as follows:
[0118] S601: For each reference image feature, determine multiple inter-class reference image features of the reference image feature, and calculate the distance in the feature space between the reference image feature and the multiple inter-class reference image features; The reference image features belong to the same data set and belong to different categories.
[0119] For example, for the reference image feature of the sample image numbered 81, the reference image features of the 150 sample images numbered 1-50 and 101-200 are all inter-class reference image features. Wherein, the distance between the reference image feature and the multiple inter-class reference image features in the feature space may be the L1 norm, the L2 norm, etc. between the multidimensional vectors corresponding to each image feature.
[0120] S602: Sort the distances in the feature space between the reference image feature and the multiple inter-class reference image features from small to large, and determine the inter-class feature distance of the reference image feature according to the ranking.
[0121] For example, the value of a distance in the top order is selected as the feature distance between the sample classes of the reference image feature. For example, the value of the m-th smallest distance (ranked as m) can be selected, and m can be any positive integer. In particular, m=1, for different reference image features, m can be the same or different. Or, select the average value of multiple distances in the top order as the feature distance between the sample classes of the reference image feature. For example, you can select the values ​​of the top M distances in the order and set the value of the top M distances in the order. The average value of the value is used as the feature distance between the sample classes of the reference image feature, and M can be any positive integer greater than 1. For different reference image features, M can be the same or different.
[0122] Using the method corresponding to the misacceptance rate alignment strategy of this embodiment to calculate the feature distance between sample classes can control the performance balance of the false acceptance rate of the neural network model for different data sets; in particular, when the reference image features of different data sets are When the feature distances between sample classes are calculated in the same manner, for example, for the reference image features of different data sets, if the above m or M are equal, the false acceptance rate of the neural network model for different data sets can be controlled to be equal.
[0123] Further, when the mis-acceptance rate alignment strategy is: when the mis-acceptance rate of data set A is W times the mis-acceptance rate of data set B, the value of the distance sorted as mW can be selected for the reference image features belonging to data set A The feature distance between the sample classes as the reference image feature; for the reference image features belonging to the data set B, the value of the distance ranked m is selected as the feature distance between the sample classes of the reference image feature. Among them, m is any positive integer, and W is any positive number. In particular, when mW is not an integer, the value of the distance ranked as mW can be estimated by interpolation.
[0124] Because the feature distance between the sample classes of the reference image feature belonging to the data set A selects the value of the mW-th smallest distance, that is, the feature distance between the sample classes ranked in the first mW is less than or equal to the selected feature distance between the sample classes ; The feature distance between the sample classes of the reference image feature belonging to the data set B is the value of the m-th smallest distance, that is, the feature distance between the first m sample classes is less than or equal to the selected feature distance between the sample classes . Correspondingly, in a large number of tests, for a certain determination threshold, if in all the values ​​of the distance between two image features belonging to the data set A and belonging to different categories, the determination threshold is the kWth smaller That is, the values ​​of the distances ranked in the top kW are all less than or equal to the judgment threshold, then among all the values ​​of the distance between two image features belonging to data set B and belonging to different categories, the judgment threshold is the first The value of k is small, that is, the value of the distance of the first k in the ranking is less than or equal to the judgment threshold; therefore, the probability that the distance between two image features belonging to data set A and belonging to different categories is less than the judgment threshold is The distance between two image features belonging to data set B and belonging to different categories is less than W times the probability of the determination threshold, that is, the false acceptance rate of data set A is W times the false acceptance rate of data set B.
[0125] Furthermore, in order to improve the stability of the feature distance between the sample classes and further improve the stability of the control of the false acceptance rate, the reference image features belonging to the data set A can be selected from mW-d/2 to mW+d. The average of multiple distances of /2 is used as the feature distance between sample classes of the reference image feature; for the reference image features belonging to data set B, select the average of multiple distances in the order of md/2 to m+d/2 Value, which is the feature distance between sample classes of the reference image feature. d can be a non-zero even number. For example, when d=4, the distance between mW-2 and mW+2 of the ranking is the average of 5 distances centered on the distance corresponding to the ranking mW, which is the above-mentioned reference image feature belonging to the data set A The feature distance between the sample classes; similarly, the distance between rank m-2 and m+2 is the average of the 5 distances centered on the distance corresponding to rank m, which is the sample class of the above-mentioned reference image feature belonging to data set B Feature distance between.
[0126] It should be noted that when the threshold alignment strategy is judged to be the false acceptance rate alignment strategy, and the target false acceptance rate corresponding to each data set is less than the preset false acceptance rate threshold, that is, when the target false acceptance rate is extremely high, in order to achieve the target For precise control of the false acceptance rate, the feature distance between the sample classes of each reference image feature can be calculated in the following way: for each reference image feature in each data set, the reference is determined from each reference image feature belonging to the data set Multiple inter-class reference image features of image features, and calculate the distance between the reference image feature and multiple inter-class reference image features in the feature space; the inter-class reference image feature and the reference image feature belong to different categories; all for each data set Reference image features, sort the distances between each reference image feature and the corresponding multiple inter-class reference image features in the feature space, from small to large, and count the number of rankings corresponding to each data set; for each data set Calculate the product of the target false acceptance rate and the number of rankings to obtain the product result, and select the value of the distance matching the ranking and the product result as the feature distance between the sample classes of each reference image feature in the data set. Exemplarily, the preset false acceptance rate threshold may be 0.01%.
[0127] It is understandable that the above embodiment is equivalent to calculating the feature distance between the sample classes of each data set, and the feature distance between the sample classes of each data set is between all the reference image features of the data set and the inter-class reference image features. The ranking in the distance is: the product result of the target false acceptance rate and the number of rankings. Exemplarily, if the target false acceptance rate is 0.001%, and the number of rankings is one million, the product of the target false acceptance rate and the number of rankings is 10, that is, between all the reference image features of the A data set and the inter-class reference image features Among the distances, the probability that any distance is less than or equal to the feature distance between the sample classes of the A data set is 10/million, that is, 0.001%. When a large number of tests are performed, for example, the feature distance between the samples of the A data set can be selected As the judgment threshold, among all the values ​​of the distance between two image features belonging to data set A and belonging to different categories, the probability that any distance is less than or equal to the feature distance between sample classes of the data set A approaches 0.001%, That is, the false acceptance rate of data set A is close to the target false acceptance rate. For other data sets, for the same reason, the acceptance rate of other data sets is also close to the target false acceptance rate.
[0128] Exemplarily refer to Figure 7 As shown, when the threshold alignment strategy is determined to be the false rejection rate alignment strategy, the sample feature distance is the feature distance within the sample class, and for a certain reference image feature, the sample feature distance of the reference image feature can be calculated as follows:
[0129] S701: For each reference image feature, determine multiple intra-class reference image features of the reference image feature, and calculate the distance in the feature space between the reference image feature and the multiple intra-class reference image features; The reference image features belong to the same data set and belong to the same category.
[0130] For example, for the reference image features of the sample image numbered 81, the reference image features of the 50 sample images numbered 51 to 100 are all reference image features within their class. Wherein, the distance in the feature space between the reference image feature and the multiple reference image features within the class may be the L1 norm, the L2 norm, etc. between the multidimensional vectors corresponding to each image feature.
[0131] S702: Sort the distances in the feature space between the reference image feature and the multiple reference image features in the class from large to small; and determine the feature distance within the sample class of the reference image feature according to the ranking.
[0132] For example, select the value of a distance ranked higher as the feature distance within the sample class of the reference image feature. For example, the value of the m-th largest distance (ranked as m) can be selected, and m can be any positive integer. In particular, m=1, for different reference image features, m can be the same or different. Or, select the average value of multiple distances that are ranked higher as the feature distance within the sample class of the reference image feature. For example, you can select the value of the first M distances in the ranking, and the value of the M distances that are ranked higher The average value of the value is used as the feature distance within the sample class of the reference image feature, and M can be any positive integer greater than 1, and for different reference image features, M can be the same or different.
[0133] Using the method corresponding to the false rejection rate alignment strategy of this embodiment to calculate the feature distance within the sample class can control the performance balance of the false rejection rate of the neural network model for different data sets; in particular, when the reference image features belonging to different data sets are When the calculation method of the feature distance within the sample class is the same, for example, for the reference image features of different data sets, if the above m or M are equal, the false rejection rate of the neural network model for different data sets can be controlled to be equal.
[0134] Further, when the false rejection rate alignment strategy is: the false rejection rate of data set A is V times the false rejection rate of data set B, the value of the distance sorted by nV can be selected for the reference image features belonging to data set A As the feature distance within the sample class of the reference image feature; for the reference image features belonging to the data set B, the value of the distance ranked n is selected as the feature distance within the sample class of the reference image feature. Among them, n is any positive integer, and V is any positive number. In particular, when nV is not an integer, interpolation can be used to estimate the value of the distance ranked nV.
[0135] Because the feature distance within the sample class of the reference image feature belonging to the data set A selects the value of the nVth largest distance, that is, the feature distance within the sample class ranked in the first nV is greater than or equal to the feature distance within the selected sample class , The feature distance within the sample class of the reference image feature belonging to the data set B is the value of the n-th largest distance, that is, the feature distance within the sample class ranked in the first n is greater than or equal to the feature distance within the selected sample class . Correspondingly, when a large number of tests are performed, for a certain determination threshold, if in all values ​​of the distance between two image features belonging to the data set A and belonging to the same category, the determination threshold is the pV-th largest That is, the value of the distance of the first pV sorted is greater than or equal to the judgment threshold, then among all the values ​​of the distance between two image features belonging to the data set B and belonging to the same category, the judgment threshold is the first The value of p is large, that is, the value of the distance of the first p sorted is greater than or equal to the judgment threshold. Therefore, the probability that the distance between two image features belonging to the data set A and belonging to the same category is greater than the judgment threshold is The distance between two image features belonging to the data set B and belonging to the same category is greater than V times the probability of the determination threshold, that is, the false rejection rate of the data set A is V times the false rejection rate of the data set B.
[0136] Furthermore, in order to improve the stability of the feature distance within the sample class, and further improve the stability of the control of false rejection rate, the reference image features belonging to the data set A can be selected from nV-e/2 to nV+e The average of multiple distances of /2 is used as the feature distance within the sample class of the reference image feature; for the reference image features belonging to the data set B, the average of multiple distances sorted from ne/2 to n+e/2 is selected Value, which is the feature distance within the sample class of the reference image feature. e can be a non-zero even number. For example, when e=2, the range between nV-1 and nV+1 is the average of 3 distances centered on the distance corresponding to nV, which is the above-mentioned reference image feature belonging to data set A The feature distance within the sample class; similarly, the distance between rank n-1 and n+1 is the average of 3 distances centered on the distance corresponding to rank n, which is the sample class of the above-mentioned reference image feature belonging to data set B Within the characteristic distance.
[0137] It should be noted that when the decision threshold alignment strategy is the false rejection rate alignment strategy, and the target false rejection rate corresponding to each data set is less than the preset false rejection rate threshold, that is, when the target false rejection rate is extremely high, in order to achieve the target For precise control of false rejection rate, the feature distance within the sample class of each reference image feature can be calculated in the following way: for each reference image feature in each data set, the reference is determined from each reference image feature belonging to the data set Multiple intra-class reference image features of the image feature, and calculate the distance between the reference image feature and multiple intra-class reference image features in the feature space; the intra-class reference image feature and the reference image feature belong to the same category; for all in each dataset For reference image features, sort the distances between each reference image feature and the corresponding multiple reference image features in the feature space from large to small, and calculate the number of rankings corresponding to each data set; for each data set Calculate the product of the target false rejection rate and the number of rankings to obtain the product result, and select the value of the distance matching the ranking and the product result as the feature distance within the sample class of each reference image feature in the data set. Exemplarily, the preset false rejection rate threshold may be 5%.
[0138] It can be understood that the calculation process for the feature distance within the sample class under the above-mentioned judgment threshold alignment strategy is similar to the description process for the feature distance between the sample classes, and will not be repeated here.
[0139] In addition, because in practical applications, the source of sample images is diverse, you can simply divide each sample image into base library sample image and snapshot sample image according to the resolution of the image. Among them, the resolution of the base library sample image Higher than the resolution of the captured sample image, correspondingly, the image quality is better and can better characterize object characteristics. Exemplarily, an image with a resolution higher than or equal to a preset resolution may be divided into base library sample images, and an image with a resolution lower than the preset resolution may be divided into a snapshot sample image; for example, the preset resolution may be 800×600. Generally, there are fewer sample images in the bottom library in the sample image, and more captured sample images.
[0140] In one embodiment, the reference image feature corresponds to the base library sample image; the inter-class reference image feature or the intra-class reference image feature corresponds to the captured sample image; the resolution of the base library sample image is higher than the resolution of the captured sample image.
[0141] That is to say, for any of the above-mentioned judgment threshold alignment strategies, when calculating the feature distance between sample classes, the reference image feature of the sample image of the base library can be used as the base point, and it will belong to the same data as the sample image of the base library. Set the reference image features of multiple captured sample images belonging to different categories as multiple inter-class reference image features, according to the distance between the reference image feature of the base library sample image and the reference image features of the multiple captured sample images, Calculate the feature distance between the sample classes of the reference image feature of the base library sample image. Similarly, for any of the above-mentioned judgment threshold alignment strategies, the calculation of the feature distance within the sample class is also as described above, and will not be repeated here.
[0142] In this way, this embodiment can uniformly implement the constraints on the image quality corresponding to any two reference image features involved in the calculation of the feature distance, and can use the base library sample image with better image quality as the base point to calculate the sample feature distance, compared to Using sample images of different image quality as the base point for the calculation of sample feature distances, more reasonable constraints are introduced to make the calculation results of sample feature distances more credible, improve the performance of the neural network model, and reduce the false acceptance rate. False rejection rate.
[0143] In one embodiment, such as Picture 8 As shown, a training method of neural network model is provided, including:
[0144] S801: Obtain each sample image belonging to multiple training data sets; each sample image is labeled with a category label and a data set label;
[0145] S802: Input each training sample image into the initial neural network model to obtain the reference image feature of each training sample image;
[0146] S803: Calculate the value of the loss function of the initial neural network model according to each reference image feature, and the category label and data set label corresponding to each reference image feature;
[0147] S804: According to the value of the loss function, adjust the parameters to be trained of the initial neural network model to obtain the neural network model;
[0148] Among them, when the initial neural network model training is completed, the difference between the feature distances of any two training data sets corresponding to the data set is less than the preset threshold; the feature distance of the data set is the feature distance between the data sets or the features within the data set. Distance, the feature distance between datasets is used to characterize the distance between any two feature points belonging to the same dataset and different categories in the feature space, and the feature distance within the dataset is used to characterize the distance between any two feature points belonging to the same dataset and in the feature space. The distance between any two feature points belonging to the same category in space.
[0149] It is understandable that the condition for completing the training of the initial neural network model may be that the value of the loss function is less than the preset loss function threshold, or other conditions. Exemplarily, when the loss function includes the loss between the feature distances of the data sets corresponding to each training data set, or includes the feature distance transformation loss, when the value of the loss function is less than the preset loss function threshold, the training data set corresponds to The feature distance of the data set is close, that is, the constraint condition that the difference between the feature distance of the data set corresponding to any two training data sets is less than the preset threshold is satisfied.
[0150] Optionally, taking the neural network model including a feature extraction network and a distance transformation network, and the feature extraction network has been pre-trained as an example, S802 may include: inputting each training sample image into the feature extraction network of the initial neural network model for feature extraction processing , Obtain the reference image feature of each training sample image; and input the reference image feature of each sample image into the distance transformation network of the initial neural network model, and calculate the distance transformation coefficient corresponding to each reference image feature; accordingly, S803 may include: Reference image features and distance transformation coefficients, category labels and dataset labels corresponding to each reference image feature, calculate feature distance transformation loss, and use feature distance transformation loss as the value of the loss function of the initial neural network model; S804 may include: according to loss The value of the function adjusts the training parameters of the distance transformation network to obtain the neural network model.
[0151] Among them, the feature distance transformation loss is the loss between the expected feature distance and the transformation feature distance of each reference image feature, the expected feature distance is the reference value of the transformation feature distance, and the transformation feature distance of the reference image feature is the sample feature distance of the reference image feature And the result of the product of the distance transform coefficient; the sample feature distance is the feature distance between sample classes or the feature distance within the sample class; the feature distance between sample classes is the feature of other reference image features belonging to the same data set and belonging to different categories. The distance from the reference image feature in the space; the feature distance within the sample class, which characterizes the distance between other reference image features that belong to the same data set and belong to the same category as the reference image feature in the feature space.
[0152] For the description of the training method of the above neural network model, please refer to the description of the above image recognition method, which will not be repeated here.
[0153] It should be understood that although Figure 2a The steps in the flowcharts of 3a, 4b, and 5-8 are displayed in sequence as indicated by the arrows, but these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. and, Figure 2a At least part of the steps in 3a, 4b, 5-8 can include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps Or the execution order of the stages is not necessarily carried out sequentially, but may be executed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
[0154] In one embodiment, such as Picture 9 As shown, an image recognition device is provided, including: an image acquisition module 91, a feature extraction module 92, and an image recognition module 93, wherein:
[0155] The image acquisition module 91 is used to acquire the image to be recognized;
[0156] The feature extraction module 92 is used to input the image to be recognized into the neural network model, and output the target image features of the image to be recognized; the neural network model is trained based on each sample image belonging to multiple training data sets, any two training data sets The difference between the corresponding data set feature distances is less than the preset threshold; the data set feature distance is the feature distance between data sets or the feature distance within the data set. The feature distance between data sets is used to represent the same data set and The distance between any two feature points that belong to different categories in the feature space, and the feature distance within a data set class is used to represent the distance between any two feature points that belong to the same data set and belong to the same category in the feature space;
[0157] The image recognition module 93 is used to perform image recognition processing on the target image features according to the judgment threshold corresponding to the neural network model to obtain the image recognition result of the image to be recognized.
[0158] Optionally, the neural network model includes a feature extraction network and a distance transformation network, and the feature extraction module 92 may include:
[0159] The reference feature extraction unit is configured to perform feature extraction processing on the input feature extraction network of the image to be recognized to obtain the reference image feature of the image to be recognized;
[0160] The distance transformation unit is used to input the reference image feature into the distance transformation network, calculate the distance transformation coefficient corresponding to the reference image feature, and perform distance transformation processing on the reference image feature according to the distance transformation coefficient, and output the target image feature obtained after the distance transformation processing .
[0161] Optionally, the distance transformation network includes at least one fully connected layer, and the distance transformation unit is specifically configured to perform normalization processing on the reference image features to obtain the reference image features after the normalization processing; for the reference image after the normalization processing Feature input at least one fully connected layer for fully connected processing to obtain the distance transform coefficients corresponding to the reference image features; according to the distance transform coefficients, distance transform processing is performed on the reference image features before normalization processing to obtain the target obtained after the distance transform processing Image characteristics.
[0162] Optionally, the distance transformation network is trained based on the loss function including the feature distance transformation loss; the feature distance transformation loss is the loss between the expected feature distance and the transformed feature distance of each reference image feature, and the expected feature distance is the transformation The reference value of the feature distance, the transformation feature distance of the reference image feature is the product of the sample feature distance of the reference image feature and the distance transformation coefficient; the reference image feature is extracted from the sample image by the feature extraction network of the neural network model; data set The feature distance is related to the sample feature distance of each reference image feature belonging to the same data set; among them, the sample feature distance is the feature distance between sample classes or the feature distance within the sample class; the feature distance between sample classes, the characterization and the reference image feature belong to the same data The distance between other reference image features belonging to different categories in the feature space and the reference image feature in the feature space; the feature distance within the sample class, the characterization and the reference image feature belong to the same data set and belong to the same category in the feature space The distance between the feature in the reference image.
[0163] Optionally, the characteristic distance transformation loss L is calculated using the following relational expression:
[0164]
[0165] Among them, N is the total number of sample images, x i Is the reference image feature of the i-th sample image, F(x i ) Is the distance transform coefficient of the reference image feature of the i-th sample image, R i Is the sample feature distance of the reference image feature of the i-th sample image, R c Is the desired characteristic distance.
[0166] Optionally, the device may further include: a neural network training module, and the neural network model training module may include:
[0167] The sample image acquisition unit is used to acquire each sample image belonging to different training data sets; the sample images are marked with category labels and data set labels;
[0168] The sample feature extraction unit is used to input each sample image into the feature extraction network of the initial neural network model for feature extraction processing to obtain the reference image feature of each sample image; and input the reference image feature of each sample image into the initial neural network model The distance transformation network calculates the distance transformation coefficient of each reference image feature;
[0169] The loss function calculation unit is used to calculate the feature distance transformation loss according to each reference image feature, the distance transformation coefficient, the category label and the data set label corresponding to each reference image feature, and the threshold alignment strategy to calculate the feature distance transformation loss, and the feature distance transformation loss, Calculate the value of the loss function of the initial neural network model;
[0170] The neural network training unit is used to adjust the training parameters of the initial neural network model according to the value of the loss function to obtain the neural network model; the training parameters include the parameters in the distance transformation network.
[0171] Optionally, the feature extraction network has been pre-trained, and the loss function calculation unit is also used to use the feature distance transformation loss as the value of the loss function of the initial neural network model; the neural network training unit is also used to transform the distance according to the value of the loss function The training parameters of the network are adjusted to obtain the neural network model.
[0172] Optionally, the loss function calculation unit is specifically configured to calculate the sample feature distance of each reference image feature according to each reference image feature, the category label and the data set label corresponding to each reference image feature, and the threshold alignment strategy; Calculate the product of the sample feature distance of each reference image feature and the distance transform coefficient, and use the product result as the transformation feature distance of each reference image feature; calculate the average value of the sample feature distance of each reference image feature as the expected feature distance; The loss between the feature distance and the transformed feature distance of each reference image feature determines the feature distance transformation loss.
[0173] Optionally, when it is determined that the threshold alignment strategy is the false acceptance rate alignment strategy, the loss function calculation unit is further configured to determine, for each reference image feature, multiple inter-class reference image features of the reference image feature, and calculate the reference image feature The distance from multiple inter-class reference image features in the feature space; the inter-class reference image feature and the reference image feature belong to the same data set and belong to different categories; the reference image feature and multiple inter-class reference image features are in the feature space The distance in the middle is sorted from small to large, and according to the sort, the feature distance between the sample classes of the reference image feature is determined.
[0174] Optionally, the loss function calculation unit is also used to select a value of a distance that is ranked higher, or select an average value of multiple distances that are ranked higher as the feature distance between sample classes of the reference image feature.
[0175] Optionally, the false acceptance rate alignment strategy is: when the false acceptance rate of data set A is W times the false acceptance rate of data set B, the loss function calculation unit is also used to select the order for the reference image features belonging to data set A It is the average of multiple distances from mW-d/2 to mW+d/2, or the value of the distance sorted as mW as the feature distance between sample classes of the reference image feature; where d is a non-zero even number, and m is positive Integer, W is a positive number; for reference image features belonging to data set B, select the average value of multiple distances sorted from md/2 to m+d/2, or the value of distances sorted by m as the reference image feature Feature distance between sample classes.
[0176] Optionally, the decision threshold alignment strategy is the misacceptance rate alignment strategy, and when the target misacceptance rate corresponding to each data set is less than the preset false acceptance rate threshold, the sample feature distance is the feature distance within the sample class, and the loss function calculation unit also uses For each reference image feature in each data set, among the reference image features belonging to the data set, determine the multiple inter-class reference image features of the reference image feature, and calculate the reference image feature and the multiple inter-class reference image features. The distance in the feature space; the inter-class reference image feature and the reference image feature belong to different categories; for all the reference image features in each data set, the distance between each reference image feature and the corresponding multiple inter-class reference image features in the feature space , Sort from small to large, and count the number of sorts corresponding to each data set; for each data set, calculate the product of the target false acceptance rate and the number of sorts to obtain the product result, and select the distance between the sort and the product result The value of is used as the feature distance between sample classes of each reference image feature in the data set.
[0177] Optionally, when it is determined that the threshold alignment strategy is the false rejection rate alignment strategy, the loss function calculation unit is also used for determining multiple reference image features within the reference image feature for each reference image feature, and calculating the reference image feature and the multiplicity The distance of the reference image feature within each class in the feature space; the reference image feature within the class and the reference image feature belong to the same data set and belong to the same category; the distance between the reference image feature and multiple reference image features within the class in the feature space, Sort from big to small; according to the sort, determine the feature distance within the sample class of the reference image feature.
[0178] Optionally, the loss function calculation unit is also used to select the value of a distance that is ranked higher, or select the average value of multiple distances that are ranked higher as the feature distance within the sample class of the reference image feature.
[0179] Optionally, the false rejection rate alignment strategy is: when the false rejection rate of the data set A is V times the false rejection rate of the data set B, the loss function calculation unit is also used to select the order for the reference image features belonging to the data set A It is the average value of multiple distances from nV-e/2 to nV+e/2, or the value of the distance ranked as nV as the feature distance within the sample class of the reference image feature; where e is a non-zero even number, and n is positive Integer, V is a positive number; for the reference image features belonging to data set B, select the average value of multiple distances ranked from ne/2 to n+e/2, or the value of the distance ranked as n as the reference image feature Feature distance within the sample class.
[0180] Optionally, the decision threshold alignment strategy is the false rejection rate alignment strategy, and when the target false rejection rate corresponding to each data set is less than the preset false rejection rate threshold, the sample feature distance is the feature distance within the sample class, and the loss function calculation unit also uses For each reference image feature in each data set, subordinate to each reference image feature of the data set, determine multiple intra-class reference image features of the reference image feature, and calculate the reference image feature and multiple intra-class reference image features. The distance in the feature space; the reference image feature within a class and the reference image feature belong to the same category; for all reference image features in each data set, the distance between each reference image feature and the corresponding multiple reference image features within the class in the feature space , Sort from big to small, and count the number of sorts corresponding to each data set; for each data set, calculate the product of the target false rejection rate and the number of sorts to get the product result, and select the distance between the sort and the product result The value of is used as the feature distance within the sample class of each reference image feature in the data set.
[0181] Optionally, the reference image feature corresponds to the base library sample image; the inter-class reference image feature or the intra-class reference image feature corresponds to the captured sample image; the resolution of the base library sample image is higher than the resolution of the captured sample image.
[0182] In one embodiment, such as Picture 10 As shown, a training device for a neural network model is provided, including: a sample image acquisition module 101, a sample feature extraction module 102, a loss function calculation module 103, and a neural network training module 104, wherein:
[0183] The sample image acquisition module 101 is used to acquire each sample image belonging to multiple training data sets; each sample image is labeled with a category label and a data set label;
[0184] The sample feature extraction module 102 is used to input each training sample image into the initial neural network model to obtain the reference image feature of each training sample image;
[0185] The loss function calculation module 103 is configured to calculate the value of the loss function of the initial neural network model according to each reference image feature, and the category label and data set label corresponding to each reference image feature;
[0186] The neural network training module 104 is configured to adjust the parameters to be trained of the initial neural network model according to the value of the loss function to obtain the neural network model;
[0187] Among them, when the initial neural network model training is completed, the difference between the feature distances of any two training data sets corresponding to the data set is less than the preset threshold; the feature distance of the data set is the feature distance between the data sets or the features within the data set. Distance, the feature distance between datasets is used to characterize the distance between any two feature points belonging to the same dataset and different categories in the feature space, and the feature distance within a dataset is used to characterize the distance between any two feature points belonging to the same dataset and in the feature space. The distance between any two feature points belonging to the same category in space.
[0188] Optionally, the neural network model includes a feature extraction network and a distance transformation network. The feature extraction network has been pre-trained, and the sample feature extraction module 102 may include:
[0189] The sample feature extraction unit is used to input each training sample image into the feature extraction network of the initial neural network model for feature extraction processing to obtain the reference image feature of each training sample image; and input the reference image feature of each sample image into the initial neural network The distance transformation network of the model calculates the distance transformation coefficient of each reference image feature;
[0190] Correspondingly, the loss function calculation module 103 may include:
[0191] The loss calculation unit is used to calculate the feature distance transformation loss according to each reference image feature and the distance transformation coefficient, category label and data set label corresponding to each reference image feature, and to use the feature distance transformation loss as the loss function of the initial neural network model value;
[0192] The neural network training module 104 may include:
[0193] The neural network training unit is used to adjust the training parameters of the distance transformation network according to the value of the loss function to obtain the neural network model.
[0194] Among them, the feature distance transformation loss is the loss between the expected feature distance and the transformation feature distance of each reference image feature, the expected feature distance is the reference value of the transformation feature distance, and the transformation feature distance of the reference image feature is the sample feature distance of the reference image feature And the result of the product of the distance transform coefficient; the sample feature distance is the feature distance between sample classes or the feature distance within the sample class; the feature distance between sample classes is the feature of other reference image features that belong to the same data set and belong to different categories. The distance from the reference image feature in the space; the feature distance within the sample class, which represents the distance between other reference image features that belong to the same data set and belong to the same category as the reference image feature in the feature space.
[0195] For the specific definition of the image recognition device, please refer to the above definition of the image recognition method. For the specific definition of the neural network model training device, please refer to the above definition of the neural network model training method, which will not be repeated here. The various modules in the above-mentioned image recognition device and neural network model training device can be implemented in whole or in part by software, hardware, and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
[0196] In one embodiment, a readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the image recognition method provided in the first aspect can be implemented, and the neural network provided in the second aspect can also be implemented. The training method of the model.
[0197] Reference Picture 11 As shown, this embodiment proposes an image recognition system, including a photographing device 111 and a computer device 112; the photographing device 111 is used to photograph and obtain an image to be recognized, and send the image to be recognized to the computer device 112 for image recognition; 112 includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the image recognition method provided in the first aspect can be implemented, and the neural network model training method provided in the second aspect can also be implemented.
[0198] The computer equipment may be, but is not limited to, a terminal, a server, etc. Taking a terminal as an example, the computer equipment includes a processor, a memory, a network interface, a display screen, and an input device connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize an image recognition method and a neural network model training method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, a trackball or a touchpad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.
[0199] Those skilled in the art can understand, Picture 11 The structure shown in is only a block diagram of part of the structure related to the solution of the application, and does not constitute a limitation on the computer equipment to which the solution of the application is applied. The specific computer equipment may include more or Fewer parts, or combine some parts, or have a different arrangement of parts.
[0200] A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0201] The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should It is considered as the range described in this specification.
[0202] The above-mentioned embodiments only express several implementation manners of the present application, and their description is relatively specific and detailed, but they should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products