Image classification, image processing method, device and storage medium
By calculating the similarity of image features and error information to iteratively update the initial feature extraction model, the problem of low update efficiency of artificial intelligence models is solved, and efficient image processing is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-09-09
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, the low efficiency of artificial intelligence model updates leads to low image processing efficiency.
By acquiring the current training image set, the images are input into the trained feature extraction model and the initial feature extraction model to be trained, respectively, to extract features. The similarity between the trained and untrained features is calculated, and the initial feature extraction model is iteratively updated using error information until the training completion condition is met, thus obtaining the first target feature extraction model.
It improves model update efficiency by eliminating the need to obtain independent image samples and labels, thus enhancing image processing efficiency.
Smart Images

Figure CN115797990B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to an image classification, image processing method, apparatus, computer device, storage medium, and computer program product. Background Technology
[0002] With the development of artificial intelligence (AI) technology, image processing technology has emerged. Image processing typically involves using AI models to extract features from images, obtaining feature vectors, and then performing subsequent tasks such as image classification and recognition. Currently, AI models are continuously iterated and updated driven by business needs. This involves updating older AI models with new, labeled images to obtain new AI models. However, the current method of updating AI models using independent image samples and labels results in low efficiency. Summary of the Invention
[0003] Therefore, it is necessary to provide an image classification, image processing method, apparatus, computer equipment, storage medium, and computer program product that can improve the efficiency of model updates and thus improve the efficiency of image processing, in order to address the above-mentioned technical problems.
[0004] An image classification method, the method comprising:
[0005] Obtain the current training image set, which is determined from a preset training image set;
[0006] Each training image in the current training image set is input into the trained feature extraction model and the initial feature extraction model to be trained for feature extraction, so as to obtain the trained features and the features to be trained for each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model.
[0007] Calculate the similarity between the trained features corresponding to each training image to obtain the trained similarity set, and calculate the similarity between the features to be trained corresponding to each training image to obtain the training similarity set;
[0008] The error information between the similarity set to be trained and the similarity set already trained is calculated, and the initial feature extraction model to be trained is updated based on the error information. The step of obtaining the current training image set is iteratively executed until the training completion condition is met. The initial feature extraction model that has been trained is used as the first target feature extraction model. The first target feature extraction model is used to extract the features corresponding to the input image, and image content classification is performed based on the features corresponding to the input image.
[0009] An image classification device, the device comprising:
[0010] The image acquisition module is used to acquire the current training image set, which is determined from a preset training image set.
[0011] The feature extraction module is used to input each training image in the current training image set into the trained feature extraction model and the initial feature extraction model to be trained for feature extraction, so as to obtain the trained features and the features to be trained for each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model.
[0012] The similarity calculation module is used to calculate the similarity between the trained features corresponding to each training image to obtain a set of trained similarity, and to calculate the similarity between the features to be trained corresponding to each training image to obtain a set of to be trained similarity.
[0013] The iterative update module is used to calculate the error information between the similarity set to be trained and the similarity set already trained, and update the initial feature extraction model to be trained based on the error information. It then returns to the step of obtaining the current training image set and iteratively executes it until the training completion condition is met. The initial feature extraction model that has been trained is used as the first target feature extraction model. The first target feature extraction model is used to extract the features corresponding to the input image and perform image content classification based on the features corresponding to the input image.
[0014] A computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:
[0015] Obtain the current training image set, which is determined from a preset training image set;
[0016] Each training image in the current training image set is input into the trained feature extraction model and the initial feature extraction model to be trained for feature extraction, so as to obtain the trained features and the features to be trained for each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model.
[0017] Calculate the similarity between the trained features corresponding to each training image to obtain the trained similarity set, and calculate the similarity between the features to be trained corresponding to each training image to obtain the training similarity set;
[0018] The error information between the similarity set to be trained and the similarity set already trained is calculated, and the initial feature extraction model to be trained is updated based on the error information. The step of obtaining the current training image set is iteratively executed until the training completion condition is met. The initial feature extraction model that has been trained is used as the first target feature extraction model. The first target feature extraction model is used to extract the features corresponding to the input image, and image content classification is performed based on the features corresponding to the input image.
[0019] A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the following steps:
[0020] Obtain the current training image set, which is determined from a preset training image set;
[0021] Each training image in the current training image set is input into the trained feature extraction model and the initial feature extraction model to be trained for feature extraction, so as to obtain the trained features and the features to be trained for each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model.
[0022] Calculate the similarity between the trained features corresponding to each training image to obtain the trained similarity set, and calculate the similarity between the features to be trained corresponding to each training image to obtain the training similarity set;
[0023] The error information between the similarity set to be trained and the similarity set already trained is calculated, and the initial feature extraction model to be trained is updated based on the error information. The step of obtaining the current training image set is iteratively executed until the training completion condition is met. The initial feature extraction model that has been trained is used as the first target feature extraction model. The first target feature extraction model is used to extract the features corresponding to the input image, and image content classification is performed based on the features corresponding to the input image.
[0024] A computer program product includes a computer program, characterized in that, when executed by a processor, the computer program performs the following steps:
[0025] Obtain the current training image set, which is determined from a preset training image set;
[0026] Each training image in the current training image set is input into the trained feature extraction model and the initial feature extraction model to be trained for feature extraction, so as to obtain the trained features and the features to be trained for each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model.
[0027] Calculate the similarity between the trained features corresponding to each training image to obtain the trained similarity set, and calculate the similarity between the features to be trained corresponding to each training image to obtain the training similarity set;
[0028] The error information between the similarity set to be trained and the similarity set already trained is calculated, and the initial feature extraction model to be trained is updated based on the error information. The step of obtaining the current training image set is iteratively executed until the training completion condition is met. The initial feature extraction model that has been trained is used as the first target feature extraction model. The first target feature extraction model is used to extract the features corresponding to the input image, and image content classification is performed based on the features corresponding to the input image.
[0029] The aforementioned image classification method, apparatus, computer equipment, storage medium, and computer program product extract features by inputting each training image in the current training image set into a trained feature extraction model and an initial feature extraction model to be trained, respectively, to obtain trained features and untrained features corresponding to each training image. Then, the similarity between the trained features corresponding to each training image is calculated to obtain a trained similarity set, and the similarity between the untrained features corresponding to each training image is calculated to obtain a untrained similarity set. Finally, the error information between the untrained similarity set and the trained similarity set is calculated, and the initial feature extraction model to be trained is updated based on the error information. The process iteratively repeats the step of obtaining the current training image set until the training completion condition is met. The trained initial feature extraction model is then used as the first target feature extraction model. That is, by mining the image feature space corresponding to the trained feature extraction model, the mined image feature space is transferred to the initial feature extraction model, thus obtaining the trained first target feature extraction model. This eliminates the need to obtain independent image samples and labels to update the model, improving model update efficiency.
[0030] An image processing method, the method comprising:
[0031] Obtain the image to be evaluated and the set of already evaluated images;
[0032] The image to be evaluated and the set of evaluated images are input into a trained feature extraction model to extract features, thereby obtaining the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images. The similarity between the features to be evaluated and the set of evaluated features is calculated to obtain the first similarity set.
[0033] The image to be evaluated and the set of evaluated images are input into the target feature extraction model for feature extraction, which yields the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images. The similarity between the target features to be evaluated and the set of evaluated target features is calculated to obtain the second similarity set. The target feature extraction model is obtained by knowledge distillation training through the trained feature extraction model.
[0034] The evaluation is performed based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated. Based on the evaluation information corresponding to the image to be evaluated, the similarity evaluation result corresponding to the image to be evaluated is determined.
[0035] An image processing apparatus, comprising:
[0036] The evaluation image acquisition module is used to acquire the image to be evaluated and the set of evaluated images;
[0037] The image to be evaluated and the set of evaluated images are input into a trained feature extraction model to extract features, thereby obtaining the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images. The similarity between the features to be evaluated and the set of evaluated features is calculated to obtain the first similarity set.
[0038] The image to be evaluated and the set of evaluated images are input into the target feature extraction model for feature extraction, which yields the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images. The similarity between the target features to be evaluated and the set of evaluated target features is calculated to obtain the second similarity set. The target feature extraction model is obtained by knowledge distillation training through the trained feature extraction model.
[0039] The evaluation is performed based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated. Based on the evaluation information corresponding to the image to be evaluated, the similarity evaluation result corresponding to the image to be evaluated is determined.
[0040] A computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:
[0041] Obtain the image to be evaluated and the set of already evaluated images;
[0042] The image to be evaluated and the set of evaluated images are input into a trained feature extraction model to extract features, thereby obtaining the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images. The similarity between the features to be evaluated and the set of evaluated features is calculated to obtain the first similarity set.
[0043] The image to be evaluated and the set of evaluated images are input into the target feature extraction model for feature extraction, which yields the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images. The similarity between the target features to be evaluated and the set of evaluated target features is calculated to obtain the second similarity set. The target feature extraction model is obtained by knowledge distillation training through the trained feature extraction model.
[0044] The evaluation is performed based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated. Based on the evaluation information corresponding to the image to be evaluated, the similarity evaluation result corresponding to the image to be evaluated is determined.
[0045] A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the following steps:
[0046] Obtain the image to be evaluated and the set of already evaluated images;
[0047] The image to be evaluated and the set of evaluated images are input into a trained feature extraction model to extract features, thereby obtaining the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images. The similarity between the features to be evaluated and the set of evaluated features is calculated to obtain the first similarity set.
[0048] The image to be evaluated and the set of evaluated images are input into the target feature extraction model for feature extraction, which yields the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images. The similarity between the target features to be evaluated and the set of evaluated target features is calculated to obtain the second similarity set. The target feature extraction model is obtained by knowledge distillation training through the trained feature extraction model.
[0049] The evaluation is performed based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated. Based on the evaluation information corresponding to the image to be evaluated, the similarity evaluation result corresponding to the image to be evaluated is determined.
[0050] A computer program product includes a computer program, characterized in that, when executed by a processor, the computer program performs the following steps:
[0051] Obtain the image to be evaluated and the set of already evaluated images;
[0052] The image to be evaluated and the set of evaluated images are input into a trained feature extraction model to extract features, thereby obtaining the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images. The similarity between the features to be evaluated and the set of evaluated features is calculated to obtain the first similarity set.
[0053] The image to be evaluated and the set of evaluated images are input into the target feature extraction model for feature extraction, which yields the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images. The similarity between the target features to be evaluated and the set of evaluated target features is calculated to obtain the second similarity set. The target feature extraction model is obtained by knowledge distillation training through the trained feature extraction model.
[0054] The evaluation is performed based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated. Based on the evaluation information corresponding to the image to be evaluated, the similarity evaluation result corresponding to the image to be evaluated is determined.
[0055] The aforementioned image processing method, apparatus, computer equipment, storage medium, and computer program product acquire an image to be evaluated and a set of evaluated images, and then input these images into a trained feature extraction model and a target feature extraction model for evaluation processing. Since the target feature extraction model is trained using knowledge distillation from the trained feature extraction model, the efficiency of processing the image to be evaluated is improved. Furthermore, by using the target feature extraction model and the trained feature extraction model to extract features from the image to be evaluated and the set of evaluated images, a first similarity set and a second similarity set are determined. Then, the first and second similarity sets are used to perform evaluation calculations to determine the similarity evaluation result corresponding to the image to be evaluated, thereby improving the accuracy of the similarity evaluation result. Attached Figure Description
[0056] Figure 1 This is a diagram illustrating the application environment of an image classification method in one embodiment;
[0057] Figure 2 This is a flowchart illustrating an image classification method in one embodiment;
[0058] Figure 3 This is a flowchart illustrating the process of calculating similarity in one embodiment;
[0059] Figure 4 This is a schematic diagram of the process for updating model parameters in one embodiment;
[0060] Figure 5 This is a flowchart illustrating the process of obtaining error information in one embodiment;
[0061] Figure 6 This is a flowchart illustrating the process of obtaining the second target feature extraction model in one embodiment;
[0062] Figure 7 This is a flowchart illustrating the process of obtaining the fourth target feature extraction model in one embodiment;
[0063] Figure 8 This is a flowchart illustrating the process of obtaining the audit result in one embodiment;
[0064] Figure 9 This is a schematic diagram of the framework for training a feature extraction model in one embodiment;
[0065] Figure 10 This is a flowchart illustrating an image processing method in one embodiment;
[0066] Figure 11 This is a flowchart illustrating the process of obtaining similar evaluation results in one embodiment;
[0067] Figure 12 This is a schematic diagram of the framework of an image processing method in one embodiment;
[0068] Figure 13 This is a flowchart illustrating an image classification method in a specific embodiment;
[0069] Figure 14 This is a schematic diagram of the application scenario in a specific embodiment;
[0070] Figure 15 This is a structural block diagram of an image classification device in one embodiment;
[0071] Figure 16 This is a structural block diagram of an image processing device in one embodiment;
[0072] Figure 17 This is an internal structural diagram of a computer device in one embodiment;
[0073] Figure 18 This is a diagram of the internal structure of a computer device in another embodiment. Detailed Implementation
[0074] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0075] Computer vision (CV) is a science that studies how to enable machines to "see." More specifically, it refers to machine vision, which uses cameras and computers to replace human eyes in recognizing and measuring targets, and then performs image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision studies related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content / behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), autonomous driving, intelligent transportation, and common biometric recognition technologies such as facial recognition and fingerprint recognition.
[0076] The solutions provided in this application involve technologies such as image processing in artificial intelligence, and are specifically illustrated through the following embodiments:
[0077] The image classification method provided in this application can be applied to, for example... Figure 1In the application environment shown, terminal 102 communicates with server 104 via a network. Server 104 receives training instructions sent by terminal 102 and retrieves the current training image set from database 106 according to these instructions. The current training image set is determined from a preset training image set. Server 104 inputs each training image in the current training image set into a trained feature extraction model and a pre-trained initial feature extraction model for feature extraction, obtaining trained features and pre-trained features for each training image. The pre-trained feature extraction model is obtained by initializing the parameters of the trained feature extraction model. Server 104 calculates the similarity between the trained features corresponding to each training image, obtaining a trained similarity set, and calculates the similarity between the pre-trained features corresponding to each training image, obtaining a pre-trained similarity set. Server 104 calculates the error information between the similarity set to be trained and the already trained similarity set, updates the initial feature extraction model to be trained based on the error information, and iteratively executes the step of obtaining the current training image set until the training completion condition is met. The trained initial feature extraction model is then used as the first target feature extraction model, which is used to extract features corresponding to the input image. Image content classification is performed based on these features. Server 104 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services. Terminal 102 can be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, in-vehicle terminal, smart TV, etc., but is not limited to these. The terminal and server can be directly or indirectly connected via wired or wireless communication, which is not limited herein.
[0078] In one embodiment, such as Figure 2 As shown, an image classification method is provided, which can be applied to... Figure 1 Taking a server as an example, it's understandable that this method can also be applied to terminals, and even to systems that include both terminals and servers, achieved through interaction between the terminal and the server. The steps include:
[0079] Step 202: Obtain the current training image set, which is determined from the preset training image set.
[0080] The current training image set includes at least two current training images, which are the images used during the current training. The preset training image set is a pre-set collection of training images used during training. The current training image set is a subset of the preset training image set. The training images in the preset training image set can be images acquired after the trained feature extraction model has been completed.
[0081] Specifically, the server can directly retrieve the current training image set from the database, which is determined from a preset training image set. That is, the current training image set is a subset of the training images in the preset training image set. In one embodiment, the server can retrieve the preset training image set, divide the training images in the preset training image set into pre-set batches to obtain training images for each batch, thus obtaining the current training image set. The server can retrieve the preset training image set from the Internet, from a business server, or from a database.
[0082] Step 204: Input each training image in the current training image set into the trained feature extraction model and the initial feature extraction model to be trained respectively to extract features, and obtain the trained features and the features to be trained corresponding to each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model.
[0083] The trained feature extraction model is a model for extracting image features obtained by training a neural network using historical training images. At this point, the trained feature extraction model needs to be updated. The initial feature extraction model to be trained refers to the feature extraction model whose parameters need to be initialized during training, or it can be obtained by directly initializing the parameters of the trained feature extraction model. This parameter initialization can be random initialization, Gaussian distribution initialization, or zero initialization, etc. The initial feature extraction model to be trained can be a model built using a neural network. In one embodiment, the model parameters of the trained feature extraction model can be used to initialize the parameters of the initial feature extraction model to be trained, that is, the model parameters of the trained feature extraction model are used as the initialization parameters of the initial feature extraction model to be trained. Trained features refer to the features corresponding to the training images extracted using the trained feature extraction model. Features to be trained refer to the features corresponding to the training images extracted using the trained initial feature extraction model.
[0084] Specifically, the server inputs each training image in the current training image set into a pre-trained feature extraction model for feature extraction, obtaining the trained features corresponding to each training image in the current training image set. Simultaneously, it inputs each training image in the current training image set into an initial feature extraction model to be trained, obtaining the features to be trained corresponding to each training image in the current training image set. In one embodiment, the pre-trained feature extraction model is a teacher network model, and the initial feature extraction model to be trained is a student network model.
[0085] Step 206: Calculate the similarity between the trained features corresponding to each training image to obtain the trained similarity set, and calculate the similarity between the features to be trained corresponding to each training image to obtain the training similarity set.
[0086] Specifically, the trained similarity set includes various trained similarity levels, which characterize the similarity between trained features corresponding to two different training images. The similarity between trained features corresponding to each pair of training images in the current training image set is calculated. The untrained similarity set includes various untrained similarity levels, which characterize the similarity between untrained features corresponding to two different training images. Each training image in the current training image set is traversed, and the similarity between the current training image and each other in the current training image set is calculated. Specifically, trained similarity levels are calculated using trained features, resulting in a trained similarity set, which characterizes the feature space corresponding to the current training image set obtained by the trained feature extraction model. Untrained similarity levels are calculated using untrained features, resulting in an untrained similarity set, which characterizes the feature space corresponding to the current training image set obtained by the untrained initial feature extraction model. In one embodiment, the trained similarity set is represented in the form of a matrix, and the set of similarities to be trained can also be represented in the form of a matrix. That is, the trained similarity matrix is calculated using the trained features, and the set of similarities to be trained is calculated using the features to be trained.
[0087] Step 208: Calculate the error information between the similarity set to be trained and the similarity set already trained, update the initial feature extraction model to be trained based on the error information, and return to the step of obtaining the current training image set for iterative execution until the training completion condition is met. Then, use the trained initial feature extraction model as the first target feature extraction model. The first target feature extraction model is used to extract the features corresponding to the input image and perform image content classification based on the features corresponding to the input image.
[0088] The error information is used to characterize the difference between the training similarity set and the already trained similarity set.
[0089] Specifically, the server can calculate the error between each training similarity in the training similarity set and the corresponding trained similarity in the trained similarity set, and then calculate the sum of all errors to obtain error information. Then, using this error information, the server uses gradient descent to back-update the model parameters in the initial feature extraction model, resulting in an updated training feature extraction model. This updated model is then used as the initial training feature extraction model, and the process of obtaining the next batch of the current training image set is iteratively executed until the training completion condition is met. The trained initial feature extraction model is then used as the first target feature extraction model. One round consists of traversing all training images in the preset training image set. The training completion condition refers to the conditions under which the initial feature extraction model has been trained, which may include reaching the maximum number of training iterations, reaching the maximum number of training rounds, the error information obtained being less than a pre-set error threshold, and the model parameters no longer changing, etc. The first target feature extraction model is the model obtained after training the initial feature extraction model. The first target feature extraction model is used to extract features corresponding to the input image and perform image content classification based on these features. For example, it could be image classification at the semantic level across species, such as classifying cats and dogs. It could also be fine-grained image classification at the subclass level, such as classifying different birds.
[0090] In one embodiment, the first target feature extraction model can perform image content recognition based on features corresponding to the input image. In another embodiment, the first target feature extraction model can also perform image content segmentation, etc., based on features corresponding to the input image.
[0091] The aforementioned image classification method extracts features by inputting each training image in the current training image set into a pre-trained feature extraction model and a pre-trained initial feature extraction model. This yields trained features and pre-trained features for each training image. Then, the similarity between the trained features is calculated to obtain a trained similarity set, and the similarity between the pre-trained features is calculated to obtain a pre-trained similarity set. Finally, the error information between the pre-trained and pre-trained similarity sets is calculated, and the pre-trained initial feature extraction model is updated based on this error information. The process iteratively repeats the step of obtaining the current training image set until training is complete. The completed pre-trained feature extraction model is then used as the first target feature extraction model. Image content classification is performed using this first target feature extraction model. This involves mining the image feature space corresponding to the trained feature extraction model and transferring the mined image feature space to the pre-trained feature extraction model, thus obtaining the completed first target feature extraction model. This eliminates the need to obtain independent image samples and labels for model updates, improving model update efficiency.
[0092] In one embodiment, such as Figure 3 As shown, step 206, which calculates the similarity between the trained features corresponding to each training image to obtain a set of trained similarity scores, includes:
[0093] Step 302: Obtain the trained feature matrix based on the trained features corresponding to each training image, and normalize the trained feature matrix to obtain the trained normalized matrix.
[0094] In the trained feature matrix, each row represents the feature vector corresponding to each trained feature.
[0095] Specifically, the server constructs a trained feature matrix based on the trained features corresponding to each training image, and then normalizes the trained feature matrix. Normalization algorithms can be used, such as L2 norm normalization, which involves dividing each element of a vector by its L2 norm to obtain the normalized trained matrix. In a specific embodiment, normalization can be performed using the formula shown below.
[0096] Formula (1)
[0097] Formula (2)
[0098] Formula (3)
[0099] Where x represents the feature vector and d represents the dimension of the feature vector. Let represent the L2 norm of the eigenvector x. This represents the normalized trained features. The normalized matrix is denoted by N, which represents the number of trained features. The normalized matrix is calculated using the formulas (1), (2), and (3) above.
[0100] Step 304: Calculate the transpose matrix corresponding to the trained normalized matrix to obtain the trained transpose matrix. Calculate the product of the trained transpose matrix and the trained normalized matrix to obtain the trained similarity set.
[0101] The trained transpose matrix refers to the matrix obtained after transposing.
[0102] Specifically, the server transposes the trained normalized matrix to obtain the trained transpose matrix, and then calculates the product of the trained device matrix and the trained normalized matrix, i.e., performs matrix multiplication. This means multiplying the trained features with each other to obtain the similarity between the trained features and each other. By iterating through each trained feature, a set of trained similarity scores is obtained.
[0103] In a specific embodiment, the trained similarity set can be calculated using the formula (4) shown below.
[0104] Formula (4)
[0105] in, This indicates that the transpose matrix has been trained. This represents the training similarity set.
[0106] In one embodiment, the server can use a similarity algorithm to calculate the trained similarity set between the trained transpose matrix and the trained normalized matrix. Specifically, it can calculate the cosine similarity between the trained transpose matrix and the trained normalized matrix, or it can calculate distance similarity, etc., to obtain the trained similarity set.
[0107] In one embodiment, such as Figure 3 As shown, step 206, which calculates the similarity between the features to be trained corresponding to each training image, yields a set of similarity scores to be trained, including:
[0108] Step 306: Obtain the training feature matrix based on the training features corresponding to each training image, and normalize the training feature matrix to obtain the training normalized matrix.
[0109] In the training feature matrix, each row represents the feature vector corresponding to each feature to be trained.
[0110] Specifically, the server establishes a training feature matrix based on the training features corresponding to each training image, and then normalizes the training feature matrix. Normalization algorithms can be used, such as 0-1 normalization, min-maximization, zero-means, and L2 normalization. For example, L2 norm normalization can be used, where each element of a vector is divided by its L2 norm to obtain the training normalized matrix. In a specific embodiment, the server can also use the above formulas (1), (2), and (3) to normalize the matrix and obtain the training normalized matrix.
[0111] Step 308: Calculate the transpose matrix corresponding to the normalized matrix to be trained, obtain the transpose matrix to be trained, calculate the product of the transpose matrix to be trained and the normalized matrix to be trained, and obtain the similarity set to be trained.
[0112] Specifically, the server transposes the normalized matrix to be trained to obtain the transposed matrix, and then calculates the product of the normalized matrix to be trained and the normalized matrix to be trained, i.e., performs matrix multiplication. This means multiplying the feature to be trained with each other to obtain the similarity between the feature to be trained and each other to obtain the similarity between the feature to be trained and each other to obtain the similarity between the features to be trained and each other to obtain the similarity set. In a specific embodiment, the server can also use formula (4) to calculate the similarity set.
[0113] In the above embodiments, by performing transpose calculation to obtain the transpose matrix, and then calculating the product of the normalized matrix and the transpose matrix, the similarity degree is obtained, thereby improving the efficiency of obtaining the similarity degree.
[0114] In one embodiment, such as Figure 4 As shown, step 208, which involves calculating the error information between the similarity set to be trained and the already trained similarity set, updating the initial feature extraction model to be trained based on the error information, and iteratively executing the step of obtaining the current training image set, includes:
[0115] Step 402: Calculate the mean square error between the similarity set to be trained and the similarity set already trained to obtain the initial loss information, and use the initial loss information as the error information.
[0116] Specifically, the server uses the average error algorithm to calculate the loss between the similarity set to be trained and the similarity set already trained, and obtains initial loss information, which is used to characterize the error between the similarity set already trained and the similarity set to be trained.
[0117] Step 404: Update the model parameters in the initial feature extraction model in reverse based on the error information to obtain the updated feature extraction model.
[0118] Specifically, the server uses the error information to calculate the gradient and backpropagates the gradient to the initial feature extraction model, updating the model parameters in the initial feature extraction model to obtain the updated feature extraction model.
[0119] Step 406: Use the updated feature extraction model as the initial feature extraction model, and return to iteratively execute the step of obtaining the current training image set.
[0120] Specifically, the server uses the updated feature extraction model as the initial feature extraction model and iteratively executes the step of obtaining the current training image set until the training completion condition is met, thus obtaining the trained feature extraction model.
[0121] In one embodiment, such as Figure 5 As shown, step 402, which involves calculating the mean square error between the training similarity set and the trained similarity set to obtain error information, includes:
[0122] Step 502: Calculate the mean squared error between the similarity set to be trained and the similarity set already trained to obtain the initial loss information.
[0123] Specifically, the server calculates the Euclidean distance between the trainable similarity sets in the training similarity set and the trained similarity sets in the trained similarity set to obtain the initial loss information.
[0124] Step 504: Obtain the number of training images corresponding to the current training image set, calculate the ratio of the initial loss information to the number of training images, and obtain the average loss information.
[0125] Specifically, the server can directly obtain the total number of training images in the current training image set, i.e., the number of training images, or it can count the number of training images. Then, it calculates the ratio of the initial loss information to the number of training images to obtain the average loss information.
[0126] Step 506: Obtain preset balance parameters, and perform balance calculation on the average loss information based on the preset balance parameters to obtain balanced loss information.
[0127] Among them, the preset balance parameter refers to the hyperparameter that is set in advance to weigh the distillation loss and classification loss.
[0128] Specifically, the server uses preset balancing parameters to weight the average loss information to obtain balanced loss information, which is the weighted loss information.
[0129] Step 508: Obtain the classification loss information corresponding to the initial feature extraction model to be trained, and calculate the sum of the classification loss information and the balance loss information to obtain the error information.
[0130] Here, classification loss information refers to the loss of the initial feature extraction model to be trained when performing subsequent image content classification tasks after feature extraction. In one embodiment, the initial feature extraction model to be trained can be a recognition loss when performing image content recognition tasks. When performing image content segmentation tasks, it can be a segmentation loss, and so on.
[0131] Specifically, the server obtains the classification loss information corresponding to the initial feature extraction model to be trained. The server can obtain image classification labels and the initial feature extraction model to be trained to classify image content based on the features corresponding to the training images, obtaining an initial classification result. Then, it calculates the classification loss between the initial classification result and the image classification labels, obtaining the classification loss information. This classification loss can be calculated using the cross-entropy loss function. Finally, the sum of the classification loss information and the balance loss information is calculated to obtain the error information.
[0132] In a specific embodiment, the error information can be calculated using the formula (5) shown below.
[0133] Formula (5)
[0134] Where L represents error information, Represents classification loss information. Preset equilibrium parameters are used to balance classification loss and distillation loss. For average loss information, Indicates the mean squared error. This indicates that a similarity set has been trained. The set of similarity images to be trained, where N represents the number of training images.
[0135] In the above embodiment, the initial loss information is obtained by calculating the mean squared error. Then, after balancing using the pre-judgment balancing parameters, the loss information is summed with the classification loss information to obtain the error information, making the obtained error information more accurate.
[0136] In one embodiment, such as Figure 6 As shown, step 208, which involves calculating the error information between the similarity set to be trained and the already trained similarity set, updating the initial feature extraction model to be trained based on the error information, and iteratively executing the step of obtaining the current training image set until the training completion condition is met, uses the trained initial feature extraction model as the first target feature extraction model, includes:
[0137] Step 602: Input the similarity set to be trained into the initial mapping network for feature mapping to obtain the target mapping feature set.
[0138] The initial mapping network refers to the mapping network whose parameters are initialized. This network transforms the set of similarity samples to be trained, thereby narrowing the semantic gap. This mapping network is built using a neural network. The target mapping feature set refers to the set of features obtained after mapping.
[0139] Specifically, the server inputs each training similarity from the training similarity set into the initial mapping network for feature mapping, thereby obtaining the target mapping feature set of the initial mapping network.
[0140] Step 604: Calculate the mean square error between the target mapping feature set and the trained similarity set to obtain target error information. Based on the target error information, update the initial mapping network and the initial feature extraction model in reverse to obtain the updated mapping network and the updated feature extraction model.
[0141] Step 606: Use the updated mapping network as the initial mapping network and the updated feature extraction model as the initial feature extraction model, and return to the step of obtaining the current training image set for iterative execution until the training completion condition is met. Then, obtain the second target feature extraction model based on the trained initial feature extraction model and the trained initial mapping network.
[0142] Specifically, the server calculates the mean difference between the target mapping feature set and the trained similarity to obtain target error information. Then, it uses this target error information to back-update the network parameters of the initial mapping network and the model parameters of the initial feature extraction model, resulting in an updated mapping network and an updated feature extraction model. The updated mapping network is then used as the initial mapping network, and the updated feature extraction model is used as the initial feature extraction model. The process iteratively executes the step of obtaining the current training image set until the training completion condition is met. Finally, based on the trained initial feature extraction model and the trained initial mapping network, a second target feature extraction model is obtained. In other words, the second target feature extraction model includes both the trained initial feature extraction model and the trained mapping network.
[0143] In the above embodiments, by adding an initial mapping network after the initial feature extraction model, and training the initial feature extraction model and the initial mapping network together, a second target feature extraction model is obtained. Using the second target feature extraction model can improve the accuracy of feature extraction.
[0144] In one embodiment, the initial feature extraction model is an initial generation model; step 208, namely, calculating the error information between the similarity set to be trained and the already trained similarity set, updating the initial feature extraction model to be trained based on the error information, and iteratively executing the step of obtaining the current training image set until the training completion condition is met, and using the trained initial feature extraction model as the first target feature extraction model, includes the following steps:
[0145] The similarity set to be trained and the similarity set already trained are input into the initial discriminant network for discrimination to obtain the similarity discrimination result. The initial discriminant network and the initial feature extraction model are updated based on the similarity discrimination result, and the step of obtaining the current training image set is returned and iteratively executed until the training completion condition is met. The trained initial feature extraction model is then used as the third target feature extraction model.
[0146] In this training process, adversarial learning can be used to train the initial feature extraction model. The initial feature extraction model is used as the initial generative model to generate features corresponding to the training images. The initial discriminative network is used to distinguish whether the input features are extracted by the trained feature extraction model or by the initial feature extraction model. That is, features extracted by the trained feature extraction model are treated as the positive class, and features extracted by the initial feature extraction model are treated as the negative class. The initial feature extraction model is trained to make the features extracted by the trained model and the features extracted by the initial feature extraction model more similar, thus deceiving the discriminative network. Simultaneously, the discriminative network is trained to distinguish between the two classes of features, i.e., adversarial learning, where features are extracted from each other. When training is complete, the feature space extracted by the trained initial feature extraction model more closely approximates the feature space extracted by the trained feature extraction model, and the discriminative network can better distinguish between them.
[0147] Specifically, the server inputs the set of similarities to be trained and the set of similarities already trained into the initial discriminator network for discrimination, obtaining the similarity discrimination result. The similarity discrimination result refers to whether the similarity of the input obtained by the initial discriminator network belongs to the feature corresponding to the positive class or the feature corresponding to the negative class. Then, based on the actual result, that is, the feature actually corresponding to the input similarity and the similarity discrimination result, the error is calculated. Based on the error, the gradient descent algorithm is used to update the initial discriminator network and the initial feature extraction model in reverse, and the step of obtaining the current training image set is iteratively executed until the training completion condition is met. At this point, the trained initial feature extraction model is used as the third target feature extraction model.
[0148] In the above embodiments, adversarial learning is performed by adding an initial discriminant network, and then the trained initial feature extraction model is used as the third target feature extraction model, which can improve the accuracy of the trained second target feature extraction model.
[0149] In one embodiment, the current training image set includes image triples, and each image triple includes a pair of positive and negative images; such as Figure 7 As shown, the image classification method also includes the following steps:
[0150] Step 702: Input each image triplet into the trained feature extraction model and the initial feature extraction model to be trained, respectively, to extract features and obtain the trained triplet features and the untrained triplet features corresponding to each image triplet.
[0151] The current training image set includes image triplets. Each image triplet contains positive and negative image pairs; that is, two training images in a triplet are of the same type, and the third training image is paired with two images of the same type but of different types (positive image pair). Any image of the same type paired with an image of a different type forms a negative image pair. Triple features refer to the features extracted from each training image in the image triplet, resulting in a triplet feature. This is achieved by concatenating the corresponding features of each training image. Trained triplet features refer to the features extracted from image triplets by a trained feature extraction model. Triplet features to be trained refer to the features extracted from image triplets by an initial feature extraction model to be trained.
[0152] Specifically, the server inputs each image triplet into a trained feature extraction model and an initial feature extraction model to be trained for feature extraction, obtaining the trained triplet features corresponding to each image triplet. Simultaneously, it inputs each image triplet into the initial feature extraction model to be trained for feature extraction, obtaining the training triplet features corresponding to each image triplet.
[0153] Step 704: Calculate the triplet loss based on the trained triplet features and the triplet features to be trained to obtain the initial triplet loss information. Update the initial feature extraction model in reverse based on the initial triplet loss information, and return to the step of obtaining the current training image set for iterative execution until the training completion condition is met. Then, use the trained initial feature extraction model as the fourth target feature extraction model.
[0154] The initial triplet loss information is used to characterize the errors corresponding to the trained triplet features and the triplet features to be trained.
[0155] Specifically, the server uses the trained triplet features to calculate the triplet loss, obtaining the trained triplet loss, and then uses the triplet to be trained to calculate the triplet loss, obtaining the training triplet loss. Next, it calculates the error between the trained triplet loss and the training triplet loss to obtain the initial triplet loss information. Then, based on the initial triplet loss information, it uses the gradient descent algorithm to update the initial feature extraction model in reverse, and iteratively executes the step of obtaining the current training image set until the training completion condition is met. Finally, the trained initial feature extraction model is used as the fourth target feature extraction model.
[0156] In the above embodiments, by inputting each image triplet into the trained feature extraction model and the initial feature extraction model to be trained respectively, feature extraction is performed to obtain the trained triplet features and the initial triplet features corresponding to each image triplet. Based on the trained triplet features and the initial triplet features, triplet loss is calculated to obtain the initial triplet loss information. Based on the initial triplet loss information, the initial feature extraction model is updated in reverse, and the step of obtaining the current training image set is returned and iteratively executed until the training completion condition is met. The trained initial feature extraction model is then used as the fourth target feature extraction model, thereby improving the accuracy of the obtained feature extraction model.
[0157] In one embodiment, such as Figure 8 As shown, after step 208, that is, after calculating the error information between the similarity set to be trained and the already trained similarity set, updating the initial feature extraction model to be trained based on the error information, and returning to the step of obtaining the current training image set, the process is iteratively executed until the training completion condition is met. After using the trained initial feature extraction model as the first target feature extraction model, the process further includes:
[0158] Step 802: Obtain the image to be reviewed, input the image to be reviewed into the first target feature extraction model for feature extraction, and obtain the features to be reviewed.
[0159] Here, "images to be reviewed" refers to images that require review, which involves examining the content of the images for violations, illegality, abnormalities, sensitivity, etc. "Features to be reviewed" refers to the image characteristics corresponding to the images to be reviewed.
[0160] Specifically, the server obtains the image to be reviewed. This image can be uploaded to the server from a user terminal, obtained from a business server, or retrieved from the internet. Then, the image to be reviewed is input into a first target feature extraction model for feature extraction to obtain the features to be reviewed. In other words, the server deploys and uses the trained first target feature extraction model. When using it, the first target feature extraction model is directly invoked to extract features, thereby obtaining the features to be reviewed.
[0161] Step 804: Obtain the reviewed features corresponding to the reviewed image library, and calculate the similarity between the feature to be reviewed and the reviewed features;
[0162] Step 806: Determine the review result corresponding to the image to be reviewed based on the degree of similarity.
[0163] The reviewed image library stores all reviewed images and their corresponding reviewed features. Reviewed images refer to images whose content contains violations, illegal information, abnormalities, or sensitive information. Reviewed features refer to the image characteristics corresponding to reviewed images.
[0164] Specifically, the server pre-establishes a library of reviewed images, which stores the reviewed features corresponding to the reviewed images. The server retrieves the reviewed features from the library and calculates the similarity between the feature to be reviewed and the reviewed features. When the similarity exceeds a preset similarity threshold, it indicates that the image to be reviewed contains illegal or sensitive content, and the review result for the image to be reviewed is "rejected." When the similarity between the feature to be reviewed and each reviewed feature in the library does not exceed the preset similarity threshold, it indicates that the image to be reviewed does not contain illegal or sensitive content, and the review result for the image to be reviewed is "approved." At this point, subsequent business processing can proceed, such as classifying, recognizing, and segmenting the image to be reviewed.
[0165] In the above embodiments, the first target feature extraction model is used to extract features from the image to be reviewed, and then the similarity between the features to be reviewed and each reviewed image in the reviewed image library is calculated. Based on the similarity, the review result corresponding to the image to be reviewed is determined, which improves the efficiency of obtaining the review result.
[0166] In a specific embodiment, such as Figure 9The diagram illustrates a training framework for a feature extraction model. Specifically, the server performs distillation learning, inputting the training image set into both a teacher network and a student network, both of which are feature extraction networks. The teacher network outputs a feature vector corresponding to each training image, resulting in individual teacher feature vectors. The similarity between these teacher feature vectors is then calculated pairwise to obtain a teacher similarity matrix. Similarly, the student network outputs a feature vector corresponding to each training image, resulting in individual student feature vectors. The similarity between these student feature vectors is then calculated pairwise to obtain a student similarity matrix. A loss function is then used to calculate the loss between the teacher and student similarity matrices. This loss is used for gradient inversion to update the student network, and this process is iteratively repeated until the training is complete, at which point the trained student network becomes the final feature extraction model.
[0167] In one embodiment, such as Figure 10 As shown, an image processing method is provided, which is applied to... Figure 1 Taking a server as an example, it's understandable that this method can also be applied to terminals, and even to systems that include both terminals and servers, achieved through interaction between the terminal and the server. The steps include:
[0168] Step 1002: Obtain the image to be evaluated and the set of evaluated images.
[0169] The evaluated image set refers to all evaluated images that were evaluated using a trained feature extraction model. The image to be evaluated is the image that needs to be evaluated, and can be an image from the evaluated image set. After iterative updates to the trained feature extraction model, the evaluated images need to be re-evaluated. The image to be evaluated can also be an image that has never been evaluated.
[0170] Specifically, the server can retrieve the image to be evaluated and the set of evaluated images from the database. The image to be evaluated can be selected sequentially from the set of evaluated images, obtained from the business server, obtained from the Internet, etc.
[0171] Step 1002: Input the image to be evaluated and the set of evaluated images into the trained feature extraction model to extract features, obtain the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images, and calculate the similarity between the features to be evaluated and the set of evaluated features to obtain the first similarity set.
[0172] Here, the trained feature extraction model refers to the feature extraction model trained using historical training images based on a neural network algorithm. The first similarity set includes various first similarity levels, which characterize the similarity between the feature to be evaluated and the evaluated features in the evaluated feature set. The feature to be evaluated refers to the image features corresponding to the image to be evaluated. The evaluated feature set includes the image features corresponding to each evaluated image.
[0173] Specifically, the server invokes a trained feature extraction model, inputting both the image to be evaluated and the set of evaluated images into the model for feature extraction, resulting in the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images. Then, a similarity algorithm can be used to calculate the similarity between the features to be evaluated and each evaluated feature in the set of evaluated features, thus obtaining the first similarity set.
[0174] Step 1002: Input the image to be evaluated and the set of evaluated images into the target feature extraction model for feature extraction, obtain the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images, and calculate the similarity between the target features to be evaluated and the set of evaluated target features to obtain the second similarity set. The target feature extraction model is obtained by knowledge distillation training through the trained feature extraction model.
[0175] The target feature extraction model is obtained through knowledge distillation training of a pre-trained feature extraction model. Alternatively, the target feature extraction model can be obtained using any of the above-described image classification methods. The target feature to be evaluated refers to the image features corresponding to the image to be evaluated, extracted by the target feature extraction model. The evaluated target feature set includes the image features extracted from each evaluated image using the target feature extraction model. The second similarity set includes various second similarity levels, which characterize the similarity between the target feature to be evaluated and the evaluated target features in the evaluated target feature set.
[0176] Specifically, the server invokes the target feature extraction model, inputting both the image to be evaluated and the set of evaluated images into the model for feature extraction, resulting in the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images. Then, a similarity algorithm can be used to calculate the similarity between the target features to be evaluated and each evaluated target feature in the set of evaluated target features, resulting in a second similarity set.
[0177] Step 1002: Perform evaluation calculations based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated, and determine the similarity evaluation result corresponding to the image to be evaluated based on the evaluation information corresponding to the image to be evaluated.
[0178] The evaluation information, which can be an evaluation score, characterizes the similarity between the image to be evaluated and the set of previously evaluated images. The similarity evaluation result characterizes the evaluation result corresponding to the image to be evaluated, and this result includes similar and dissimilar results.
[0179] Specifically, the server uses a first similarity set and a second similarity set to calculate the error, and obtains the evaluation score corresponding to the evaluation image based on the error. When the evaluation score exceeds a preset similarity threshold, the evaluation result of the image to be evaluated is a similar result. At this time, the server can use the target feature corresponding to the image to be evaluated to replace the evaluated feature corresponding to the same image in the evaluated image set. When the evaluation score does not exceed the preset similarity threshold, the evaluation result of the image to be evaluated is a dissimilar result. At this time, the image to be evaluated can be further sent to a human evaluation terminal for manual evaluation.
[0180] The aforementioned image processing method, apparatus, computer equipment, storage medium, and computer program product acquire an image to be evaluated and a set of evaluated images, and then input these images into a trained feature extraction model and a target feature extraction model for evaluation processing. Since the target feature extraction model is trained using knowledge distillation from the trained feature extraction model, the efficiency of processing the image to be evaluated is improved. Furthermore, by using the target feature extraction model and the trained feature extraction model to extract features from the image to be evaluated and the set of evaluated images, a first similarity set and a second similarity set are determined. Then, the first and second similarity sets are used to perform evaluation calculations to determine the similarity evaluation result corresponding to the image to be evaluated, thereby improving the accuracy of the similarity evaluation result.
[0181] In one embodiment, step 1004, which involves calculating the similarity between the feature to be evaluated and the set of evaluated features to obtain a first similarity set, includes the following steps:
[0182] The features to be evaluated are normalized to obtain normalized features to be evaluated, and the set of evaluated features is normalized to obtain a normalized set of evaluated features. The normalized set of evaluated features is transposed to obtain an evaluated transpose matrix. The product of the normalized set of evaluated features and the evaluated transpose matrix is calculated to obtain the first similarity set.
[0183] Here, the normalized feature to be evaluated refers to the normalized feature to be evaluated. The normalized evaluated feature set refers to the normalized evaluated feature set. The evaluated transpose matrix refers to the transpose matrix obtained by transposing the normalized evaluated feature set.
[0184] Specifically, the server uses a normalization algorithm to normalize the features to be evaluated, obtaining normalized features to be evaluated, and normalizes the evaluated feature set, obtaining a normalized evaluated feature set. The L2 norm normalization algorithm can be used for normalization, i.e., formulas (1), (2), and (3) can be used. Then, the server transposes the normalized evaluated feature set to obtain the evaluated transpose matrix, and uses a similarity algorithm to calculate the similarity between the normalized evaluated feature set and the evaluated transpose matrix, obtaining a first similarity set. For example, the product of the normalized evaluated feature set and the evaluated transpose matrix can be calculated to obtain the first similarity set, or the distance similarity between the normalized evaluated feature set and the evaluated transpose matrix can be calculated to obtain the first similarity set, and so on.
[0185] In one embodiment, step 1006, which involves calculating the similarity between the target feature to be evaluated and the set of already evaluated target features to obtain a second similarity set, includes:
[0186] The features of the target to be evaluated are normalized to obtain normalized target features, and the set of evaluated target features is normalized to obtain normalized set of evaluated target features. The normalized set of evaluated target features is transposed to obtain the transpose matrix of evaluated targets. The product of the normalized set of evaluated target features and the transpose matrix of evaluated targets is calculated to obtain the second similarity set.
[0187] Here, "normalized target features to be evaluated" refers to the normalized target features. "Normalized evaluated target feature set" refers to the normalized evaluated target feature set. "Evaluated transpose matrix" refers to the transpose matrix obtained by transposing the normalized evaluated target feature set.
[0188] Specifically, the server uses a normalization algorithm to normalize the features of the target to be evaluated, obtaining normalized features of the target to be evaluated, and normalizes the set of evaluated target features, obtaining a normalized set of evaluated target features. The L2 norm normalization algorithm can be used for normalization, i.e., formulas (1), (2), and (3) can be used. Then, the server transposes the normalized set of evaluated target features to obtain the transposed matrix of the evaluated target. A similarity algorithm is then used to calculate the similarity between the normalized set of evaluated target features and the transposed matrix of the evaluated target, obtaining a second similarity set. For example, the product of the normalized set of evaluated target features and the transposed matrix of the evaluated target can be calculated to obtain the second similarity set, or the distance similarity between the normalized set of evaluated target features and the transposed matrix of the evaluated target can be calculated to obtain the second similarity set, and so on.
[0189] In one embodiment, such as Figure 11As shown, step 1008, which involves evaluating the image to be evaluated based on the first and second similarity sets to obtain evaluation information, and determining the similarity evaluation result based on the evaluation information, includes:
[0190] Step 1002: Calculate the mean square error between the first similarity set and the second similarity set to obtain the target loss information;
[0191] Step 1104: Count the total number of images corresponding to the image to be evaluated and the already evaluated image set, calculate the ratio of target loss information to the total number of images, and determine the evaluation information corresponding to the image to be evaluated based on the ratio.
[0192] Specifically, the server uses the mean squared error loss function to calculate the error between the first similarity set and the second similarity set, obtaining the target loss information. This target loss information is used to characterize the similarity error between the first and second similarity sets. Then, the total number of images corresponding to the image to be evaluated and the already evaluated image sets is counted; that is, the total number of evaluated images in the already evaluated image set is added to the number of images to be evaluated to obtain the total number of images. The target loss information is then compared with the total number of images to obtain a ratio. The evaluation information corresponding to the image to be evaluated is then determined based on this ratio. This evaluation information is used to characterize the relative position change of the image to be evaluated in the feature space of the trained feature extraction model relative to the feature space of the target feature extraction model.
[0193] In a specific embodiment, the evaluation information corresponding to the image to be evaluated can be calculated using the formula (6) shown below.
[0194] Formula (6)
[0195] Where S represents the evaluation information corresponding to the image to be evaluated, i.e., the score. N is the total number of images corresponding to the image to be evaluated and the already evaluated image set. This represents the mean squared error, which is the target loss information. Denotes the first similarity set, This represents the second set of similarity.
[0196] Step 1106: When the evaluation information exceeds the preset evaluation threshold, the similarity evaluation pass result corresponding to the image to be evaluated is obtained.
[0197] The preset evaluation threshold refers to a pre-set threshold for passing a similarity evaluation. A passing similarity evaluation result means that the image features of the image to be evaluated can replace the evaluation results of the same image in the already evaluated image set.
[0198] Specifically, the server determines whether the evaluation information exceeds a preset evaluation threshold and obtains a pass result for the similarity evaluation of the image to be evaluated. If the evaluation information does not exceed the preset evaluation threshold, it obtains a fail result for the similarity evaluation of the image to be evaluated. In this case, the image to be evaluated can be sent to a manual evaluation terminal for manual evaluation.
[0199] In a specific embodiment, such as Figure 12 As shown, a schematic diagram of an image processing framework is provided, specifically:
[0200] The server obtains the image to be evaluated and the anchor image set, which serves as a reference for evaluating the image to be evaluated. The image to be evaluated is an image that has been evaluated using the feature extraction teacher network. When training the feature extraction student network, this image needs to be re-evaluated. At this time, the server inputs the image to be evaluated and the anchor image set into the feature extraction teacher network for feature extraction, obtaining the feature vector corresponding to the image to be evaluated and the feature vector matrix corresponding to the anchor image set. Then, the similarity between the feature vector and the feature vector matrix is calculated to obtain the teacher similarity vector. Then, the image to be evaluated and the anchor image set are input into the feature extraction student network for feature extraction, obtaining the feature vector corresponding to the image to be evaluated and the feature vector matrix corresponding to the anchor image set. Then, the similarity between the feature vector and the feature vector matrix is calculated to obtain the student similarity vector. Then, the evaluation score corresponding to the image to be evaluated is calculated based on the teacher similarity vector and the student similarity vector using formula (6). This evaluation score is used to characterize the change in the relative position of the image to be evaluated in the student network feature space relative to the teacher network feature space. When the evaluation score exceeds the preset evaluation score, the image to be evaluated passes the evaluation, and then the image to be evaluated can be used to update the evaluated image database.
[0201] In a specific embodiment, such as Figure 13 As shown, an image classification method specifically includes the following steps:
[0202] Step 1302: Obtain the current training image set, which is determined from a preset training image set;
[0203] Step 1304: Input each training image in the current training image set into the trained feature extraction model and the initial feature extraction model to be trained respectively to extract features, and obtain the trained features corresponding to each training image and the features to be trained corresponding to each training image.
[0204] Step 1306: Obtain the trained feature matrix based on the trained features corresponding to each training image, and normalize the trained feature matrix to obtain the trained normalized matrix; calculate the transpose matrix corresponding to the trained normalized matrix to obtain the trained transpose matrix; calculate the product of the trained transpose matrix and the trained normalized matrix to obtain the trained similarity set.
[0205] Step 1308: Obtain the training feature matrix based on the training features corresponding to each training image, and normalize the training feature matrix to obtain the training normalized matrix; calculate the transpose matrix corresponding to the training normalized matrix to obtain the training transpose matrix; calculate the product of the training transpose matrix and the training normalized matrix to obtain the training similarity set.
[0206] Step 1310: Calculate the mean squared error between the similarity set to be trained and the similarity set already trained to obtain the initial loss information; obtain the number of training images corresponding to the current training image set, calculate the ratio of the initial loss information to the number of training images, and obtain the average loss information.
[0207] Step 1312: Obtain preset balance parameters, perform balance calculation on the average loss information based on the preset balance parameters to obtain balance loss information; obtain the classification loss information corresponding to the initial feature extraction model to be trained, and calculate the sum of the classification loss information and the balance loss information to obtain error information.
[0208] Step 1314: Based on the error information, update the model parameters in the initial feature extraction model in reverse to obtain the updated feature extraction model; use the updated feature extraction model as the initial feature extraction model, and return to the step of obtaining the current training image set for iterative execution until the training completion condition is met. Then, use the trained initial feature extraction model as the target feature extraction model. The target feature extraction model is used to extract the features corresponding to the input image, and perform image content classification based on the features corresponding to the input image.
[0209] This application also provides an application scenario that applies the aforementioned feature extraction model training method and image processing method, such as... Figure 14The diagram illustrates the framework of an application scenario. Specifically, in the content moderation application scenario of image retrieval, a sensitive content image database needs to be established. Representative images without serious false positives are added to the database, and these are then matched with the images to be reviewed to filter sensitive content. After the feature extraction teacher network model has been used for a period of time, it needs to be updated. At this time, an initial feature extraction student network model can be established. Distillation learning is then performed based on the feature extraction teacher network model and the feature extraction student network model. This involves obtaining the current training image set, which is determined from a preset training image set. Each training image in the current training image set is then input into the feature extraction teacher network model and the initial feature extraction student network model for feature extraction, resulting in the trained features corresponding to each training image and the features to be trained corresponding to each training image. Then, the similarity between the trained features corresponding to each training image is calculated to obtain a trained similarity set. The similarity between the features to be trained corresponding to each training image is also calculated to obtain a training similarity set. Error information between the training similarity set and the trained similarity set is calculated, and the initial feature extraction student network model is updated based on this error information. The step of obtaining the current training image set is then iteratively executed until the training completion condition is met. The trained initial feature extraction student network model is then used as the target feature extraction student network model. The target feature extraction student network model is then used to clean the image seeds in the sensitive content image database, i.e., to re-evaluate them. Images with evaluation scores exceeding a preset threshold are saved to the updated sensitive image database. Images with evaluation scores below the preset threshold are manually evaluated, and images that pass the manual evaluation are also saved to the updated sensitive image database, thus obtaining the updated sensitive image database.
[0210] In one specific embodiment, this application also provides an application scenario in which the above-described feature extraction model training method is applied. In a face recognition application scenario, the server acquires a face image to be recognized and inputs it into a first target feature extraction model for feature extraction to obtain the features of the face image to be recognized. The first target feature extraction model acquires a current training image set, which is determined from a preset training image set. Each training image in the current training image set is input into a trained feature extraction model and an initial feature extraction model to be trained for feature extraction, resulting in trained features and untrained features corresponding to each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model. The similarity between the trained features corresponding to each training image is calculated to obtain a trained similarity set, and the similarity between the untrained features corresponding to each training image is also calculated. The process involves obtaining a set of similarity samples to be trained, calculating the error information between the set of similarity samples to be trained and the already trained similarity samples, updating the initial feature extraction model to be trained based on the error information, and iteratively executing the step of obtaining the current training image set until the training completion condition is met. The trained initial feature extraction model is then used as the first target feature extraction model. The similarity is then calculated between the features of the face image to be identified and the existing face image features in the face database. When an existing face image feature with a similarity exceeding a preset face similarity threshold is found, the face identity information corresponding to the existing face image feature is obtained, thereby obtaining the face identity information of the face image to be identified.
[0211] It should be understood that, although Figures 2 to 13 The steps in the flowchart are shown sequentially as indicated by the arrows, but these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise explicitly stated in this document, there is no strict order in which these steps are executed; they can be performed in other orders. Furthermore, Figures 2 to 13 At least some of the steps in the process may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but may be executed at different times. The execution order of these steps or stages is not necessarily sequential, but may be executed in turn or alternately with other steps or at least some of the steps or stages in other steps.
[0212] In one embodiment, such as Figure 15 As shown, an image classification device 1500 is provided. This device can be a software module, a hardware module, or a combination of both as part of a computer device. Specifically, the device includes: an image acquisition module 1502, a feature extraction module 1504, a similarity calculation module 1506, and an iterative update module 1508, wherein:
[0213] The image acquisition module 1502 is used to acquire the current training image set, which is determined from a preset training image set;
[0214] The feature extraction module 1504 is used to input each training image in the current training image set into the trained feature extraction model and the initial feature extraction model to be trained for feature extraction, so as to obtain the trained features corresponding to each training image and the features to be trained corresponding to each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model.
[0215] The similarity calculation module 1506 is used to calculate the similarity between the trained features corresponding to each training image to obtain a set of trained similarity, and to calculate the similarity between the features to be trained corresponding to each training image to obtain a set of to be trained similarity.
[0216] The iterative update module 1508 is used to calculate the error information between the similarity set to be trained and the similarity set already trained, update the initial feature extraction model to be trained based on the error information, and return to iteratively execute the step of obtaining the current training image set until the training completion condition is met. Then, the initial feature extraction model that has been trained is used as the first target feature extraction model, which is used to extract the features corresponding to the input image.
[0217] In one embodiment, the similarity calculation module 1506 is further configured to obtain a trained feature matrix based on the trained features corresponding to each training image, and normalize the trained feature matrix to obtain a trained normalized matrix; calculate the transpose matrix corresponding to the trained normalized matrix to obtain a trained transpose matrix; and calculate the product of the trained transpose matrix and the trained normalized matrix to obtain a trained similarity set.
[0218] In one embodiment, the similarity calculation module 1506 is further configured to obtain a training feature matrix based on the training features corresponding to each training image, and normalize the training feature matrix to obtain a training normalized matrix; calculate the transpose matrix corresponding to the training normalized matrix to obtain a training transpose matrix; and calculate the product of the training transpose matrix and the training normalized matrix to obtain a training similarity set.
[0219] In one embodiment, the iterative update module 1508 is further configured to calculate the mean squared error between the similarity set to be trained and the similarity set already trained, obtain initial loss information, use the initial loss information as error information; update the model parameters in the initial feature extraction model in reverse based on the error information, obtain the updated feature extraction model; use the updated feature extraction model as the initial feature extraction model, and return to iteratively execute the step of obtaining the current training image set.
[0220] In one embodiment, the iterative update module 1508 is further configured to calculate the mean squared error between the similarity set to be trained and the similarity set already trained to obtain initial loss information; obtain the number of training images corresponding to the current training image set, calculate the ratio of the initial loss information to the number of training images to obtain average loss information; obtain a preset balance parameter, perform a balance calculation on the average loss information based on the preset balance parameter to obtain balanced loss information; obtain the classification loss information corresponding to the initial feature extraction model to be trained, and calculate the sum of the classification loss information and the balanced loss information to obtain error information.
[0221] In one embodiment, the iterative update module 1508 is further configured to input the similarity set to be trained into the initial mapping network for feature mapping to obtain the target mapping feature set; calculate the mean square error between the target mapping feature set and the trained similarity set to obtain target error information; update the initial mapping network and the initial feature extraction model in reverse based on the target error information to obtain the updated mapping network and the updated feature extraction model; use the updated mapping network as the initial mapping network and the updated feature extraction model as the initial feature extraction model, and return to the step of obtaining the current training image set for iterative execution until the training completion condition is met, and obtain the second target feature extraction model based on the trained initial feature extraction model and the trained initial mapping network.
[0222] In one embodiment, the initial feature extraction model is an initial generated model; the iterative update module 1508 is further configured to input the similarity set to be trained and the similarity set already trained into the initial discriminant network for discrimination, and obtain the similarity discrimination result; update the initial discriminant network and the initial feature extraction model based on the similarity discrimination result, and return to the step of obtaining the current training image set for iterative execution until the training completion condition is met, and use the trained initial feature extraction model as the third target feature extraction model.
[0223] In one embodiment, the current training image set includes image triples, and the image triples include positive and negative image pairs; the image classification device 1500 further includes:
[0224] The contrastive learning module is used to input each image triplet into the trained feature extraction model and the initial feature extraction model to be trained, respectively, to extract features and obtain the trained triplet features and the initial triplet features corresponding to each image triplet. Based on the trained triplet features and the initial triplet features, the triplet loss is calculated to obtain the initial triplet loss information. Based on the initial triplet loss information, the initial feature extraction model is updated in reverse, and the step of obtaining the current training image set is iteratively executed until the training completion condition is met. The trained initial feature extraction model is then used as the fourth target feature extraction model.
[0225] In one embodiment, the image classification device 1500 further includes:
[0226] The model uses a module to acquire the image to be reviewed, input the image to be reviewed into the first target feature extraction model for feature extraction, and obtain the features to be reviewed; acquire the reviewed features corresponding to the reviewed image library, calculate the similarity between the features to be reviewed and the reviewed features; and determine the review result corresponding to the image to be reviewed based on the similarity.
[0227] In one embodiment, such as Figure 16 As shown, an image processing apparatus 1600 is provided. This apparatus can be a software module, a hardware module, or a combination of both, integrated into a computer device. Specifically, the apparatus includes: an evaluation image acquisition module 1602, a first extraction module 1604, a second extraction module 1606, and an evaluation module 1608, wherein:
[0228] The image acquisition module 1602 is used to acquire the image to be evaluated and the set of evaluated images;
[0229] The first extraction module 1604 is used to input the image to be evaluated and the set of evaluated images into a trained feature extraction model for feature extraction, to obtain the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images, and to calculate the similarity between the features to be evaluated and the set of evaluated features to obtain a first similarity set.
[0230] The second extraction module 1606 is used to input the image to be evaluated and the set of evaluated images into the target feature extraction model for feature extraction, to obtain the target features to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images, and to calculate the similarity between the target features to be evaluated and the set of evaluated target features to obtain a second similarity set. The target feature extraction model is obtained by knowledge distillation training through the trained feature extraction model.
[0231] The evaluation module 1608 is used to perform evaluation calculations based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated, and to determine the similarity evaluation result corresponding to the image to be evaluated based on the evaluation information corresponding to the image to be evaluated.
[0232] In one embodiment, the first extraction module 1604 is further configured to normalize the features to be evaluated to obtain normalized features to be evaluated, and normalize the evaluated feature set to obtain a normalized evaluated feature set; transpose the normalized evaluated feature set to obtain an evaluated transpose matrix, and calculate the product of the normalized evaluated feature set and the evaluated transpose matrix to obtain a first similarity set.
[0233] In one embodiment, the second extraction module 1604 is further configured to normalize the target features to be evaluated to obtain normalized target features to be evaluated, and normalize the set of evaluated target features to obtain a normalized set of evaluated target features; transpose the normalized set of evaluated target features to obtain a transposed matrix of evaluated targets, and calculate the product of the normalized set of evaluated target features and the transposed matrix of evaluated targets to obtain a second similarity set.
[0234] In one embodiment, the evaluation module 1608 is further configured to calculate the mean square error between the first similarity set and the second similarity set to obtain target loss information; count the total number of images corresponding to the image to be evaluated and the set of evaluated images, calculate the ratio of target loss information to the total number of images, determine the evaluation information corresponding to the image to be evaluated based on the ratio; and obtain the similarity evaluation pass result corresponding to the image to be evaluated when the evaluation information exceeds a preset evaluation threshold.
[0235] Specific limitations regarding the feature extraction model training device and image processing device can be found in the limitations regarding the feature extraction model training method and image processing method above, and will not be repeated here. Each module in the aforementioned feature extraction model training device and image processing device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in the computer device in hardware form, or stored in the memory of the computer device in software form, so that the processor can call and execute the corresponding operations of each module.
[0236] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 17As shown, the computer device includes a processor, memory, and a network interface connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores training image data or evaluated image data. The network interface communicates with external terminals via a network connection. When executed by the processor, the computer program implements a feature extraction model training method or an image processing method.
[0237] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 18 As shown, the computer device includes a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a feature extraction model training method or an image processing method. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad mounted on the computer device casing, or an external keyboard, touchpad, or mouse.
[0238] Those skilled in the art will understand that Figure 17 and Figure 18 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0239] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.
[0240] In one embodiment, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0241] In one embodiment, a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, causing the computer device to perform the steps in the above method embodiments.
[0242] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, or optical storage, etc. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM), etc.
[0243] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0244] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
1. An image classification method, characterized in that, The method includes: Obtain the current training image set, which is determined from a preset training image set; Each training image in the current training image set is input into a trained feature extraction model and an initial feature extraction model to be trained for feature extraction, thereby obtaining the trained features corresponding to each training image and the features to be trained corresponding to each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model. The similarity between the trained features corresponding to each training image is calculated to obtain a trained similarity set, and the similarity between the features to be trained corresponding to each training image is calculated to obtain a training similarity set. The trained similarity set is used to characterize the feature space corresponding to the current training image set obtained by the trained feature extraction model. The training similarity set is used to characterize the feature space corresponding to the current training image set obtained by the initial feature extraction model to be trained. The process involves calculating the error information between the set of similarities to be trained and the set of similarities already trained, updating the initial feature extraction model to be trained based on the error information, and iteratively executing the step of obtaining the current training image set until the training completion condition is met. The trained initial feature extraction model is then used as the first target feature extraction model. This first target feature extraction model is used to extract features corresponding to the input image and perform image content classification based on the features corresponding to the input image. The first target feature extraction model is obtained by mining the image feature space corresponding to the trained feature extraction model and transferring the mined image feature space to the initial feature extraction model. The process includes calculating the error between each similarity to be trained in the set of similarities to be trained and the corresponding trained similarity in the set of trained similarities, and calculating the sum of all errors to obtain the error information.
2. The method according to claim 1, characterized in that, The step of calculating the similarity between the trained features corresponding to each training image to obtain a trained similarity set includes: Based on the trained features corresponding to each training image, a trained feature matrix is obtained, and the trained feature matrix is normalized to obtain a trained normalized matrix. Calculate the transpose matrix corresponding to the trained normalized matrix to obtain the trained transpose matrix; The product of the trained transpose matrix and the trained normalized matrix is calculated to obtain the trained similarity set.
3. The method according to claim 1, characterized in that, The step of calculating the similarity between the training features corresponding to each training image to obtain a training similarity set includes: The training feature matrix is obtained based on the training features corresponding to each training image, and the training feature matrix is normalized to obtain the training normalized matrix. Calculate the transpose matrix corresponding to the normalized matrix to be trained, and obtain the transpose matrix to be trained. The product of the transpose matrix to be trained and the normalized matrix to be trained is calculated to obtain the set of similarity levels to be trained.
4. The method according to claim 1, characterized in that, The step of calculating the error information between the similarity set to be trained and the similarity set already trained, updating the initial feature extraction model to be trained based on the error information, and returning to obtain the current training image set is executed iteratively, including: Calculate the mean squared error between the set of similarity to be trained and the set of similarity already trained to obtain initial loss information, and use the initial loss information as the error information; Based on the error information, the model parameters in the initial feature extraction model are updated in reverse to obtain the updated feature extraction model; The updated feature extraction model is used as the initial feature extraction model, and the steps to obtain the current training image set are iteratively executed.
5. The method according to claim 4, characterized in that, The step of calculating the mean square error between the similarity set to be trained and the similarity set already trained to obtain the error information includes: Calculate the mean squared error between the set of similarities to be trained and the set of similarities already trained to obtain the initial loss information; Obtain the number of training images corresponding to the current training image set, calculate the ratio of the initial loss information to the number of training images, and obtain the average loss information; Obtain preset balance parameters, and perform balance calculation on the average loss information based on the preset balance parameters to obtain balanced loss information; Obtain the classification loss information corresponding to the initial feature extraction model to be trained, and calculate the sum of the classification loss information and the balance loss information to obtain the error information.
6. The method according to claim 1, characterized in that, The step of calculating the error information between the similarity set to be trained and the similarity set already trained, updating the initial feature extraction model to be trained based on the error information, and returning to the step of obtaining the current training image set is iteratively executed until the training completion condition is met, and the trained initial feature extraction model is used as the first target feature extraction model, includes: The set of similarities to be trained is input into the initial mapping network for feature mapping to obtain the target mapping feature set; Calculate the mean squared error between the target mapping feature set and the trained similarity set to obtain target error information. Based on the target error information, update the initial mapping network and the initial feature extraction model in reverse to obtain the updated mapping network and the updated feature extraction model. The updated mapping network is used as the initial mapping network, and the updated feature extraction model is used as the initial feature extraction model. The step of obtaining the current training image set is returned and iteratively executed until the training completion condition is met. Then, the second target feature extraction model is obtained based on the trained initial feature extraction model and the trained initial mapping network.
7. The method according to claim 1, characterized in that, The initial feature extraction model is an initial generation model; The step of calculating the error information between the similarity set to be trained and the similarity set already trained, updating the initial feature extraction model to be trained based on the error information, and returning to the step of obtaining the current training image set is iteratively executed until the training completion condition is met, and the trained initial feature extraction model is used as the first target feature extraction model, includes: The training similarity set and the trained similarity set are input into the initial discrimination network for discrimination to obtain the similarity discrimination result; The initial discrimination network and the initial feature extraction model are updated based on the similarity discrimination result, and the step of obtaining the current training image set is iteratively executed until the training completion condition is met. Then, the trained initial feature extraction model is used as the third target feature extraction model.
8. The method according to claim 1, characterized in that, The current training image set includes image triples, and each image triple includes positive and negative image pairs; the method further includes: Each image triplet is input into a trained feature extraction model and an initial feature extraction model to be trained, respectively, to extract features, thereby obtaining the trained triplet features and the untrained triplet features corresponding to each image triplet. Based on the trained triplet features and the triplet features to be trained, triplet loss is calculated to obtain initial triplet loss information. Based on the initial triplet loss information, the initial feature extraction model is updated in reverse, and the step of obtaining the current training image set is returned and iteratively executed until the training completion condition is met. The trained initial feature extraction model is then used as the fourth target feature extraction model.
9. The method according to claim 1, characterized in that, After iteratively executing the steps of calculating the error information between the similarity set to be trained and the similarity set already trained, updating the initial feature extraction model to be trained based on the error information, and returning to obtain the current training image set, until the training completion condition is met, and then using the trained initial feature extraction model as the first target feature extraction model, the method further includes: Obtain the image to be reviewed, and input the image to be reviewed into the first target feature extraction model for feature extraction to obtain the features to be reviewed; Obtain the reviewed features corresponding to the reviewed image library, and calculate the similarity between the feature to be reviewed and the reviewed features; The review result corresponding to the image to be reviewed is determined based on the degree of similarity.
10. An image processing method, characterized in that, The method includes: Obtain the image to be evaluated and the set of already evaluated images; The image to be evaluated and the set of evaluated images are input into a trained feature extraction model for feature extraction, to obtain the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images, and the similarity between the features to be evaluated and the set of evaluated features is calculated to obtain a first similarity set; The image to be evaluated and the set of evaluated images are input into the target feature extraction model for feature extraction, thereby obtaining the target feature to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images. The similarity between the target feature to be evaluated and the set of evaluated target features is calculated to obtain a second similarity set. The target feature extraction model is obtained based on any one of the method claims in claims 1-9. Evaluation calculations are performed based on the first similarity set and the second similarity set to obtain evaluation information corresponding to the image to be evaluated, and the similarity evaluation result corresponding to the image to be evaluated is determined based on the evaluation information corresponding to the image to be evaluated.
11. The method according to claim 10, characterized in that, The step of calculating the similarity between the feature to be evaluated and the set of evaluated features to obtain a first similarity set includes: The features to be evaluated are normalized to obtain normalized features to be evaluated, and the set of evaluated features is normalized to obtain a normalized set of evaluated features. The normalized evaluated feature set is transposed to obtain the evaluated transpose matrix. The product of the normalized feature to be evaluated and the evaluated transpose matrix is calculated to obtain the first similarity set.
12. The method according to claim 10, characterized in that, The calculation of the similarity between the target feature to be evaluated and the set of already evaluated target features yields a second similarity set, including: The features of the target to be evaluated are normalized to obtain normalized features of the target to be evaluated, and the set of features of the evaluated targets is normalized to obtain a set of normalized features of the evaluated targets. The normalized set of evaluated target features is transposed to obtain the transposed matrix of evaluated targets. The product of the normalized target features to be evaluated and the transposed matrix of evaluated targets is calculated to obtain the second similarity set.
13. The method according to claim 10, characterized in that, The evaluation calculation based on the first similarity set and the second similarity set to obtain the evaluation information corresponding to the image to be evaluated, and the determination of the similarity evaluation result corresponding to the image to be evaluated based on the evaluation information, includes: Calculate the mean squared error between the first similarity set and the second similarity set to obtain the target loss information; The total number of images corresponding to the image to be evaluated and the set of evaluated images is counted, the ratio of the target loss information to the total number of images is calculated, and the evaluation information corresponding to the image to be evaluated is determined based on the ratio. When the evaluation information exceeds the preset evaluation threshold, the similarity evaluation result corresponding to the image to be evaluated is obtained.
14. An image classification device, characterized in that, The device includes: An image acquisition module is used to acquire the current training image set, which is determined from a preset training image set; The feature extraction module is used to input each training image in the current training image set into the trained feature extraction model and the initial feature extraction model to be trained for feature extraction, so as to obtain the trained features corresponding to each training image and the features to be trained corresponding to each training image. The initial feature extraction model is obtained by initializing the parameters of the trained feature extraction model. The similarity calculation module is used to calculate the similarity between the trained features corresponding to each training image to obtain a trained similarity set, and to calculate the similarity between the features to be trained corresponding to each training image to obtain a training similarity set. The trained similarity set is used to characterize the feature space corresponding to the current training image set obtained by the trained feature extraction model. The training similarity set is used to characterize the feature space corresponding to the current training image set obtained by the initial feature extraction model to be trained. An iterative update module is used to calculate the error information between the set of similarities to be trained and the set of similarities already trained, and update the initial feature extraction model to be trained based on the error information. It then iteratively executes the step of obtaining the current training image set until the training completion condition is met. The trained initial feature extraction model is then used as the first target feature extraction model. This first target feature extraction model is used to extract features corresponding to the input image and perform image content classification based on the features corresponding to the input image. The first target feature extraction model is obtained by mining the image feature space corresponding to the trained feature extraction model and transferring the mined image feature space to the initial feature extraction model. This includes: calculating the error between each similarity to be trained in the set of similarities to be trained and the corresponding trained similarity in the set of trained similarities, and calculating the sum of all errors to obtain the error information.
15. The apparatus according to claim 14, characterized in that, The similarity calculation module is also used to obtain a trained feature matrix based on the trained features corresponding to each training image, and to normalize the trained feature matrix to obtain a trained normalized matrix. Calculate the transpose matrix corresponding to the trained normalized matrix to obtain the trained transpose matrix; The product of the trained transpose matrix and the trained normalized matrix is calculated to obtain the trained similarity set.
16. The apparatus according to claim 14, characterized in that, The similarity calculation module is also used to obtain a training feature matrix based on the training features corresponding to each training image, and to normalize the training feature matrix to obtain a training normalized matrix. Calculate the transpose matrix corresponding to the normalized matrix to be trained, and obtain the transpose matrix to be trained. The product of the transpose matrix to be trained and the normalized matrix to be trained is calculated to obtain the set of similarity levels to be trained.
17. The apparatus according to claim 14, characterized in that, The iterative update module is further used to calculate the mean square error between the similarity set to be trained and the similarity set already trained, to obtain initial loss information, and to use the initial loss information as the error information; Based on the error information, the model parameters in the initial feature extraction model are updated in reverse to obtain the updated feature extraction model; the updated feature extraction model is used as the initial feature extraction model, and the step of obtaining the current training image set is iteratively executed.
18. The apparatus according to claim 17, characterized in that, The iterative update module is also used to calculate the mean squared error between the similarity set to be trained and the similarity set already trained to obtain initial loss information; obtain the number of training images corresponding to the current training image set, calculate the ratio of the initial loss information to the number of training images, and obtain average loss information; Obtain preset balance parameters, perform balance calculation on the average loss information based on the preset balance parameters to obtain balance loss information; obtain classification loss information corresponding to the initial feature extraction model to be trained, and calculate the sum of the classification loss information and the balance loss information to obtain the error information.
19. The apparatus according to claim 14, characterized in that, The iterative update module is also used to input the similarity set to be trained into the initial mapping network for feature mapping to obtain the target mapping feature set; The mean squared error between the target mapping feature set and the trained similarity set is calculated to obtain target error information. Based on the target error information, the initial mapping network and the initial feature extraction model are updated in reverse to obtain the updated mapping network and the updated feature extraction model. The updated mapping network is used as the initial mapping network, and the updated feature extraction model is used as the initial feature extraction model. The step of obtaining the current training image set is returned and iteratively executed until the training completion condition is met. Based on the trained initial feature extraction model and the trained initial mapping network, a second target feature extraction model is obtained.
20. The apparatus according to claim 14, characterized in that, The initial feature extraction model is an initial generation model; The iterative update module is also used to input the training similarity set and the trained similarity set into the initial discrimination network for discrimination, and obtain the similarity discrimination result; The initial discrimination network and the initial feature extraction model are updated based on the similarity discrimination result, and the step of obtaining the current training image set is iteratively executed until the training completion condition is met. Then, the trained initial feature extraction model is used as the third target feature extraction model.
21. The apparatus according to claim 14, characterized in that, The current training image set includes image triplets, and each image triplet includes a pair of positive and negative images; the device further includes: The contrastive learning module is used to input each image triplet into the trained feature extraction model and the initial feature extraction model to be trained, respectively, to extract features and obtain the trained triplet features and the initial triplet features corresponding to each image triplet. Based on the trained triplet features and the initial triplet features, the module calculates the triplet loss to obtain the initial triplet loss information. Based on the initial triplet loss information, the module updates the initial feature extraction model in reverse and returns to the step of obtaining the current training image set for iterative execution until the training completion condition is met. Then, the trained initial feature extraction model is used as the fourth target feature extraction model.
22. The apparatus according to claim 14, characterized in that, The device further includes: The model uses a module to acquire an image to be reviewed, input the image to be reviewed into the first target feature extraction model for feature extraction, and obtain the features to be reviewed; acquire the reviewed features corresponding to the reviewed image library, and calculate the similarity between the features to be reviewed and the reviewed features; The review result corresponding to the image to be reviewed is determined based on the degree of similarity.
23. An image processing apparatus, characterized in that, The device includes: The evaluation image acquisition module is used to acquire the image to be evaluated and the set of evaluated images; The first extraction module is used to input the image to be evaluated and the set of evaluated images into a trained feature extraction model to extract features, obtain the features to be evaluated corresponding to the image to be evaluated and the set of evaluated features corresponding to the set of evaluated images, and calculate the similarity between the features to be evaluated and the set of evaluated features to obtain a first similarity set. The second extraction module is used to input the image to be evaluated and the set of evaluated images into the target feature extraction model for feature extraction, to obtain the target feature to be evaluated corresponding to the image to be evaluated and the set of evaluated target features corresponding to the set of evaluated images, and to calculate the similarity between the target feature to be evaluated and the set of evaluated target features to obtain a second similarity set. The target feature extraction model is obtained based on any one of the method claims in claims 1-9. The evaluation module is used to perform evaluation calculations based on the first similarity set and the second similarity set to obtain evaluation information corresponding to the image to be evaluated, and to determine the similarity evaluation result corresponding to the image to be evaluated based on the evaluation information corresponding to the image to be evaluated.
24. The apparatus according to claim 23, characterized in that, The first extraction module is also used to normalize the feature to be evaluated to obtain normalized feature to be evaluated, and to normalize the evaluated feature set to obtain normalized evaluated feature set; The normalized evaluated feature set is transposed to obtain the evaluated transpose matrix. The product of the normalized feature to be evaluated and the evaluated transpose matrix is calculated to obtain the first similarity set.
25. The apparatus according to claim 23, characterized in that, The second extraction module is also used to normalize the target features to be evaluated to obtain normalized target features to be evaluated, and to normalize the set of evaluated target features to obtain a normalized set of evaluated target features. The normalized set of evaluated target features is transposed to obtain the transposed matrix of evaluated targets. The product of the normalized target features to be evaluated and the transposed matrix of evaluated targets is calculated to obtain the second similarity set.
26. The apparatus according to claim 23, characterized in that, The evaluation module is further configured to calculate the mean squared error between the first similarity set and the second similarity set to obtain target loss information; count the total number of images corresponding to the image to be evaluated and the set of evaluated images, calculate the ratio of the target loss information to the total number of images, and determine the evaluation information corresponding to the image to be evaluated based on the ratio; when the evaluation information exceeds a preset evaluation threshold, obtain the similarity evaluation pass result corresponding to the image to be evaluated.
27. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 13.
28. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 13.
29. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 13.