Age recognition model training method, recognition method, device, apparatus, and medium

By combining labeled and unlabeled image data and employing a self-supervised learning method, the model parameters are updated using pseudo-labels from unlabeled data. This solves the problem of poor accuracy in age classification models caused by the scarcity of labeled data and achieves higher accuracy in age classification.

CN119888808BActive Publication Date: 2026-06-19SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD
Filing Date
2024-12-12
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Due to the scarcity of labeled data, existing face age classification models perform poorly on face images of different age groups and cannot effectively utilize rich and diverse training data to capture features of different age groups.

Method used

By combining labeled and unlabeled image data, a self-supervised learning method is adopted, and the model parameters are updated using pseudo-labels from unlabeled data. The specific steps include training the model with labeled images in N iterations and training with unlabeled images in M ​​iterations. The model is updated cyclically through saliency augmentation and pseudo-label construction until the training stopping condition is met.

Benefits of technology

It significantly improves the accuracy of the age classification model, achieving higher age classification precision by effectively utilizing a combination of unlabeled and labeled data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119888808B_ABST
    Figure CN119888808B_ABST
Patent Text Reader

Abstract

This application relates to the field of image recognition technology, and in particular to an age recognition model training method, recognition method, device, equipment, and medium. The method constructs a first age classification model and a second age classification model. The second age classification model has the same architecture as the first age classification model, and the parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model. The second age classification model is used to construct pseudo-labels for unlabeled data. The age classification model is trained using both unlabeled and labeled data, and the model parameters are updated using age labels and pseudo-labels identified by the model for unlabeled data, respectively. This effectively improves the model's age classification accuracy through self-supervised learning by accurately constructing pseudo-labels.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image recognition technology, and in particular to an age recognition model training method, recognition method, device, equipment and medium. Background Technology

[0002] Currently, in research on face age classification, the scarcity of labeled data has a significant negative impact on the generalization ability of models. Although there are a large number of publicly available face recognition datasets in related fields, these datasets cannot be directly used for age classification tasks due to the lack of age annotations. Since age classification models typically rely on rich and diverse training data to capture features of different age groups, insufficient data leads to poor model performance on face images of different age groups.

[0003] Therefore, how to combine labeled and unlabeled image data to train age classification and improve its accuracy has become an urgent problem to be solved. Summary of the Invention

[0004] In view of this, embodiments of this application provide an age recognition model training method, recognition method, apparatus, device, and medium to solve the problem of how to combine labeled and unlabeled image data to train age classification in order to improve the accuracy of age classification.

[0005] In a first aspect, embodiments of this application provide a method for training an age recognition model, including:

[0006] Obtain labeled and unlabeled datasets, wherein the labeled dataset includes at least one labeled image and a corresponding age label, and the unlabeled dataset includes at least one unlabeled image;

[0007] In N iterations, the labeled image is input into the first age classification model to obtain the first age classification result. Based on the first age classification result and the age label, the parameters of the first age classification model are updated to obtain the updated first age classification model. N is an integer greater than zero.

[0008] In M iterations, the unlabeled image is input into the updated first age classification model to obtain the second age classification result, and the unlabeled image is input into the updated second age classification model to obtain the third age classification result. The second age classification model has the same architecture as the first age classification model. The parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model, and M is an integer greater than zero.

[0009] Based on the second age classification result and the third age classification result, the updated first age classification model is updated to obtain a further updated first age classification model. The further updated first age classification model is used as the first age classification model. The process of inputting the labeled image into the first age classification model is returned to obtain the first age classification result until the training stopping condition is met, and the trained first age classification model is obtained.

[0010] Secondly, embodiments of this application provide a method for facial age recognition, comprising:

[0011] Acquire the image to be recognized;

[0012] The trained first age classification model is executed using the age recognition model training method described in the first aspect above to classify the image to be identified, thereby obtaining the age classification result corresponding to the image to be identified.

[0013] Thirdly, an embodiment of this application provides an age recognition model training device, comprising:

[0014] The data acquisition module is used to acquire labeled datasets and unlabeled datasets. The labeled datasets include at least one labeled image and a corresponding age label, and the unlabeled datasets include at least one unlabeled image.

[0015] The labeled image training module is used to input the labeled image into the first age classification model in N iterations to obtain the first age classification result, and update the parameters of the first age classification model according to the first age classification result and the age label to obtain the updated first age classification model, where N is an integer greater than zero;

[0016] An unlabeled image training module is used to input the unlabeled image into the updated first age classification model in M ​​iterations to obtain a second age classification result, and input the unlabeled image into the updated second age classification model to obtain a third age classification result. The second age classification model has the same architecture as the first age classification model. The parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model, and M is an integer greater than zero.

[0017] The iterative training module is used to update the updated first age classification model based on the second age classification result and the third age classification result, to obtain a further updated first age classification model, and to use the further updated first age classification model as the first age classification model. Then, it returns to the step of inputting the labeled image into the first age classification model to obtain the first age classification result, until the training stopping condition is met, and a trained first age classification model is obtained.

[0018] Fourthly, embodiments of this application provide a recognition device for facial age recognition, comprising:

[0019] The image acquisition module is used to acquire the image to be recognized;

[0020] The age classification and recognition module is used to execute the trained first age classification model obtained by the age recognition model training method described in the first aspect above, classify the image to be recognized, and obtain the age classification result corresponding to the image to be recognized.

[0021] Fifthly, embodiments of this application provide a computer device, the computer device including a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements the age recognition model training method as described in the first aspect, or the recognition method for facial age recognition as described in the second aspect.

[0022] Sixthly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the age recognition model training method as described in the first aspect, or the recognition method for facial age recognition as described in the second aspect.

[0023] The beneficial effects of the embodiments in this application compared with the prior art are:

[0024] The method of this application involves inputting labeled images into a first age classification model in N iterations to obtain a first age classification result. Based on the first age classification result and age labels, the parameters of the first age classification model are updated to obtain an updated first age classification model. In M iterations, unlabeled images are input into the updated first age classification model to obtain a second age classification result. Unlabeled images are input into the updated second age classification model to obtain a third age classification result. Based on the second and third age classification results, the updated first age classification model is updated again to obtain a further updated first age classification model. This further updated first age classification model is used as the first age classification model and iterated until the training stopping condition is met to obtain a trained first age classification model. This trained first age classification model is used to classify the image to be recognized to obtain the age classification result of the corresponding image.

[0025] The model is trained using both unlabeled and labeled data. The model parameters are then updated using age labels and pseudo-labels identified by the model from unlabeled data. This approach effectively improves the accuracy of age classification by enabling self-supervised learning through accurate pseudo-label construction. Attached Figure Description

[0026] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0027] Figure 1 This is a schematic diagram of an application environment for an age recognition model training method or a recognition method for face age recognition provided in Embodiment 1 of this application;

[0028] Figure 2 This is a flowchart illustrating an age recognition model training method provided in Embodiment 2 of this application;

[0029] Figure 3 This is a flowchart illustrating an age recognition model training method provided in Embodiment 3 of this application;

[0030] Figure 4 This is a flowchart illustrating an age recognition model training method provided in Embodiment 4 of this application;

[0031] Figure 5 This is a flowchart of an age recognition model training method provided in Embodiment 5 of this application;

[0032] Figure 6 This is a flowchart illustrating a facial age recognition method provided in Embodiment Six of this application;

[0033] Figure 7 This is a schematic diagram of the structure of an age recognition model training device provided in Embodiment 7 of this application;

[0034] Figure 8 This is a schematic diagram of the structure of a facial age recognition device provided in Embodiment 8 of this application;

[0035] Figure 9 This is a schematic diagram of the structure of a computer device provided in Embodiment 9 of this application. Detailed Implementation

[0036] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.

[0037] It should be understood that, when used in this application specification and the appended claims, the term "comprising" indicates the presence of the described features, integrals, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or a collection thereof.

[0038] It should also be understood that the term “and / or” as used in this application specification and the appended claims means any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0039] As used in this application specification and the appended claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if detected [the described condition or event]" may be interpreted, depending on the context, as meaning "once determined," "in response to determination," "once detected [the described condition or event]," or "in response to detection [the described condition or event]."

[0040] Furthermore, in the description of this application and the appended claims, the terms "first," "second," "third," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0041] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized.

[0042] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.

[0043] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.

[0044] It should be understood that the sequence number of each step in the following embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0045] To illustrate the technical solution of this application, specific embodiments are described below.

[0046] The age recognition model training method or the recognition method for face age recognition provided in Embodiment 1 of this application can be applied to, for example, Figure 1 In this application environment, the client communicates with the server. The client can send images to be classified to the server, and the server will then perform classification and recognition.

[0047] Both the age recognition model training method and the recognition method for face age recognition can be applied to the server. The server is used to deploy and implement the models corresponding to the age recognition model training method and the recognition method for face age recognition. The model corresponding to the age recognition model training method is an untrained model, which is trained by acquiring image data from the client or database. The model corresponding to the face age recognition method is a model trained using the age recognition model training method, which is used to obtain client instructions for retrieval or to build an image database for retrieval.

[0048] The client side includes, but is not limited to, PDAs, desktop computers, laptops, ultra-mobile personal computers (UMPCs), netbooks, cloud terminal devices, and personal digital assistants (PDAs). The server side can be a standalone server or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms.

[0049] See Figure 2This is a flowchart illustrating an age recognition model training method provided in Embodiment 2 of this application. The above-described age recognition model training method can be applied to... Figure 1 The server-side component connects to the client, database, and other resources to obtain and process the relevant data.

[0050] like Figure 2 As shown, the training method for this age recognition model may include the following steps:

[0051] Step S201: Obtain the labeled dataset and the unlabeled dataset.

[0052] The labeled dataset includes at least one labeled image and its corresponding age label, while the unlabeled dataset includes at least one unlabeled image.

[0053] In this embodiment, the face recognition model is essentially an age classification model configured to recognize the age of a face. Of course, this age is obtained by analyzing the features of the face, and it can be used in some compliant scenarios. The collected face images and age data are also obtained through compliant means.

[0054] To address the issue of limited datasets containing age-labeled images, unlabeled images without age labels are also used in this embodiment for training the model, thereby enabling supervised training of the age classification model.

[0055] Among them, the age classification model can directly predict the age, which is a regression prediction. Of course, the age classification model can also not directly predict the age of the face, but instead predict the age range of the face. The age is divided into eight age ranges: infant (0-2 years old), toddler (3-5 years old), child (6-12 years old), adolescent (13-17 years old), young adult (18-24 years old), middle-aged (25-39 years old), middle-aged and elderly (40-59 years old), and elderly (60 years old and above). The regression prediction problem is transformed into a classification problem.

[0056] The labeled and unlabeled datasets mentioned above can be stored in a database for use during server-side training. These datasets can be used as a single dataset to train other models. For example, images from both datasets can be used to train a face recognition model that predicts the location of faces in images, face classification, and confidence levels. Furthermore, this face recognition model can be used for image saliency calculation and can also be used to initialize an age classification model to improve its accuracy in face recognition. Training can then be performed based on this model.

[0057] In step S202, during N iterations, the labeled image is input into the first age classification model to obtain the first age classification result. Based on the first age classification result and the age label, the parameters of the first age classification model are updated to obtain the updated first age classification model.

[0058] Where N is an integer greater than zero.

[0059] In this embodiment, the first age classification model is trained using labeled images from the labeled dataset, with training iterations of 1, 2, ..., N times. Then, it is trained using unlabeled images from the unlabeled dataset. N can be set according to requirements, for example, N=5.

[0060] The first age classification model is trained using a labeled dataset, employing supervised training methods such as contrastive loss and cross-entropy loss. This involves comparing the age classification result of an image with its age label, calculating the loss, and then using this loss to adjust the parameters of the backbone network in the first age classification model, thus obtaining an updated model. For example, the encoder and decoder are part of the backbone network in the age classification model.

[0061] In one implementation, stochastic gradient descent (SGD) is used to optimize parameter updates. SGD updates model parameters using only the gradient information of one sample in each iteration. Besides SGD, other optimizers, such as momentum optimizers and adaptive learning rate optimizers (e.g., Adagrad, RMSprop, Adam), can improve optimization performance.

[0062] Optionally, after updating the parameters of the first age classification model based on the first age classification results and age labels to obtain the updated first age classification model, the process also includes:

[0063] Obtain the current parameters of the second age classification model and the first updated parameters of the updated first age classification model;

[0064] The current parameters and the first updated parameters are weighted and averaged to obtain the second updated parameters. The second updated parameters are then used to update the parameters of the second age classification model, resulting in the updated second age classification model.

[0065] The second age classification model has the same architecture as the first age classification model. The parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model. Therefore, in this embodiment, the current parameters of the second age classification model and the first updated parameters of the updated first age classification model are weighted and averaged to obtain the second updated parameters. The second updated parameters are then used to update the second age classification model to obtain the updated second age classification model.

[0066] The formula for updating the parameters of the second age classification model is as follows:

[0067]

[0068] In the formula, As the first parameter to update, The second update parameter is θ, the current parameter is θ, and the decay rate is δ, which belongs to [0,1]. For example, δ = 0.99 or 0.9.

[0069] Initially, the parameters of the first age classification model and the second age classification model are the same. They can be two copied models, or they can both be updated using the parameters of the face recognition model.

[0070] In step S203, during the M iterations, the unlabeled image is input into the updated first age classification model to obtain the second age classification result, and the unlabeled image is input into the updated second age classification model to obtain the third age classification result.

[0071] The second age classification model has the same architecture as the first age classification model. The parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model, and M is an integer greater than zero.

[0072] In this embodiment, after training the first age classification model using the labeled dataset, it is then trained using unlabeled images from the unlabeled dataset. This training is repeated 1, 2, ..., M times. Subsequently, labeled images from the labeled dataset are used to train the model, forming a loop. The value of M can be designed according to requirements; for example, M=1.

[0073] Training the first age classification model requires supervised training methods such as contrastive loss and cross-entropy loss. Since unlabeled images lack age tags, this embodiment uses an updated second age classification model to predict the age classification results (i.e., the third age classification results). These third age classification results are then used as the corresponding age tags. In other words, the second age classification result (i.e., the classification result of the first age classification model) for an image is compared with the image's age tag (i.e., the third age classification result), and the resulting loss is used to readjust the parameters of the backbone network in the first age classification model.

[0074] Step S204: Based on the second age classification result and the third age classification result, update the first age classification model to obtain the updated first age classification model again. Use the updated first age classification model as the first age classification model, return to execute the input of the labeled image into the first age classification model, and obtain the first age classification result until the training stopping condition is met, and obtain the trained first age classification model.

[0075] In this embodiment, the updated first age classification model is trained using unlabeled images, thus obtaining the updated first age classification model again. The process of training the first age classification model using labeled and unlabeled datasets is repeated to obtain a trained first age classification model.

[0076] This embodiment of the application involves inputting labeled images into a first age classification model in N iterations to obtain a first age classification result. Based on the first age classification result and age labels, the parameters of the first age classification model are updated to obtain an updated first age classification model. In M iterations, unlabeled images are input into the updated first age classification model to obtain a second age classification result. Unlabeled images are then input into the updated second age classification model to obtain a third age classification result. Based on the second and third age classification results, the updated first age classification model is updated again to obtain a further updated first age classification model. This further updated first age classification model is used as the first age classification model and iterated until the training stopping condition is met to obtain a trained first age classification model. This trained first age classification model is used to classify images to be identified, obtaining the corresponding age classification result for the images. Specifically, the age classification model is trained using both unlabeled and labeled data, and the model parameters are updated using age labels and pseudo-labels identified by the model from unlabeled data. This effectively improves the model's age classification accuracy through self-supervised learning by accurately constructing pseudo-labels.

[0077] See Figure 3This is a flowchart illustrating an age recognition model training method provided in Embodiment 3 of this application. Figure 3 As shown, in step S203 above, inputting the unlabeled image into the updated first age classification model to obtain the second age classification result, and inputting the unlabeled image into the updated second age classification model to obtain the third age classification result, may include the following steps:

[0078] Step S301: Obtain two unlabeled images from the unlabeled dataset, and perform image fusion augmentation on the two unlabeled images to obtain the augmented image.

[0079] Step S302: Input the augmented image into the updated first age classification model to obtain the second age classification result.

[0080] Step S303: Input the two unlabeled images into the updated second age classification model respectively to obtain the classification results of the corresponding unlabeled images. Then, fuse the classification results of the two unlabeled images to obtain the third age classification result.

[0081] In this embodiment, two unlabeled images are used to form an augmented image to train the first age classification model, and the same method is used to form the age pseudo-label of the augmented image, so that the quality of the pseudo-label is higher.

[0082] The pseudo-label of the augmented image is obtained by fusing the classification results of the two unlabeled images corresponding to the augmented image. The classification of the two unlabeled images adopts an updated second age classification model. This updated second age classification model is updated once after the first age classification model is updated, and is not updated in the remaining calculations, so that it can obtain more accurate classification results based on the labeled image.

[0083] In this embodiment, two unlabeled images are used to augment the data to form more accurate training data, so as to obtain pseudo-labels accurately and achieve accurate classification of the first age classification model.

[0084] See Figure 4 This is a flowchart illustrating an age recognition model training method provided in Embodiment 4 of this application. Figure 4 As shown, the step S301 above, which involves image fusion and augmentation of two unlabeled images to obtain an augmented image, may include the following steps:

[0085] Step S401: Obtain the salience weights of the two unlabeled images.

[0086] Step S402: Based on the saliency weight of the two unlabeled images, perform saliency enhancement on the two unlabeled images to obtain augmented images.

[0087] The step S303 above, which involves fusing the classification results of two unlabeled images to obtain a third age classification result, may include the following steps:

[0088] Step S403: Based on the saliency weight of the two unlabeled images, the classification results of the two unlabeled images are enhanced to obtain the third age classification result.

[0089] In this embodiment, a significant augmentation method is used. The Mixup augmentation method can improve the generalization ability and robustness of the model. By linearly mixing two images and their corresponding labels, new synthetic samples are generated. This method not only increases the diversity of training data, but also helps the model learn smoother decision boundaries.

[0090] For face recognition and face age prediction models, when two face images have significant angular differences, such as frontal and side profiles, differences in occlusion, or whether the person is wearing glasses, a hat, or a mask, simply using the Mixup augmentation formula will introduce large errors. This will cause the generated image to fail to accurately reflect the face age information and become inconsistent with the label, potentially leading to poorer model convergence and affecting the model's generalization ability on real samples.

[0091] This embodiment uses Mixup augmentation based on saliency maps, as shown in the following formula:

[0092]

[0093] In the formula, Indicates an augmented image. This represents the i-th unlabeled image. This represents the j-th unlabeled image. express, This represents the third age classification result, and the classification result corresponding to the i-th unlabeled image. This represents the classification result corresponding to the j-th unlabeled image; where α and β are as follows:

[0094]

[0095] In the formula, Unlabeled image The significance of the proportion, Unlabeled image The significance of the proportion.

[0096] Optionally, before obtaining the salience weights of the two unlabeled images, the following steps are also included:

[0097] For any unlabeled image in the unlabeled dataset, construct an initial image with the same dimensions as the unlabeled image.

[0098] In the current iteration, anchor points are selected from the unlabeled image, and each anchor point is added to the initial image to obtain a synthetic image. The trained face recognition model is used to recognize the unlabeled image and the synthetic image respectively, and the first recognition result of the corresponding unlabeled image and the second recognition result of the corresponding synthetic image are obtained.

[0099] Calculate the similarity between the first recognition result and the second recognition result, use the synthesized image as the initial image, return to the execution to select anchor points on the unlabeled image, add each anchor point to the initial image, and obtain the synthesized image;

[0100] If the similarity difference after two consecutive iterations is less than the preset difference, the iteration is stopped, and the synthesized image corresponding to the last iteration is determined to be the salient image.

[0101] The saliency image is binarized to obtain a binarized saliency image. The proportion of pixels with a value of 1 in the binarized saliency image in the unlabeled image is determined as the saliency weight of the corresponding unlabeled image. All unlabeled images are traversed to obtain the saliency weight of all unlabeled images.

[0102] Specifically, a face recognition model is trained using an existing face recognition dataset to extract saliency maps from face images. The aforementioned face recognition dataset can include both labeled and unlabeled datasets.

[0103] The method for obtaining the saliency map is as follows:

[0104] Initialize an image B0 with all zeros as the initial image. The image size is the same as the face image B. Take anchor points evenly on the face image. For each anchor point, take a circular area with a radius of R (for example, 5 by default) as the receptive area. The receptive areas can overlap.

[0105] Iterate through all anchor points, adding the circular region corresponding to one anchor point to B at each iteration. t-1 In the image, the similarity between the synthesized image and the original face image is calculated, and the synthesized image with the highest similarity is taken as the synthesized image B in the t-th iteration. t .

[0106] The iteration stops when the maximum number of iterations is reached or the similarity difference is less than the threshold, and output B. t As a significance map, the significance Figure 2 Values ​​are set to 0, meaning pixels greater than 0 are assigned a value of 1, and pixels less than or equal to 0 are assigned a value of 0.

[0107] The similarity difference is calculated as follows:

[0108]

[0109] In the formula,f (*) is a face recognition model.

[0110] The above operation is performed on all images in the face recognition dataset, especially those in the unlabeled dataset, to obtain the corresponding saliency map. The saliency weight S of each image is given by the following formula:

[0111]

[0112] Among them, B t The proportion of values ​​of 1 in the binary image of face image B is used as the saliency weight of the face image.

[0113] For example, such as Figure 5 The diagram shown is a flowchart of an age recognition model training method provided in Embodiment 5 of this application. The process is as follows:

[0114] 1) Use the face recognition training set to train a face recognition model, and use the recognition model to obtain the salience weight of the face recognition training set;

[0115] 2) Initialize the face age classification model using the above recognition model, and maintain two models. One main model is updated with parameters through SGD, and the auxiliary model is updated with parameters of the main model through EMA. The parameters of the two models are the same during initialization.

[0116] 3) Train the face age classification model. The main model is trained alternately with labeled and unlabeled data. The labeled data is iterated 5 times and the unlabeled data is iterated once. The loss value of the labeled data is calculated using cross-entropy loss. The SGD optimizer is used to update and optimize the parameters. The auxiliary model parameters are updated in each iteration using EMA.

[0117] 4) When using unlabeled data, a saliency-mixup augmentation is performed between pairs of images. The augmented images are then fed into the main model (i.e., the first age classification model) to calculate the age prediction category. The two unaugmented images are fed into the auxiliary model (i.e., the second age classification model) to output the age prediction results. If the scores of a certain category predicted by both images are greater than the threshold (default 0.4), then that category is used as their pseudo-label. The augmented category label is calculated in the same way as the saliency-mixup augmentation and is calculated using the cross-entropy loss predicted by the main model.

[0118] 5) When the loss value is lower than the set threshold or the number of iterations reaches the maximum value, training is stopped, and the main model is used for age prediction.

[0119] As the model is trained iteratively, the EMA continuously updates the model and, consequently, the pseudo-labels to improve the accuracy of the pseudo-labels and reduce the interference caused by error accumulation.

[0120] This application proposes a face age classification method based on saliency enhancement by combining saliency analysis and self-supervised learning. This method effectively utilizes unlabeled and labeled data to generate high-quality pseudo-labels and enhanced samples, thereby significantly improving the age classification accuracy of the model.

[0121] See Figure 6 This is a flowchart illustrating a face age recognition method provided in Embodiment Six of this application. The above-described age recognition model training method can be applied to... Figure 1 The server-side component connects to the client, database, and other resources to obtain and process the relevant data.

[0122] like Figure 6 As shown, the facial age recognition method may include the following steps:

[0123] Step S601: Obtain the image to be recognized.

[0124] Step S602: Use the age recognition model training method to execute the trained first age classification model to classify the image to be recognized and obtain the age classification result of the corresponding image to be recognized.

[0125] This embodiment of the application involves inputting labeled images into a first age classification model in N iterations to obtain a first age classification result. Based on the first age classification result and age labels, the parameters of the first age classification model are updated to obtain an updated first age classification model. In M iterations, unlabeled images are input into the updated first age classification model to obtain a second age classification result. Unlabeled images are then input into the updated second age classification model to obtain a third age classification result. Based on the second and third age classification results, the updated first age classification model is updated again to obtain a further updated first age classification model. This further updated first age classification model is used as the first age classification model and iterated until the training stopping condition is met to obtain a trained first age classification model. This trained first age classification model is used to classify images to be identified, obtaining the corresponding age classification result for the images. Specifically, the age classification model is trained using both unlabeled and labeled data, and the model parameters are updated using age labels and pseudo-labels identified by the model from unlabeled data. This effectively improves the model's age classification accuracy through self-supervised learning by accurately constructing pseudo-labels.

[0126] Corresponding to the age recognition model training method in the above embodiment, Figure 7 This paper shows a structural block diagram of the age recognition model training device provided in Embodiment 7 of this application. The age recognition model training device can be applied to... Figure 1The server-side component connects to the client, database, etc., via a corresponding computer device to obtain and process the data. For ease of explanation, only the parts relevant to the embodiments of this application are shown.

[0127] See Figure 7 The age recognition model training device includes:

[0128] Data acquisition module 71 is used to acquire labeled datasets and unlabeled datasets. The labeled dataset includes at least one labeled image and a corresponding age label, and the unlabeled dataset includes at least one unlabeled image.

[0129] The labeled image training module 72 is used to input the labeled image into the first age classification model in N iterations to obtain the first age classification result, and update the parameters of the first age classification model according to the first age classification result and age label to obtain the updated first age classification model, where N is an integer greater than zero;

[0130] The unlabeled image training module 73 is used to input unlabeled images into the updated first age classification model in M ​​iterations to obtain the second age classification result, and input unlabeled images into the updated second age classification model to obtain the third age classification result. The second age classification model has the same architecture as the first age classification model. The parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model, and M is an integer greater than zero.

[0131] The loop training module 74 is used to update the first age classification model based on the second age classification result and the third age classification result, to obtain the updated first age classification model again, and to use the updated first age classification model as the first age classification model. Then, it returns to execute the input of the labeled image into the first age classification model to obtain the first age classification result, until the training stopping condition is met, and the trained first age classification model is obtained.

[0132] Optionally, the unlabeled image training module 73 includes:

[0133] The augmentation processing unit is used to acquire two unlabeled images from the unlabeled dataset, perform image fusion augmentation on the two unlabeled images, and obtain an augmented image.

[0134] An augmented image classification unit is used to input the augmented image into an updated first age classification model to obtain a second age classification result.

[0135] The classification fusion unit is used to input two unlabeled images into the updated second age classification model to obtain the classification results of the corresponding unlabeled images, and to fuse the classification results of the two unlabeled images to obtain the third age classification result.

[0136] Optionally, the augmentation processing unit includes:

[0137] The weight acquisition subunit is used to obtain the salience weight of two unlabeled images;

[0138] The augmentation processing subunit is used to saliency-enhance two unlabeled images based on their saliency weights to obtain an augmented image.

[0139] The classification and fusion unit includes:

[0140] The classification fusion subunit is used to enhance the saliency of the classification results of two unlabeled images based on their saliency weights, and obtain the third age classification result.

[0141] Optionally, the age recognition model training device also includes:

[0142] The initial image construction module is used to construct an initial image of the same size as the unlabeled image for any unlabeled image in the unlabeled dataset before obtaining the saliency weights of two unlabeled images.

[0143] The face recognition module is used to select anchor points from the unlabeled image in the current iteration, add each anchor point to the initial image to obtain a synthetic image, and use the trained face recognition model to recognize the unlabeled image and the synthetic image respectively to obtain the first recognition result of the corresponding unlabeled image and the second recognition result of the corresponding synthetic image.

[0144] The loop processing module is used to calculate the similarity between the first recognition result and the second recognition result, take the synthesized image as the initial image, return to execute the selection of anchor points from the unlabeled image, add each anchor point to the initial image, and obtain the synthesized image;

[0145] The saliency determination module is used to stop the iteration if the similarity difference after two consecutive iterations is less than a preset difference, and to determine the synthesized image corresponding to the last iteration as the saliency image;

[0146] The proportion calculation module is used to binarize the saliency image to obtain a binarized saliency image. The proportion of pixels with a value of 1 in the binarized saliency image in the unlabeled image is determined as the saliency proportion of the corresponding unlabeled image. The module iterates through all unlabeled images to obtain the saliency proportion of all unlabeled images.

[0147] Optionally, the age recognition model training device also includes:

[0148] The parameter acquisition module is used to update the parameters of the first age classification model based on the first age classification result and age label, and after obtaining the updated first age classification model, obtain the current parameters of the second age classification model and the first updated parameters of the updated first age classification model.

[0149] The second model update module is used to calculate the weighted average of the current parameters and the first update parameters, and the weighted average result is the second update parameter. The second update parameter is used to update the parameters of the second age classification model, and the updated second age classification model is obtained.

[0150] It should be noted that the information interaction and execution process between the above modules, units, and sub-units are based on the same concept as the method embodiments of this application. For details on their specific functions and technical effects, please refer to the method embodiments section, and they will not be repeated here.

[0151] Corresponding to the facial age recognition method in the above embodiments, Figure 8 This diagram illustrates a structural block diagram of a facial age recognition device according to Embodiment 8 of this application. The aforementioned facial age recognition device can be applied to… Figure 1 The server-side component connects to the client, database, etc., via a corresponding computer device to obtain and process the data. For ease of explanation, only the parts relevant to the embodiments of this application are shown.

[0152] See Figure 8 The facial age recognition device includes:

[0153] Image acquisition module 81 is used to acquire the image to be recognized;

[0154] The age classification and recognition module 82 is used to execute the trained first age classification model obtained by the age recognition model training method of the above embodiments, classify the image to be recognized, and obtain the age classification result of the corresponding image to be recognized.

[0155] It should be noted that the information interaction and execution process between the above modules, units, and sub-units are based on the same concept as the method embodiments of this application. For details on their specific functions and technical effects, please refer to the method embodiments section, and they will not be repeated here.

[0156] Figure 9 This is a schematic diagram of the structure of a computer device provided in Embodiment 9 of this application. Figure 9 As shown, the computer device of this embodiment includes: at least one processor ( Figure 9Only one is shown in the diagram), a memory, and a computer program stored in the memory and capable of running on at least one processor. When the processor executes the computer program, it implements the steps in any of the above-described age recognition model training methods or recognition method embodiments for face age recognition.

[0157] This computer device may include, but is not limited to, a processor and memory. Those skilled in the art will understand that... Figure 9 The examples of computer devices are merely examples and do not constitute a limitation on computer devices. Computer devices may include more or fewer components than shown in the illustration, or combinations of certain components, or different components, such as network interfaces, displays, and input devices.

[0158] The processor referred to can be a CPU, but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.

[0159] Memory includes readable storage media, internal memory, etc., wherein internal memory can be the RAM of a computer device, providing an environment for the operation of the operating system and computer-readable instructions stored in the readable storage media. The readable storage media can be the hard drive of a computer device, or in other embodiments, it can be an external storage device of the computer device, such as a plug-in hard drive, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card. Furthermore, memory can include both internal storage units and external storage devices of the computer device. Memory is used to store the operating system, applications, bootloader, data, and other programs, such as program code for computer programs. Memory can also be used to temporarily store data that has been output or will be output.

[0160] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above device can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here. If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of this application can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the above method embodiments. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. A computer-readable medium can include at least: any entity or device capable of carrying computer program code, a recording medium, a computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media. Examples include USB flash drives, portable hard drives, magnetic disks, or optical disks. In some jurisdictions, according to legislation and patent practice, computer-readable media cannot be electrical carrier signals or telecommunication signals.

[0161] The implementation of all or part of the processes in the methods of the above embodiments can also be accomplished by a computer program product. When the computer program product is run on a computer device, it enables the computer device to execute the steps in the above method embodiments.

[0162] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0163] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0164] In the embodiments provided in this application, it should be understood that the disclosed apparatus / computer devices and methods can be implemented in other ways. For example, the apparatus / computer device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0165] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0166] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.

Claims

1. A method for training an age recognition model, characterized in that, include: Obtain labeled and unlabeled datasets, wherein the labeled dataset includes at least one labeled image and a corresponding age label, and the unlabeled dataset includes at least one unlabeled image; In N iterations, the labeled image is input into the first age classification model to obtain the first age classification result. Based on the first age classification result and the age label, the parameters of the first age classification model are updated to obtain the updated first age classification model. N is an integer greater than zero. In M iterations, the unlabeled image is input into the updated first age classification model to obtain the second age classification result, and the unlabeled image is input into the updated second age classification model to obtain the third age classification result. The second age classification model has the same architecture as the first age classification model. The parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model, and M is an integer greater than zero. Based on the second age classification result and the third age classification result, the updated first age classification model is updated to obtain a further updated first age classification model. The further updated first age classification model is used as the first age classification model. The process of inputting the labeled image into the first age classification model is returned to obtain the first age classification result until the training stopping condition is met, and the trained first age classification model is obtained. The step of inputting the unlabeled image into the updated first age classification model to obtain a second age classification result, and inputting the unlabeled image into the updated second age classification model to obtain a third age classification result, includes: Two unlabeled images are obtained from the unlabeled dataset, and the two unlabeled images are fused and augmented to obtain an augmented image; The augmented image is input into the updated first age classification model to obtain the second age classification result; The two unlabeled images are respectively input into the updated second age classification model to obtain the classification results of the corresponding unlabeled images. The classification results of the two unlabeled images are then fused to obtain the third age classification result.

2. The age recognition model training method according to claim 1, characterized in that, The step of fusing and augmenting the two unlabeled images to obtain an augmented image includes: Obtain the salience weight of the two unlabeled images; Based on the saliency weight of the two unlabeled images, the two unlabeled images are saliency-enhanced to obtain augmented images; The process of fusing the classification results of the two unlabeled images to obtain a third age classification result includes: Based on the saliency weight of the two unlabeled images, the classification results of the two unlabeled images are enhanced to obtain the third age classification result.

3. The age recognition model training method according to claim 2, characterized in that, Before obtaining the salience weights of the two unlabeled images, the method further includes: For any unlabeled image in the unlabeled dataset, construct an initial image with the same image size as the unlabeled image. In the current iteration, anchor points are selected from the unlabeled image, and each anchor point is added to the initial image to obtain a synthetic image. The trained face recognition model is used to recognize the unlabeled image and the synthetic image respectively to obtain a first recognition result corresponding to the unlabeled image and a second recognition result corresponding to the synthetic image. Calculate the similarity between the first recognition result and the second recognition result, use the synthesized image as the initial image, return to execute the step of selecting anchor points from the unlabeled image, add each anchor point to the initial image, and obtain the synthesized image; If the similarity difference after two consecutive iterations is less than the preset difference, the iteration is stopped, and the synthesized image corresponding to the last iteration is determined to be the salient image. The saliency image is binarized to obtain a binarized saliency image. The proportion of pixels with a value of 1 in the binarized saliency image in the unlabeled image is determined as the saliency weight of the corresponding unlabeled image. All unlabeled images are traversed to obtain the saliency weight of all unlabeled images.

4. The age recognition model training method according to any one of claims 1 to 3, characterized in that, After updating the parameters of the first age classification model based on the first age classification result and the age label to obtain the updated first age classification model, the method further includes: Obtain the current parameters of the second age classification model and the first updated parameters of the updated first age classification model; The current parameter and the first updated parameter are weighted and averaged to obtain the second updated parameter. The second updated parameter is then used to update the parameters of the second age classification model to obtain the updated second age classification model.

5. A method for facial age recognition, characterized in that, include: Acquire the image to be recognized; The trained first age classification model is executed using the age recognition model training method as described in any one of claims 1 to 4 to classify the image to be recognized, thereby obtaining the age classification result corresponding to the image to be recognized.

6. An age recognition model training device, characterized in that, include: The data acquisition module is used to acquire labeled datasets and unlabeled datasets. The labeled datasets include at least one labeled image and a corresponding age label, and the unlabeled datasets include at least one unlabeled image. The labeled image training module is used to input the labeled image into the first age classification model in N iterations to obtain the first age classification result, and update the parameters of the first age classification model according to the first age classification result and the age label to obtain the updated first age classification model, where N is an integer greater than zero; An unlabeled image training module is used to input the unlabeled image into the updated first age classification model in M ​​iterations to obtain a second age classification result, and input the unlabeled image into the updated second age classification model to obtain a third age classification result. The second age classification model has the same architecture as the first age classification model. The parameters in the updated second age classification model are calculated based on the parameters of the updated first age classification model, and M is an integer greater than zero. The loop training module is used to update the updated first age classification model based on the second age classification result and the third age classification result, to obtain a newly updated first age classification model, to use the newly updated first age classification model as the first age classification model, and to return to the step of inputting the labeled image into the first age classification model to obtain the first age classification result, until the training stopping condition is reached, and the trained first age classification model is obtained. The unlabeled image training module includes: An augmentation processing unit is used to obtain two unlabeled images from the unlabeled dataset, perform image fusion augmentation on the two unlabeled images, and obtain an augmented image; An augmented image classification unit is used to input the augmented image into the updated first age classification model to obtain a second age classification result; The classification fusion unit is used to input the two unlabeled images into the updated second age classification model respectively to obtain the classification results of the corresponding unlabeled images, and to fuse the classification results of the two unlabeled images to obtain the third age classification result.

7. A recognition device for facial age recognition, characterized in that, include: The image acquisition module is used to acquire the image to be recognized; An age classification and recognition module is used to classify the image to be recognized by executing the trained first age classification model obtained by the age recognition model training method as described in any one of claims 1 to 4, and obtain the age classification result corresponding to the image to be recognized.

8. A computer device, characterized in that, The computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the age recognition model training method as described in any one of claims 1 to 4, or the recognition method for facial age recognition as described in claim 5.

9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the age recognition model training method as described in any one of claims 1 to 4, or the recognition method for face age recognition as described in claim 5.