Identity state consensus method, device, medium and product based on multi-task learning

CN122244912APending Publication Date: 2026-06-19ONE SIDE RESEARCH & TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ONE SIDE RESEARCH & TECHNOLOGY CO LTD
Filing Date
2024-12-18
Publication Date
2026-06-19

Smart Images

  • Figure CN122244912A_ABST
    Figure CN122244912A_ABST
Patent Text Reader

Abstract

This application discloses an identity-state consensus method, device, medium, and product based on multi-task learning, relating to the field of image recognition technology. The method includes: inputting an acquired facial image into an identity-state consensus model to obtain identity recognition results and state recognition results; the identity-state consensus model is trained under a multi-task learning framework, and the training method is as follows: constructing a training dataset, where each sample includes sample facial image data and corresponding label data; constructing a target network under the multi-task learning framework, including a basic neural network, an identity recognition network, and a state recognition network; inputting each sample from the training dataset into the target network to obtain predicted identity recognition results and predicted state recognition results; determining the total loss based on the predicted recognition results and corresponding label data, and updating the parameters of the target network to obtain the identity-state consensus model. This application can ensure the accuracy and efficiency of identity recognition and state recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image recognition technology, and in particular to an identity state consensus method, device, medium and product based on multi-task learning. Background Technology

[0002] With the development of digital imaging technology, facial recognition and analysis technologies have been widely applied in various fields such as healthcare and security, helping to identify individuals or analyze facial features to obtain relevant information. However, existing facial recognition and data analysis technologies still have shortcomings in terms of accuracy and efficiency. Therefore, an improved technology is needed to address these issues. Summary of the Invention

[0003] The purpose of this application is to provide an identity state consensus method, device, medium, and product based on multi-task learning, which can improve the accuracy and efficiency of identity recognition and state recognition.

[0004] To achieve the above objectives, this application provides the following solution:

[0005] Firstly, this application provides an identity state consensus method based on multi-task learning, including:

[0006] Acquire facial images;

[0007] The facial image is input into the identity state consensus model to obtain identity recognition results and state recognition results; the state recognition results include physiological state recognition results and emotional state recognition results.

[0008] The identity state consensus model is trained within a multi-task learning framework, and the training method is as follows:

[0009] Construct a training dataset; each sample in the training dataset includes sample facial image data, as well as corresponding identity labels, physiological state labels, emotional state labels, and environmental state labels;

[0010] A target network is constructed under a multi-task learning framework; the target network includes a basic neural network, an identity recognition network, and a state recognition network; wherein, the basic neural network is used to extract features from sample facial image data to obtain shared features;

[0011] Each sample in the training dataset is input into the target network to obtain the predicted recognition result; the predicted recognition result includes the predicted identity recognition result and the predicted state recognition result.

[0012] Based on the predicted identification results and the corresponding label data, the total loss is determined, and the parameters of the target network are updated according to the total loss to obtain the identity state consensus model; the label data includes the identity label, physiological state label, emotional state label and environmental state label.

[0013] Secondly, this application provides a computer device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the identity state consensus method based on multi-task learning described above.

[0014] Thirdly, this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the aforementioned identity state consensus method based on multi-task learning.

[0015] Fourthly, this application provides a computer program product, including a computer program that, when executed by a processor, implements the identity state consensus method based on multi-task learning described above.

[0016] According to the specific embodiments provided in this application, the following technical effects are disclosed:

[0017] This application provides an identity state consensus method, device, medium, and product based on multi-task learning. The method includes: inputting an acquired facial image into an identity state consensus model to obtain an identity recognition result and a state recognition result; the state recognition result includes a physiological state recognition result and an emotional state recognition result; the identity state consensus model is trained under a multi-task learning framework, and the training method is as follows: constructing a training dataset; each sample in the training dataset includes sample facial image data, and corresponding identity labels, physiological state labels, emotional state labels, and environmental state labels; constructing a target network under the multi-task learning framework; the target network includes a basic neural network, an identity recognition network, and a state recognition network; wherein, the basic neural network is used to extract features from the sample facial image data to obtain shared features; inputting each sample in the training dataset into the target network to obtain a predicted recognition result; determining the total loss based on the predicted recognition result and the corresponding label data, and updating the parameters of the target network based on the total loss to obtain the identity state consensus model.

[0018] This application constructs an identity-state consensus model based on a multi-task learning framework. Identity recognition and state recognition mutually reinforce each other, enabling the model to learn features from both tasks simultaneously when processing facial images, thereby improving overall recognition accuracy. Furthermore, the underlying neural network in this application provides shared feature representations for both the identity recognition and state recognition networks. This sharing mechanism reduces the number of model parameters and improves training efficiency. Attached Figure Description

[0019] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0020] Figure 1 This is an application environment diagram of an identity state consensus method based on multi-task learning in one embodiment of this application;

[0021] Figure 2 A flowchart illustrating an identity state consensus method based on multi-task learning, provided as an embodiment of this application;

[0022] Figure 3 This is a schematic diagram illustrating the training method of the identity state consensus model.

[0023] Figure 4 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0024] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0025] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0026] The identity status consensus method provided in this application embodiment can be applied to, for example, Figure 1In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be set up independently, integrated into server 104, or placed in the cloud or on another server. Terminal 102 can send the facial image to be recognized to server 104. After receiving the facial image, server 104 inputs it into the identity-state consensus model to obtain identity recognition and state recognition results. Server 104 can then feed back the obtained identity recognition and state recognition results to terminal 102. Furthermore, in some embodiments, the identity-state consensus method can also be implemented independently by server 104 or terminal 102. For example, terminal 102 can directly perform identity and state recognition processing on the facial image to be recognized, or server 104 can obtain the facial image to be recognized from the data storage system and perform identity and state recognition processing on it.

[0027] Among them, terminal 102 can be, but is not limited to, various desktop computers, laptops, smartphones, tablets, IoT devices and portable wearable devices, and server 104 can be implemented by independent servers or server clusters composed of multiple servers, or it can be a cloud server.

[0028] In one exemplary embodiment, such as Figures 2-3 As shown, a multi-task learning-based identity state consensus method is provided. This method is executed by a computer device, specifically by a terminal or server alone, or by both a terminal and a server. In this embodiment, the method is applied to... Figure 1 Taking server 104 as an example, the explanation includes the following steps 201 to 202. Wherein:

[0029] Step 201: Obtain facial image.

[0030] Step 202: Input the facial image into the identity state consensus model to obtain the identity recognition result and the state recognition result; the state recognition result includes the physiological state recognition result, the emotional state recognition result and the environmental state recognition result.

[0031] Step 203: Display the identity recognition result and status recognition result on the user interface to provide immediate feedback.

[0032] Furthermore, the identity state consensus model is trained within a multi-task learning framework, and the training method is as follows: steps 301 to 304:

[0033] Step 301, construct the training dataset; specifically including:

[0034] Obtain the image of each face to be identified.

[0035] Each facial image to be identified undergoes preprocessing operations to obtain facial image data for each sample. The preprocessing operations include image scaling, normalization, and data augmentation. Image scaling involves uniformly adjusting the facial image to a specified size, such as 224×224 pixels. Normalization involves normalizing the pixel values ​​of the facial image to a value between 0 and 1, or standardizing them according to the needs of the target network (e.g., subtracting the mean or dividing by the standard deviation). Data augmentation involves randomly rotating, translating, scaling, flipping, and adjusting the brightness of the facial image to increase the diversity and robustness of the data.

[0036] Each sample facial image data is labeled to obtain corresponding identity labels, physiological state labels, emotional state labels, and environmental state labels. The labeling operation combines automatic labeling tools with manual labeling to ensure accuracy and consistency, thereby improving the training effect of the identity state consensus model. Physiological state labels include fatigue, stress, and relaxation; emotional state labels include happiness, sadness, anger, and surprise; and environmental state labels include indoor, outdoor, daytime, and nighttime. These labels help the model understand and distinguish facial features in different contexts during training, thus improving the model's accuracy and robustness in practical applications. The accuracy of the label data is crucial to model performance; therefore, a combination of automatic labeling tools and manual labeling is typically required to ensure high-quality data labeling.

[0037] A training dataset is constructed based on the facial image data of each sample, as well as the corresponding identity label, physiological state label, emotional state label, and environmental state label.

[0038] Step 302: Construct the target network under the multi-task learning framework; the target network includes a basic neural network, an identity recognition network, and a state recognition network.

[0039] Step 303: Input each sample of the training dataset into the target network to obtain the prediction recognition result; the prediction recognition result includes the prediction identity recognition result and the prediction state recognition result.

[0040] Furthermore, the step of inputting each sample from the training dataset into the target network to obtain the predicted identity recognition result and the predicted state recognition result specifically includes:

[0041] A pre-trained convolutional neural network (such as ResNet, VGG, or Inception) is used as the base neural network to extract low-level and high-level features of the face (such as edge features, texture features, and semantic features). The pre-trained convolutional neural network includes multiple convolutional layers, pooling layers, and non-linear activation function layers. The sample facial image data is sequentially input into these layers to extract multi-level features, resulting in shared features. Simultaneously, this feature extraction network is used as a shared layer in the target network within a multi-task learning framework to extract general features for identity recognition and state recognition.

[0042] The shared features are input into the identity recognition network, which includes several first fully connected layers and a first softmax layer. The shared features are sequentially input into the several first fully connected layers for feature integration to obtain a first feature vector. The first feature vector is input into the first softmax layer for identity recognition to obtain the predicted identity recognition result.

[0043] The shared features are input into a state recognition network, which includes several second fully connected layers, a second softmax layer, a third softmax layer, and a fourth softmax layer. The shared features are sequentially input into several second fully connected layers for feature integration to obtain a second feature vector. The second feature vector is used to input into the second softmax layer for physiological state recognition to obtain a predicted physiological state recognition result. The second feature vector is also used to input into the third softmax layer for emotional state recognition to obtain a predicted emotional state recognition result. The second feature vector is also used to input into the fourth softmax layer for environmental state recognition to obtain a predicted environmental state recognition result.

[0044] Step 304: Based on the predicted recognition results and the corresponding label data, determine the total loss, and update the parameters of the target network according to the total loss to obtain the identity state consensus model; the label data includes the identity label, physiological state label, emotional state label and environmental state label.

[0045] Furthermore, the step of determining the total loss based on the predicted identification results and corresponding label data, and updating the parameters of the target network based on the total loss to obtain the identity state consensus model, specifically includes:

[0046] The first loss value between the predicted identity recognition result and the corresponding identity label is calculated using the first cross-entropy loss function; the second loss value between the predicted physiological state recognition result and the corresponding physiological state label is calculated using the second cross-entropy loss function; the third loss value between the predicted emotion state recognition result and the corresponding emotion state label is calculated using the third cross-entropy loss function; the fourth loss value between the predicted environmental state recognition result and the corresponding environmental state label is calculated using the fourth cross-entropy loss function; and the first, second, third, and fourth loss values ​​are weighted using a weighted loss function to obtain the total loss.

[0047] Backpropagation is performed based on the total loss to calculate the gradient of the target network parameters.

[0048] Based on the gradient of the target network parameters, the Adam optimizer is used to update the parameters of the target network, and the above operation is repeated until the performance of the target network on the validation set reaches the expected standard, thereby obtaining the identity state consensus model.

[0049] In addition, training methods also include:

[0050] A validation dataset is set up. Based on this dataset, the performance of identity recognition and state recognition is evaluated using metrics such as accuracy, precision, recall, and F1 score. Hyperparameters such as learning rate, batch size, and loss weights are adjusted according to the validation results to optimize model performance. Cross-validation is employed, dividing the dataset into K parts, which are used as both training and validation sets for multiple training and validation runs. This ensures the model's consistency and stability across different data partitions and reduces the risk of overfitting. The trained multi-task learning model is then deployed to a high-performance computing server.

[0051] Furthermore, the trained model parameters and structure are exported in a deployable format (such as ONNX, TensorFlowSavedModel, etc.). A deep learning inference environment is then configured on a high-performance computing server, and the exported model is loaded.

[0052] Compared with existing technologies, the above-described identity state consensus method based on multi-task learning has the following advantages:

[0053] 1. Multi-task learning model: Existing technologies typically perform identity recognition or state recognition separately, employing a single-task learning model. This application, however, uses a multi-task learning model, utilizing shared features to simultaneously perform identity recognition or state recognition. This approach not only improves data utilization and training efficiency but also enhances the model's generalization ability and robustness. Furthermore, by adjusting the weights of the identity recognition or state recognition loss, the overall performance of the model is optimized.

[0054] 2. Optimization of Deep Learning Network Architecture: Existing technologies use general-purpose deep learning architectures (such as VGG, ResNet, etc.) for identity recognition or state recognition, but these are not necessarily optimized for specific tasks. This application, however, optimizes and adjusts a pre-trained deep learning architecture for specific tasks. For example, the number of network layers, filter size, and activation function are adjusted according to the specific needs of identity recognition or state recognition to improve the effectiveness of feature extraction; the multi-layered structure of the deep network extracts features at different levels, enabling the model to capture more complex and nuanced facial features.

[0055] 3. Data Augmentation and Diversity Strategies: Existing technologies may only perform basic data augmentation (such as rotation and translation) to increase the diversity of the dataset. This application employs more comprehensive and diverse data augmentation strategies to ensure stable model performance in various real-world scenarios. For example, adjusting image brightness and contrast simulates facial images under different lighting conditions; randomized expression and pose variations are implemented to increase the diversity of the training data.

[0056] 4. Adaptive Learning Rate and Optimizer Selection: Existing technologies may use a fixed learning rate and a conventional optimizer (such as SGD) for model training. This application, however, employs an adaptive learning rate and an advanced optimizer (such as Adam) for model training to accelerate convergence and improve model performance.

[0057] 5. Rigorous Model Validation and Tuning: Existing technologies may not be rigorous enough in model validation and tuning, leading to unstable model performance in practical applications. This application, however, ensures high model performance across various scenarios through a rigorous validation process and hyperparameter tuning. For example, multiple metrics such as accuracy, precision, recall, and F1 score are used to evaluate model performance, comprehensively measuring its effectiveness; cross-validation is employed to ensure consistency and stability of the model across different data partitions.

[0058] This application also provides an application scenario in which the above-described identity and state consensus method is applied. Specifically, the identity and state consensus method provided in this embodiment can be applied in a facial image recognition scenario. The facial image recognition scenario includes an identity and state recognition stage and an identity and state recognition result display stage; the identity and state consensus method provided in this embodiment includes an identity and state recognition stage and an identity and state recognition result display stage.

[0059] In one exemplary embodiment, a computer device is provided, which may be a server or a terminal, and its internal structure diagram may be as follows. Figure 4As shown, this computer device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores facial image processing data. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communication with external terminals via a network connection. When executed by the processor, the computer program implements an identity state consensus method based on multi-task learning.

[0060] Those skilled in the art will understand that Figure 4 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0061] In one exemplary embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method embodiments.

[0062] In one exemplary embodiment, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the above-described method embodiments.

[0063] In one exemplary embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above-described method embodiments.

[0064] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.

[0065] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

[0066] The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0067] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0068] This document uses specific examples to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. Furthermore, those skilled in the art will recognize that, based on the ideas of this application, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A consensus method for identity state based on multi-task learning, characterized in that, The identity state consensus method based on multi-task learning includes: Acquire facial images; The facial image is input into the identity state consensus model to obtain identity recognition results and state recognition results; the state recognition results include physiological state recognition results, emotional state recognition results, and environmental state recognition results. The identity state consensus model is trained within a multi-task learning framework, and the training method is as follows: Construct a training dataset; each sample in the training dataset includes sample facial image data, as well as corresponding identity labels, physiological state labels, emotional state labels, and environmental state labels; A target network is constructed under a multi-task learning framework; the target network includes a basic neural network, an identity recognition network, and a state recognition network; wherein, the basic neural network is used to extract features from sample facial image data to obtain shared features; Each sample in the training dataset is input into the target network to obtain the predicted recognition result; the predicted recognition result includes the predicted identity recognition result and the predicted state recognition result. Based on the predicted identification results and the corresponding label data, the total loss is determined, and the parameters of the target network are updated according to the total loss to obtain the identity state consensus model; the label data includes the identity label, physiological state label, emotional state label and environmental state label.

2. The identity state consensus method based on multi-task learning according to claim 1, characterized in that, Constructing the training dataset specifically includes: Acquire each face image to be identified; Each facial image to be identified is preprocessed to obtain facial image data for each sample; the preprocessing operations include image scaling, normalization, and data augmentation. Each sample facial image data is labeled to obtain the corresponding identity label, physiological state label, emotional state label and environmental state label for each sample facial image data. A training dataset is constructed based on the facial image data of each sample, as well as the corresponding identity label, physiological state label, emotional state label, and environmental state label.

3. The identity state consensus method based on multi-task learning according to claim 1, characterized in that, Each sample from the training dataset is input into the target network to obtain predicted identity recognition results and predicted state recognition results, specifically including: The sample facial image data is input into a basic neural network for feature extraction to obtain shared features; The shared features are input into an identity recognition network for identity recognition to obtain a predicted identity recognition result. The shared features are input into a state recognition network for state recognition to obtain a predicted state recognition result.

4. The identity state consensus method based on multi-task learning according to claim 3, characterized in that, The base neural network is a pre-trained convolutional neural network; the pre-trained convolutional neural network includes multiple convolutional layers, pooling layers, and non-linear activation function layers; the step of inputting sample facial image data into the base neural network for feature extraction to obtain shared features specifically includes: The sample facial image data is sequentially input into multiple convolutional layers, pooling layers, and nonlinear activation function layers to extract multi-level features from the sample facial image data, thereby obtaining the shared features.

5. The identity state consensus method based on multi-task learning according to claim 3, characterized in that, The identity recognition network includes several first fully connected layers and one first softmax layer; the step of inputting shared features into the identity recognition network for identity recognition to obtain a predicted identity recognition result specifically includes: The shared features are sequentially input into several first fully connected layers for feature integration to obtain a first feature vector; The first feature vector is input into the first softmax layer for identity recognition, and the predicted identity recognition result is obtained.

6. The identity state consensus method based on multi-task learning according to claim 3, characterized in that, The state recognition network includes several second fully connected layers, a second softmax layer, a third softmax layer, and a fourth softmax layer; the step of inputting shared features into the state recognition network for state recognition to obtain a predicted state recognition result specifically includes: The shared features are sequentially input into several second fully connected layers for feature integration to obtain a second feature vector; The second feature vector is input into the second softmax layer for physiological state recognition to obtain the predicted physiological state recognition result. The second feature vector is input into the third softmax layer for emotional state recognition to obtain the predicted emotional state recognition result. The second feature vector is input into the fourth softmax layer for environmental state recognition to obtain the predicted environmental state recognition result.

7. The identity state consensus method based on multi-task learning according to claim 1, characterized in that, The process of determining the total loss based on the predicted identification results and corresponding label data, and updating the parameters of the target network based on the total loss to obtain the identity state consensus model, specifically includes: The first loss value between the predicted identity recognition result and the corresponding identity label is calculated using the first cross-entropy loss function. The second loss value between the predicted physiological state recognition result and the corresponding physiological state label is calculated using the second cross-entropy loss function. The third loss value between the predicted sentiment state recognition result and the corresponding sentiment state label is calculated using the third cross-entropy loss function. The fourth loss value between the predicted environmental state recognition result and the corresponding environmental state label is calculated using the fourth cross-entropy loss function. The first, second, third, and fourth loss values ​​are weighted using a weighted loss function to obtain the total loss. Backpropagation is performed based on the total loss to calculate the gradient of the target network parameters; Based on the gradient of the target network parameters, the Adam optimizer is used to update the parameters of the target network to obtain the identity state consensus model.

8. A computer device, comprising: A memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor executes the computer program to implement the identity state consensus method based on multi-task learning as described in any one of claims 1-7.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the identity state consensus method based on multi-task learning as described in any one of claims 1-7.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the identity state consensus method based on multi-task learning as described in any one of claims 1-7.