A method and device for facial expression recognition based on multi-scale attention mechanism
By extracting features at different scales through a neural network model with a multi-scale attention mechanism, the impact of lighting, occlusion, and pose factors on facial expression recognition in existing technologies has been resolved, thereby improving recognition accuracy and efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTH CHINA NORMAL UNIV
- Filing Date
- 2022-06-15
- Publication Date
- 2026-06-30
AI Technical Summary
Existing facial expression recognition algorithms are affected by factors such as lighting, occlusion, and posture in real-world environments, resulting in poor recognition performance.
A neural network model based on a multi-scale attention mechanism is adopted, including a feature extraction module, a multi-scale attention mechanism module, and a feature classification module. The multi-scale attention mechanism extracts features at different scales, reducing the influence of lighting, occlusion, and pose factors.
It improves the accuracy and efficiency of facial expression recognition and reduces the impact of lighting, occlusion and posture factors on recognition.
Smart Images

Figure CN115273171B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of signal processing technology, and in particular to a method, apparatus, device, and storage medium for facial expression recognition based on a multi-scale attention mechanism. Background Technology
[0002] Facial expressions contain rich feature information and play an important role in interpersonal communication, serving as the most direct and natural signal of human emotion in daily interactions. With the rapid development of artificial intelligence technology, facial expression recognition has been widely applied in various fields such as human-computer interaction, driver fatigue, intelligent healthcare, and distance education. Although many facial expression recognition algorithms have been proposed, their recognition accuracy in real-world environments still needs improvement due to factors such as lighting, occlusion, and pose. Summary of the Invention
[0003] Based on this, the purpose of the present invention is to provide a facial expression recognition method, device, equipment, and storage medium based on a multi-scale attention mechanism. By using the multi-scale attention mechanism, features at different scales are extracted, and facial expression recognition is performed on facial expression images based on these features. This avoids the influence of factors such as lighting, occlusion, and pose, thereby improving the accuracy and efficiency of expression recognition.
[0004] In a first aspect, embodiments of this application provide a facial expression recognition method based on a multi-scale attention mechanism, comprising the following steps:
[0005] A neural network model based on a multi-scale attention mechanism is constructed, wherein the neural network model includes a feature extraction module, a multi-scale attention mechanism module, and a feature classification module connected in sequence, and the multi-scale attention mechanism module includes a first multi-scale attention module and a second multi-scale attention module.
[0006] A facial expression image to be identified is obtained, and the facial expression image is input into the neural network model. Based on the feature extraction module, the basic features of the facial expression image are obtained.
[0007] The basic features are input into the first multi-scale attention module of the multi-scale attention mechanism module to obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module.
[0008] The first multi-scale attention weight parameter and the first multi-scale weighted feature corresponding to the basic feature are input into the second multi-scale attention module of the multi-scale attention mechanism module to obtain the second multi-scale weighted feature output by the second multi-scale attention module;
[0009] Based on the second multi-scale weighted features and the feature classification module, the expression recognition probability of the facial expression image is obtained, and based on the expression recognition probability, the expression recognition result of the facial expression image is obtained.
[0010] Secondly, embodiments of this application provide a facial expression recognition device based on a multi-scale attention mechanism, comprising:
[0011] The model building module is used to build a neural network model based on a multi-scale attention mechanism. The neural network model includes a feature extraction module, a multi-scale attention mechanism module, and a feature classification module connected in sequence. The multi-scale attention mechanism module includes a first multi-scale attention module and a second multi-scale attention module.
[0012] The basic feature acquisition module is used to obtain the facial expression image to be identified, input the facial expression image into the neural network model, and obtain the basic features of the facial expression image according to the feature extraction module.
[0013] The first multi-scale weighted feature acquisition module is used to input the basic features into the first multi-scale attention module of the multi-scale attention mechanism module to obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module.
[0014] The second multi-scale weighted feature acquisition module is used to input the first multi-scale attention weight parameters and the first multi-scale weighted features corresponding to the basic features into the second multi-scale attention module of the multi-scale attention mechanism module, and obtain the second multi-scale weighted features output by the second multi-scale attention module.
[0015] The expression recognition module is used to obtain the expression recognition probability of the facial expression image based on the second multi-scale weighted features and the feature classification module, and to obtain the expression recognition result of the facial expression image based on the expression recognition probability.
[0016] Thirdly, embodiments of this application provide a computer device, including a processor, a memory, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the facial expression recognition method based on a multi-scale attention mechanism as described in the first aspect.
[0017] Fourthly, embodiments of this application provide a storage medium storing a computer program that, when executed by a processor, implements the steps of the facial expression recognition method based on a multi-scale attention mechanism as described in the first aspect.
[0018] In this application embodiment, a method, apparatus, device, and storage medium for facial expression recognition based on a multi-scale attention mechanism are provided. By using the multi-scale attention mechanism, features at different scales are extracted, and facial expression recognition is performed based on these features. This avoids the influence of factors such as lighting, occlusion, and pose, thereby improving the accuracy and efficiency of expression recognition.
[0019] To better understand and implement this invention, the following detailed description is provided in conjunction with the accompanying drawings. Attached Figure Description
[0020] Figure 1 A flowchart illustrating a facial expression recognition method based on a multi-scale attention mechanism provided in one embodiment of this application;
[0021] Figure 2 This is a schematic diagram of step S2 in the process of a facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application;
[0022] Figure 3 This is a schematic diagram of step S3 in the process of a facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application;
[0023] Figure 4 A schematic diagram of the structure of the first multi-scale attention module of the facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application;
[0024] Figure 5 A schematic diagram of step S3 in the process of a facial expression recognition method based on a multi-scale attention mechanism provided in another embodiment of this application;
[0025] Figure 6 This is a schematic diagram of step S4 in the process of a facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application;
[0026] Figure 7 This is a schematic diagram of step S5 in the process of a facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application;
[0027] Figure 8 A schematic diagram of the structure of a facial expression recognition device based on a multi-scale attention mechanism provided in one embodiment of this application;
[0028] Figure 9 This is a schematic diagram of the structure of a computer device provided in one embodiment of this application. Detailed Implementation
[0029] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0030] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.
[0031] It should be understood that although the terms first, second, third, etc., may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to determination."
[0032] Please see Figure 1 , Figure 1 The flowchart illustrates a facial expression recognition method based on a multi-scale attention mechanism according to an embodiment of this application. The method includes the following steps:
[0033] S1: Construct a neural network model based on a multi-scale attention mechanism.
[0034] The execution subject of the facial expression recognition method based on the multi-scale attention mechanism is the recognition device (hereinafter referred to as the recognition device). In an optional embodiment, the recognition device may be a computer device, a server, or a server cluster composed of multiple computer devices.
[0035] The neural network model is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. The neural network structure includes ResNet series convolutional neural network structures, Transformer and Bert series self-attention neural network structures, as well as LSTM, ELMO series sequence neural network structures, etc.
[0036] In this embodiment, the recognition device adopts a ResNet18 convolutional neural network structure to construct a neural network model based on a multi-scale attention mechanism. The neural network model includes a feature extraction module, a multi-scale attention mechanism module, and a feature classification module connected in sequence.
[0037] The feature extraction module includes a 7×7 convolutional layer, a pooling layer, and several residual blocks connected in sequence. The residual blocks include a sub-convolutional layer containing two 3×3 kernels, a batch normalization (BN) layer, and a ReLU activation function connected in sequence.
[0038] The multi-scale attention mechanism module includes a first multi-scale attention module and a second multi-scale attention module. The first multi-scale attention module includes a first spatial information extraction module and a first channel information extraction module. The second multi-scale attention module includes a second spatial information extraction module and a second channel information extraction module. Both the first sub-feature extraction module and the second spatial information extraction module include 3×3 convolutional layers.
[0039] S2: Obtain the facial expression image to be identified, input the facial expression image into the neural network model, and obtain the basic features of the facial expression image according to the feature extraction module.
[0040] The facial expression image reflects the facial expression characteristics of the current person, such as happy, angry, sad, surprised, grief, disgust, and neutral, etc.
[0041] In this embodiment, the recognition device can acquire a facial expression image to be recognized input by the user, or it can acquire it from a preset database. The acquired facial expression image is input into the neural network model. According to the feature extraction module, the basic features of the facial expression image are obtained. Specifically, the facial expression image passes through a 7×7 convolutional layer to obtain an output feature map, which is then input into a pooling layer for dimensionality reduction processing. The dimensionality-reduced output feature map is then obtained and processed through several residual blocks for residual connection processing to prevent gradient vanishing and network degradation. The resulting output feature map after residual connection processing serves as the basic features of the facial expression image.
[0042] Please see Figure 2 , Figure 2 The diagram illustrates step S2 of a facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application, including steps S201 to S202, as follows:
[0043] S201: Preprocess the facial expression image to obtain a preprocessed facial expression image, wherein the preprocessing includes image face cropping and image horizontal flipping.
[0044] In this embodiment, the recognition device preprocesses the facial expression image to obtain a preprocessed facial expression image. The preprocessing includes image face cropping and image horizontal flipping. Specifically, the recognition device performs face detection and face alignment on the facial expression image, then crops the detected faces and further adjusts them to the same 256×256 size facial image. Next, to prevent overfitting, random cropping is used to randomly crop the facial image to a size of 224×224, and random horizontal flipping is used for data augmentation.
[0045] S202: Input the preprocessed facial expression image into the neural network model, and obtain the basic features of the preprocessed facial expression image according to the feature extraction module.
[0046] In this embodiment, the recognition device inputs the preprocessed facial expression image into the neural network model, and obtains the basic features of the preprocessed facial expression image according to the feature extraction module, thereby further improving the efficiency and accuracy of basic feature acquisition.
[0047] S3: Input the basic features into the first multi-scale attention module of the multi-scale attention mechanism module, and obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module.
[0048] In this embodiment, the recognition device inputs the basic features into the first multi-scale attention module of the multi-scale attention mechanism module to obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module.
[0049] In this embodiment, the recognition device inputs the basic features into the first multi-scale attention module to obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module.
[0050] Please see Figure 3 as well as Figure 4 , Figure 3 This is a schematic diagram of step S3 in the process of a facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application, including steps S301 to S304. Figure 4 A schematic diagram of the structure of the first multi-scale attention module of the facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application is shown below:
[0051] S301: Divide the basic features into several basic feature subsets, input the several basic feature subsets into the first spatial information extraction module, obtain the first spatial information corresponding to the first basic feature subset according to the preset first spatial information calculation algorithm, and accumulate the first spatial information corresponding to each basic feature subset to obtain the first spatial information corresponding to each basic feature subset.
[0052] The first spatial information extraction module includes several channels, each channel including a first spatial information calculation algorithm for spatial information calculation. In this embodiment, the identification device divides the basic features into several basic feature subsets with the same number of channels as the first spatial information extraction module, and inputs each basic feature subset into each channel of the first spatial information extraction module. According to the preset first spatial information calculation algorithm, the first spatial information corresponding to the first basic feature subset is obtained, and the first spatial information corresponding to each basic feature subset is accumulated to obtain the first spatial information corresponding to each basic feature subset. The first spatial information calculation algorithm is as follows:
[0053]
[0054] In the formula, y i y represents the first spatial information corresponding to the i-th basic feature subset. i-1 f represents the first spatial information corresponding to the (i-1)th basic feature subset. i (·) is the convolution function, x i Let be the i-th basic feature subset, and s be the number of the basic feature subsets.
[0055] By calculating the first spatial information corresponding to each set of basic feature subsets through different channels, feature information at different scales can be obtained, resulting in multi-scale spatial information and improving the accuracy of facial expression recognition.
[0056] S302: Input the first spatial information corresponding to each set of basic feature subsets into the first channel information extraction module, and obtain the first channel information corresponding to each set of basic feature subsets according to the preset first channel information calculation algorithm.
[0057] The first channel information extraction module is used to calculate multi-scale attention weight parameters for the first spatial information output by each channel in the first spatial information extraction module, including a first channel information calculation algorithm, wherein the first channel information calculation algorithm is as follows:
[0058] F i =σ(W2ρ(W1y) i ))
[0059] In the formula, F i σ() represents the first channel information corresponding to the i-th basic feature subset, ρ() represents the sigmoid function, and W1 and W2 represent the preset first weight coefficient and second weight coefficient, respectively.
[0060] In this embodiment, the identification device inputs the first spatial information corresponding to each set of basic feature subsets into the first channel information extraction module, and obtains the first channel information corresponding to each set of basic feature subsets according to the preset first channel information calculation algorithm.
[0061] S303: According to the preset first splicing algorithm, the first channel information corresponding to each set of basic feature subsets is spliced to obtain the first multi-scale attention weight parameter.
[0062] The first channel information extraction module also includes a first concatenation algorithm for concatenating multi-scale attention weight parameters, wherein the first concatenation algorithm is:
[0063] F P =Concat([F1,F2,F3,...,F i-1 ])
[0064] In the formula, F P Here are the first multi-scale attention weight parameters, and Concat() is the concatenation function;
[0065] In this embodiment, the recognition device splices the first channel information corresponding to each set of basic feature subsets according to a preset first splicing algorithm to obtain the first multi-scale attention weight parameters, so that the obtained multi-scale attention weight parameters are more comprehensive.
[0066] S304: Input the basic features and the corresponding first multi-scale attention weight parameters of the basic features into the first feature calculation module, and obtain the first multi-scale weighted features according to the preset first feature extraction algorithm.
[0067] The first feature calculation module includes a first feature extraction algorithm for performing feature calculation, wherein the first feature extraction algorithm is:
[0068]
[0069] In the formula, X P For the first multi-scale weighted feature, F P Let X be the first multi-scale attention weight parameter, and let X be the basic feature. The dot product symbol;
[0070] In this embodiment, the recognition device inputs the basic features and the corresponding first multi-scale attention weight parameters to the first feature calculation module, and obtains the first multi-scale weighted features according to the preset first feature extraction algorithm.
[0071] In an optional embodiment, the multi-scale attention mechanism module further includes a residual connection module, wherein the first multi-scale attention module is connected to the residual connection module, and the residual connection module is connected to the second multi-scale attention module; see also Figure 5 , Figure 5 A schematic diagram of step S3 in the process of a facial expression recognition method based on a multi-scale attention mechanism provided in another embodiment of this application is shown, which also includes step S305, as follows:
[0072] S305: Input the first multi-scale weighted feature into the residual connection module, perform residual connection processing, and obtain the first multi-scale weighted feature after residual connection processing.
[0073] In this embodiment, the recognition device inputs the first multi-scale weighted feature into the residual connection module to perform residual connection processing, preventing gradient vanishing and network degradation, and obtaining the first multi-scale weighted feature after residual connection processing.
[0074] S4: Input the first multi-scale attention weight parameter and the first multi-scale weighted feature corresponding to the basic feature into the second multi-scale attention module of the multi-scale attention mechanism module to obtain the second multi-scale weighted feature output by the second multi-scale attention module.
[0075] In this embodiment, the recognition device inputs the first multi-scale attention weight parameter and the first multi-scale weighted feature corresponding to the basic feature into the second multi-scale attention module of the multi-scale attention mechanism module to obtain the second multi-scale weighted feature output by the second multi-scale attention module. The second multi-scale attention module includes a second spatial information extraction module, a second channel information extraction module and a second feature calculation module connected in sequence.
[0076] Please see Figure 6 , Figure 6 The diagram illustrates step S4 of a facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application, including steps S401 to S405, as follows:
[0077] S401: Divide the first multi-scale weighted feature into several groups of first multi-scale weighted feature subsets, input the several groups of first multi-scale weighted feature subsets into the second spatial information extraction module, obtain the second spatial information corresponding to the first group of first multi-scale weighted feature subsets according to the preset second spatial information calculation algorithm, and accumulate the second spatial information corresponding to each group of first multi-scale weighted feature subsets to obtain the second spatial information corresponding to each group of basic feature subsets.
[0078] The second spatial information extraction module includes several channels. In this embodiment, the recognition device divides the first multi-scale weighted features into several groups of first multi-scale weighted feature subsets, which are the same as the number of channels, according to the number of channels in the second spatial information extraction module. Each group of first multi-scale weighted feature subsets is then input into each channel of the second spatial information extraction module. Based on a preset second spatial information calculation algorithm, the second spatial information corresponding to the first group of first multi-scale weighted feature subsets is obtained. The second spatial information corresponding to each group of first multi-scale weighted feature subsets is then accumulated. The second spatial information calculation algorithm is as follows:
[0079]
[0080] In the formula, a l f represents the second spatial information corresponding to the first multi-scale weighted feature subset of the l-th group. l (·) is the convolution function, b l Let d be the first multi-scale weighted feature subset of the l-th group, and d be the number of groups of the first multi-scale weighted feature subset. The second spatial information corresponding to each group of the first multi-scale weighted feature subset is calculated through different channels to further obtain feature information at different scales, thereby obtaining multi-scale spatial information and improving the accuracy of expression recognition.
[0081] S402: Input the second spatial information corresponding to each group of first multi-scale weighted feature subsets into the second channel information extraction module, and obtain the second channel information corresponding to each group of first multi-scale weighted feature subsets according to the preset second channel information calculation algorithm.
[0082] The second channel information extraction module is used to calculate multi-scale attention weight parameters for the second spatial information output by each channel in the second spatial information extraction module, including a second channel information calculation algorithm, wherein the second channel information calculation algorithm is as follows:
[0083] E l =σ(W2ρ(W1a) l ))
[0084] In the formula, F lThe second channel information corresponding to the first multi-scale weighted feature subset of the l-th group, σ() is the sigmoid function, ρ() is the ReLU activation function; W1 and W2 are the preset first weight coefficient and second weight coefficient, respectively;
[0085] In this embodiment, the recognition device inputs the second spatial information corresponding to each group of first multi-scale weighted feature subsets into the second channel information extraction module, and obtains the second channel information corresponding to each group of first multi-scale weighted feature subsets according to the preset second channel information calculation algorithm.
[0086] S403: According to the preset second splicing algorithm, the second channel information corresponding to each group of first multi-scale weighted feature subsets is spliced to obtain the second multi-scale attention weight parameters.
[0087] The second concatenation algorithm is as follows:
[0088] F C =Concat([F1,F2,F3,...,F l-1 ])
[0089] In the formula, F C Here are the second multi-scale attention weight parameters, and Concat() is the concatenation function;
[0090] In this embodiment, the analysis device splices the second channel information corresponding to each group of first multi-scale weighted feature subsets according to a preset second splicing algorithm to obtain the second multi-scale attention weight parameters, so that the obtained multi-scale attention weight parameters are more comprehensive.
[0091] S404: Input the first multi-scale attention weight parameter into the second channel information extraction module, and obtain the fused multi-scale attention weight parameter according to the first multi-scale attention weight parameter, the second multi-scale attention weight parameter and the preset weight fusion algorithm.
[0092] The weight fusion algorithm is as follows:
[0093] F P+C =λ(W3F P )+(1-λ)F C
[0094] In the formula, F P+C For the fused multi-scale attention weight parameters, F P λ is the first multi-scale attention weight parameter, λ is the preset fusion coefficient, and W3 is the preset third weight coefficient.
[0095] In this embodiment, the recognition device inputs the first multi-scale attention weight parameter into the second channel information extraction module. Based on the first multi-scale attention weight parameter, the second multi-scale attention weight parameter, and the preset weight fusion algorithm, the device obtains the fused multi-scale attention weight parameter. The multi-scale attention weight parameter is then fused across layers to further improve the accuracy of feature information acquisition, enabling the sharing of feature information at different stages and improving the accuracy of expression recognition.
[0096] S405: Input the first multi-scale weighted feature and the fused multi-scale attention weight parameters into the second feature calculation module, and obtain the second multi-scale weighted feature according to the preset second feature extraction algorithm.
[0097] The second feature extraction algorithm is as follows:
[0098]
[0099] In the formula, X C X is the second multi-scale weighted feature. P This is the first multi-scale weighted feature.
[0100] In this embodiment, the recognition device inputs the first multi-scale weighted feature and the fused multi-scale attention weight parameter into the second feature calculation module, and obtains the second multi-scale weighted feature according to the preset second feature extraction algorithm.
[0101] S5: Based on the second multi-scale weighted features and the feature classification module, obtain the expression recognition probability of the facial expression image, and based on the expression recognition probability, obtain the expression recognition result of the facial expression image.
[0102] In this embodiment, the recognition device obtains the expression recognition probability of the facial expression image based on the second multi-scale weighted features and the feature classification module, and obtains the expression recognition result of the facial expression image based on the expression recognition probability. The feature classification module includes a fully connected layer and a classification layer connected in sequence.
[0103] Please see Figure 7 , Figure 7 The schematic diagram of step S5 in the facial expression recognition method based on a multi-scale attention mechanism provided in an embodiment of this application includes steps S501 to S502, as follows:
[0104] S501: Input the second multi-scale weighted feature into the fully connected layer for dimensionality reduction processing to obtain the dimensionality-reduced second multi-scale weighted feature output by the fully connected layer.
[0105] In this embodiment, the recognition device inputs the second multi-scale weighted feature into the fully connected layer for dimensionality reduction processing to obtain the dimensionality-reduced second multi-scale weighted feature output by the fully connected layer, which is then used for subsequent expression recognition processing in the classification layer.
[0106] S502: Input the second multi-scale weighted feature after dimensionality reduction into the classification layer, obtain the expression polarity probability distribution vector corresponding to the facial expression image according to the preset expression recognition algorithm, obtain the expression polarity corresponding to the dimension with the highest probability according to the expression polarity probability distribution vector, and use the expression polarity as the expression recognition result of the facial expression image.
[0107] The facial expression recognition algorithm is as follows:
[0108] Y = softmax(W4X) C +b)
[0109] In the formula, Y is the probability distribution vector of the facial expression polarity, softmax() is the normalized activation function, W4 is the preset fourth weight coefficient, and b is the preset bias coefficient.
[0110] In this embodiment, the recognition device inputs the second multi-scale weighted feature after dimensionality reduction into the classification layer, obtains the expression polarity probability distribution vector corresponding to the facial expression image according to the preset expression recognition algorithm, obtains the expression polarity corresponding to the dimension with the highest probability according to the expression polarity probability distribution vector, and uses the expression polarity as the expression recognition result of the facial expression image.
[0111] Specifically, when Y = [Y positive, Y negative, Y neutral] = [0.1, 0.7, 0.2] is calculated, the probability of Y being negative is the highest, and the expression polarity corresponding to the dimension with the highest probability is negative. The expression polarity is then used as the expression recognition result of the facial expression image.
[0112] Please refer to Figure 8 , Figure 8 This is a schematic diagram of a facial expression recognition device based on a multi-scale attention mechanism according to an embodiment of this application. The device can be implemented entirely or partially through software, hardware, or a combination of both. The device 7 includes:
[0113] The model building module 81 is used to build a neural network model based on a multi-scale attention mechanism. The neural network model includes a feature extraction module, a multi-scale attention mechanism module, and a feature classification module connected in sequence. The multi-scale attention mechanism module includes a first multi-scale attention module and a second multi-scale attention module.
[0114] The basic feature acquisition module 82 is used to acquire a facial expression image to be identified, input the facial expression image into the neural network model, and obtain the basic features of the facial expression image according to the feature extraction module.
[0115] The first multi-scale weighted feature acquisition module 83 is used to input the basic features into the first multi-scale attention module of the multi-scale attention mechanism module to obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module.
[0116] The second multi-scale weighted feature acquisition module 84 is used to input the first multi-scale attention weight parameter and the first multi-scale weighted feature corresponding to the basic feature into the second multi-scale attention module of the multi-scale attention mechanism module to obtain the second multi-scale weighted feature output by the second multi-scale attention module.
[0117] The expression recognition module 85 is used to obtain the expression recognition probability of the facial expression image based on the second multi-scale weighted features and the feature classification module, and to obtain the expression recognition result of the facial expression image based on the expression recognition probability.
[0118] In this embodiment, a neural network model based on a multi-scale attention mechanism is constructed through a model building module. The neural network model includes a feature extraction module, a multi-scale attention mechanism module, and a feature classification module connected sequentially. The multi-scale attention mechanism module includes a first multi-scale attention module and a second multi-scale attention module. A basic feature acquisition module obtains a facial expression image to be recognized and inputs it into the neural network model. Based on the feature extraction module, basic features of the facial expression image are obtained. The basic features are then input into the first multi-scale weighted feature acquisition module of the multi-scale attention mechanism module. In the multi-scale attention module, the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module are obtained. Through the second multi-scale weighted feature acquisition module, the first multi-scale attention weight parameters and the first multi-scale weighted features corresponding to the basic features are input into the second multi-scale attention module of the multi-scale attention mechanism module to obtain the second multi-scale weighted features output by the second multi-scale attention module. Through the expression recognition module, based on the second multi-scale weighted features and the feature classification module, the expression recognition probability of the facial expression image is obtained, and based on the expression recognition probability, the expression recognition result of the facial expression image is obtained. By using the multi-scale attention mechanism to extract features at different scales and performing expression recognition on facial expression images based on these features, the influence of factors such as lighting, occlusion, and pose is avoided, improving the accuracy and efficiency of expression recognition.
[0119] Please refer to Figure 9 , Figure 9 This is a schematic diagram of the structure of a computer device provided in one embodiment of this application. The computer device 10 includes: a processor 91, a memory 92, and a computer program 93 stored in the memory 92 and executable on the processor 91; the computer device can store multiple instructions, which are adapted to be loaded and executed by the processor 91. Figures 1 to 3 as well as Figures 5 to 7 The method steps of the illustrated embodiment can be found in the following documentation for detailed execution. Figures 1 to 3 as well as Figures 5 to 7 The specific details of the illustrated embodiments will not be elaborated here.
[0120] The processor 91 may include one or more processing cores. The processor 91 connects to various parts of the server using various interfaces and lines. It executes various functions and processes data of the facial expression recognition device 8 based on a multi-scale attention mechanism by running or executing instructions, programs, code sets, or instruction sets stored in the memory 92, and by calling data from the memory 92. Optionally, the processor 91 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 91 may integrate one or a combination of several of the following: a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required to be displayed on the touch screen; and the modem handles wireless communication. It is understood that the modem may also be implemented separately as a single chip without being integrated into the processor 91.
[0121] The memory 92 may include random access memory (RAM) or read-only memory. Optionally, the memory 92 may include a non-transitory computer-readable storage medium. The memory 92 can be used to store instructions, programs, code, code sets, or instruction sets. The memory 92 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch instructions), instructions for implementing the various method embodiments described above, etc.; the data storage area may store data involved in the various method embodiments described above, etc. Optionally, the memory 92 may also be at least one storage device located remotely from the aforementioned processor 91.
[0122] This application also provides a storage medium that can store multiple instructions. These instructions are applicable to being loaded and executed by a processor using the method steps described in Embodiments 1 to 4 above. For details of the execution process, please refer to the specific descriptions of Embodiments 1 to 4, which will not be repeated here.
[0123] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0124] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0125] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the algorithm. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0126] In the embodiments provided by this invention, it should be understood that the disclosed apparatus / terminal devices and methods can be implemented in other ways. For example, the apparatus / terminal device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0127] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0128] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0129] If the integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of the present invention can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms.
[0130] This invention is not limited to the above-described embodiments. If any modifications or variations to this invention do not depart from the spirit and scope of this invention, and if such modifications and variations fall within the scope of the claims and equivalent technologies of this invention, then this invention also intends to include such modifications and variations.
Claims
1. A facial expression recognition method based on a multi-scale attention mechanism, characterized in that, Includes the following steps: A neural network model based on a multi-scale attention mechanism is constructed, wherein the neural network model includes a feature extraction module, a multi-scale attention mechanism module, and a feature classification module connected in sequence. The multi-scale attention mechanism module includes a first multi-scale attention module and a second multi-scale attention module; the first multi-scale attention module includes a first spatial information extraction module and a first channel information extraction module. A facial expression image to be identified is obtained, and the facial expression image is input into the neural network model. Based on the feature extraction module, the basic features of the facial expression image are obtained. The basic features are input into the first multi-scale attention module of the multi-scale attention mechanism module to obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module. The step of inputting the basic features into the first multi-scale attention module of the multi-scale attention mechanism module, and obtaining the first multi-scale attention weight parameters and the first multi-scale attention weighted features output by the first multi-scale attention module, includes the following steps: The basic features are divided into several basic feature subsets, and these subsets are input into the first spatial information extraction module. According to a preset first spatial information calculation algorithm, the first spatial information corresponding to the first set of basic feature subsets is obtained. The first spatial information corresponding to each set of basic feature subsets is then accumulated to obtain the first spatial information corresponding to each set of basic feature subsets. The first spatial information calculation algorithm is as follows: In the formula, For the first i The first spatial information corresponding to the basic feature subset of the group. For the first i -1 sets of basic feature subsets correspond to the first spatial information. It is a convolution function. For the first i Group basic feature subset, The number of groups in the subset of the basic features; The first spatial information corresponding to each set of basic feature subsets is input into the first channel information extraction module. Based on a preset first channel information calculation algorithm, the first channel information corresponding to each set of basic feature subsets is obtained. The first channel information calculation algorithm is as follows: In the formula, For the first i The first channel information corresponding to the basic feature subset of the group For the sigmoid function, It is the ReLU activation function; as well as These are the preset first weighting coefficient and the second weighting coefficient, respectively; According to a preset first concatenation algorithm, the first channel information corresponding to each set of basic feature subsets is concatenated to obtain the first multi-scale attention weight parameters, wherein the first concatenation algorithm is: In the formula, The first multi-scale attention weight parameter is... This is a concatenation function; The first multi-scale attention weight parameter and the first multi-scale attention weighted feature corresponding to the basic feature are input into the second multi-scale attention module of the multi-scale attention mechanism module to obtain the second multi-scale attention weighted feature output by the second multi-scale attention module; Based on the second multi-scale attention weighted features and the feature classification module, the expression recognition probability of the facial expression image is obtained, and based on the expression recognition probability, the expression recognition result of the facial expression image is obtained.
2. The facial expression recognition method based on a multi-scale attention mechanism according to claim 1, characterized in that, The step of inputting the facial expression image into the neural network model and obtaining the basic features of the facial expression image based on the feature extraction module includes the following steps: The facial expression image is preprocessed to obtain a preprocessed facial expression image, wherein the preprocessing includes image face cropping and image horizontal flipping; The preprocessed facial expression image is input into the neural network model, and the basic features of the preprocessed facial expression image are obtained according to the feature extraction module.
3. The facial expression recognition method based on a multi-scale attention mechanism according to claim 1, characterized in that: The first multi-scale attention module further includes a first feature calculation module, which is connected to the first channel information extraction module; The step of inputting the basic features into the first multi-scale attention module of the multi-scale attention mechanism module, and obtaining the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module, further includes the following steps: The basic features and the corresponding first multi-scale attention weight parameters are input to the first feature calculation module. The first multi-scale attention weighted features are obtained according to a preset first weighted feature extraction algorithm, wherein the first feature extraction algorithm is: In the formula, This is the first multi-scale attention-weighted feature. The first multi-scale attention weight parameter is... For the aforementioned basic features, The dot product symbol is used for elements.
4. The facial expression recognition method based on a multi-scale attention mechanism according to claim 3, characterized in that: The multi-scale attention mechanism module further includes a residual connection module, wherein the first multi-scale attention module is connected to the residual connection module, and the residual connection module is connected to the second multi-scale attention module; Before inputting the first multi-scale attention weight parameter corresponding to the basic feature and the first multi-scale weighted feature into the second multi-scale attention module of the multi-scale attention mechanism module, and obtaining the second multi-scale weighted feature output by the second multi-scale attention module, the method further includes the following steps: The first multi-scale weighted feature is input into the residual connection module for residual connection processing to obtain the first multi-scale weighted feature after residual connection processing.
5. The facial expression recognition method based on a multi-scale attention mechanism according to claim 3 or 4, characterized in that: The second multi-scale attention module includes a second spatial information extraction module, a second channel information extraction module, and a second feature calculation module connected in sequence. The step of inputting the first multi-scale attention weight parameter corresponding to the basic feature and the first multi-scale weighted feature into the second multi-scale attention module of the multi-scale attention mechanism module, and obtaining the second multi-scale weighted feature output by the second multi-scale attention module, includes the following steps: The first multi-scale weighted feature is divided into several groups of first multi-scale weighted feature subsets. These groups of first multi-scale weighted feature subsets are input into the second spatial information extraction module. According to a preset second spatial information calculation algorithm, the second spatial information corresponding to the first group of first multi-scale weighted feature subsets is obtained. The second spatial information corresponding to each group of first multi-scale weighted feature subsets is then accumulated to obtain the second spatial information corresponding to each group of basic feature subsets. The second spatial information calculation algorithm is as follows: In the formula, For the first l The second spatial information corresponding to the first multi-scale weighted feature subset of the group. For the first l The second spatial information corresponding to the first multi-scale weighted feature subset of the group. It is a convolution function. For the first l The first multi-scale weighted feature subset of the group. The number of groups in the first multi-scale weighted feature subset; The second spatial information corresponding to each group of first multi-scale weighted feature subsets is input into the second channel information extraction module. Based on a preset second channel information calculation algorithm, the second channel information corresponding to each group of first multi-scale weighted feature subsets is obtained. The second channel information calculation algorithm is as follows: In the formula, For the first l The second channel information corresponding to the first multi-scale weighted feature subset of the group. For the sigmoid function, It is the ReLU activation function; as well as These are the preset first weighting coefficient and the second weighting coefficient, respectively; According to the preset second concatenation algorithm, the second channel information corresponding to each group of first multi-scale weighted feature subsets is concatenated to obtain the second multi-scale attention weight parameters. The second concatenation algorithm is as follows: In the formula, The second multi-scale attention weight parameter is... This is a concatenation function; The first multi-scale attention weight parameters are input into the second channel information extraction module. Based on the first multi-scale attention weight parameters, the second multi-scale attention weight parameters, and a preset weight fusion algorithm, fused multi-scale attention weight parameters are obtained. The weight fusion algorithm is as follows: In the formula, The fused multi-scale attention weight parameters, The first multi-scale attention weight parameter is... The preset fusion coefficient, This is the preset third weighting coefficient; The first multi-scale weighted feature and the fused multi-scale attention weight parameters are input into the second feature calculation module. The second multi-scale weighted feature is obtained according to a preset second feature extraction algorithm, wherein the second feature extraction algorithm is: In the formula, This is the second multi-scale attention-weighted feature. This is the first multi-scale attention-weighted feature. The dot product symbol is used for elements.
6. The facial expression recognition method based on a multi-scale attention mechanism according to claim 5, characterized in that: The feature classification module includes a fully connected layer and a classification layer connected in sequence; The step of obtaining the expression recognition probability of the facial expression image based on the second multi-scale weighted features and the feature classification module, and obtaining the expression recognition result of the facial expression image based on the expression recognition probability, includes the following steps: The second multi-scale weighted feature is input into the fully connected layer for dimensionality reduction processing to obtain the dimensionality-reduced second multi-scale weighted feature output by the fully connected layer. The second multi-scale weighted feature after dimensionality reduction is input into the classification layer. Based on a preset expression recognition algorithm, the expression polarity probability distribution vector corresponding to the facial expression image is obtained. Based on the expression polarity probability distribution vector, the expression polarity corresponding to the dimension with the highest probability is obtained. This expression polarity is then used as the expression recognition result of the facial expression image. The expression recognition algorithm is as follows: In the formula, Y Let be the probability distribution vector of the facial expression polarity. ( ) is the normalized activation function. The fourth weighting coefficient is preset. b This is the preset bias coefficient.
7. A facial expression recognition device based on a multi-scale attention mechanism, characterized in that, include: A model building module is used to build a neural network model based on a multi-scale attention mechanism. The neural network model includes a feature extraction module, a multi-scale attention mechanism module, and a feature classification module connected in sequence. The multi-scale attention mechanism module includes a first multi-scale attention module and a second multi-scale attention module. The first multi-scale attention module includes a first spatial information extraction module and a first channel information extraction module. The basic feature acquisition module is used to obtain the facial expression image to be identified, input the facial expression image into the neural network model, and obtain the basic features of the facial expression image according to the feature extraction module. The first multi-scale weighted feature acquisition module is used to input the basic features into the first multi-scale attention module of the multi-scale attention mechanism module to obtain the first multi-scale attention weight parameters and the first multi-scale weighted features output by the first multi-scale attention module. The step of inputting the basic features into the first multi-scale attention module of the multi-scale attention mechanism module, and obtaining the first multi-scale attention weight parameters and the first multi-scale attention weighted features output by the first multi-scale attention module, includes the following steps: The basic features are divided into several basic feature subsets, and these subsets are input into the first spatial information extraction module. According to a preset first spatial information calculation algorithm, the first spatial information corresponding to the first set of basic feature subsets is obtained. The first spatial information corresponding to each set of basic feature subsets is then accumulated to obtain the first spatial information corresponding to each set of basic feature subsets. The first spatial information calculation algorithm is as follows: In the formula, For the first i The first spatial information corresponding to the basic feature subset of the group. For the first i -1 sets of basic feature subsets correspond to the first spatial information. It is a convolution function. For the first i Group basic feature subset, The number of groups in the subset of the basic features; The first spatial information corresponding to each set of basic feature subsets is input into the first channel information extraction module. Based on a preset first channel information calculation algorithm, the first channel information corresponding to each set of basic feature subsets is obtained. The first channel information calculation algorithm is as follows: In the formula, For the first i The first channel information corresponding to the basic feature subset of the group For the sigmoid function, It is the ReLU activation function; as well as These are the preset first weighting coefficient and the second weighting coefficient, respectively; According to a preset first concatenation algorithm, the first channel information corresponding to each set of basic feature subsets is concatenated to obtain the first multi-scale attention weight parameters, wherein the first concatenation algorithm is: In the formula, The first multi-scale attention weight parameter is... This is a concatenation function; The second multi-scale weighted feature acquisition module is used to input the first multi-scale attention weight parameters and the first multi-scale weighted features corresponding to the basic features into the second multi-scale attention module of the multi-scale attention mechanism module, and obtain the second multi-scale weighted features output by the second multi-scale attention module. The expression recognition module is used to obtain the expression recognition probability of the facial expression image based on the second multi-scale weighted features and the feature classification module, and to obtain the expression recognition result of the facial expression image based on the expression recognition probability.
8. A computer device, characterized in that, The method includes a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the facial expression recognition method based on a multi-scale attention mechanism as described in any one of claims 1 to 6.
9. A storage medium, characterized in that: The storage medium stores a computer program, which, when executed by a processor, implements the steps of the facial expression recognition method based on a multi-scale attention mechanism as described in any one of claims 1 to 6.