Micro-expression recognition method based on multi-motion feature fusion

By employing a multi-motion feature fusion method, combining optical flow field, LBP-TOP, facial key point trajectory, and HOG features, and using a feature fusion model with LSTM and attention mechanisms, the problem of insufficient instantaneous change capture and high computational resource consumption in existing micro-expression recognition technologies is solved, achieving efficient and accurate micro-expression recognition.

CN119942613BActive Publication Date: 2026-06-23FOURTH MILITARY MEDICAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
FOURTH MILITARY MEDICAL UNIVERSITY
Filing Date
2025-01-06
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies are insufficient in processing subtle muscle movement features in two-dimensional images, and three-dimensional reconstruction cannot fully capture the instantaneous changes in micro-expressions. The improved Inception network module is not sensitive enough in processing short-term and local changes, and the complex model structure leads to high computational resource requirements, which may result in increased latency, especially in real-time applications.

Method used

A multi-motion feature fusion method is adopted, which uses a convolutional neural network to detect micro-expression time periods, extracts optical flow field, local binary pattern histogram (LBP-TOP), facial key point trajectory and inter-frame motion vector, and combines a feature fusion model with a long short-term memory network (LSTM) and attention mechanism to perform feature fusion and recognition. Data augmentation and lightweight techniques are used to optimize the model.

Benefits of technology

It improves the accuracy and robustness of micro-expression recognition, reduces computational complexity, enhances the model's recognition accuracy under different lighting conditions and the feasibility of real-time applications, simplifies the model structure, and reduces training time and deployment costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119942613B_ABST
    Figure CN119942613B_ABST
Patent Text Reader

Abstract

The application discloses a micro-expression recognition method based on multi-motion feature fusion, comprising the following steps: S1, acquiring a video sequence; S2, detecting a micro-expression time period; S3, extracting multi-motion features; S4, fusing feature vectors; S5, recognizing a micro-expression category; and S6, post-processing a recognition result, wherein the accuracy and robustness of recognition are improved through multi-feature extraction and deep learning technology. The video sequence to be analyzed is acquired, and a pre-constructed micro-expression detection model is used to detect a time period containing micro-expression in the video sequence. Multi-motion features are extracted from the detected time period, including an optical flow field, a local binary pattern histogram LBP-TOP, facial key point trajectories and inter-frame motion vectors. The features are input into a feature fusion model based on a long short-term memory network LSTM, a fused feature vector is obtained, and an attention mechanism is introduced to enhance the recognition of key frames. A pre-established micro-expression category database is used to recognize a micro-expression category, and the result is post-processed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer vision technology, and in particular to a micro-expression recognition method based on multi-motion feature fusion. Background Technology

[0002] With the development of computer vision and machine learning technologies, micro-expression recognition has gradually become an important research field, aiming to accurately identify people's true emotional state by capturing and analyzing these subtle facial changes through automated methods.

[0003] A search revealed Chinese patent CN114333002A, which discloses a micro-expression recognition method based on graph deep learning and 3D facial reconstruction. The method includes the following steps: constructing a graph feature learning module to obtain a one-dimensional feature vector through graph feature analysis; constructing an optical flow feature learning module to obtain a one-dimensional feature vector through optical flow feature extraction; constructing a 3D detail reconstruction module to obtain a one-dimensional feature vector; and constructing a multi-stream OGC-FL network model structure to obtain the micro-expression recognition classification result through multi-stream fusion. Compared with a single strategy, this invention's multi-strategy generation of optical flow features can select the most advantageous generation strategy for micro-expression recognition tasks. The multi-stream OGC-FL network model structure of this Chinese patent CN114333002A finds the consistency between facial key point information and dense image information in recognizing micro-expressions. The sparse spatial information of key points can be used to determine the general state of micro-expressions through GFL, while the dense image information highlights subtle facial muscle movements, providing more detailed information for MER (Mean Interpreter) extraction.

[0004] However, the inventions mentioned above use 3D reconstruction and graph feature extraction to capture sparse spatial information of facial key points in actual use, but they are insufficient when processing subtle muscle movement features in 2D images. 3D reconstruction cannot fully capture the instantaneous changes in micro-expressions. Although the improved Inception network module can capture motion information by processing optical flow features, it is not sensitive enough when processing short-term and local changes, especially with poor robustness under different lighting conditions. In addition, the model structure is relatively complex, resulting in high computational resource requirements, which may lead to increased latency, especially in real-time applications.

[0005] Therefore, a micro-expression recognition method based on multi-motion feature fusion is proposed. Summary of the Invention

[0006] The purpose of this invention is to address the shortcomings of existing technologies in processing subtle muscle movement features in two-dimensional images, the inability of three-dimensional reconstruction to fully capture instantaneous changes in micro-expressions, and the fact that while the improved Inception network module can capture motion information by processing optical flow features, it is not sensitive enough to short-term and local changes, especially exhibiting poor robustness under different lighting conditions. Furthermore, the complex model structure leads to high computational resource requirements, potentially resulting in increased latency in real-time applications. Therefore, this invention proposes a micro-expression recognition method based on the fusion of multiple motion features.

[0007] To achieve the above objectives, the present invention adopts the following technical solution:

[0008] Micro-expression recognition methods based on multi-motion feature fusion include:

[0009] S1: Obtain the video sequence to be analyzed;

[0010] S2: Input the video sequence into a pre-built micro-expression detection model, and use the micro-expression detection model to detect the time periods in the video sequence that contain micro-expressions. The micro-expression detection model is trained using a convolutional neural network (CNN), and the training dataset contains video segments labeled with the times when micro-expressions occur.

[0011] S3: Extract multiple motion features from the time period, including optical flow field, local binary mode histogram (LBP-TOP), facial key point trajectory, and inter-frame motion vector. The optical flow field is calculated using the Endo optical flow algorithm.

[0012] S4: Input the multi-motion features into the pre-trained feature fusion model, and obtain the fused feature vector through the feature fusion model. The feature fusion model is a deep learning model based on the Long Short Time Memory Network (LSTM), and an attention mechanism is introduced during the training process to enhance the recognition of key frames.

[0013] S5: Based on the fused feature vector, using a pre-established micro-expression category database, identify the micro-expression category within the time period. The micro-expression category database contains multiple micro-expression types and their corresponding typical feature vectors.

[0014] S6: Post-process the recognition results.

[0015] The above technical solution further includes:

[0016] Preferably, the micro-expression detection model in step S2 is trained through the following steps:

[0017] A standard micro-expression video dataset was collected, which contains video clips of micro-expressions in various emotional states. Each video clip was annotated by experts to indicate whether micro-expressions were present and the time period in which they occurred.

[0018] The micro-expression detection model was trained using a labeled dataset. During the training process, the cross-entropy loss function was used to optimize the model parameters until the detection accuracy reached more than 95%.

[0019] During the training of the micro-expression detection model, data augmentation techniques are employed, including random cropping, flipping, and adding Gaussian noise;

[0020] After training, cross-validation is used to evaluate the model's generalization ability, and hyperparameters are tuned to optimize model performance.

[0021] Preferably, the extraction of multiple motion features in step S3 includes:

[0022] The Endo optical flow algorithm was used to calculate the inter-frame optical flow field, and the optical flow dynamics at different scales were obtained through multi-scale analysis.

[0023] The Local Binary Pattern Histogram (LBP-TOP) algorithm is used to extract facial texture features;

[0024] Facial key points are extracted using a facial key point detection algorithm, and their trajectory changes are tracked;

[0025] Facial contour and texture features were extracted using the HOG feature descriptor.

[0026] Preferably, the feature fusion model in step S4 employs a deep learning architecture, including:

[0027] In the feature fusion stage, a multi-layer LSTM structure is used to capture long-term dependencies;

[0028] An attention mechanism is introduced to dynamically assign importance weights to different features, thereby improving recognition accuracy.

[0029] During training, enhancement techniques such as random masking and data perturbation are used to improve the robustness of the model.

[0030] Residual connections are added to the feature fusion model to alleviate the gradient vanishing problem.

[0031] Preferably, the micro-expression category database in step S5 is constructed through the following steps:

[0032] We collected micro-expression samples from different individuals, and each sample was labeled by experts with the specific micro-expression type and its characteristic vector.

[0033] Establish a mapping relationship between micro-expression categories and fused feature vectors, and use principal component analysis (PCA) to reduce the dimensionality of the feature vectors;

[0034] After the database is built, the newly extracted feature vectors are classified using the K-nearest neighbor algorithm or the support vector machine (SVM) classifier.

[0035] By continuously updating the database and incorporating new micro-expression samples;

[0036] Clustering algorithms are used to cluster feature vectors in the database to discover potential micro-expression types.

[0037] Preferably, step S1 includes the following steps:

[0038] Preprocess the video sequence;

[0039] Use a Gaussian filter to denoise the video sequence;

[0040] Histogram equalization is used to standardize the brightness of video frames.

[0041] Scale the video frames to unify their size;

[0042] Perform color space conversion.

[0043] Preferably, the micro-expression detection model in step S2 includes:

[0044] During the training process, a transfer learning approach is adopted. First, pre-training is performed on a large-scale facial expression dataset, and then fine-tuning is performed on a specific micro-expression dataset.

[0045] During training, a class balancing strategy is used to ensure that various micro-expression samples are fully represented;

[0046] We use a multi-task learning framework to optimize both micro-expression detection and classification tasks.

[0047] Preferably, the feature fusion model in step S4 includes:

[0048] By monitoring performance metrics on the validation set, training is stopped when performance stops improving, in order to prevent overfitting.

[0049] During training, save the best-performing model version periodically for later use;

[0050] A multi-task learning framework is adopted, which optimizes both micro-expression detection and classification tasks.

[0051] During training, dropout is used to reduce the risk of overfitting the model.

[0052] Preferably, the post-processing in step S6 includes:

[0053] Kalman filtering is used to smooth the recognition results;

[0054] Set a threshold and filter out recognition results that are below the threshold;

[0055] Morphological operations are used to further optimize the recognition results, removing isolated points and small areas;

[0056] A time window smoothing technique is used to perform a consistency check on the recognition results of consecutive frames.

[0057] Preferred, including:

[0058] In the feature fusion model, an adaptive learning rate adjustment strategy is introduced to accelerate model convergence;

[0059] In the post-processing step, a threshold filtering mechanism is used to screen the recognition results in order to improve the accuracy of the recognition results.

[0060] In the feature extraction step, an environmental factor correction algorithm is used, including correction for changes in lighting conditions and non-uniform backgrounds;

[0061] In the feature extraction step, a head pose correction algorithm is used to handle problems such as head tilt and occlusion.

[0062] In the feature extraction step, a lightweight processing algorithm is used to address the issues of low pixel count in video acquisition devices and poor computer processing capabilities.

[0063] The present invention has the following beneficial effects:

[0064] 1. This invention combines multiple feature extraction methods, such as optical flow field, LBP-TOP, facial key point trajectory, and HOG features, to ensure the capture of instantaneous changes in micro-expressions and subtle muscle movements, thus improving the comprehensiveness and accuracy of feature extraction. Secondly, various data augmentation techniques are employed to enhance the model's robustness under different conditions, enabling it to maintain high recognition accuracy under varying lighting conditions and facial expression changes. Furthermore, the combination of LSTM and attention mechanisms with PCA dimensionality reduction simplifies the model structure, reduces computational complexity, improves the feasibility of real-time applications, and also reduces training time and deployment costs.

[0065] 2. In this invention, by ensuring that the dataset contains a sufficient number of micro-expression samples and covers multiple emotional states, a class balancing strategy is used to ensure sufficient representation of various micro-expression samples, thereby improving the model's generalization ability. Furthermore, the introduction of attention mechanisms, early stopping strategies, and dropout techniques prevents overfitting, improves the model's performance on new data, and enhances the model's robustness. Attached Figure Description

[0066] Figure 1 This is a flowchart of the micro-expression recognition method based on multi-motion feature fusion proposed in this invention;

[0067] Figure 2 This is a diagram of the multi-feature fusion framework in this invention;

[0068] Figure 3 This is a framework diagram for handling interference factors in this invention. Detailed Implementation

[0069] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0070] like Figure 1 As shown, the micro-expression recognition method based on multi-motion feature fusion proposed in this invention includes:

[0071] S1: Obtain the video sequence to be analyzed;

[0072] S2: Input the video sequence into a pre-built micro-expression detection model, and use the micro-expression detection model to detect the time periods in the video sequence that contain micro-expressions. The micro-expression detection model is trained using a convolutional neural network (CNN), and the training dataset contains video segments labeled with the times when micro-expressions occur.

[0073] S3: Extract multiple motion features from the time period, including optical flow field, local binary mode histogram (LBP-TOP), facial key point trajectory, and inter-frame motion vector. The optical flow field is calculated using the Endo optical flow algorithm.

[0074] S4: Input the multi-motion features into the pre-trained feature fusion model, and obtain the fused feature vector through the feature fusion model. The feature fusion model is a deep learning model based on the Long Short Time Memory Network (LSTM), and an attention mechanism is introduced during the training process to enhance the recognition of key frames.

[0075] S5: Based on the fused feature vector, using a pre-established micro-expression category database, identify the micro-expression category within the time period. The micro-expression category database contains multiple micro-expression types and their corresponding typical feature vectors.

[0076] S6: Post-process the recognition results.

[0077] In one embodiment, the video sequence to be analyzed is obtained.

[0078] First, the video sequences to be analyzed need to be obtained. These video sequences can come from camera recordings, stored files, or other video sources. After obtaining the video sequences, the following preprocessing steps are performed:

[0079] Denoising: A Gaussian filter is used to denoise the video frames to eliminate high-frequency noise. The specific steps are as follows:

[0080] Using the `cv2.GaussianBlur()` function from the OpenCV library, the filter size is 3×3 and the standard deviation is 1.5.

[0081] Apply a Gaussian filter to each frame of video until all video frames have been processed.

[0082] Brightness normalization: Histogram equalization is used to normalize the brightness of video frames to ensure consistency in feature extraction. The specific steps are as follows:

[0083] Use the `cv2.equalizeHist()` function from the OpenCV library to perform histogram equalization on each frame of the video.

[0084] Ensure that the histogram of each processed image frame is evenly distributed.

[0085] Color space conversion: Converting the color space of video frames from RGB to HSV to enhance the expression of color features. The specific steps are as follows:

[0086] Use the `cv2.cvtColor()` function from the OpenCV library to convert an RGB image to HSV format.

[0087] The V channel (luminance) is extracted as part of subsequent feature extraction.

[0088] Scaling: The video frames are scaled to uniform frame size to accommodate subsequent feature extraction. The specific steps are as follows:

[0089] Use the `cv2.resize()` function from the OpenCV library to resize all video frames to a fixed size, such as 640×480 pixels.

[0090] In one embodiment, micro-expression detection

[0091] Next, the preprocessed video sequence is input into a pre-built micro-expression detection model, which detects the time periods in the video sequence that contain micro-expressions.

[0092] Data Collection: A standard micro-expression video dataset was collected. Each video segment was annotated by experts to indicate the presence of micro-expressions and the time periods in which they occurred. The specific steps are as follows:

[0093] Select appropriate data from public datasets, such as SAMM or CASME II, ensuring that the dataset contains at least 1,000 micro-expression samples covering different emotional states.

[0094] Data annotation: Use specialized annotation tools (such as Labelbox or VGG Image Annotator) to annotate the video dataset. The specific steps are as follows:

[0095] Examine the video frame by frame and mark the time periods containing micro-expressions.

[0096] Label the specific type of each micro-expression (e.g., surprise, disgust, happiness, etc.).

[0097] Data augmentation: Data augmentation techniques, including random pruning, flipping, and adding Gaussian noise, are used to improve the robustness of the model. The specific steps are as follows:

[0098] Using random cropping technology, different regions of the video frame are cropped each time, with a cropping ratio of 80%-120% of the original image.

[0099] Use horizontal flipping techniques to increase dataset diversity.

[0100] Add Gaussian noise with a standard deviation of 0.1.

[0101] Model training: A Convolutional Neural Network (CNN) is used as the basic model architecture. The specific steps are as follows:

[0102] Build a CNN model using TensorFlow or PyTorch framework, including 3 convolutional layers, 2 max pooling layers, and 2 fully connected layers.

[0103] The convolutional kernel sizes of the convolutional layers are 3×3, 3×3, and 3×3, with a stride of 1 and padding of the same.

[0104] The pooling kernel sizes of the max pooling layers are 2×2 and 2×2, with a step size of 2.

[0105] The number of nodes in the fully connected layers are 128 and 64, respectively, and the ReLU activation function is used.

[0106] The output layer uses Softmax as the activation function.

[0107] The micro-expression detection model was trained using a labeled dataset with a batch size of 32, a learning rate of 0.001, and 100 iterations.

[0108] Optimize the model parameters using the cross-entropy loss function until the detection accuracy reaches 95% or higher.

[0109] Transfer learning: During the training process of the micro-expression detection model, transfer learning technology is adopted. The model is first pre-trained on a large-scale facial expression dataset and then fine-tuned on a specific micro-expression dataset to improve the generalization ability of the model.

[0110] Model Evaluation: After training, cross-validation is used to evaluate the model's generalization ability, and hyperparameters are tuned to optimize model performance. The specific steps are as follows:

[0111] The dataset was divided into a training set (70%), a validation set (15%), and a test set (15%).

[0112] The model performance was evaluated using 5-fold cross-validation.

[0113] Adjust the hyperparameters based on the performance on the validation set.

[0114] In one embodiment, multi-motion feature extraction

[0115] Multi-motion features were extracted from the time period containing micro-expressions, mainly including optical flow field, local binary pattern histogram (LBP-TOP), facial key point trajectory, and inter-frame motion vector.

[0116] Optical flow field calculation: The Farneback optical flow algorithm is used to calculate the inter-frame optical flow field, and the optical flow dynamics at different scales are obtained through multi-scale analysis. The specific steps are as follows:

[0117] The optical flow field is calculated using the `cv2.calcOpticalFlowFarneback()` function from the OpenCV library.

[0118] Multi-scale analysis of the optical flow field was performed to extract features at different scales.

[0119] The formula for calculating the optical flow field is as follows:

[0120] F=calcOpticalFlowFarneback(I t ,I t+1 ,pyr s cale=0.5, levels=3, winsize

[0121] =15, iterations=3, poly n =5,poly s igma=1.2,flags=0)

[0122] Among them, I t and I t+1 These represent two consecutive frames of images.

[0123] Local Binary Pattern Histogram (LBP-TOP): The LBP-TOP algorithm is applied to extract facial texture features. The specific steps are as follows:

[0124] Use the `localBinaryPatternsHistograms` function from the OpenCV library to extract LBP features.

[0125] Construct an LBP-TOP histogram to capture facial texture information.

[0126] The formula represents LBP feature extraction:

[0127]

[0128] Where p is the number of neighboring points, R is the radius of the circle, and s is the sign function.

[0129] Facial key point trajectory: The trajectory changes of facial key points (such as eyebrows, eyes, nose, mouth, cheeks, etc.) are extracted using a facial key point detection algorithm and then tracked. The specific steps are as follows:

[0130] Use the `face_alignment` tool from the OpenCV library to detect facial key points.

[0131] Track the trajectory changes of key points in a video sequence and extract motion features.

[0132] Use the Dense Optical Flow method to track key point trajectories.

[0133] HOG Features: Facial contour and texture features are extracted using the HOG feature descriptor. The specific steps are as follows:

[0134] Use the `HOGDescriptor` class from the OpenCV library to extract HOG features.

[0135] Set appropriate cell and block sizes to extract HOG features from the facial region.

[0136] The cell size is 8×8 pixels, and the block size is 2×2 cells.

[0137] Environmental factor correction:

[0138] Illumination correction: Histogram equalization is used to standardize the brightness of video frames, reducing the impact of illumination changes on feature extraction.

[0139] Background correction: Using background subtraction or image segmentation techniques to remove background interference and improve the accuracy of feature extraction.

[0140] Head posture correction:

[0141] Head tilt correction: Use geometric transformation techniques such as affine transformation or perspective transformation to correct head tilt.

[0142] Occlusion handling: The deep learning model Mask R-CNN is used to detect and repair occluded regions to ensure the integrity of feature extraction.

[0143] Lightweight processing:

[0144] Low-resolution processing: Use the super-resolution technology ESRGAN to improve the clarity of low-resolution videos.

[0145] Lightweight models: Use lightweight neural network architectures such as MobileNet or ShuffleNet to reduce the computational complexity of the model and improve processing speed.

[0146] In one embodiment, feature fusion

[0147] The aforementioned multi-motion features are input into a pre-trained feature fusion model, and the fused feature vector is obtained through the feature fusion model.

[0148] Feature fusion model: This model uses a deep learning architecture based on Long Short-Term Memory (LSTM) networks, and incorporates an attention mechanism during training to enhance keyframe recognition. The specific steps are as follows:

[0149] Build an LSTM model using the TensorFlow or PyTorch framework, containing two LSTM layers, with each LSTM layer having 128 hidden units.

[0150] An attention mechanism is introduced into the LSTM model, and the attention weights are calculated using the Softmax function.

[0151] Set appropriate hidden layer size and number of layers to optimize the model structure.

[0152] The formula represents the calculation of the LSTM layer:

[0153] h t =LSTM(h t-1 ,x t )

[0154] Among them, ht It is the hidden state at the current moment, h t-1 It is the hidden state from the previous moment, x t It represents the input features at the current moment.

[0155] Data augmentation: During training, augmentation techniques such as random masking and data perturbation are used to enhance the robustness of the model. The specific steps are as follows:

[0156] A portion of the input features is randomly masked, with a proportion of 20%.

[0157] Using data perturbation techniques, such as adding random noise, the standard deviation is 0.05.

[0158] Residual connections: Residual connections are added to the feature fusion model to alleviate the vanishing gradient problem. The specific steps are as follows:

[0159] Add residual blocks to the LSTM model and use skip connections.

[0160] Set an appropriate number of residual blocks to optimize model performance.

[0161] Adaptive learning rate adjustment strategy: In the feature fusion model, an adaptive learning rate adjustment strategy, the Adam optimizer, is introduced to accelerate model convergence.

[0162] In one embodiment, micro-expression category recognition

[0163] Based on the fused feature vectors, the micro-expression categories within the time period are identified using a pre-established micro-expression category database.

[0164] Database construction: Micro-expression samples from different individuals were collected, and each sample was labeled by experts with the specific micro-expression type and its associated feature vector. The specific steps are as follows:

[0165] An initial database was built using an labeled dataset, containing at least 1,000 microexpression samples.

[0166] Ensure that the database contains samples of various types of micro-expressions.

[0167] Feature mapping: Establishing a mapping relationship between micro-expression categories and fused feature vectors, and using principal component analysis (PCA) to reduce the dimensionality of the feature vectors to decrease computational complexity. The specific steps are as follows:

[0168] The PCA algorithm is used to reduce the dimensionality of the fused feature vectors, retaining the first 50 principal components.

[0169] Construct a mapping relationship between feature vectors and micro-expression categories.

[0170] The formula represents PCA dimensionality reduction:

[0171] Y = XW

[0172] Where X is the original feature matrix, W is the weight matrix after dimensionality reduction, and Y is the feature matrix after dimensionality reduction.

[0173] Classification Validation: After the database is built, classifiers such as K-nearest neighbors or support vector machines (SVM) are used to classify the newly extracted feature vectors to verify the effectiveness of the database. The specific steps are as follows:

[0174] Train the classifier using the `KNeighborsClassifier` or `SVC` class from the Scikit-learn library.

[0175] Set appropriate hyperparameters to optimize classifier performance.

[0176] Cross-validation was used to evaluate the classifier's performance.

[0177] Continuous updates: By continuously updating the database and incorporating new micro-expression samples, the timeliness and accuracy of the database are maintained. The specific steps are as follows:

[0178] Regularly collect new micro-expression samples and update the database.

[0179] Retrain the classifier using new data to maintain classification performance.

[0180] In one embodiment, post-processing

[0181] Post-processing of the identification results, including but not limited to removing false positives and smoothing the identification results, is performed to improve the reliability of the final identification results.

[0182] Smoothing: Kalman filtering is used to smooth the recognition results to reduce the impact of noise. The specific steps are as follows:

[0183] The recognition results are smoothed using the Kalman filter algorithm.

[0184] Set appropriate filter parameters to optimize the smoothing effect.

[0185] The formula for Kalman filtering is as follows:

[0186]

[0187] in, This is the state estimate at the current moment, where A is the state transition matrix and K is the state estimate at the current moment. t It is the Kalman gain, z t H represents the observed values, and H is the observation matrix.

[0188] Threshold filtering: Set a threshold to filter out recognition results below the threshold, thus removing false positives. The specific steps are as follows:

[0189] Set an appropriate recognition threshold, typically 0.5 or higher.

[0190] Results below a certain threshold are filtered out, and only high-confidence recognition results are retained.

[0191] Morphological operations: Morphological operations are used to further optimize the recognition results, removing isolated points and small regions. The specific steps are as follows:

[0192] Use the `morphologyEx` function from the OpenCV library to perform morphological operations.

[0193] Set an appropriate core size and optimize the results.

[0194] Use the opening operation (erosion followed by dilation) to remove outliers.

[0195] A threshold filtering mechanism is used to screen the recognition results to further improve the accuracy of the recognition results.

[0196] Temporal window smoothing: A consistency check is performed on the recognition results of consecutive frames using temporal window smoothing technology. The specific steps are as follows:

[0197] Check the consistency of the identification results within consecutive time windows.

[0198] Remove inconsistent results to maintain consistency in the identification results.

[0199] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A micro-expression recognition method based on multi-motion feature fusion, characterized in that, include: S1: Obtain the video sequence to be analyzed; S2: Input the video sequence into a pre-built micro-expression detection model, and use the micro-expression detection model to detect the time periods in the video sequence that contain micro-expressions. The micro-expression detection model is trained using a convolutional neural network (CNN), and the training dataset contains video segments labeled with the times when micro-expressions occur. S3: Extract multiple motion features from the time period, including optical flow field, local binary mode histogram (LBP-TOP), facial key point trajectory, and inter-frame motion vector. The optical flow field is calculated using the Endo optical flow algorithm. S4: Input the multi-motion features into the pre-trained feature fusion model, and obtain the fused feature vector through the feature fusion model. The feature fusion model is a deep learning model based on the Long Short Time Memory Network (LSTM), and an attention mechanism is introduced during the training process to enhance the recognition of key frames. S5: Based on the fused feature vector, using a pre-established micro-expression category database, identify the micro-expression category within the time period. The micro-expression category database contains multiple micro-expression types and their corresponding typical feature vectors. S6: Post-process the recognition results; The micro-expression detection model in step S2 is trained through the following steps: A standard micro-expression video dataset was collected, which contains video clips of micro-expressions in various emotional states. Each video clip was annotated by experts to indicate whether micro-expressions were present and the time period in which they occurred. The micro-expression detection model was trained using a labeled dataset. During the training process, the cross-entropy loss function was used to optimize the model parameters until the detection accuracy reached more than 95%. During the training of the micro-expression detection model, data augmentation techniques are employed, including random cropping, flipping, and adding Gaussian noise; After training, cross-validation is used to evaluate the model's generalization ability, and hyperparameters are tuned to optimize model performance. The extraction of multiple motion features in step S3 includes: The Endo optical flow algorithm was used to calculate the inter-frame optical flow field, and the optical flow dynamics at different scales were obtained through multi-scale analysis. The Local Binary Pattern Histogram (LBP-TOP) algorithm is used to extract facial texture features; Facial key points are extracted using a facial key point detection algorithm, and their trajectory changes are tracked; Use the HOG feature descriptor to extract facial contour and texture features; The feature fusion model in step S4 employs a deep learning architecture, including: In the feature fusion stage, a multi-layer LSTM structure is used to capture long-term dependencies; An attention mechanism is introduced to dynamically assign importance weights to different features, thereby improving recognition accuracy. During training, enhancement techniques such as random masking and data perturbation are used to improve the robustness of the model. Adding residual connections to the feature fusion model can alleviate the gradient vanishing problem. The micro-expression category database in step S5 is constructed through the following steps: We collected micro-expression samples from different individuals, and each sample was labeled by experts with the specific micro-expression type and its characteristic vector. Establish a mapping relationship between micro-expression categories and fused feature vectors, and use principal component analysis (PCA) to reduce the dimensionality of the feature vectors; After the database is built, the newly extracted feature vectors are classified using the K-nearest neighbor algorithm or the support vector machine (SVM) classifier. By continuously updating the database and incorporating new micro-expression samples; Clustering algorithms are used to cluster feature vectors in the database to discover potential micro-expression types; The preceding steps in step S1 include: Preprocess the video sequence; Use a Gaussian filter to denoise the video sequence; Histogram equalization is used to standardize the brightness of video frames. Scale the video frames to unify their size; Perform color space conversion; The micro-expression detection model in step S2 includes: During the training process, a transfer learning approach is adopted. First, the training is performed on a large-scale facial expression dataset, and then fine-tuned on a micro-expression dataset. During training, a class balancing strategy is used to ensure that various micro-expression samples are fully represented; Use a multi-task learning framework to optimize both micro-expression detection and classification tasks; The feature fusion model in step S4 includes: By monitoring performance metrics on the validation set, training is stopped when performance stops improving, in order to prevent overfitting. During training, save the best-performing model version periodically for later use; A multi-task learning framework is adopted, which optimizes both micro-expression detection and classification tasks. During training, dropout is used to reduce the risk of model overfitting; The post-processing in step S6 includes: Kalman filtering is used to smooth the recognition results; Set a threshold and filter out recognition results that are below the threshold; Morphological operations are used to further optimize the recognition results, removing isolated points and small areas; A time window smoothing technique is used to perform a consistency check on the recognition results of consecutive frames; The method includes: In the feature fusion model, an adaptive learning rate adjustment strategy is introduced to accelerate model convergence; In the post-processing step, a threshold filtering mechanism is used to screen the recognition results in order to improve the accuracy of the recognition results. In the feature extraction step, an environmental factor correction algorithm is used, including correction for changes in lighting conditions and non-uniform backgrounds. In the feature extraction step, a head pose correction algorithm is used to handle problems such as head tilt and occlusion. In the feature extraction step, a lightweight processing algorithm is used to address the issues of low pixel count in video acquisition devices and poor computer processing capabilities.