A method for simulating facial expressions in an interactive humanoid robot based on artificial intelligence

By collecting and processing users' brainwave signals, using deep convolutional neural networks for emotion classification and intensity quantification, and combining biomimetic muscle control and physical deformation of the outer coating, simulated micro-expressions are generated. This solves the problems of insufficient real-time performance and realism in emotion recognition in existing technologies, and realizes the bio-realism and adaptive optimization of humanoid robot expressions.

CN121093995BActive Publication Date: 2026-06-30BEIJING YUNSHANGHUI INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING YUNSHANGHUI INFORMATION TECH CO LTD
Filing Date
2025-08-28
Publication Date
2026-06-30

Smart Images

  • Figure CN121093995B_ABST
    Figure CN121093995B_ABST
Patent Text Reader

Abstract

This invention discloses an artificial intelligence-based method for simulating facial expressions in an interactive humanoid robot, belonging to the field of facial expression simulation technology. The method includes: acquiring and preprocessing user brainwave signals to generate power spectral density features; inputting the power spectral density features into a pre-trained emotion classification model to output the user's emotion category and intensity value; mapping the emotion category and intensity value to an emotional expression and outputting biomimetic muscle contraction commands; adjusting the facial muscle fibers of the humanoid robot according to the biomimetic muscle contraction commands to dynamically adjust the surface texture details of the humanoid robot and generate simulated micro-expressions; capturing user facial feedback data through camera eyes, evaluating the user's acceptance of the simulated micro-expressions based on the user's facial feedback data, and outputting an expression optimization strategy. This invention achieves bio-realistic micro-expressions in a humanoid robot through a dual closed-loop modeling approach of deformation matching and user feedback-driven expression optimization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of facial expression simulation technology, and in particular to a method for simulating facial expressions in an interactive humanoid robot based on artificial intelligence. Background Technology

[0002] In recent years, humanoid robot emotional interaction technology has gradually evolved towards multimodal perception and dynamic expression simulation. Mainstream solutions rely on visual sensors to capture user facial expressions, combined with convolutional neural networks to classify emotions, driving a pre-programmed mechanical expression library to output standardized feedback. Deep learning methods have achieved significant breakthroughs in emotion recognition; micro-expression analysis models based on facial motion coding can identify basic emotions. Reinforcement learning has been introduced to optimize expression generation strategies, dynamically adjusting expression amplitude parameters based on user feedback data. Advances in EEG signal analysis technology have provided new pathways for emotion recognition. The development of bionic muscle actuation technology has enabled humanoid robots to achieve multiple degrees of freedom control of the face, and physical deformation simulation algorithms for silicone outer layers have significantly improved the naturalness of expressions.

[0003] However, existing technologies have shortcomings in terms of real-time emotion recognition and facial expression realism. Traditional visual emotion recognition relies on ambient lighting conditions and cannot capture emotional changes in a blank expression state, leading to interaction delays. Pre-programmed expression libraries lack dynamic intensity adaptation mechanisms, resulting in a mismatch between facial expression feedback and the user's actual emotional intensity. Bionic muscle control has not established an anatomical-level topological mapping relationship, and micro-expression generation has a mechanical feel. User feedback optimization relies on manual rule bases and lacks quantitative correlation analysis between EEG features and acceptance levels, making it difficult to achieve continuous adaptive iteration of expression strategies. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides an artificial intelligence-based method for simulating facial expressions in interactive humanoid robots to address the deficiencies in the real-time performance of emotion recognition and the realism of facial expressions.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] In a first aspect, the present invention provides an artificial intelligence-based method for simulating facial expressions in an interactive humanoid robot. The method includes: acquiring and preprocessing user brainwave signals to generate power spectral density features; inputting the power spectral density features into a pre-trained emotion classification model, performing emotion classification and intensity quantification on the power spectral density features using a deep convolutional neural network, and outputting the user's emotion category and intensity value; mapping the emotion category and intensity value to an emotional expression to generate expression amplitude parameters; adjusting the humanoid robot's facial layout based on the expression amplitude parameters and outputting biomimetic muscle contraction commands; adjusting the humanoid robot's facial muscle fibers according to the biomimetic muscle contraction commands, dynamically adjusting the surface texture details of the humanoid robot by combining the physical deformation characteristics of the robot's outer layer, and generating simulated micro-expressions; capturing user facial feedback data through camera eyes, evaluating the user's acceptance of the simulated micro-expressions based on the user's facial feedback data, comparing the acceptance level with the power spectral density features using Pearson correlation analysis, and outputting an expression optimization strategy.

[0008] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method described in this invention, the steps include: inputting power spectral density features into a pre-trained emotion classification model, performing emotion classification and intensity quantization on the power spectral density features through a deep convolutional neural network, and outputting the user's emotion category and emotion intensity value.

[0009] The power spectral density features are input into a pre-trained emotion classification model. The emotion classification model performs multi-scale feature convolution on the independent band features in the power spectral density features to generate emotion feature vectors.

[0010] The emotion feature vector is input into the fully connected layer of a deep convolutional neural network to perform emotion category probability assignment and output the emotion category probability distribution.

[0011] Perform category transformation on the probability distribution of emotion categories and output the emotion category;

[0012] Extract the maximum probability value from the probability distribution of emotion categories, and perform a numerical conversion of the maximum probability value to the emotion intensity value, outputting the emotion intensity value.

[0013] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method of the present invention, the specific steps of mapping emotion categories and emotion intensity values ​​to generate facial expression amplitude parameters are as follows:

[0014] The system retrieves the corresponding basic expression template from the preset emotion expression mapping table according to the emotion category.

[0015] Based on the emotion intensity value, the standard contraction strength in the basic expression template is proportionally mapped, and the contraction strength value is output.

[0016] The emotional intensity value is linearly proportional to the duration baseline value in the basic expression template, and the duration value is output.

[0017] Based on the target muscle region in the basic facial expression template, the contraction strength value and duration value are matched to generate facial expression amplitude parameters.

[0018] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method of the present invention, the specific steps of adjusting the facial layout of the humanoid robot based on facial expression amplitude parameters and outputting bionic muscle contraction commands are as follows:

[0019] Extract the target muscle region of the humanoid robot from the facial expression amplitude parameters, number the facial muscle relationships in the target muscle region, and output the facial control number.

[0020] Convert the contraction strength value in the facial expression amplitude parameter into a strength duty cycle and output the duty cycle parameter;

[0021] The duration value in the facial expression amplitude parameter is driven by a period mapping, and the time duration parameter is output.

[0022] The facial control number, duty cycle parameter, and time duration parameter are encapsulated into instructions to output biomimetic muscle contraction instructions.

[0023] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method described in this invention, the steps include: adjusting the facial muscle fibers of the humanoid robot according to bionic muscle contraction commands, and dynamically adjusting the surface texture details of the humanoid robot by combining the physical deformation characteristics of the humanoid robot's outer layer to generate simulated micro-expressions.

[0024] Based on the bionic muscle contraction command, the facial components of the humanoid robot are activated, the physical position of the facial muscle fibers of the humanoid robot is adjusted, and surface situation change data is generated.

[0025] Extracting the elastic modulus and strain limit of the outer coating material of the humanoid robot from the outer coating layer;

[0026] By performing deformation matching on the surface deformation data, elastic modulus and strain limit, dynamic deformation trajectory of the humanoid robot's outer coating is generated;

[0027] Adjusting the surface texture details of the humanoid robot according to the dynamic deformation trajectory generates simulated micro-expressions.

[0028] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method of the present invention, the specific steps of performing deformation matching of facial expression data, elastic modulus, and strain limit to generate the dynamic deformation trajectory of the outer coating are as follows:

[0029] Based on the elastic modulus and strain limit, a deformation relationship equation is established;

[0030] Substitute the surface deformation data into the deformation relationship equation to obtain the stress-strain response of the humanoid robot's facial expression;

[0031] The displacement vector distribution of the humanoid robot's outer coating is obtained based on the stress-strain response. The displacement vector distribution is then integrated according to the time series of the deformation data to generate a dynamic deformation trajectory.

[0032] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method described in this invention, the specific steps of capturing user facial feedback data through camera eyes and assessing the user's acceptance of simulated micro-expressions based on the user's facial feedback data are as follows:

[0033] Capture user facial dynamic data through camera eyeballs;

[0034] Perform feature recognition on user facial dynamic data to obtain standardized facial movements;

[0035] Standardized facial movements are matched with intensity levels to quantify the user's facial intensity value and generate user facial feedback data.

[0036] The user's facial feedback data is weighted and corrected to assess the user's acceptance of simulated micro-expressions.

[0037] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method of the present invention, the specific steps of performing feature recognition on user facial dynamic data to obtain standardized facial movements are as follows:

[0038] The user's facial dynamic data is aligned to generate a standardized image sequence;

[0039] Extract the user's facial coordinates from a standardized image sequence, and construct a facial motion trajectory based on the user's facial coordinates;

[0040] Facial expression recognition is performed on the changes in facial coordinate displacement in the facial movement trajectory to obtain standardized facial movements.

[0041] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method of the present invention, the specific steps of comparing the receptivity with the power spectral density features using Pearson correlation and outputting an facial expression optimization strategy are as follows:

[0042] The power spectral density features are time-matched with the user's acceptance of simulated micro-expressions, and a feature scoring relationship table is output.

[0043] Normal distribution tests were performed on the power spectral density features in the feature scoring relationship table and the user's acceptance of simulated micro-expressions, and normal distribution data were output.

[0044] Perform correlation matching on normally distributed data and output feature-rating pairs;

[0045] Based on the feature-score pair, we determine the direction for expression optimization, adjust the humanoid robot's expression according to the expression optimization direction, and output the expression optimization strategy.

[0046] As a preferred embodiment of the artificial intelligence-based interactive humanoid robot facial expression simulation method described in this invention, the facial expression optimization direction refers to enhancing and suppressing the power spectral density feature based on the positive and negative correlation between the power spectral density feature and the degree of acceptance in the feature-rating pair, and combining the emotion intensity value of the emotion classification model to generate an facial expression optimization strategy that adjusts the facial expression amplitude parameters of the humanoid robot.

[0047] The beneficial effects of this invention are as follows: Through a dual-loop modeling approach of deformation matching and user feedback-driven expression optimization, bio-realism of micro-expressions in humanoid robots is achieved. By using a deep convolutional neural network to perform multi-scale feature convolution on independent band features in the power spectral density characteristics to generate emotion feature vectors, and then using a fully connected layer to probability-allocate the output emotion category and intensity value, a precise quantitative mapping from EEG signals to emotional states is achieved. This overcomes the sensitivity limitations of traditional visual recognition to environmental interference. Combining the elastic modulus and strain limit of the humanoid robot's outer coating material, the surface deformation data generated by biomimetic muscle contraction commands is substituted into the deformation relationship equation to obtain the stress-strain response and integrate the displacement vector distribution to generate a dynamic deformation trajectory. This drives real-time adjustment of surface texture details, achieving anatomical-level synchronization between the dynamic deformation of the silicone material and muscle movement, thus reducing the stiffness of mechanical expressions. Attached Figure Description

[0048] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0049] Figure 1 This is a flowchart of a method for simulating facial expressions in an interactive humanoid robot based on artificial intelligence.

[0050] Figure 2A flowchart for classifying and quantifying the intensity of emotions.

[0051] Figure 3 A flowchart for mapping emotional expressions.

[0052] Figure 4 This is a flowchart for generating simulated micro-expressions. Detailed Implementation

[0053] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0054] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0055] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0056] Reference Figures 1-4 As one embodiment of the present invention, this embodiment provides a method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence, comprising the following steps:

[0057] S1. Collect the user's brainwave signals and preprocess them to generate power spectral density features.

[0058] Preprocessing includes noise reduction and frequency domain transformation of the user's EEG signal to generate power spectral density features;

[0059] Specifically, when collecting users' brainwave signals, the humanoid robot captures signals by using an electrode collector that fits against the user's head. The electrode collector uses a flexible material to contact the user's scalp and obtains the user's electrical signals through microcurrent sensing. The user's electrical signals are initially amplified by the signal enhancement circuit built into the electrode collector and output as user brainwave signals.

[0060] After acquiring the user's EEG signal, it is necessary to perform noise reduction processing to eliminate interference components and ensure the accuracy of subsequent analysis. Noise reduction is achieved through multi-stage filtering technology: First, a bandpass filter is used to remove slowly drifting signals and high-frequency noise; second, eye movement and electromyography signals are acquired simultaneously as reference signals, and the similarity between the reference signal and the user's EEG signal is compared in real time. Real-time comparison is achieved through adaptive filtering algorithms (such as LMS or RLS), which statistically analyze the error value between the reference signal and the EEG signal in real time, and dynamically adjust the filtering parameters according to the magnitude of the error to match the current interference characteristics (such as high-frequency jumps caused by blinking or sudden waveforms of muscle contraction); finally, the filtered signal is smoothed in the time domain to eliminate long-term baseline shifts (such as changes in electrode contact resistance) and periodic interference (such as 50Hz power frequency noise), ultimately outputting a continuous and stable user EEG signal;

[0061] Based on the noise-reduced user EEG signal, power spectral density features are extracted through frequency domain transformation to quantify the energy distribution of the user EEG signal in different frequency bands. In specific operation, the continuous user EEG signal is segmented into non-overlapping time windows of fixed duration (e.g., 1 second). Subsequently, a Fast Fourier Transform (FFT) is performed on the segmented user EEG signal to convert the time domain data into a frequency domain complex representation, and the amplitude squared value of each frequency point is obtained to obtain the power spectrum. Further, the power spectrum is normalized to power spectral density, and random fluctuations are reduced by the moving average method. The average power value of different frequency bands (such as alpha wave, beta wave and theta wave) is extracted as the power spectral density feature.

[0062] S2. Input the power spectral density features into the pre-trained emotion classification model, and use a deep convolutional neural network to classify and quantify the intensity of the power spectral density features to output the user's emotion category and emotion intensity value.

[0063] Specifically, the training and generation steps of the pre-trained emotion classification model are as follows: First, emotion state samples are extracted from historical EEG data. The emotion state samples must contain clear emotion labels (such as positive, negative, neutral) and corresponding power spectral density features. The historical EEG data needs to be cleaned to remove outliers and low-quality samples to ensure the representativeness and diversity of the samples. Then, the historical EEG data is divided into training set, validation set and test set according to the proportion for training the emotion classification model, parameter adjustment and performance evaluation.

[0064] A sentiment classification model is constructed based on a deep convolutional neural network. The model consists of multi-scale convolutional layers, pooling layers, fully connected layers, and an output layer. The multi-scale convolutional layers extract local frequency band patterns from the power spectral density features using convolutional kernels of different sizes (such as 3×3 and 5×5), enhancing the model's ability to perceive multi-frequency band features. The pooling layers use max pooling to reduce feature dimensionality and computational complexity. The fully connected layers use non-linear activation functions (such as ReLU) to achieve high-order combinations of features and output the probability distribution of sentiment categories. The output layer uses the Softmax function to normalize the probability distribution, ensuring that the output values ​​are between 0 and 1 and sum to 1.

[0065] The emotion classification model is iteratively trained using the training set. The model's parameters are initialized, and optimization algorithms (such as Adam) and learning rates (such as 3e-4) are set. Parameters are updated using backpropagation to minimize the loss function (such as cross-entropy loss). Regularization techniques (such as L2 regularization) are introduced during training to prevent overfitting, and early stopping is used to monitor validation set performance and prevent performance degradation in the later stages of training. After each training round, the hyperparameters of the emotion classification model (such as learning rate decay and batch size) are adjusted using the validation set to gradually improve the model's generalization ability. After training is complete, the model's performance is evaluated on the test set, ultimately generating the trained emotion classification model.

[0066] The power spectral density features are input into a pre-trained emotion classification model. The emotion classification model performs multi-scale feature convolution on the independent band features in the power spectral density features to generate emotion feature vectors.

[0067] Specifically, the power spectral density feature includes the average power values ​​of different frequency bands of alpha, beta, and theta waves. The power spectral density feature is organized into a two-dimensional feature matrix with a fixed dimension (e.g., number of frequency bands × number of time windows) as input data. A pre-trained emotion classification model receives the two-dimensional feature matrix and extracts local patterns from independent frequency band features in the power spectral density feature through multi-scale convolutional layers, generating intermediate feature representations with hierarchical semantics. The multi-scale convolutional layers of the emotion classification model sequentially apply convolutional kernels of different sizes (e.g., 3×3 and 5×5) to the input two-dimensional feature matrix using a sliding window operation, extracting local energy change patterns of each frequency band feature within different time windows. After the convolution operation, an activation function enhances the nonlinear expressive power of the features. Finally, the convolution results at each scale are concatenated into a high-dimensional feature vector, which serves as the emotion feature vector. This emotion feature vector integrates local pattern information from multiple frequency bands and time scales.

[0068] The emotion feature vector is input into the fully connected layer of a deep convolutional neural network to perform emotion category probability assignment and output the emotion category probability distribution.

[0069] The emotion feature vector is fed into a fully connected layer, which consists of a weight matrix updated by backpropagation during the training phase of the pre-trained emotion classification model. The initial values ​​of the weight matrix are generated based on the mapping relationship between emotion state samples and power spectral density features in the training set. The fully connected layer gradually compresses the feature dimension through linear transformation (i.e., emotion feature vector and weight matrix) and nonlinear activation functions (such as ReLU) to obtain higher-order correlations of emotion categories. Finally, the output is normalized using the Softmax function to generate an emotion category probability distribution (such as positive, negative, and neutral), with each emotion category probability distribution corresponding to a probability value.

[0070] Perform category transformation on the probability distribution of emotion categories and output the emotion category;

[0071] Based on the probability distribution of emotion categories, the category with the highest probability value is selected as the final emotion category; for example, if the probability of the positive category is 0.8, the probability of the negative category is 0.1, and the probability of the neutral category is 0.1, then "positive" is output as the emotion category.

[0072] Extract the maximum probability value from the probability distribution of emotion categories, and perform a numerical conversion of the maximum probability value to the emotion intensity value, outputting the emotion intensity value.

[0073] The maximum probability value (e.g., 0.8) is extracted from the probability distribution of emotion categories. The maximum probability value directly reflects the confidence of the emotion classification model in the current emotion category. Then, the maximum probability value is numerically converted into emotion intensity through linear scaling or a non-linear function (e.g., exponential function). For example, if the maximum probability value is 0.8, the emotion intensity value is 80, indicating that the current emotion intensity is "strong positive". Finally, the emotion intensity value is output, completing the entire process of emotion classification and intensity quantification.

[0074] S3. Map the emotion category and emotion intensity value to the emotion expression, generate the expression amplitude parameter, adjust the facial layout of the humanoid robot based on the expression amplitude parameter, and output the bionic muscle contraction command.

[0075] The system retrieves the corresponding basic expression template from the preset emotion expression mapping table according to the emotion category.

[0076] Specifically, emotion categories include explicit emotion labels. The preset emotion expression mapping table is a structured two-dimensional table, where rows represent different emotion categories and columns represent the attributes of the basic expression template (such as standard contraction strength, duration baseline value, and target muscle area). Each row of the basic expression template stores facial expression parameters corresponding to a specific emotion category. For example, the basic expression template for a positive emotion category may include a standard contraction strength of 50% and a duration baseline value of 2 seconds for the target muscle areas of the zygomaticus major and orbicularis oculi muscles. By using the current emotion category as input, the complete record of the corresponding row is matched from the emotion expression mapping table to extract all parameters of the basic expression template.

[0077] Based on the emotion intensity value, the standard contraction strength in the basic expression template is proportionally mapped, and the contraction strength value is output.

[0078] The emotion intensity value represents the confidence level of the current emotion category (e.g., a numerical range of 0-100); the standard contraction strength in the basic expression template is a preset fixed value (e.g., 50% of the maximum muscle contraction strength) used to describe the muscle contraction strength of the emotion category; the preset fixed value is set based on the natural contraction strength of human muscles under different emotional states; the emotion intensity value and the standard contraction strength are linearly proportionally converted; the linear proportional conversion process treats the emotion intensity value as a proportional factor, which directly acts on the standard contraction strength, so that the muscle contraction strength is dynamically adjusted with the change of emotion intensity; for example, when the emotion intensity value is 100, the contraction strength value reaches its maximum value, and when the emotion intensity value is 0, the contraction strength value is 0%; the proportional mapping ensures an intuitive correspondence between expression intensity and emotion intensity, avoiding overly stiff or exaggerated facial expressions;

[0079] The emotional intensity value is linearly proportional to the duration baseline value in the basic expression template, and the duration value is output.

[0080] The fixed duration (e.g., 2 seconds) of the baseline duration value in the basic expression template represents the duration of the expression for each emotion category. By linearly adjusting the emotion intensity value and the baseline duration value, a dynamically adjusted duration value is generated. By treating the emotion intensity value as a time-extending factor, the expression duration is dynamically adjusted according to changes in emotion intensity. For example, when the emotion intensity value is 100, the duration value is 2 seconds, while when the emotion intensity value is 50, the duration value is 1 second. This proportional adjustment ensures the coordination between expression duration and emotion intensity; expressions last longer under strong emotions and fade faster under weak emotions.

[0081] Based on the target muscle regions in the basic facial expression template, region matching is performed on the contraction force value and duration value to generate facial expression amplitude parameters;

[0082] The target muscle region is defined by the basic expression template, which includes the numbers of multiple facial muscle groups (such as the orbicularis oculi and zygomaticus major) and their corresponding standard contraction patterns. The dynamically adjusted contraction strength and duration values ​​are mapped to each muscle group in the target muscle region. The target muscle region, contraction strength, and duration values ​​are combined into structured data to form the expression amplitude parameters.

[0083] Extract the target muscle region of the humanoid robot from the facial expression amplitude parameters, number the facial muscle relationships in the target muscle region, and output the facial control number.

[0084] Specifically, the facial expression amplitude parameters include target muscle region information, which is converted into facial control numbers that the humanoid robot can execute; based on the facial muscle layout of the humanoid robot, each muscle group in the target muscle region is mapped to the corresponding facial control number; by comparing the correspondence between the target muscle region and the robot's facial layout one by one, a complete list of facial control numbers is generated; by converting muscle names in biological anatomy into numbers that the robot can recognize, the accuracy of control commands is ensured.

[0085] Convert the contraction strength value in the facial expression amplitude parameter into a strength duty cycle and output the duty cycle parameter;

[0086] The contraction force value represents the relative contraction intensity of each muscle group, while the duty cycle parameter is defined as the proportion of the electronic signal at a high level within the cycle, directly controlling the degree of contraction of the bionic muscle. By mapping the contraction force value to the duty cycle (e.g., 0%-100%) at a fixed ratio (e.g., 1:1), the corresponding duty cycle parameter is generated. For example, if the contraction force value is 40%, the duty cycle parameter is 40%, indicating that the motor is at a high level for 40% of the time within the cycle, driving the bionic muscle to contract. The force duty cycle conversion process ensures a linear correspondence between mechanical action and emotional intensity.

[0087] The duration value in the facial expression amplitude parameter is driven by a period mapping, and the time duration parameter is output.

[0088] The duration value represents the duration of facial expression maintenance for each muscle group (e.g., 1.6 seconds), which needs to be converted into the driving cycle parameter of the electronic signal (e.g., 1600 milliseconds). The driving cycle parameter is defined as the complete cycle length of the electronic signal, controlling the frequency of contraction and relaxation alternation of the bionic muscle. By mapping the duration value to the driving cycle (e.g., 0-5000 milliseconds) at a fixed ratio (e.g., 1:1), the corresponding duration parameter is generated. The driving cycle mapping process ensures precise control of the duration of facial expression maintenance. For example, a longer driving cycle parameter makes the muscle contraction state last longer, and a shorter driving cycle parameter makes the muscle relax faster, thereby achieving a natural transition of dynamic facial expressions.

[0089] The facial control number, duty cycle parameter and time duration parameter are encapsulated into instructions to output biomimetic muscle contraction instructions.

[0090] The facial control number, duty cycle parameter, and duration parameter correspond to the control target, contraction intensity, and duration of the bionic muscle, respectively. By combining the facial control number, duty cycle parameter, and duration parameter in a fixed format, a bionic muscle contraction command is generated. The bionic muscle contraction command transforms abstract emotion parameters into specific motor control signals, ensuring that the humanoid robot can execute facial expressions that match the current emotion category and emotion intensity value.

[0091] S4. Adjust the facial muscle fibers of the humanoid robot according to the bionic muscle contraction command, and dynamically adjust the surface texture details of the humanoid robot by combining the physical deformation characteristics of the humanoid robot's outer layer to generate simulated micro-expressions.

[0092] Based on the bionic muscle contraction command, the facial components of the humanoid robot are activated, the physical position of the facial muscle fibers of the humanoid robot is adjusted, and surface situation change data is generated.

[0093] Specifically, the bionic muscle contraction command includes a facial control number, a duty cycle parameter, and a duration parameter. The facial control number indicates the facial muscle area to be activated, the duty cycle parameter defines the intensity ratio of the muscle contraction, and the duration parameter controls the duration for which the muscle maintains the contracted state. By inputting the bionic muscle contraction command into the drive center of the humanoid robot, the facial motor of the corresponding numbered facial muscle component is triggered to start. The facial motor adjusts the current output according to the duty cycle parameter, causing the bionic muscle fibers to stretch or contract in a fixed direction, changing their physical position. By recording the positional changes of each muscle fiber (such as displacement and angular offset), situational change data matching the current emotion category and emotion intensity value is generated. The situational change data includes the dynamic adjustment process and final shape of the muscle fibers.

[0094] Extracting the elastic modulus and strain limit of the outer coating material of the humanoid robot from the outer coating layer;

[0095] The outer layer of the humanoid robot is made of a highly elastic polymer material. The elastic modulus and strain limit are obtained through the physical properties of the highly elastic polymer material. The elastic modulus reflects the stiffness of the highly elastic polymer material under stress, that is, the ability of the highly elastic polymer material to resist deformation. The strain limit represents the maximum deformation that the highly elastic polymer material can withstand before fracture.

[0096] Based on the elastic modulus and strain limit, a deformation relationship equation is established;

[0097] The elastic modulus and strain limit are used to construct the deformation relationship equation, which describes the stress-strain characteristics of highly elastic polymer materials under stress. The deformation relationship equation is based on Hooke's law and extends it to consider the nonlinear deformation characteristics of highly elastic polymer materials (such as plastic deformation near the strain limit). When highly elastic polymer materials approach the strain limit, the stress-strain relationship no longer follows a linear proportion, but exhibits hardening or softening characteristics. For example, the stress gradually decreases with increasing strain, or it enters the plastic flow stage. At this time, the deformation relationship equation needs to introduce nonlinear terms (such as power law relationship), and combine the hardening of highly elastic polymer materials (such as isotropic hardening and kinematic hardening) and yield stress to quantify the cumulative effect of plastic deformation. By integrating the elastic modulus, yield stress, hardening conditions and strain limit of highly elastic polymer materials, the deformation relationship equation can dynamically adjust the stress-strain response, thereby more accurately predicting the deformation behavior of materials under complex stress states, especially the nonlinear performance near the failure limit.

[0098] The equation for the deformation relationship is as follows:

[0099] ;

[0100] in, This represents the stress response of a highly elastic polymer material under stress. This represents the elastic modulus of highly elastic polymer materials. This represents the relative deformation of highly elastic polymer materials; it is a dimensionless quantity. This represents the critical stress value at which a highly elastic polymer material begins to undergo plastic deformation. This represents the critical strain value at which a highly elastic polymer material transitions from elastic to plastic. It represents the degree of nonlinearity controlling stress growth during the plastic stage; it is a dimensionless quantity. This represents the maximum strain value before a highly elastic polymer material fractures, i.e., the strain limit.

[0101] Substitute the surface deformation data into the deformation relationship equation to obtain the stress-strain response of the humanoid robot's facial expression;

[0102] Specifically, the surface deformation data contains changes in the physical position of muscle fibers, which need to be converted into stress-strain input for the humanoid robot's outer coating material. By substituting each deformation event (such as the stretching / compression of different muscle groups) in the surface deformation data into the deformation relationship equation, the stress-strain response at the corresponding location is generated. Finally, the stress-strain response is presented in the form of spatial distribution, including the stress state (such as tensile stress and compressive stress) and deformation of each region of the humanoid robot's outer coating material.

[0103] The displacement vector distribution of the humanoid robot's outer coating is obtained based on the stress-strain response. The displacement vector distribution is then integrated according to the time series of the deformation data to generate a dynamic deformation trajectory.

[0104] The stress-strain response describes the stress state and deformation of the outer coating material in each region. By arranging the displacement vectors of all regions according to their spatial positions, a complete displacement vector distribution map is generated. Subsequently, based on the time series of the surface deformation data (such as the onset and duration of muscle contraction), the displacement vector distribution is integrated according to the time step. For example, at the beginning, the orbicularis oculi muscle begins to contract, causing the eye displacement vector to gradually increase; at the end, the displacement vector stabilizes. Finally, the dynamic deformation trajectory describes the deformation process of the humanoid robot's outer coating material in both time and space dimensions, providing dynamic input for subsequent adjustments to surface texture details.

[0105] Adjust the surface texture details of the humanoid robot according to the dynamic deformation trajectory to generate simulated micro-expressions;

[0106] The adjustment process is achieved through a high-precision micro-motor, which uses micro-actuators (such as piezoelectric ceramics) to deform the microstructure of the surface texture (such as protrusions and grooves) in real time. Each time step of the dynamic deformation trajectory corresponds to a set of texture adjustment instructions. The surface texture details change synchronously with the dynamic deformation trajectory, generating a simulated micro-expression that is highly matched with the current emotion category and intensity value. For example, under a strong positive emotion, the wrinkles at the corners of the eyes deepen and the cheeks puff out slightly, forming a natural smiling expression.

[0107] S5. Capture user facial feedback data through camera eyeballs, assess user acceptance of simulated micro-expressions based on user facial feedback data, compare acceptance with power spectral density features using Pearson correlation, and output expression optimization strategy.

[0108] Capture user facial dynamic data through camera eyeballs;

[0109] Specifically, during the interaction between the user and the humanoid robot, the humanoid robot uses high-resolution camera eyes to capture the user's face in real time. The real-time capture continuously records the dynamic changes of the user's face at a fixed frame rate (such as 30 frames / second). The acquired dynamic data of the user's face includes the user's facial muscle movements, expression changes and micro-expression features. The dynamic data of the user's face is stored in an uncompressed format, retaining complete pixel information and timestamps.

[0110] The user's facial dynamic data is aligned to generate a standardized image sequence;

[0111] The system collects dynamic facial data from camera eyeballs, extracts standard coordinate points for the user's face through facial detection, and covers the main expression control areas such as eyebrows, eyes, nose, and mouth. Based on the extracted standard coordinate points, an affine transformation algorithm is used to perform geometric correction on each frame of the image to ensure that the position, angle, and scale of the user's face remain consistent across all images. The corrected image sequence is then normalized to crop the user's facial area to a fixed size (e.g., 256×256 pixels) and adjust the brightness and contrast to a uniform level. The output format of the normalized image sequence is a continuous frame image stream, with each frame containing a timestamp and corresponding facial coordinate point information.

[0112] Extract the user's facial coordinates from a standardized image sequence, and construct a facial motion trajectory based on the user's facial coordinates;

[0113] Based on the standard facial coordinates of each frame in a standardized image sequence, a time-series facial motion trajectory is constructed. The motion trajectory is stored in the form of a three-dimensional coordinate array, and the timestamp of each coordinate point is strictly synchronized with the acquisition time of the image frame. The missing coordinate points caused by the image acquisition interval are filled in by the interpolation algorithm to ensure the continuity of the trajectory. The construction process of the motion trajectory also records the displacement changes of key areas of the user's face, such as the contraction amplitude of the orbicularis oculi muscle and the stretching direction of the zygomaticus major muscle. The completed facial motion trajectory serves as the basic description of the user's facial expression dynamics, providing kinematic data support for subsequent expression recognition.

[0114] Facial coordinate displacement changes in facial motion trajectories are used to perform expression recognition and obtain standardized facial movements.

[0115] Based on coordinate displacement data in facial motion trajectories, facial coordinate displacements are classified. The classification process involves acquiring the displacement amplitude and direction of standard coordinate points and matching the displacement features with actions defined in the FACS standard. Actions defined in the FACS standard refer to the decomposition of human facial expressions into multiple standardized muscle activity patterns through the minimum observable movements (AUs) of facial muscles. Each AU corresponds to a specific muscle contraction or relaxation action (such as frowning, raising eyebrows, and lifting the corners of the mouth), and the degree of expression is described by intensity grading (0-5 levels). The intensity of each action is quantified by the absolute value of the displacement, and the quantified action intensity value is bound to a timestamp to form a data record containing the action type, action intensity value, and time interval. By analyzing the combination relationships of multiple AUs, such as the co-activation of zygomaticus major muscle elevation and orbicularis oculi muscle contraction, composite expression labels for users, such as smiling or surprised, are derived. The final output standardized facial movements are stored in structured data form, including the classification result of each action, action intensity value, and time interval, for subsequent user feedback analysis.

[0116] Standardized facial movements are matched with intensity levels to quantify the user's facial intensity value and generate user facial feedback data.

[0117] Specifically, based on the motion intensity values ​​in standardized facial movements, and combined with the intensity grading rules of the FACS standard, the motion intensity values ​​are converted into quantitative scores. The conversion and quantification process is achieved through linear mapping, for example, mapping a motion intensity value of 5 to 100% and a motion intensity value of 0 to 0%. The quantified scores are then matched with the emotion category of the user's current interaction scenario to generate user facial feedback data.

[0118] The user's facial feedback data is weighted and corrected to assess the user's acceptance of simulated micro-expressions;

[0119] The quantitative scores from user facial feedback data are combined with a preset emotion weight matrix for correction. The emotion weight matrix is ​​set according to the sensitivity differences of each action under different emotion categories. By analyzing the performance of each facial action under different emotions and combining the differences in users' natural perception of expressions in real scenarios, the relative importance of each action in a specific emotion is determined. For example, in the emotion of surprise, AU25 (pupil dilation) has a higher weight than AU1 (inner brow muscle elevation). The weighted and corrected intensity values ​​are normalized to generate the user's acceptance level of simulated micro-expressions.

[0120] The power spectral density features are time-matched with the user's acceptance of simulated micro-expressions, and a feature scoring relationship table is output.

[0121] The time series of power spectral density features is synchronized and aligned with the time series of user acceptance ratings to ensure the correspondence between power spectral density features and user acceptance of simulated micro-expressions on the time axis. The aligned power spectral density features and user acceptance of simulated micro-expressions are divided into time segments using a sliding window method, and the power spectral density features in each time segment are associated with the corresponding acceptance rating. The associated data is stored in tabular form, with each row of the table recording the power spectral density features of the time segment and the corresponding acceptance rating.

[0122] Normal distribution tests were performed on the power spectral density features in the feature scoring relationship table and the user's acceptance of simulated micro-expressions, and normal distribution data were output.

[0123] For the power spectral density characteristics and acceptability in the feature scoring table, normality tests were performed. The Kolmogorov-Smirnov test was used, comparing the goodness of fit between the data distribution and the standard normal distribution. Specifically, the maximum vertical distance (D statistic) between the actual data distribution of power spectral density characteristics and acceptability and the standard normal distribution curve was calculated. The degree of matching in terms of symmetry, central tendency, and tail thickness was observed to determine whether the power spectral density characteristics and acceptability conformed to a normal distribution. For power spectral density characteristics and acceptability that were not normally distributed, the Box-Cox transformation was used to make them satisfy the normality assumption. The Box-Cox transformation is a method that adjusts the data distribution to make the original data closer to a normal distribution. Its core is to use a power function or logarithmic function to perform a nonlinear transformation on the power spectral density characteristics and acceptability, thereby eliminating data skew and balancing variance. When there are negative values ​​in the data, all values ​​must be adjusted to positive values ​​through a translation operation before the transformation is performed.

[0124] Perform correlation matching on normally distributed data and output feature-rating pairs;

[0125] Based on the power spectral density characteristics and acceptability in a normal distribution, the Pearson correlation coefficient between the power spectral density characteristics and acceptability is obtained. The process of obtaining the Pearson correlation coefficient involves standardizing the covariance and standard deviation of the power spectral density characteristics and acceptability, i.e., the ratio of covariance to standard deviation, with a result ranging from -1 to 1. The larger the absolute value of the Pearson correlation coefficient, the stronger the linear relationship between the power spectral density characteristics and acceptability. Feature-rating pairs are selected through significance tests (such as t-tests).

[0126] Based on the feature-score pair, determine the direction for expression optimization, adjust the humanoid robot's expression according to the expression optimization direction, and output the expression optimization strategy;

[0127] Based on the correlation sign between power spectral density features and acceptability in the feature-rating pair, the direction of expression optimization is determined. When the Pearson correlation coefficient is positive, it indicates that energy enhancement of power spectral density features and acceptability will improve user acceptability, so the power of the humanoid robot's facial actuation signal needs to be increased. When the correlation coefficient is negative, it indicates that energy suppression of power spectral density features and acceptability will improve user acceptability, so the power of the humanoid robot's facial actuation signal needs to be reduced. The power adjustment range is dynamically scaled by combining the emotion intensity value of the emotion classification model. For example, in high emotion intensity scenarios, the power of relevant features is enhanced to amplify the expression effect; in low emotion intensity scenarios, the power of relevant features is suppressed to avoid over-exaggeration. Finally, the adjusted power parameters are mapped back to the humanoid robot's expression amplitude parameters to generate an expression optimization strategy that includes the target muscle region, duty cycle parameters, and duration parameters.

[0128] This embodiment also provides a computer device applicable to the artificial intelligence-based interactive humanoid robot facial expression simulation method, comprising: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to realize the artificial intelligence-based interactive humanoid robot facial expression simulation method proposed in the above embodiment.

[0129] The computer device can be a terminal, comprising a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad on the computer device's casing, or an external keyboard, touchpad, or mouse.

[0130] This embodiment also provides a storage medium storing a computer program, which, when executed by a processor, implements the method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as proposed in the above embodiments. The storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Red-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0131] In summary, this invention achieves bio-realistic micro-expressions in humanoid robots through a dual closed-loop modeling approach of deformation matching and user feedback-driven expression optimization. By using a deep convolutional neural network to perform multi-scale feature convolution on independent band features in the power spectral density characteristics to generate emotion feature vectors, and then using a fully connected layer to probability-assign emotion categories and intensity values, it achieves precise quantitative mapping from EEG signals to emotional states, overcoming the sensitivity limitations of traditional visual recognition to environmental interference. Combining the elastic modulus and strain limit of the humanoid robot's outer coating material, it substitutes the surface deformation data generated by biomimetic muscle contraction commands into the deformation relationship equation to obtain the stress-strain response and integrate displacement vector distribution to generate a dynamic deformation trajectory. This drives real-time adjustment of surface texture details, achieving anatomical-level synchronization between the dynamic deformation of the silicone material and muscle movement, thus reducing the stiffness of mechanical expressions.

[0132] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. An artificial intelligence-based interactive communication humanoid robot expression simulation method, characterized by: include, The system collects and preprocesses user brainwave signals to generate power spectral density features. The power spectral density features are input into a pre-trained emotion classification model. A deep convolutional neural network is used to classify and quantify the intensity of the power spectral density features, and the user's emotion category and emotion intensity value are output. The emotion category and emotion intensity value are mapped to the emotion expression to generate the expression amplitude parameter. Based on the expression amplitude parameter, the facial layout of the humanoid robot is adjusted and the bionic muscle contraction command is output. The facial muscle fibers of the humanoid robot are adjusted according to the bionic muscle contraction command. Combined with the physical deformation characteristics of the humanoid robot's outer layer, the surface texture details of the humanoid robot are dynamically adjusted to generate simulated micro-expressions. By capturing user facial feedback data through camera eyeballs, the user's acceptance of simulated micro-expressions is evaluated based on the user's facial feedback data. The acceptance level is compared with the power spectral density feature using Pearson correlation, and an expression optimization strategy is output. The process of capturing user facial feedback data through a camera and assessing the user's acceptance of simulated micro-expressions based on this data includes the following steps: Capture user facial dynamic data through camera eyeballs; Perform feature recognition on user facial dynamic data to obtain standardized facial movements; Standardized facial movements are matched with intensity levels to quantify the user's facial intensity value and generate user facial feedback data. The user's facial feedback data is weighted and corrected to assess the user's acceptance of simulated micro-expressions; The specific steps for performing feature recognition on user facial dynamic data to obtain standardized facial movements are as follows: The user's facial dynamic data is aligned to generate a standardized image sequence; Extract the user's facial coordinates from a standardized image sequence, and construct a facial motion trajectory based on the user's facial coordinates; Facial expression recognition is performed on the changes in facial coordinate displacement in the facial movement trajectory to obtain standardized facial movements.

2. The method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as described in claim 1, characterized in that: The process involves inputting power spectral density features into a pre-trained emotion classification model, using a deep convolutional neural network to classify and quantify the intensity of the power spectral density features, and outputting the user's emotion category and intensity value. The specific steps are as follows: The power spectral density features are input into a pre-trained emotion classification model. The emotion classification model performs multi-scale feature convolution on the independent band features in the power spectral density features to generate emotion feature vectors. The emotion feature vector is input into the fully connected layer of a deep convolutional neural network to perform emotion category probability assignment and output the emotion category probability distribution. Perform category transformation on the probability distribution of emotion categories and output the emotion category; Extract the maximum probability value from the probability distribution of emotion categories, and perform a numerical conversion of the maximum probability value to the emotion intensity value, outputting the emotion intensity value.

3. The method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as described in claim 1, characterized in that: The specific steps for mapping emotion categories and emotion intensity values ​​to emotional expressions and generating expression amplitude parameters are as follows: The system retrieves the corresponding basic expression template from the preset emotion expression mapping table according to the emotion category. Based on the emotion intensity value, the standard contraction strength in the basic expression template is proportionally mapped, and the contraction strength value is output. The emotional intensity value is linearly proportional to the duration baseline value in the basic expression template, and the duration value is output. Based on the target muscle region in the basic facial expression template, the contraction strength value and duration value are matched to generate facial expression amplitude parameters.

4. The method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as described in claim 1, characterized in that: The specific steps for adjusting the facial layout of the humanoid robot based on facial expression amplitude parameters and outputting bionic muscle contraction commands are as follows: Extract the target muscle region of the humanoid robot from the facial expression amplitude parameters, number the facial muscle relationships in the target muscle region, and output the facial control number. Convert the contraction strength value in the facial expression amplitude parameter into a strength duty cycle and output the duty cycle parameter; The duration value in the facial expression amplitude parameter is driven by a period mapping, and the time duration parameter is output. The facial control number, duty cycle parameter, and time duration parameter are encapsulated into instructions to output biomimetic muscle contraction instructions.

5. The method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as described in claim 1, characterized in that: The process involves adjusting the facial muscle fibers of the humanoid robot according to biomimetic muscle contraction commands, and dynamically adjusting the surface texture details of the humanoid robot by combining the physical deformation characteristics of the robot's outer layer to generate simulated micro-expressions. The specific steps are as follows: Based on the bionic muscle contraction command, the facial components of the humanoid robot are activated, the physical position of the facial muscle fibers of the humanoid robot is adjusted, and surface situation change data is generated. Extracting the elastic modulus and strain limit of the outer coating material of the humanoid robot from the outer coating layer; By performing deformation matching on the surface deformation data, elastic modulus and strain limit, dynamic deformation trajectory of the humanoid robot's outer coating is generated; Adjusting the surface texture details of the humanoid robot according to the dynamic deformation trajectory generates simulated micro-expressions.

6. The method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as described in claim 5, characterized in that: The specific steps for matching the deformation data, elastic modulus, and strain limit of the surface to generate the dynamic deformation trajectory of the outer coating are as follows: Based on the elastic modulus and strain limit, a deformation relationship equation is established; Substitute the surface deformation data into the deformation relationship equation to obtain the stress-strain response of the humanoid robot's facial expression; The displacement vector distribution of the humanoid robot's outer coating is obtained based on the stress-strain response. The displacement vector distribution is then integrated according to the time series of the deformation data to generate a dynamic deformation trajectory.

7. The method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as described in claim 1, characterized in that: The specific steps for comparing acceptability with power spectral density features using Pearson correlation to output an expression optimization strategy are as follows: The power spectral density features are time-matched with the user's acceptance of simulated micro-expressions, and a feature scoring relationship table is output. Normal distribution tests were performed on the power spectral density features in the feature scoring relationship table and the user's acceptance of simulated micro-expressions, and normal distribution data were output. Perform correlation matching on normally distributed data and output feature-rating pairs; Based on the feature-score pair, we determine the direction for expression optimization, adjust the humanoid robot's expression according to the expression optimization direction, and output the expression optimization strategy.

8. The method for simulating facial expressions of an interactive humanoid robot based on artificial intelligence as described in claim 7, characterized in that: The aforementioned expression optimization direction refers to enhancing and suppressing the power spectral density feature based on the positive and negative correlation between the power spectral density feature and the degree of acceptance in the feature-rating pair, and combining it with the emotion intensity value of the emotion classification model to generate an expression optimization strategy that adjusts the facial expression amplitude parameters of the humanoid robot.