Facial Expression Control Method and System for Facial Expression Robots
By employing a micro-amplitude interval intelligent recognition and hybrid compensation mechanism, combined with a lightweight diffusion model to optimize the trajectory, the problems of dead zone, quantization, hysteresis, and jitter in robot micro-expression control are solved, achieving highly natural and robust micro-expression expression and improving the robot's interactive experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI GUOKE EMBODIED INTELLIGENT ROBOT CO LTD
- Filing Date
- 2025-12-19
- Publication Date
- 2026-06-30
AI Technical Summary
Existing robot facial expression control technology struggles to effectively address issues such as dead zones, quantization, hysteresis, and jitter in the micro-amplitude range without relying on specialized hardware. This leads to the failure or distortion of micro-expression control, making it difficult to achieve human-like micro-expression expressions.
Employing a micro-amplitude interval intelligent recognition and hybrid compensation mechanism, combined with the trajectory optimization capability of the lightweight diffusion model, the hybrid compensation strategy generates intermediate target increments, and the lightweight diffusion denoising model generates a smooth final execution trajectory, avoiding motion failure or distortion caused by mechanical dead zones and quantization errors.
It achieves highly natural and robust micro-expression control without the need for additional special hardware, enhancing the emotional expressiveness and realism of robot-human interaction.
Smart Images

Figure CN121696952B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robot control technology, and more specifically, to a method and system for controlling the facial expressions of an expression robot. Background Technology
[0002] With the widespread application of robotics in service, companionship, and human-computer interaction, the naturalness and subtlety of robot facial expressions have become crucial for enhancing the interactive experience. Currently, robot facial expression control mainly relies on servo motors, stepper motors, or wire-driven mechanisms to drive facial action units (AUs). For facial expressions of normal amplitude, traditional control methods (such as PID control and trajectory planning algorithms) can achieve relatively stable performance. However, when robots need to present human-like "micro-expressions"—that is, subtle changes in facial expressions with extremely small amplitude and slow speed—existing control methods face a series of prominent problems caused by inherent defects in mechanical transmission systems.
[0003] First, within the micro-motion range, dead zones caused by gear backlash and transmission line slack are particularly pronounced. Tiny control commands cannot overcome the dead zone threshold, resulting in complete unresponsiveness of the motion unit. For example, the command to "slightly raise the corners of the mouth" fails to execute because it cannot overcome the dead zone, severely impacting the accuracy of micro-expression transmission. Second, limited by the quantization error of the servo motor or encoder, extremely small angular changes may be discarded because they are below the resolution threshold, or unexpected positional jumps may occur due to quantization amplification, disrupting the continuity of expressions. Furthermore, motion hysteresis is more pronounced in low-speed, short-stroke control, manifesting as delayed motion initiation or sluggish response, causing micro-expressions to lose their inherent immediacy and natural fluidity. Finally, in low-speed micro-motion states, the motor controller is prone to introducing micro-oscillations and jitter, further weakening the stability and reliability of mechanical execution.
[0004] While some existing technologies attempt to improve small-range control performance through high-precision sensors or complex hardware compensation mechanisms, such solutions often lead to a significant increase in system cost and structural complexity, and are difficult to apply to robot facial expression control systems with stringent real-time and embedded deployment requirements. Some methods also employ general motion smoothing algorithms or traditional nonlinear compensation strategies, but their effectiveness is often limited when facing the crucial contradiction of "dead zone breakthrough and balance of motion subtlety" in micro-expression control: simply increasing the command output can destroy the "micro" characteristics, while over-reliance on smoothing filters can exacerbate latency and response lag.
[0005] Therefore, existing robot facial expression control technology lacks an efficient solution that can systematically identify and intelligently compensate for the dead zones, quantization, hysteresis, and jitter problems unique to the "micro-amplitude range" without relying on special hardware, making it difficult to achieve truly human-like micro-expression capabilities. Summary of the Invention
[0006] In response, the present invention provides a method and system for controlling the facial expressions of an expression robot, so as to at least partially solve the above-mentioned technical problems.
[0007] This invention provides a method for controlling the facial expressions of an expression robot, comprising the following steps:
[0008] S1, receives facial expression semantic instructions and generates corresponding target amplitude and target change rate for at least one facial action unit of the robot;
[0009] S2, based on the target amplitude and target rate of change, calculate the amplitude normalization ratio and rate normalization ratio of the current action unit, and determine whether the action unit has entered the micro-amplitude range control mode based on the comparison results of the two ratios with the preset threshold.
[0010] S3, when it is determined that the action unit enters the micro-range control mode, according to the relationship between the target increment of the action unit and the preset dead zone threshold, a hybrid compensation strategy is adopted to generate an intermediate target increment. If the absolute value of the target increment is not greater than the sum of the dead zone threshold and a safety margin, then a minimum breakthrough step size with a specific direction is output as the intermediate target increment. If the absolute value of the target increment is greater than the sum of the dead zone threshold and the safety margin but not greater than a switching threshold, then the target increment is processed by a smooth transition function to generate the intermediate target increment.
[0011] S4, perform short-time trajectory planning based on the intermediate target increment to generate an initial control trajectory;
[0012] S5, the initial control trajectory, the identification information of the action unit, and the target change rate are used as conditions and input into a pre-trained lightweight diffusion denoising model. Through a few steps of sampling inference, a smooth final execution trajectory is generated.
[0013] S6, the final execution trajectory is converted into a driving signal and sent to the corresponding actuator to drive the facial motion unit to move.
[0014] In one possible embodiment, determining whether to enter the micro-amplitude range control mode further includes: setting a lag time for the action of entering or exiting the micro-amplitude range control mode; only when the action unit meets the determination condition for entering the micro-amplitude range and the duration exceeds a first lag time threshold, is the action unit officially switched to the micro-amplitude range control mode; only when the action unit does not meet the determination condition for entering the micro-amplitude range and the duration exceeds a second lag time threshold is the action unit exited the micro-amplitude range control mode.
[0015] In one possible embodiment, the training process of the lightweight diffusion denoising model includes: constructing a training dataset consisting of ideal micro-expression trajectories; injecting noise into the ideal micro-expression trajectories based on a noise model containing dead zones, quantization errors, hysteresis, and mechanical jitter to generate noisy training samples; using the noisy training samples as input and the corresponding ideal micro-expression trajectories as training targets, and performing supervised training of the diffusion denoising model using a composite loss function that combines trajectory position error, velocity coherence error, acceleration smoothness error, and expression semantic preservation error.
[0016] In one possible embodiment, the core architecture of the lightweight diffusion denoising model is a residual temporal network; the network contains multiple residual blocks connected in series, each residual block consisting of at least a one-dimensional temporal convolutional layer, a normalization layer, and an activation function; the network has a temporal encoding module at the input end, which is used to generate an encoding vector for each time step of the input trajectory, and the encoding vector is combined with the noisy initial control trajectory as the input of the first residual block, so that the denoising process can perceive and maintain the temporal continuity of the action.
[0017] In one possible embodiment, the lightweight diffusion denoising model employs a deterministic or stochastic few-step sampling algorithm during inference, wherein the number of sampling steps is set to a fixed value much smaller than the number of diffusion steps during training, in order to meet the real-time requirements of robot facial expression control.
[0018] In one possible embodiment, in the step of generating intermediate target increments using a hybrid compensation strategy, the smooth transition function is an S-shaped curve function, used to map the target increment into a continuous and smoothly changing output value within the interval defined by the minimum breakthrough step size and the switching threshold, thereby achieving a natural transition from discrete dead zone breakthrough to continuous control.
[0019] In another aspect, this application also provides an expression control system for an expression robot, comprising:
[0020] The instruction processing module is used to receive facial expression semantic instructions and generate corresponding target amplitude and target change rate for at least one facial action unit of the robot.
[0021] The judgment module is used to calculate the amplitude normalization ratio and the rate normalization ratio of the current action unit based on the target amplitude and the target rate of change, and to determine whether the action unit has entered the micro-amplitude range control mode based on the comparison result of the two ratios with a preset threshold.
[0022] The intermediate target increment generation module is used to generate an intermediate target increment by adopting a hybrid compensation strategy when it is determined that the action unit enters the micro-amplitude range control mode, based on the relationship between the target increment of the action unit and the preset dead zone threshold. If the absolute value of the target increment is not greater than the sum of the dead zone threshold and a safety margin, a minimum breakthrough step size with a specific direction is output as the intermediate target increment. If the absolute value of the target increment is greater than the sum of the dead zone threshold and the safety margin but not greater than a switching threshold, the target increment is processed by a smooth transition function to generate the intermediate target increment.
[0023] The initial control trajectory generation module is used to perform short-time trajectory planning based on the intermediate target increment and generate the initial control trajectory.
[0024] The final execution trajectory generation module is used to input the initial control trajectory, the identification information of the action unit, and the target change rate as conditions into a pre-trained lightweight diffusion denoising model, and generate a smooth final execution trajectory through a few steps of sampling inference.
[0025] The execution module is used to convert the final execution trajectory into a drive signal and send it to the corresponding actuator to drive the facial motion unit to move.
[0026] This application also provides an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the facial expression control method for an facial expression robot as described above.
[0027] In another aspect, this application provides a computer-readable storage medium having computer program instructions stored thereon, which can be executed by a processor to implement the facial expression control method for an facial expression robot as described above.
[0028] Another aspect of this application provides a computer program product, including a computer program that, when executed by a processor, implements the facial expression control method for an facial expression robot as described above.
[0029] This application effectively solves the core challenge of robot micro-expression control by introducing a micro-amplitude interval intelligent recognition and hybrid compensation mechanism, combined with the trajectory optimization capability of a lightweight diffusion model. As a result, the system can accurately identify facial expressions falling into the micro-amplitude interval due to small target amplitude and low speed, and automatically trigger a dedicated control strategy, fundamentally avoiding motion failure or distortion caused by mechanical dead zones and quantization errors. Secondly, the continuous-discrete hybrid compensation curve, through the organic combination of discrete breakthrough step size and S-shaped smoothing function, ensures reliable breakthrough of transmission dead zones while maintaining the subtle changes and motion continuity of micro-expressions, eliminating abrupt changes or jitters that are prone to occur near dead zones in traditional methods. Furthermore, applying the lightweight diffusion model to the denoising and generation of control trajectories effectively filters out quantization noise, hysteresis, and jitter introduced by the mechanical system, generating a natural, smooth motion trajectory that more closely resembles the dynamic characteristics of human micro-expressions. Finally, this method achieves high naturalness and robustness control of micro-expressions at the entire algorithm level without adding any special hardware sensors, significantly improving the emotional expressiveness and realism of robot-human interaction. Attached Figure Description
[0030] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0031] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:
[0032] Figure 1 This is a schematic diagram of an expression control method for an expression robot provided in an embodiment of the present invention.
[0033] Figure 2 This is a schematic diagram of the intermediate target increment generation process provided in an embodiment of the present invention.
[0034] Figure 3 This is a schematic diagram of the model training process provided in an embodiment of the present invention.
[0035] Figure 4 This is a schematic diagram of the model architecture provided in an embodiment of the present invention.
[0036] Figure 5 This is a schematic diagram of the structure of an expression control system for an expression robot provided in an embodiment of the present invention.
[0037] Figure 6 This is a schematic diagram of the structure of a device provided in an embodiment of the present invention. Detailed Implementation
[0038] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0039] It should be noted that all user information (including but not limited to user device information, user personal information, object information corresponding to device usage data, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, device usage data, etc.) involved in all embodiments of this disclosure are information and data authorized by the user or fully authorized by all parties.
[0040] The following detailed description, in conjunction with specific embodiments, illustrates the implementation process of the facial expression control method for the facial expression robot described in this invention. It should be noted that these embodiments are merely illustrative of the invention and not intended to limit its scope. Any conventional adjustments or substitutions made by those skilled in the art to the steps without departing from the inventive concept should be included within the scope of protection of this invention.
[0041] like Figure 1 As shown in the diagram, this invention discloses a method for controlling the facial expressions of an expression robot, comprising the following steps:
[0042] S1, receives facial expression semantic instructions and generates corresponding target amplitude and target change rate for at least one facial action unit of the robot;
[0043] S2, based on the target amplitude and target rate of change, calculate the amplitude normalization ratio and rate normalization ratio of the current action unit, and determine whether the action unit has entered the micro-amplitude range control mode based on the comparison results of the two ratios with the preset threshold.
[0044] S3, when it is determined that the action unit enters the micro-range control mode, according to the relationship between the target increment of the action unit and the preset dead zone threshold, a hybrid compensation strategy is adopted to generate an intermediate target increment. If the absolute value of the target increment is not greater than the sum of the dead zone threshold and a safety margin, then a minimum breakthrough step size with a specific direction is output as the intermediate target increment. If the absolute value of the target increment is greater than the sum of the dead zone threshold and the safety margin but not greater than a switching threshold, then the target increment is processed by a smooth transition function to generate the intermediate target increment.
[0045] S4, perform short-time trajectory planning based on the intermediate target increment to generate an initial control trajectory;
[0046] S5, the initial control trajectory, the identification information of the action unit, and the target change rate are used as conditions and input into a pre-trained lightweight diffusion denoising model. Through a few steps of sampling inference, a smooth final execution trajectory is generated.
[0047] S6, the final execution trajectory is converted into a driving signal and sent to the corresponding actuator to drive the facial motion unit to move.
[0048] In some embodiments, step S1 involves converting abstract facial expression semantic instructions into quantized target parameters executable by the robot's facial action unit, providing basic input for subsequent control logic. Specifically, the implementation process of this step is as follows:
[0049] First, the system receives facial expression semantic instructions from external input. For example, these instructions can be obtained through various interaction methods. They can be natural language instructions such as "slight smile," "mild confusion," or "mild alertness" input by the user through a human-computer interaction interface (such as a touchscreen or voice assistant), or standardized semantic instructions issued by a higher-level control system (such as a robot emotion decision module) (e.g., a slight upturn of the mouth coded AU12). It is understood that regardless of the instruction form, its core must include clearly defined "expression type" and "intensity level" information to ensure the accuracy of subsequent target parameter generation.
[0050] Secondly, the facial expression semantic instructions are parsed and quantified to generate the target amplitude and target change rate corresponding to at least one facial action unit (ActionUnit, hereinafter referred to as "AU").
[0051] Specifically, based on a pre-defined "semantic-action mapping library," which is pre-calibrated through experiments and human facial expression databases (such as CK+ and JAFFE expression databases), the library stores the association relationships and parameter ranges between different semantic expressions and their corresponding Action Units (AUs). For example, for the semantic instruction of "slight smile," the library pre-defines its primary associated AU as "corner of the mouth lifting unit (AU12)" and its secondary associated AU as "orbicularis oculi muscle unit (AU6, slight contraction to present a natural smile)." At the same time, for the intensity level of "slight," the library calibrates the target amplitude range of AU12 as 5% to 8% of its maximum movable amplitude (if the maximum movable angle of AU12 is 20°, then the target amplitude is 1° to 1.6°), and the target change rate range is 0.5° / s to 1° / s (ensuring that the movement is slow and delicate).
[0052] During parameter generation, it is preferable to fine-tune the parameters based on the real-time state of the robot's facial mechanism. For example, if the built-in encoder detects an initial offset of 0.3° in the current position of AU12 (due to mechanical fatigue or residual actions from previous movements), the offset will be automatically compensated when generating the target amplitude. That is, if the original target amplitude is 1.2°, the final target amplitude will be adjusted to 1.5° to ensure that the actual action effect matches the expected semantic instructions. Finally, the parameter format output by the high-level target generator is "{AU identifier, target amplitude (unit: ° or mm, depending on the type of transmission mechanism), target change rate (unit: ° / s or mm / s)}", such as "{AU12, 1.5°, 0.8° / s}" and "{AU6, 0.5°, 0.3° / s}", which will be used as the input data for step S2.
[0053] In some embodiments, step S2 involves using quantitative calculation and hysteresis determination to accurately identify whether the action unit needs to enter the micro-range control mode, thereby avoiding control strategy mismatch caused by misjudgment of small-amplitude actions.
[0054] Specifically, according to an embodiment of the present invention, firstly, for each AU output in step S1, its amplitude normalization ratio (denoted as ) needs to be calculated separately. The ratio of the normalized rate to the rate (denoted as ) This eliminates the incomparability of parameters caused by differences in "maximum movable range" and "maximum response speed" among different AUs.
[0055] For the normalized ratio of amplitude The calculation is based on "the ratio of the target amplitude of the current AU to the maximum movable amplitude of the AU", and the specific formula is as follows:
[0056]
[0057] in, The first generated in step S1 The target amplitude of each AU (if it is a relative increment, then the increment value is taken directly). For the first The current actual position of each AU (read by the built-in encoder of the facial expression robot; if the encoder is unavailable, it is estimated by forward mapping of the kinematic model). For the first The maximum movable amplitude of each AU (obtained in advance through mechanical calibration, e.g., AU12) It is 20°, AU6 (8°) This indicates that the absolute value is taken to ensure that the ratio is non-negative.
[0058] For example, if AU12's , , Then its target increment is Normalized amplitude ratio (i.e., 6%).
[0059] For the rate normalization ratio The calculation is based on the ratio of the current AU's target rate of change to the AU's maximum response rate, and the specific formula is as follows:
[0060]
[0061] in, The first generated in step S1 The target rate of change for each AU For the first The maximum response rate of each AU (obtained in advance through driver performance testing, e.g., AU12) At 5° / s, AU6 (3° / s).
[0062] Continuing with the example above, if AU12's , Then the rate normalization ratio (i.e., 16%).
[0063] Next, a preliminary judgment is made based on threshold comparison. Specifically, after obtaining... and Then, both are compared with a preset amplitude threshold (denoted as ). ) and rate threshold (denoted as Compare the results, if both conditions are met... "and" If the condition is met, it is preliminarily determined that the AU needs to enter the micro-range control mode; otherwise, it is preliminarily determined that the AU will remain in the normal control mode.
[0064] In some embodiments, the preset threshold and Calibration needs to be performed based on the characteristics of the robot's facial drive mechanism (such as dead zone size and encoding resolution). Preferably, The value range is 0.05~0.12 (that is, the target amplitude is 5%~12% of the maximum movable amplitude). The value range is 0.10~0.25 (that is, the target rate of change is 10%~25% of the maximum response rate).
[0065] For example, if set , Then the above AU12 and Preliminary assessment indicates that it needs to enter a micro-range control mode; if a certain AU's (Exceed Even if (satisfy It was also preliminarily determined that it would maintain its conventional mode.
[0066] Optionally, to avoid frequent mode switching (e.g., AU) caused by mechanical vibration or sensor noise, exist (Slight fluctuations in the vicinity cause the pattern to repeatedly switch between minor and normal states). Based on the initial assessment, this invention adds lag time determination logic. Specifically, this logic includes the following two aspects:
[0067] On the one hand, for AUs that are initially determined to need to "enter the micro-amplitude range control mode", the system will start a timer. If the AU meets the following conditions, the system will then initiate a timer. and The duration of "" exceeds the first lag time threshold (denoted as If the condition is not met during the timer's operation, the AU is ultimately determined to switch to the micro-range control mode; if the AU no longer meets the above conditions during the timer's operation, the timer is reset and the initial determination is performed again. Preferably, The value range is 40ms to 80ms, and it can be set to 50ms to ensure that the judgment result is not affected by transient noise.
[0068] On the other hand, for AUs that are initially determined to need to "exit the micro-amplitude range control mode" (i.e., those that were originally in micro-amplitude mode but currently do not meet the "exit mode"), and The system also starts a timer. If the duration for which the AU does not meet the above conditions exceeds the second lag time threshold (denoted as...), the system will then start a timer. If the condition is met again during the timing period, the timer is reset to zero, maintaining the micro-amplitude mode. Preferably, The range of values and Consistency, i.e., 40ms~80ms, for example, setting it to 50ms, to ensure smooth mode switching.
[0069] For example, if a certain AU's Because the mechanical vibration fluctuates between 0.09 and 0.11 ( ),when It was initially determined that it had entered micro-amplitude mode, but the duration was only 20ms (not exceeding...). ), then The value rises back to 0.11, at which point the timer is reset to zero, and no mode switching occurs; only when... It only switches to micro-amplitude mode after stabilizing below 0.09 for 50ms.
[0070] In some embodiments, for step S3, step S3 adopts a hybrid compensation strategy of "discrete breakthrough + continuous smoothing" based on the relationship between the target increment of AU and the dead zone threshold, so as to ensure that the mechanical dead zone is broken through while avoiding unnatural facial expressions caused by sudden changes in movement.
[0071] Please see Figure 2 The schematic diagram of the intermediate target increment generation process provided in this embodiment of the invention, specifically, in S201, the target increment and dead zone threshold are determined. First, the system calculates the... The target increment of AU (denoted as ) The calculation formula is as follows:
[0072]
[0073] in, and The definition is the same as in step S2. The positive and negative signs indicate the direction of the action (for example, positive means "upward" or "open", and negative means "downward" or "contract").
[0074] Secondly, the system calls the preset "dead zone threshold" (denoted as...). ), the threshold is the first The inherent dead zone width of the driver corresponding to each AU is determined by manufacturing tolerances (such as gear backlash) or engineering experience, preferably... The value range is 0.3°~1.0° (for angle drive mechanisms) or 0.1mm~0.5mm (for linear drive mechanisms).
[0075] For example, if AU12's Then when At this time, conventional control commands cannot drive AU actions, and compensation strategies are needed to overcome the dead zone.
[0076] In addition, to ensure the reliability of dead zone breakthrough (avoiding insufficient compensation due to dead zone threshold calibration errors), the system also introduces a safety margin (denoted as ). Preferably, The value range is 0.1° to 0.3°, for example, it can be set to 0.2°.
[0077] In S202, a hybrid compensation strategy is executed according to different cases to generate corresponding intermediate target increments. Specifically, according to an embodiment of the present invention, the hybrid compensation strategy is based on... The size is divided into the following three cases for execution, generating the corresponding intermediate target increment (denoted as ). )
[0078] Scenario 1: The target increment falls into the dead zone ( )
[0079] when Not greater than " When the target increment is completely within the mechanical dead zone, conventional instructions cannot drive the AU action. In this case, the "discrete breakthrough" strategy is adopted, outputting a minimum breakthrough step size with a specific direction (denoted as ). As .
[0080] Specifically, The calculation formula is:
[0081]
[0082] in, A sign function, used to preserve and The direction of the movement is consistent (e.g.) For positive, It is also positive; Negative, (Also negative) For the first The minimum identifiable angle (or minimum identifiable displacement) of each AU is determined by the driver's encoding resolution (e.g., if the encoding resolution is 0.1°, then...). ), used to ensure It can be recognized and executed by the driver.
[0083] For example, if AU12's (Positive direction, i.e., the corners of the mouth turn up). , , ,but , ,therefore This means that the intermediate target increment is 0.7°. This step size can ensure that the dead zone of 0.5° is broken, while the direction is consistent with the original target.
[0084] Scenario 2: The target increment is within a small range above the dead zone ( )
[0085] when Greater than " "But not greater than the "switching threshold" (denoted as When the target increment has partially exceeded the dead zone, but is still within the range of micro-amplitude movements, a "continuous smoothing" strategy is adopted, using a smoothing transition function to adjust the target increment. Process and generate This is to achieve a natural transition from "discrete breakthrough" to "continuous control".
[0086] In some embodiments, the switching threshold This is the upper limit for micro-motion, and its value is calibrated based on the actuator characteristics and the sensitivity of the facial mechanism. Preferably, The value range is 3° to 6° (for angle-driven mechanisms). For example, if AU12's... , Then when (satisfy When this happens, the compensation logic for that situation is entered.
[0087] Optionally, the smooth transition function is an S-shaped curve function, which can map the input target increment to a continuously and smoothly changing output value, avoiding abrupt changes. Preferably, the S-shaped curve function is a logistic S-function, whose expression is:
[0088]
[0089] in, This indicates the portion of the target increment that exceeds the "dead zone + safety margin"; This is the maximum output amplitude of the curve, and its value is approximately equal to " This is used to ensure that the curve covers the entire small range; The steepness parameter of the curve, preferably, The value range is 10~30. The larger the value, the steeper the curve changes near the midpoint; The curve's half-saturation point has a value of "". ", which is the midpoint of the small range, is used to ensure the symmetry of the curve.
[0090] Based on the aforementioned S-curve function, the intermediate target increment The calculation formula is:
[0091]
[0092] For example, the parameters of AU12 are continued ( , ),like (Positive direction), then , , , Substituting into the expression for S(x), we get:
[0093]
[0094] and then This value not only overcomes the dead zone but also achieves a smooth transition through an S-shaped curve, avoiding direct use. Possible sudden changes in behavior.
[0095] Scenario 3: The target increment exceeds the slight range ( )
[0096] when Greater than When this occurs, it indicates that the target increment has exceeded the range of micro-movements and no special compensation is needed. In this case, the original target increment is directly used as the intermediate target increment, i.e.:
[0097]
[0098] For example, if AU12's , (satisfy ),but The subsequent handling will follow the standard control strategy.
[0099] In some embodiments, step S4 is used to "discretize" the intermediate target increment generated in step S3. This is converted into a "continuous" initial control trajectory, providing structured input for subsequent diffusion model denoising. This step falls under the category of conventional trajectory planning and is relatively simple to implement, as follows:
[0100] According to an embodiment of the present invention, the system first determines the time window for trajectory planning (denoted as ). This time window needs to be aligned with the target rate of change. Matching ensures the trajectory's time length is reasonable. Specifically, The calculation formula is:
[0101]
[0102] in, This represents the absolute value of the intermediate target increment. The target rate of change generated in step S1. For example, if AU12's... , ,but That is, the time window for trajectory planning is 1.9 seconds.
[0103] Secondly, the system employs a short-time trajectory planning algorithm, in The initial control trajectory is generated within the time window (denoted as...). , For time variables, In some embodiments, the trajectory planning algorithm is preferably a cubic spline interpolation algorithm or a least squares curve fitting algorithm with jerk penalty. Both algorithms can ensure the continuity of the trajectory's position and velocity, and avoid mechanical impact.
[0104] Specifically, if a cubic spline interpolation algorithm is used, the system will set multiple interpolation nodes within the time window (e.g., one node every 10ms, for a total of 190 nodes in 1.9s), and based on... and Determine the target location for each node: starting node ( The position of ) is The velocity is 0 (assuming initial rest); the termination node ( The position of ) is The speed is The positions and velocities of intermediate nodes are obtained through cubic spline interpolation, ultimately forming a continuous initial control trajectory. .
[0105] For example, AU12's , The position of the termination node is By using cubic spline interpolation, a position sequence of 190 nodes can be generated, with each node corresponding to the control position at a given time point. This sequence constitutes the initial control trajectory. .
[0106] In some embodiments, for step S5, step S5 uses a pre-trained lightweight diffusion denoising model to denoise and optimize the initial control trajectory generated in step S4, generating a natural and smooth final execution trajectory. This step involves two stages: model training and inference, and the specific implementation process is as follows.
[0107] According to embodiments of the present invention, the core of model training is to construct a mapping relationship between "noisy trajectory → ideal trajectory," ensuring that the model has the ability to filter out mechanical noise and quantization errors. Please refer to... Figure 3 The schematic diagram of the model training process provided in this embodiment of the invention includes the following steps:
[0108] In step S301, a training dataset is constructed. Specifically, the system first constructs a training dataset consisting of ideal micro-expression trajectories. Specifically, the ideal micro-expression trajectories are generated in the following manner:
[0109] 1. Extract micro-expression data of different types and intensities (such as slight smile, slight frown, slight surprise) from human facial expression databases (such as CK+, JAFFE) and obtain the corresponding AU motion trajectories (location-time series).
[0110] 2. Based on the kinematic model of the robot's facial structure, the trajectory of human micro-expressions is mapped to the ideal trajectory of the robot AU (e.g., mapping a 1mm upward tilt of the human mouth corner to a 0.8° upward tilt of the robot AU12).
[0111] 3. Smooth the mapped trajectory (e.g., using Gaussian filtering) to eliminate unintentional jitter in human facial expressions, forming a "clean" ideal trajectory (denoted as ). );
[0112] 4. According to "{AU identifier, ideal trajectory" Target rate of change The training dataset is constructed using the format "}", with a preferred size of 10,000 to 50,000 entries to cover different AUs and different micro-expression types.
[0113] In S302, secondly, noisy training samples are generated to simulate the mechanical noise in actual robot operation. The system uses a preset noise model to train the ideal trajectory. Noise is injected to generate noisy training samples (denoted as ). Specifically, the noise model includes the following four typical noise components, the parameters of which are calibrated based on actual driver test data:
[0114] Dead zone noise: If the increment of a certain segment in the ideal trajectory If the position value of the trajectory segment is fixed at the position of the previous moment (simulating the action not being executed due to the dead zone), it can be randomly jumped to... (Simulating the randomness of breaking through the dead zone), with probabilities set at 70% and 30% respectively;
[0115] Quantization noise: Quantization error is superimposed on each position value of the ideal trajectory, with a quantization step size. Equal to the driver's encoding resolution (e.g.) ),Right now ,in This is a rounding function;
[0116] Hysteresis noise: Based on the Preisach hysteresis model, different mapping relationships are used for the "ascending segment" (position increasing) and "descending segment" (position decreasing) of the ideal trajectory. For example, the actual position of the ascending segment lags behind the ideal position by 0.1°~0.3°, and the descent segment lags behind by 0.2°~0.4°.
[0117] Mechanical vibration noise: Based on the above noise, high-frequency white noise (frequency range of 5Hz~20Hz) and low-frequency drift noise (frequency range of 0.1Hz~0.5Hz) are superimposed. The amplitude range of the white noise is 0.05°~0.1° (simulating micro-oscillation of motor), and the amplitude range of the low-frequency drift is 0.1°~0.2° (simulating position drift caused by mechanical fatigue).
[0118] Finally, the noisy training samples The result of the superposition of the above four types of noise has the same format as the ideal trajectory, i.e., "{AU identifier, noisy trajectory" Target rate of change }".
[0119] In S303, the model architecture is designed. Specifically, the lightweight diffusion denoising model adopted in this invention uses a residual temporal network as its core architecture. Its design goal is to reduce the number of parameters and computational complexity while ensuring denoising effectiveness, thus meeting the real-time control requirements of the robot. Please refer to [link to relevant documentation]. Figure 4 The schematic diagram of the model architecture provided in this embodiment of the invention shows the specific components of the model architecture as follows:
[0120] 1. Input layer: Input data includes noisy trajectories. (dimension is) , The number of time steps for the trajectory, for example ), AU identifier (converted to dimension through one-hot encoding) The vector, For example, the total number of AUs for the robot's face. ), target rate of change (After normalization, it becomes a scalar);
[0121] 2. Time Encoding Module: This module generates a time-encoded vector for each time step of the input trajectory to enhance the model's ability to perceive temporal information. Specifically, the time encoding uses sinusoidal embedding. time steps ( ), its encoded vector's first elements ( , For encoding dimensions, the preferred method is... The formula for calculating ) is:
[0122]
[0123] The generated temporal encoding vector (dimension: ) and noisy tracks (dimension is) By combining the splicing operations, a dimension of is formed. The temporal feature vector;
[0124] 3. Residual Temporal Network Layer: This layer consists of multiple residual blocks connected in series. Preferably, the number of residual blocks is 3 to 5, and the structure of each residual block includes:
[0125] One-dimensional temporal convolutional layer: kernel size is 3~5, output channels are 64~128, used to extract temporal features.
[0126] Layer Normalization: Normalizes the convolutional output to accelerate model training;
[0127] Activation function: Use ReLU or GELU functions to introduce nonlinear transformation capability;
[0128] Residual connection: The input of the residual block is directly added to the output to avoid the gradient vanishing problem.
[0129] In addition, the one-hot encoded vector of the AU identifier and the target change rate scalar are mapped to a vector with the same dimension as the number of output channels of the residual block through a fully connected layer. Then, the vector is injected into each residual block through the feature-wise linear modulation (FiLM) mechanism to achieve "conditional denoising" (i.e., adjusting the denoising strategy according to different AUs and different rates).
[0130] 4. Output Layer: Consists of a one-dimensional convolutional layer (kernel size 1), which converts the output of the residual temporal network layer (dimension 1) into a single layer. , (The number of output channels) is mapped to a dimension of The denoised trajectory is the predicted trajectory of the model.
[0131] In S304, supervised training is performed based on a composite loss function. Specifically, after the model architecture is determined, the system uses "noisy trajectories" for training. For input, ideal trajectory Supervised training is performed on the labels, using a composite loss function (denoted as ) during the training process. Optimize model parameters to ensure that the denoised trajectory simultaneously meets the requirements of "accurate position", "consistent velocity", "smooth motion" and "semantic consistency".
[0132] Specifically, the composite loss function It is obtained by weighted summation of the following four parts:
[0133] 1. Trajectory position error loss ( The mean squared error (MSE) is used to calculate the positional difference between the predicted trajectory and the ideal trajectory. The formula is as follows:
[0134]
[0135] in, The model predicts the first Step trajectory position, For time steps;
[0136] 2. Speed continuity error loss ( The velocity difference between the predicted trajectory and the ideal trajectory is calculated using MSE to ensure velocity continuity. The formula is:
[0137]
[0138] in, (Predicted velocity, based on the position difference between two adjacent steps) (Ideal speed);
[0139] 3. Acceleration smoothness error loss ( ): Using jerk (Jerk, the derivative of velocity) The norm is used to calculate the loss in order to suppress severe jitter; the formula is as follows:
[0140]
[0141] in, (Predicted jerk);
[0142] 4. Loss of semantic preservation error in facial expressions ( The predicted trajectory and the ideal trajectory are mapped to facial geometric features (such as mouth corner height and eye fissure width) through a kinematic model. The geometric difference between the two is calculated using MSE to ensure that denoising does not change the semantics of the expression. The formula is as follows:
[0143]
[0144] in, For example, kinematic mapping functions, (This means that for every 1° the AU12 tilts upwards, the height of the corner of the mouth increases by 0.5mm).
[0145] Composite loss function The final expression is:
[0146]
[0147] in, For the loss weights, preferably, their values are as follows: , , , This is to balance the impact of each loss item.
[0148] During training, the model uses the Adam optimizer, with the initial learning rate set to... The learning rate is gradually reduced using a learning rate decay strategy (such as cosine annealing); the optimal number of training epochs is 50-100, until the loss function is fully utilized. The model converges on the validation set (e.g., the loss does not decrease after 5 consecutive validation rounds). This results in a fully trained lightweight diffusion denoising model.
[0149] According to an embodiment of the present invention, during the actual operation phase of the robot, the reasoning process in step S5 involves converting the initial control trajectory generated in step S4 into a logical reasoning process. The input is fed into the trained model, and the final execution trajectory (denoted as ) is generated through a few steps of sampling. The specific process is as follows:
[0150] First, the system will initialize the control trajectory. The corresponding AU identifier (e.g., AU12), and the target rate of change. Preprocess according to the training format: Normalize (map to) (Interval), perform one-hot encoding on the AU identifier, and for Normalize (map to) (Intervals), forming the input data for the model.
[0151] Secondly, in order to meet the real-time requirements of robot facial expression control (usually requiring a response time of less than 100ms), the model uses a deterministic or stochastic few-step sampling algorithm for inference. The number of sampling steps is set to a fixed value that is much smaller than the number of diffusion steps during training (usually 100~1000 steps during training).
[0152] Preferably, the sampling algorithm adopts the DDIM (Denoising Diffusion Implicit Models) algorithm, and the number of sampling steps (denoted as...) The value range is 8 to 12 steps. For example, setting it to 10 steps can ensure the denoising effect while keeping the inference time within 50ms.
[0153] Specifically, the core of DDIM sampling is to gradually denoise using the following recursive formula:
[0154]
[0155] in, Number of sampling steps (from) Decrease to 1). For the first The noisy trajectory of the step, This is the final denoised trajectory. This is a predefined diffusion coefficient (determined during training). This is the noise prediction function for the model. For conditional information (AU identifier and) ).
[0156] Initially, the first Noisy trajectory of the step This refers to the preprocessed initial control trajectory. (A small amount of initial noise is added to initiate the denoising process); subsequently, the model from Initially, the noise prediction function is called at each step. Predict the noise in the current trajectory and update the trajectory using a recursive formula until... The denoised trajectory is obtained. .
[0157] Finally, the system... Perform inverse normalization (mapping back to the original angle or displacement range) and amplitude upper limit protection (ensuring the trajectory position does not exceed the maximum movable range of the AU) to obtain the final execution trajectory. For example, the initial control trajectory of AU12. There is a quantization error and micro-jitter of 0.1°~0.2°. After 10 steps of DDIM sampling by the model, the output... It eliminates jitter, improves trajectory smoothness by more than 90%, and controls positional error within 0.05°, meeting the requirements for naturalness in micro-expressions.
[0158] In some embodiments, step S6 is the final step in converting the control trajectory into a mechanical action, as detailed below:
[0159] According to an embodiment of the present invention, the system first generates the final execution trajectory in step S5. This is converted into a drive signal that the actuator can recognize. Specifically, the format of the drive signal varies depending on the type of robot face actuator (such as a servo motor, stepper motor, or linear actuator):
[0160] For angle-driven servo motors, the drive signal is an "angle command sequence," that is, for each time step... It is directly used as the target angle command for the servo motor;
[0161] For linearly driven actuators (such as wire-driven mechanisms), the drive signal is a "displacement command sequence," which needs to be converted using a kinematic model. (Angle) converted to linear displacement (e.g.) (The radius is the length of the connecting rod).
[0162] For example, if AU12 is driven by a servo motor, the final execution trajectory If the angle sequence is 190 time steps (0.3°→0.35°→…→1.82°), then the drive signal is the angle sequence, and each time step corresponds to the target angle command of a servo motor.
[0163] Secondly, the system sends the drive signal to the corresponding actuator (such as the servo motor of AU12) through a communication interface (such as CAN bus or RS485) and monitors the feedback signal of the actuator (such as the actual position of the encoder) in real time. If the deviation between the feedback signal and the drive signal exceeds a preset threshold (such as 0.1°), the system triggers a retry mechanism to resend the drive signal to ensure the accuracy of the action execution.
[0164] Finally, the actuator drives the facial motion unit to move according to the drive signal, realizing the corresponding micro-expression. For example, the AU12's servo motor rotates gradually according to the angle command sequence, causing the corners of the mouth to slowly rise, presenting a "slight smile" micro-expression. The entire movement is seamless with no dead zones or noticeable jitter, and its naturalness is close to that of human micro-expressions.
[0165] Please see Figure 5 , Figure 5 This is a schematic diagram of the structure of an expression control system 5 for an expression robot provided in an embodiment of this application. Figure 5 As shown, system 500 includes:
[0166] The instruction processing module 501 is used to receive facial expression semantic instructions and generate corresponding target amplitude and target change rate for at least one facial action unit of the robot.
[0167] The judgment module 502 is used to calculate the amplitude normalization ratio and the rate normalization ratio of the current action unit based on the target amplitude and the target rate of change, and to determine whether the action unit has entered the micro-amplitude range control mode based on the comparison result of the two ratios with a preset threshold.
[0168] The intermediate target increment generation module 503 is used to generate an intermediate target increment by adopting a hybrid compensation strategy when it is determined that the action unit enters the micro-amplitude range control mode, based on the relationship between the target increment of the action unit and the preset dead zone threshold. If the absolute value of the target increment is not greater than the sum of the dead zone threshold and a safety margin, then a minimum breakthrough step size with a specific direction is output as the intermediate target increment. If the absolute value of the target increment is greater than the sum of the dead zone threshold and the safety margin but not greater than a switching threshold, then the target increment is processed by a smooth transition function to generate the intermediate target increment.
[0169] The initial control trajectory generation module 504 is used to perform short-time trajectory planning based on the intermediate target increment and generate an initial control trajectory.
[0170] The final execution trajectory generation module 505 is used to input the initial control trajectory, the identification information of the action unit, and the target change rate as conditions into a pre-trained lightweight diffusion denoising model, and generate a smooth final execution trajectory through a few steps of sampling inference.
[0171] The execution module 506 is used to convert the final execution trajectory into a drive signal and send it to the corresponding actuator to drive the facial motion unit to move.
[0172] Those skilled in the art will clearly understand that the technical solutions of the embodiments of this application can be implemented by means of software and / or hardware. In this specification, "unit" and "module" refer to software and / or hardware that can independently complete or cooperate with other components to complete a specific function, wherein the hardware may be, for example, a field-programmable gate array (FPGA), an integrated circuit (IC), etc.
[0173] Each processing unit and / or module in the embodiments of this application can be implemented by an analog circuit that implements the functions described in the embodiments of this application, or by software that executes the functions described in the embodiments of this application.
[0174] Please see Figure 6 It shows a schematic diagram of the structure of an electronic device according to an embodiment of this application, which can be used to implement... Figure 1 The method in the illustrated embodiment. (As shown) Figure 6 As shown, the electronic device 600 may include:
[0175] The system includes at least one processor 601, at least one network interface 604, a user interface 603, a memory 605, and at least one communication bus 602. The communication bus 602 is used to enable connection and communication between the components. The user interface 603 may include buttons, and optionally include a standard wired or wireless interface. The network interface 604 may include, but is not limited to, a Bluetooth module, an NFC module, a Wi-Fi module, etc.
[0176] The processor 601 may include one or more processing cores and connect to various parts within the device 600 via various interfaces and lines. It implements the various functions and data processing of the device 600 by running or executing instructions, programs, code sets, or instruction sets stored in the memory 605, and by accessing data in the memory 605. Optionally, the processor 601 may be implemented using at least one hardware form of DSP, FPGA, or PLA. The processor 601 may also integrate one or more combinations of CPU, GPU, and modem. The CPU is mainly used to handle the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required for display; and the modem is used for wireless communication. It is understood that the modem may not be integrated into the processor 601, but may be implemented through a separate chip.
[0177] Memory 605 may include random access memory (RAM) or read-only memory (ROM). Optionally, memory 605 includes a non-transitory computer-readable medium for storing instructions, programs, code, code sets, or instruction sets. Memory 605 may be divided into a program storage area and a data storage area, wherein the program storage area may be used to store instructions for implementing an operating system, instructions for implementing at least one function (such as touch functionality, audio playback functionality, image playback functionality, etc.), and instructions for implementing the foregoing method embodiments; the data storage area may be used to store data involved in the relevant method embodiments. Memory 605 may also be at least one storage device located remotely from processor 601. Figure 6 As shown, the memory 605, which serves as a computer storage medium, may contain an operating system, a network communication module, a user interface module, and program instructions.
[0178] In particular, the methods and / or embodiments in this application can be implemented as computer software programs. For example, the embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowchart. When the computer program is executed by processor 601, it performs the functions defined in the methods of this application.
[0179] Another embodiment of this application provides a computer-readable storage medium having computer program instructions stored thereon, which can be executed by a processor to implement the methods and / or technical solutions of any one or more embodiments of this application described above.
[0180] The computer-readable storage medium may include, but is not limited to, any type of disk, including floppy disks, optical disks, DVDs, CD-ROMs, microdrives, as well as magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic cards or optical cards, nanosystems (including molecular memory ICs), or any type of medium or device suitable for storing instructions and / or data.
[0181] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
Claims
1. A method for controlling the facial expressions of an expression robot, characterized in that, The methods and steps include the following: S1, receive facial expression semantic instructions containing clear expression type and intensity level information, and generate corresponding target amplitude and target change rate for at least one facial action unit of the robot; S2, based on the target amplitude and target rate of change, calculate the amplitude normalization ratio and rate normalization ratio of the current action unit, and determine whether the action unit has entered the micro-amplitude range control mode based on the comparison results of the two ratios with the preset threshold. S3, when it is determined that the action unit enters the micro-amplitude range control mode, an intermediate target increment is generated using a hybrid compensation strategy based on the relationship between the target increment of the action unit and a preset dead zone threshold. The dead zone threshold is the inherent dead zone width of the driver corresponding to the action unit, and its value is calibrated by manufacturing tolerances or engineering experience. To avoid insufficient compensation due to dead zone threshold calibration errors, a safety margin is introduced. If the absolute value of the target increment is not greater than the sum of the dead zone threshold and the safety margin, a minimum breakthrough step size with a specific direction is output as the intermediate target increment. If the absolute value of the target increment is greater than the sum of the dead zone threshold and the safety margin but not greater than a switching threshold... The target increment is then processed by a smooth transition function to generate the intermediate target increment. The switching threshold is the upper bound of the micro-motion, and its value is calibrated according to the characteristics of the actuator and the sensitivity of the facial mechanism. S4, perform short-time trajectory planning based on the intermediate target increment to generate an initial control trajectory; S5, the initial control trajectory, the identification information of the action unit, and the target change rate are used as conditions and input into a pre-trained lightweight diffusion denoising model. Through a few steps of sampling inference, a smooth final execution trajectory is generated. S6, the final execution trajectory is converted into a driving signal and sent to the corresponding actuator to drive the facial motion unit to move.
2. The facial expression control method for an expression robot according to claim 1, characterized in that: The determination of whether to enter the micro-amplitude range control mode also includes: setting a lag time for the action of entering or exiting the micro-amplitude range control mode; only when the action unit meets the determination condition for entering the micro-amplitude range and the duration exceeds the first lag time threshold, will it officially switch to the micro-amplitude range control mode; only when the action unit does not meet the determination condition for entering the micro-amplitude range and the duration exceeds the second lag time threshold will it exit the micro-amplitude range control mode.
3. The facial expression control method for an facial expression robot according to claim 1, characterized in that: The training process of the lightweight diffusion denoising model includes: constructing a training dataset consisting of ideal micro-expression trajectories; injecting noise into the ideal micro-expression trajectories based on a noise model containing dead zones, quantization errors, hysteresis, and mechanical jitter to generate noisy training samples; using the noisy training samples as input and the corresponding ideal micro-expression trajectories as training targets, and using a composite loss function that combines trajectory position error, velocity coherence error, acceleration smoothness error, and expression semantic preservation error to supervise the training of the diffusion denoising model.
4. The facial expression control method for an expression robot according to claim 3, characterized in that, Also includes: The core architecture of the lightweight diffusion denoising model is a residual temporal network. This network contains multiple residual blocks connected in series. Each residual block consists of at least a one-dimensional temporal convolutional layer, a normalization layer, and an activation function. The network has a temporal encoding module at the input end, which generates an encoding vector for each time step of the input trajectory. This encoding vector is combined with the noisy initial control trajectory and used as the input of the first residual block, so that the denoising process can perceive and maintain the temporal continuity of the action.
5. The facial expression control method for an expression robot according to claim 1, characterized in that, Also includes: The lightweight diffusion denoising model employs a deterministic or stochastic few-step sampling algorithm during inference. The number of sampling steps is set to a fixed value that is much smaller than the number of diffusion steps during training, in order to meet the real-time requirements of robot facial expression control.
6. The facial expression control method for an expression robot according to claim 1, characterized in that, Also includes: In the step of generating intermediate target increments using a hybrid compensation strategy, the smooth transition function is an S-shaped curve function, which is used to map the target increment into a continuous and smoothly changing output value within the interval defined by the minimum breakthrough step size and the switching threshold, thereby realizing a natural transition from discrete dead zone breakthrough to continuous control.
7. An expression control system for an expression robot, characterized in that, include: The instruction processing module is used to receive facial expression semantic instructions containing clear expression type and intensity level information, and to generate corresponding target amplitude and target change rate for at least one facial motion unit of the robot. The judgment module is used to calculate the amplitude normalization ratio and the rate normalization ratio of the current action unit based on the target amplitude and the target rate of change, and to determine whether the action unit has entered the micro-amplitude range control mode based on the comparison result of the two ratios with a preset threshold. The intermediate target increment generation module is used to generate an intermediate target increment by adopting a hybrid compensation strategy when it is determined that the action unit enters the micro-amplitude range control mode, based on the relationship between the target increment of the action unit and the preset dead zone threshold. The dead zone threshold is the inherent dead zone width of the driver corresponding to the action unit, and its value is calibrated by manufacturing tolerance or engineering experience. To avoid insufficient compensation due to dead zone threshold calibration error, a safety margin is introduced. If the absolute value of the target increment is not greater than the sum of the dead zone threshold and the safety margin, a minimum breakthrough step size with a specific direction is output as the intermediate target increment. If the absolute value of the target increment is greater than the sum of the dead zone threshold and the safety margin but not greater than a switching threshold, the target increment is processed by a smooth transition function to generate the intermediate target increment. The switching threshold is the upper limit of the micro-amplitude action, and its value is calibrated according to the characteristics of the driver and the sensitivity of the facial mechanism. The initial control trajectory generation module is used to perform short-time trajectory planning based on the intermediate target increment and generate the initial control trajectory. The final execution trajectory generation module is used to input the initial control trajectory, the identification information of the action unit, and the target change rate as conditions into a pre-trained lightweight diffusion denoising model, and generate a smooth final execution trajectory through a few steps of sampling inference. The execution module is used to convert the final execution trajectory into a drive signal and send it to the corresponding actuator to drive the facial motion unit to move.
8. An electronic device, characterized in that, include: At least one processor; and a memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
9. A computer-readable storage medium having computer program instructions stored thereon, characterized in that, The computer program instructions can be executed by a processor to implement the method as described in any one of claims 1-6.
10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-6.