A method and system for generating empathic responses based on the evolution of emotional states

By continuously modeling and predicting the trends of users' emotional trajectories, and dynamically adjusting empathy strategies, this technology addresses the problem of neglecting the emotional evolution process in existing technologies. It enables proactive preventative intervention, improves the timeliness and accuracy of empathic responses, and enhances the quality of human-computer emotional interaction.

CN122309680APending Publication Date: 2026-06-30YUANYU HUANYU ARTIFICIAL INTELLIGENCE TECHNOLOGY (SUZHOU) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
YUANYU HUANYU ARTIFICIAL INTELLIGENCE TECHNOLOGY (SUZHOU) CO LTD
Filing Date
2026-04-28
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing methods for generating empathic responses ignore the evolution of users' emotions and lack proactive intervention mechanisms based on evolutionary trends. This results in poor adaptability between empathic strategies and emotional dynamics, making it impossible to effectively predict emotional escalation and intervene in a timely manner.

Method used

By continuously modeling the emotional evolution trajectory in users' multi-turn dialogues, using a sliding window and exponentially weighted moving average smoothing mechanism, and combining features such as the rate of change of emotional intensity, cumulative duration, and acceleration, trend prediction is performed and graded early warnings are triggered to generate proactive intervention information and dynamically adjust empathy strategies to generate responses.

Benefits of technology

This represents a leap from passive, reactive responses to proactive, preventative interventions, improving the timeliness and accuracy of empathetic responses and significantly enhancing the quality of human-computer emotional interaction and user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309680A_ABST
    Figure CN122309680A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for generating empathic responses based on the evolution of emotional states, belonging to the fields of artificial intelligence and affective computing. The method includes: acquiring multimodal emotional state vectors from multi-turn dialogues and storing them in a short-term emotional trajectory memory; smoothing the emotional intensity sequence using an exponentially weighted moving average to construct an emotional evolution trajectory and calculate evolutionary feature parameters; predicting the emotional intensity of the next round using an evolutionary trend prediction model and calculating the intervention urgency; comparing the intervention urgency with a dual warning threshold to trigger an active intervention mode; selecting an appropriate strategy from an empathic strategy library through a strategy selection function, encoding it as a structured prompt, and inputting it along with the emotional evolution trajectory summary and active intervention information into a large language model to generate an active intervention-style empathic response. This invention achieves a shift from passive reaction to active prevention, solving the problems of lack of perception of the emotional accumulation process and inability to actively intervene in existing empathic dialogues, significantly improving the quality of human-computer emotional interaction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the fields of artificial intelligence, affective computing and natural language processing, and specifically relates to a method and system for generating empathic responses based on the evolution of emotional states. Background Technology

[0002] Empathetic Response Generation (ERG) is a fundamental task in the field of emotional intelligence research and an important pathway to achieving emotionally intelligent human-computer interaction. Empathic dialogue systems take user utterances as input, identify and detect user emotions, and then generate responses containing empathic sentiments. Existing technologies have made some progress in empathetic response generation: MoEL designs independent decoders for each individual emotion, softly combining the outputs to generate empathetic responses; EmpDG explores the interaction and combination methods between emotions from a multi-scale perspective; MKERG introduces the commonsense knowledge graph ConceptNet to deeply understand the user emotions implicit in the context. However, existing technologies still have the following fundamental shortcomings:

[0003] First, there is a lack of modeling and utilization of the user's emotional evolution process. Most existing solutions rely solely on the user's current emotional state (such as identifying the "anger" label) to generate responses, ignoring the dynamic evolution of emotions over multiple rounds of dialogue. A user's emotions typically accumulate gradually from calm or mild dissatisfaction to a strong negative state; this gradual process contains rich predictive information. Ignoring emotional evolution trends leads to a lack of system ability to predict emotional escalation, missing the optimal opportunity for early intervention.

[0004] Second, the selection of empathic response strategies lacks guidance from the trend of emotional evolution. Existing solutions often employ static rules or simple mappings based on single-point emotions, failing to dynamically adjust according to the rate of increase, amplitude, and direction of fluctuation in user emotions. Different evolutionary trends (such as slow accumulation vs. instantaneous outburst) require drastically different empathic strategies; treating them uniformly inevitably leads to mechanical and misaligned responses.

[0005] Third, there is a lack of threshold-based early warning and proactive intervention mechanisms based on evolving trends. Existing systems only begin "passive-reactive" reassurance when users' emotions have already reached a high level, significantly reducing the effectiveness of the response. Proactive intervention before critical points in the emotional evolution can more effectively resolve users' negative emotions, and this technological approach has not yet been fully developed. Summary of the Invention

[0006] This invention addresses the technical problems of existing empathic response generation methods, such as neglecting the user's emotional evolution process, lacking proactive intervention mechanisms based on evolution trends, and having poor adaptability between empathic strategies and emotional dynamics. It provides an empathic response generation method based on the evolution of emotional states. By continuously modeling, predicting trends, and providing threshold warnings for the emotional evolution trajectory in multi-turn dialogues, it achieves a technological leap from passive reactive responses to proactive preventative interventions, thereby significantly improving the timeliness, accuracy, and user experience of empathic responses in human-computer emotional interaction. Furthermore, it can capture, model, and utilize the user's emotional evolution trajectory, enabling proactive intervention based on evolution trend prediction and threshold warnings.

[0007] According to one aspect of the present invention, a method for generating empathic responses based on the evolution of emotional states is provided, comprising:

[0008] The interaction data of each round of dialogue between the user and the intelligent companion robot is acquired, and the emotional state vector of each round of dialogue is obtained based on the interaction data; the expression of the emotional state vector is as follows: ,in, For the emotion category tag, ={Anger, Sadness, Happiness, Fear, Surprise, Neutral, Depression, Anxiety, Repressed Calm, Hidden Depression}; The emotional intensity value. Unix timestamp;

[0009] The emotional state vector of each round of dialogue is added to the short-term emotional trajectory memory bank. The emotional evolution trajectory is obtained based on the short-term emotional trajectory memory bank, and the evolution feature parameters are calculated based on the emotional evolution trajectory.

[0010] The emotional intensity sequence in the emotional evolution trajectory is input into the pre-trained evolution trend prediction model, which outputs the next round of emotional intensity prediction value. The intervention urgency is calculated by combining the average rate of change, emotional accumulation duration and emotional acceleration in the evolution feature parameters, and an active intervention information vector is generated based on the intervention urgency.

[0011] The emotional state vector, evolution feature parameters, intervention urgency, and intervention level in the active intervention information vector of the current round of dialogue are used as input data to input the strategy selection function to obtain the strategy weight. The strategy with the highest weight or several empathy strategies ranked from high to low are selected from the preset empathy strategy library.

[0012] The selected empathy strategies and strategy weights are jointly encoded into structured prompts, which are then input into the large language model along with the dialogue history context, the emotional evolution trajectory summary generated based on the emotional evolution trajectory, and the proactive intervention information vector to generate empathic responses.

[0013] Furthermore, based on the short-term emotional trajectory memory bank, the emotional evolution trajectory is obtained, including:

[0014] The emotional intensity sequence and its corresponding timestamp sequence are obtained based on the short-term emotional trajectory memory bank;

[0015] The emotional intensity sequence is smoothed by an exponentially weighted moving average to obtain the smoothed emotional intensity sequence.

[0016] The trajectory of emotional evolution is constructed based on timestamp sequences and smoothed emotional intensity series.

[0017] Furthermore, the formula for calculating the urgency of the intervention is as follows:

[0018]

[0019] in, 'a' represents the urgency of intervention, and 'a' represents the emotional acceleration. For indicator functions, Let be the weighting coefficient, satisfying ; This is a predicted value for sentiment intensity. The average rate of change, Accumulate time for emotional engagement.

[0020] Furthermore, an active intervention information vector is generated based on the urgency of the intervention, including:

[0021] If the urgency of intervention is greater than or equal to the first warning threshold and less than the second warning threshold, then the first-level active intervention mode is triggered.

[0022] If the urgency of intervention is greater than or equal to the second warning threshold, the second-level active intervention mode is triggered;

[0023] Otherwise, maintain passive response mode;

[0024] When the first-level active intervention mode or the second-level active intervention mode is triggered, an active intervention information vector is generated.

[0025] Furthermore, the expression for the active intervention information vector is:

[0026]

[0027] in, Intervention level, To describe the evolution trend, Recommended soothing actions.

[0028] Furthermore, the policy selection function employs a learnable policy selection matrix.

[0029] Furthermore, after generating an empathetic response, it also includes:

[0030] Obtain the actual emotional state vector for the next round, calculate the actual emotional intensity, and compare it with the predicted emotional intensity value.

[0031] If the difference between the comparisons exceeds a preset threshold, the online update mechanism of the policy selection function is triggered to adjust the learnable policy selection matrix.

[0032] Furthermore, the interactive data includes at least text modal data, and at least one of voice modal data or visual modal data.

[0033] Furthermore, the short-term emotional trajectory memory bank adopts a sliding window mechanism, retaining only the emotional state of the most recent L rounds of dialogue.

[0034] According to one aspect of the present invention, an empathic response generation system based on the evolution of emotional states is provided, comprising:

[0035] The first processing module is used to acquire interaction data of each round of dialogue between the user and the intelligent companion robot, and to obtain an emotional state vector for each round of dialogue based on the interaction data; the expression of the emotional state vector is as follows: ,in, For the emotion category tag, ={Anger, Sadness, Happiness, Fear, Surprise, Neutral, Depression, Anxiety, Repressed Calm, Hidden Depression}; The emotional intensity value. Unix timestamp;

[0036] The second processing module is used to add the emotional state vector of each round of dialogue to the short-term emotional trajectory memory bank, obtain the emotional evolution trajectory based on the short-term emotional trajectory memory bank, and calculate the evolution feature parameters based on the emotional evolution trajectory.

[0037] The third processing module is used to input the emotional intensity sequence in the emotional evolution trajectory into the pre-trained evolution trend prediction model, output the next round of emotional intensity prediction value, and calculate the intervention urgency by combining the average rate of change, emotional accumulation duration and emotional acceleration in the evolution feature parameters, and generate an active intervention information vector based on the intervention urgency.

[0038] The fourth processing module is used to input the emotional state vector, evolution feature parameters, intervention urgency, and intervention level in the active intervention information vector of the current round of dialogue into the strategy selection function as input data to obtain the strategy weight. It then selects the strategy with the highest weight or several empathy strategies sorted from the preset empathy strategy library from high to low.

[0039] The fifth processing module encodes the selected empathy strategy and strategy weights into a structured prompt, which, along with the dialogue history context, the emotional evolution trajectory summary generated based on the emotional evolution trajectory, and the active intervention information vector, is input into the large language model to generate an empathic response.

[0040] Compared with the prior art, the beneficial effects of the present invention are:

[0041] 1. This invention modifies emotional intensity by introducing a contradiction penalty factor, enabling the system to recognize complex emotional states. Compared to traditional single-modal or late-fusion methods, the F1 score is improved by 18% to 22% on the contradiction emotion recognition task.

[0042] 2. This invention effectively suppresses single-round recognition noise through a sliding window and exponentially weighted moving average smoothing mechanism, making the emotion evolution trajectory more stable and reliable. Experiments show that the correlation coefficient between the smoothed sequence and the manually labeled emotion change trend reaches more than 0.92.

[0043] 3. This invention proposes six types of features, including the rate of change of emotional intensity, average rate, direction indicator, fluctuation amplitude, cumulative duration, and acceleration, to comprehensively depict the dynamic characteristics of emotional evolution.

[0044] 4. Based on LSTM, this invention predicts evolution trends and calculates the urgency of intervention, enabling graded early warning and allowing the system to take proactive interventions of varying degrees of urgency.

[0045] 5. The strategy selection function proposed in this invention integrates multi-dimensional information such as emotional state, evolutionary characteristics, and intervention urgency, and outputs a strategy weight vector, supporting weighted combination of multiple strategies.

[0046] 6. The present invention inputs structured strategy prompts, emotion evolution summaries, and intervention information into a large language model, guiding the LLM to generate responses that highly match the current emotion evolution state. This closed-loop feedback mechanism enables the system to learn from each interaction, continuously optimize strategy selection, and improve user satisfaction.

[0047] 7. This invention achieves a leap from passive reactive responses to proactive preventative interventions through a complete technical chain of emotional state quantification, evolution trajectory modeling, trend prediction, hierarchical early warning, strategy fusion, and closed-loop optimization. It can be widely applied to scenarios such as intelligent customer service, mental health care, in-vehicle emotional assistants, and companion robots, significantly improving the quality of human-computer emotional interaction and user experience. Attached Figure Description

[0048] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0049] Figure 1 This is a flowchart of the empathic response generation method based on the evolution of emotional state, according to an embodiment of the present invention.

[0050] Figure 2 This is a flowchart illustrating the emotional state sequence storage process according to an embodiment of the present invention.

[0051] Figure 3 This is a flowchart illustrating the strategy selection and empathic response generation process in an embodiment of the present invention. Detailed Implementation

[0052] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0053] Example 1

[0054] like Figure 1 As shown, this embodiment of the invention proposes a method for generating empathic responses based on the evolution of emotional states, including: acquiring interaction data of each round of dialogue between the user and the intelligent companion robot, and obtaining an emotional state vector for each round of dialogue based on the interaction data; the expression of the emotional state vector is as follows: ,in, For the emotion category tag, ={Anger, Sadness, Happiness, Fear, Surprise, Neutral, Depression, Anxiety, Repressed Calm, Hidden Depression}; The emotional intensity value. The system uses Unix timestamps; it adds the emotional state vector of each round of dialogue to a short-term emotional trajectory memory, obtains the emotional evolution trajectory based on the short-term emotional trajectory memory, and calculates evolution feature parameters based on the emotional evolution trajectory; it inputs the emotional intensity sequence in the emotional evolution trajectory into a pre-trained evolution trend prediction model, outputs the predicted value of the emotional intensity of the next round, and calculates the intervention urgency by combining the average rate of change, emotional accumulation duration, and emotional acceleration in the evolution feature parameters, and generates an active intervention information vector based on the intervention urgency; it inputs the emotional state vector of the current round of dialogue, evolution feature parameters, intervention urgency, and intervention level in the active intervention information vector as input data into a strategy selection function to obtain strategy weights, selects the strategy with the highest weight or several empathy strategies with the highest weight values ​​from a preset empathy strategy library; it encodes the selected empathy strategy and strategy weights together into a structured prompt, and inputs it together with the dialogue history context, the emotional evolution trajectory summary generated based on the emotional evolution trajectory, and the active intervention information vector into a large language model to generate an empathic response.

[0055] Specifically, embodiments of the present invention provide content related to multimodal emotion state recognition:

[0056] The interaction data of the Tth round of dialogue between the user and the intelligent companion robot is obtained. The interaction data includes at least text modal data and at least one of voice modal data or visual modal data.

[0057] A multimodal emotion recognition model is employed to extract emotion features from data across each modality. This model utilizes a Transformer-based cross-modal attention fusion network: pre-trained text encoders (e.g., BERT), speech encoders (e.g., wav2vec 2.0), and visual encoders (e.g., ResNet-50) are used to extract initial feature vectors for each modality. Subsequently, through stacked cross-attention layers, text features are used as queries, and speech and visual features are used as keys and values, respectively, to calculate cross-modal attention weights, achieving deep interaction and alignment between modalities. Finally, a gated fusion unit dynamically fuses the features from each modality into a multimodal emotion joint representation vector (i.e., fused features). The fused features are then input into an emotion classifier, outputting the emotion state vector for that round of dialogue. :

[0058] (1)

[0059] in, For the emotion category tag, ={Anger, Sadness, Happiness, Fear, Surprise, Neutral, Depression, Anxiety, Repressed Calm, Hidden Depression}; This represents the intensity of emotion, with 0 indicating no emotion at all and 1 indicating extremely strong emotion. This is a Unix timestamp, accurate to the millisecond level.

[0060] When fusing multimodal data, if there are inconsistencies in text sentiment features, speech sentiment features, or visual sentiment features (e.g., the text says "I'm fine" but the speech has a low rhythm or the face shows a depressed expression), it is marked as an "ambivalent sentiment state," and the sentiment intensity value is adjusted according to the following formula:

[0061] (2)

[0062] in, This is the contradiction penalty factor, with a value range of 0.1 to 0.3.

[0063] Specifically, such as Figure 2 As shown, this embodiment of the invention provides content on emotional state sequence storage and sliding window management:

[0064] The emotional state vector of each round is added to the short-term emotional trajectory memory bank in chronological order. This memory bank uses a sliding window mechanism, retaining only the emotional state of the most recent L rounds of conversation:

[0065] (3)

[0066] in, This represents the set of emotional state vectors stored in the short-term emotional trajectory memory bank after the Tth round of dialogue; T is the current dialogue round number, i.e., the number of the most recent dialogue rounds retained in the memory bank; L is the window size, preferably ranging from 3 to 10. , , These represent the sentiment state vectors from round T-L+1 to round T. When the window is full, the earliest sentiment state vector is removed.

[0067] Specifically, embodiments of the present invention provide content for constructing an emotional evolution trajectory:

[0068] from Reading the emotional intensity sequence and its corresponding timestamp sequence The timestamp sequence provides the necessary time difference information for calculating rate, acceleration, and cumulative duration. The emotional evolution trajectory is composed of (timestamp, intensity value) point pairs. To reduce noise in a single round of recognition, the emotional intensity sequence is smoothed using an exponentially weighted moving average (EWMA) to obtain the smoothed emotional intensity sequence:

[0069] (4)

[0070] in, The initial value is the emotional intensity after smoothing in the k-th round. ; This represents the smoothing coefficient. Combined with the timestamp sequence, the smoothed sentiment intensity sequence constitutes the trajectory of sentiment evolution. ,in It is the timestamp of the kth round. It is the smoothed emotional intensity of the kth round.

[0071] Specifically, embodiments of the present invention provide the content for calculating the characteristic parameters of emotion evolution:

[0072] Based on the trajectory of emotional evolution Calculate the following evolutionary characteristic parameters:

[0073] 1. Rate of change of emotional intensity (No. Wheel relative to the first -1 instantaneous rate of change):

[0074] (5)

[0075] 2. Average rate of change :

[0076] (6)

[0077] 3. Emotional evolution direction indicator :

[0078] (7)

[0079] in, (Intensity / hour) (Intensity / hour).

[0080] 4. Amplitude of emotional fluctuations A:

[0081] (8)

[0082] 5. Duration of emotional accumulation : The time span from the first detection of negative emotion (intensity ≥ 0.3) to the current round.

[0083] 6. Emotional acceleration (Rate of change of rate of change): It can provide 2-3 rounds of early warning of "accelerated deterioration" of emotions, thus buying valuable time for proactive intervention.

[0084] (9)

[0085] The sign of acceleration is used to determine whether the escalation of emotions is "accelerating" or "decelerating," and it is indicative of the urgency of intervention.

[0086] Specifically, embodiments of the present invention provide content for predicting emotional intensity:

[0087] Smoothed emotional intensity sequence Input into the pre-trained evolution trend prediction model The model employs a Long Short-Term Memory (LSTM) network architecture, and its forward propagation process can be represented as:

[0088] (10)

[0089] (11)

[0090] in, For the first The hidden state of the step, These are the learnable parameters of the LSTM network. and These define the weights and biases for the output layer. The model outputs the next round of sentiment intensity predictions. At the same time, predict the direction of evolution. :

[0091] (12)

[0092] That is, if the predicted intensity increases, it is a negative reinforcement direction (+1), and if it decreases, it is a positive mitigation direction (-1).

[0093] Specifically, embodiments of the present invention provide content for triggering dual-threshold early warning and active intervention modes:

[0094] Two preset emotional intensity warning thresholds: First warning threshold The value ranges from 0.65 to 0.75; the second warning threshold. The value ranges from 0.85 to 0.95.

[0095] Based on sentiment intensity prediction value Average rate of change Emotional accumulation time And emotional acceleration indicators to define the urgency of intervention. :

[0096] (13)

[0097] Where 'a' represents emotional acceleration. This is an indicator function; it takes the value 1 when a > 0, and 0 otherwise. Let be the weighting coefficient, satisfying Preferred .

[0098] Comparing the urgency of intervention With threshold: If and This triggers the first-level active intervention mode; if If either the primary or secondary active intervention mode is triggered, the system will activate the secondary active intervention mode; otherwise, it will remain in passive response mode. Upon triggering either the primary or secondary active intervention mode, the system will generate an active intervention information vector. :

[0099] (14)

[0100] in, Intervention level, For describing the evolution trend (e.g., "accelerated rise", "fluctuating increase"), Recommended reassurance actions (such as "prioritize treatment", "issue compensation", "guide deep breathing").

[0101] Specifically, embodiments of the present invention provide content on strategy selection based on evolutionary characteristic parameters and intervention levels:

[0102] Building an empathy strategy library ,in The strategies include: emotional affirmation Emotional normalization Self-disclosure Open-ended questions Problem Solving Shift attention Advance compensation Humor to defuse .

[0103] Design strategy selection function Based on the emotional state vector of the current round of dialogue The set of key parameters selected from evolutionary characteristic parameters urgency of intervention The input is the intervention level, and the output is a policy weight vector. :

[0104] (15)

[0105] in, This is a concatenated vector of input features. This is the learnable policy selection matrix. Finally, the K policies with the highest policy weights (K is usually 1~3) are selected and combined, that is, the policy with the highest policy weight or several empathy policies with policy weight values ​​sorted from high to low.

[0106] Specific decision-making rule examples (which can be hard-coded or learned strategies): If and Forced inclusion and (In a customer service scenario); if the contradictory emotional state is marked as true, add... and The weight; if ,Increase The weights, and based on Decide whether to stack or .

[0107] Specifically, such as Figure 3 As shown, embodiments of the present invention provide content for strategy selection and empathic response generation:

[0108] Combine the selected strategies and their weights Encoding as structured hints The prompt template is as follows:

[0109] [Strategy]: {Strategy Name}

[0110] [Intent]: {Strategic Intent Description}

[0111] [Example Sentence Structure]: {1-2 example response fragments matching this strategy}

[0112] [Strength]: {Weight}

[0113] Combine the above prompt with the following information:

[0114] 1. Dialogue history context, the expression is as follows:

[0115] (16)

[0116] 2. Summary of Emotional Evolution Trajectory For example: "Your emotions rose from mild anxiety (0.35) to moderate dissatisfaction (0.55) and then to anger (0.82) in the past three rounds of conversation, showing an accelerating upward trend." Based on The generated text summary is used as part of a prompt in a large language model.

[0117] 3. Proactive intervention in information (If triggered). The final input sequence is:

[0118] (17)

[0119] The input sequence is fed into the large language model. (Parameter count ≥ 7B, fine-tuned via instructions), generate a natural language response, the expression is as follows:

[0120] (18)

[0121] The generated responses semantically meet the requirements of the selected strategy, emotionally match the user's current and predicted emotional state, and in the proactive intervention mode contain explicit reassurance, compensation, or guidance content.

[0122] Specifically, embodiments of the present invention provide content on post-response emotion tracking and strategy effectiveness evaluation (closed-loop feedback):

[0123] After presenting the generated response to the user, obtain the user's emotional state vector for the next round. Calculate the actual change in emotional intensity and the predicted Compare. If If the preset deviation threshold is exceeded, the strategy selector will be triggered. The online update mechanism uses reinforcement learning signal adjustment. This enables adaptive optimization of strategy selection.

[0124] Example 2

[0125] This invention provides a mental health companion robot for interacting with users whose emotions fluctuate frequently due to multiple life pressures.

[0126] Step S1: Acquisition and storage of emotional states in multi-turn dialogues

[0127] User profile background: This user is a 28-year-old working professional living alone. They are sensitive to rainy weather and tend to feel down after consecutive rainy days. The robot has created a long-term emotional profile for them, recording "rainy days" and "low light" as triggers for their emotions. Currently, the local area has experienced three consecutive days of rain, and the weather forecast remains cloudy turning overcast.

[0128] First round of interaction (T=1)

[0129] User input (text + facial image): "This rain just won't stop, I feel so lethargic."

[0130] Speech characteristics: The mean fundamental frequency decreased by 8%, the energy decreased by 12%, and the speech rate slowed down by 10%.

[0131] Visual features: Facial action unit AU4 (frowning) activation intensity 0.3, AU15 (drooping corner of mouth) activation intensity 0.2, eyelid ptosis (AU7 intensity 0.3).

[0132] Multimodal fusion model output: Sentiment category = "depressed", intensity value Contradictions are marked as false. The system generates an emotion state vector. Add to the short-term emotional trajectory memory bank (sliding window L=5, the current window only contains this entry).

[0133] Second round of interaction (T=2)

[0134] User input (text): "Sigh, there's no way to make the weather clear up anyway, I'll just have to bear with it."

[0135] Text sentiment analysis: The words "sigh" and "endure it" imply helplessness and repression, showing acceptance on the surface but unreleased inner emotions.

[0136] Acoustic characteristics: The user speaks with a heavy breath and a slight sigh at the end of the sentence (the energy briefly peaks and then decays), which is a non-verbal emotional expression.

[0137] Multimodal fusion model comprehensive judgment: Emotion category = "Repressed calm" (a contradictory state of outward acceptance but inward repression), initial intensity Simultaneously, mark "semantic-acoustic contradiction" as true, and the contradiction penalty factor... Correction strength .

[0138] System storage .

[0139] Third round of interaction (T=3)

[0140] User input (text + facial image): "It's nothing really, it's just a cloudy day, I'm fine."

[0141] Textual semantics: Expresses indifference, self-comfort, and an attempt to downplay emotions.

[0142] Visual characteristics: AU12 (upturned corners of the mouth) activation intensity of 0.4 (slight smile) was detected, but AU6 (orbicularis oculi muscle) was not activated, and AU4 (frowning) remained at 0.2, while the eyelids drooped (AU7 intensity 0.25), forming a typical "forced smile" pattern.

[0143] Multimodal fusion model output: Sentiment category = "Hidden Depression", initial intensity The visual-semantic contradiction (smiling vs. frowning + drooping eyelids) triggers a contradiction penalty. Correction strength .

[0144] .

[0145] At this time, the short-term emotional trajectory memory bank stores the following sequence: .

[0146] The robot receives input from three sources: text, voice, and vision. After cross-modal attention fusion, it outputs an emotional state vector e_t.

[0147] Step S2: Modeling and Feature Extraction of Emotional Evolution Trajectory

[0148] Trajectory smoothing: Take a sliding window of L=3 and a smoothing coefficient of α=0.3.

[0149]

[0150]

[0151]

[0152] Smooth trajectory points: Adjacent time intervals: Hour, Hours (based on actual conversation intervals, corresponding to one interaction per day during consecutive rainy days).

[0153] Evolutionary feature parameter extraction, including: rate of change: Intensity / hour Intensity / hour

[0154] Average rate of change Intensity per hour (positive value)

[0155] acceleration Intensity per hour² (positive value indicates accelerating increase)

[0156] Fluctuation range

[0157] The duration of emotional accumulation τ_accum = the time span from the first round (intensity ≥ 0.3) to the third round = 44 hours

[0158] Direction indicator: due to And if the acceleration is greater than 0, it is defined as the negative reinforcement direction d=+1.

[0159] Contradictory feature marker: Modal contradictions exist in both the second and third rounds, and the contradiction accumulation flag is 2.

[0160] Step S3: Evolution Trend Prediction and Threshold Judgment

[0161] LSTM prediction: Input the smoothed intensity of the last 3 rounds [0.42, 0.447, 0.493], the model outputs the predicted sentiment intensity for the next round (T=4). .

[0162] Intervention urgency calculation ( ):

[0163] First item:

[0164] Second item:

[0165] Third item:

[0166]

[0167] Threshold comparison: The system presets a first warning threshold. Second warning threshold .

[0168] current The conditions for active intervention were not met. However, the system detected the predicted intensity. It has exceeded (0.65), and the contradictory characteristics are significant. Therefore, the strategy selector can still use "predicted intensity exceeding the threshold" as an auxiliary condition to initiate the pre-intervention mode (lightweight proactive care). To fully demonstrate proactive intervention, it is assumed that the intensity continues to rise after the user's actual input in the next round (T=4), and U exceeds... .

[0169] Suppose in the fourth round (T=4), the user inputs (it's been raining for the fourth consecutive day, and the user is looking out the window): "When will this rain end? I feel like I'm going to get moldy, and I don't want to do anything."

[0170] Identify emotion category = "depression", intensity Smooth after storage .

[0171] Recalculate U: Prediction , Updated to 0.0025. Hour, Triggering Level 1 Active Intervention Mode.

[0172] Step S4: Selecting Empathy Strategies Based on Evolutionary Trends

[0173] Current mood category = "Depressed", intensity = 0.78

[0174] Evolutionary characteristics: average rate 0.0025 (slow but continuously increasing), positive acceleration, fluctuation range 0.159, accumulation of contradictory markers (repression and calm → forced smiles → depression), cumulative duration 68 hours.

[0175] The intervention urgency level U=0.691, triggering a Level 1 active intervention.

[0176] Simulation experiments show that after triggering Level 1 intervention, the probability of a user's emotion escalating to Level 2 decreases by 40%; after triggering Level 2 intervention, the probability of escalation decreases by 65%.

[0177] The policy selector input feature vector x_concat=[0.78,0.691,0.0025,1,0.159,68,contradiction flag=2,...]. The policy weight vector is calculated using the trained policy weight matrix W_policy (12×8), and the results are shown in Table 1.

[0178] Table 1. Example of policy weight output

[0179] The selected strategy combination is: emotional affirmation + emotional normalization + attention diversion. Compared to the previous two examples, this scenario increases the weight of "attention diversion" because when weather affects emotions, guiding users to focus on indoor activities or creating a comfortable environment is more effective than simply expressing emotions.

[0180] Step S5: Generating an active, intervention-based empathic response

[0181] Structured hint encoding:

[0182] [Strategy 1]: Emotional confirmation intention: Acknowledge and accept the user's low mood caused by the weather. Example sentence: "After so many days without seeing the sun, it is easy to feel down."

[0183] [Strategy 2]: Emotional normalization intention: Let users know that it is a common phenomenon for weather to affect emotions. Example sentence: "Many people have similar feelings on rainy days, this is not your problem."

[0184] [Strategy 3]: Shifting Attention Intent: To guide users to shift their attention from the unchangeable weather to controllable indoor activities. Example Sentence: "Would you like to make a cup of hot tea and listen to some relaxing music?" [Dialogue History]: In the first three rounds, the user expressed a lack of enthusiasm, resignation, and forced smiles. In the fourth round, the user expressed feeling like they were about to rot.

[0185] [Summary of Emotional Evolution]: Your mood has been slowly rising over the past four days as the rain has continued, from a slight depression to a noticeable sense of unease, and there is a contradiction between saying "I'm fine" on the surface and feeling uncomfortable inside.

[0186] [Proactive Intervention Information]: Level 1 proactive intervention. It is predicted that the mood may continue to rise. It is recommended to alleviate it through environmental adjustments and activity guidance.

[0187] Large language model generates responses:

[0188] "It's been raining for four days straight, and I haven't seen the sun. Even I feel the air is damp, so it's no wonder you feel listless and like you're about to rot (emotional confirmation). Actually, it's natural for the weather to affect people's moods. Many people feel similarly during long periods of rain, and it's not your fault or that you're just being fragile (normalization). While we can't make the sky clear immediately, we can make this little corner a bit more comfortable—would you like me to make you a cup of hot tea? Or maybe we can put on some rain-themed white noise, turn on a warm-colored lamp, and listen to some relaxing music together? Sometimes changing your environment can improve your mood (distraction + open-ended choice)."

[0189] Robot behavior output:

[0190] The speech synthesis uses a gentle, soothing tone (speech rate 0.9 times, pitch maintained neutral).

[0191] Facial expression: The eyebrows are relaxed (AU4 intensity is reduced to 0.1), the corners of the mouth are naturally closed, and the eyelids are normally open (simulating calmness).

[0192] The robot simultaneously performs preset actions: its head slightly turns towards the indoor light, and its arm makes a gesture of "inviting someone to make tea".

[0193] The strategy selection function Φ outputs a strategy weight vector based on the current emotional state, evolutionary feature parameters, and intervention level. It selects an appropriate strategy from the empathy strategy library, encodes it with structured prompts, and then sends it to the large language model to generate a response.

[0194] In this embodiment of the invention, the robot models the emotional evolution of a user's emotions through three rounds of dialogue, identifying the fluctuating trajectory of the user's emotions caused by continuous rainy weather, from "depressed" to "suppressed calm" and then to "hidden depression," and uses conflict detection (sighing, forced smile) to provide early warning. When the user's emotions erupt to "depression" in the fourth round, the system triggers a first-level proactive intervention. The generated response avoids common ineffective comforting phrases like "go out for a walk" or "it'll be better when the weather clears up," instead employing a combination of empathetic strategies: emotional confirmation, normalization, and attention diversion. Actual user testing feedback shows that this response reduces the user's immediate emotional intensity to 0.60 in the next round.

[0195] Based on the above embodiments, the present invention provides an empathic response generation system based on the evolution of emotional state, which is used to execute the empathic response generation method based on the evolution of emotional state in the above method embodiments.

[0196] The system includes: a first processing module, used to acquire interaction data of each round of dialogue between the user and the intelligent companion robot, and to obtain an emotional state vector for each round of dialogue based on the interaction data; the expression for the emotional state vector is as follows: ,in, For the emotion category tag, ={Anger, Sadness, Happiness, Fear, Surprise, Neutral, Depression, Anxiety, Repressed Calm, Hidden Depression}; The emotional intensity value. The first processing module uses a Unix timestamp; the second processing module adds the emotional state vector of each round of dialogue to a short-term emotional trajectory memory, obtains the emotional evolution trajectory based on the short-term emotional trajectory memory, and calculates evolution feature parameters based on the emotional evolution trajectory; the third processing module inputs the emotional intensity sequence in the emotional evolution trajectory into a pre-trained evolution trend prediction model, outputs the predicted emotional intensity value for the next round, and calculates the intervention urgency by combining the average rate of change, emotional accumulation duration, and emotional acceleration in the evolution feature parameters, and generates an active intervention information vector based on the intervention urgency; the fourth processing module uses the emotional state vector, evolution feature parameters, intervention urgency, and intervention level in the active intervention information vector of the current round of dialogue as input data into a strategy selection function to obtain strategy weights, and selects the strategy with the highest weight or several empathy strategies ranked from high to low from a preset empathy strategy library; the fifth processing module encodes the selected empathy strategy and strategy weight together into a structured prompt, and inputs it together with the dialogue history context, the emotional evolution trajectory summary generated based on the emotional evolution trajectory, and the active intervention information vector into a large language model to generate an empathic response.

[0197] The empathic response generation system based on emotional state evolution provided in this invention addresses the technical problems of existing empathic response generation methods, such as neglecting the user's emotional evolution process, lacking proactive intervention mechanisms based on evolution trends, and poor adaptability of empathic strategies to emotional dynamics. It employs several modules to continuously model, predict trends, and provide threshold warnings for the emotional evolution trajectory in multi-turn dialogues, achieving a technological leap from passive reactive responses to proactive preventative interventions. This significantly improves the timeliness, accuracy, and user experience of empathic responses in human-computer emotional interaction; it can capture, model, and utilize the user's emotional evolution trajectory to achieve proactive intervention based on evolution trend prediction and threshold warnings.

[0198] Finally, it should be noted that the above specific embodiments are merely representative examples of the present invention. Obviously, the present invention is not limited to the above specific embodiments and many variations are possible. Any simple modifications, equivalent changes, and alterations made to the above specific embodiments based on the technical essence of the present invention should be considered within the protection scope of the present invention.

Claims

1. A method for generating empathic responses based on the evolution of emotional states, characterized in that, include: Acquire interaction data for each round of dialogue between the user and the intelligent companion robot, and obtain the emotional state vector for each round of dialogue based on the interaction data; The expression for the emotional state vector is: ,in, For the emotion category tag, ={Anger, Sadness, Happiness, Fear, Surprise, Neutral, Depression, Anxiety, Repressed Calm, Hidden Depression}; The emotional intensity value. Unix timestamp; The emotional state vector of each round of dialogue is added to the short-term emotional trajectory memory bank. The emotional evolution trajectory is obtained based on the short-term emotional trajectory memory bank, and the evolution feature parameters are calculated based on the emotional evolution trajectory. The emotional intensity sequence in the emotional evolution trajectory is input into the pre-trained evolution trend prediction model, which outputs the next round of emotional intensity prediction value. The intervention urgency is calculated by combining the average rate of change, emotional accumulation duration and emotional acceleration in the evolution feature parameters, and an active intervention information vector is generated based on the intervention urgency. The emotional state vector, evolution feature parameters, intervention urgency, and intervention level in the active intervention information vector of the current round of dialogue are used as input data to input the strategy selection function to obtain the strategy weight. The strategy with the highest weight or several empathy strategies ranked from high to low are selected from the preset empathy strategy library. The selected empathy strategies and strategy weights are jointly encoded into structured prompts, which are then input into the large language model along with the dialogue history context, the emotional evolution trajectory summary generated based on the emotional evolution trajectory, and the proactive intervention information vector to generate empathic responses.

2. The empathic response generation method based on the evolution of emotional states according to claim 1, characterized in that, The emotional evolution trajectory is obtained based on the short-term emotional trajectory memory bank, including: The emotional intensity sequence and its corresponding timestamp sequence are obtained based on the short-term emotional trajectory memory bank; The emotional intensity sequence is smoothed by an exponentially weighted moving average to obtain the smoothed emotional intensity sequence. The trajectory of emotional evolution is constructed based on timestamp sequences and smoothed emotional intensity series.

3. The empathic response generation method based on the evolution of emotional states according to claim 1, characterized in that, The formula for calculating the urgency of the intervention is: , in, 'a' represents the urgency of intervention, and 'a' represents the emotional acceleration. For indicator functions, Let be the weighting coefficient, satisfying ; This is a predicted value for sentiment intensity. The average rate of change, Accumulate time for emotional engagement.

4. The empathic response generation method based on the evolution of emotional states according to claim 1, characterized in that, Generate an active intervention information vector based on the urgency of the intervention, including: If the urgency of intervention is greater than or equal to the first warning threshold and less than the second warning threshold, then the first-level active intervention mode is triggered. If the urgency of intervention is greater than or equal to the second warning threshold, the second-level active intervention mode is triggered; Otherwise, maintain passive response mode; When the first-level active intervention mode or the second-level active intervention mode is triggered, an active intervention information vector is generated.

5. The empathic response generation method based on the evolution of emotional state according to claim 1, characterized in that, The expression for the active intervention information vector is: , in, Intervention level, To describe the evolution trend, Recommended soothing actions.

6. The empathic response generation method based on the evolution of emotional state according to claim 1, characterized in that, The policy selection function employs a learnable policy selection matrix.

7. The empathic response generation method based on the evolution of emotional state according to claim 6, characterized in that, After generating an empathetic response, it also includes: Obtain the actual emotional state vector for the next round, calculate the actual emotional intensity, and compare it with the predicted emotional intensity value. If the difference between the comparisons exceeds a preset threshold, the online update mechanism of the policy selection function is triggered to adjust the learnable policy selection matrix.

8. The empathic response generation method based on the evolution of emotional state according to claim 1, characterized in that, The interactive data includes at least text modal data, and at least one of voice modal data or visual modal data.

9. The empathic response generation method based on the evolution of emotional state according to claim 1, characterized in that, The short-term emotional trajectory memory bank adopts a sliding window mechanism, which only retains the emotional state of the most recent L rounds of dialogue.

10. A system for generating empathic responses based on the evolution of emotional states, characterized in that, include: The first processing module is used to acquire the interaction data of each round of dialogue between the user and the intelligent companion robot, and to obtain the emotional state vector of each round of dialogue based on the interaction data. The expression for the emotional state vector is: ,in, For the emotion category tag, ={Anger, Sadness, Happiness, Fear, Surprise, Neutral, Depression, Anxiety, Repressed Calm, Hidden Depression}; The emotional intensity value. Unix timestamp; The second processing module is used to add the emotional state vector of each round of dialogue to the short-term emotional trajectory memory bank, obtain the emotional evolution trajectory based on the short-term emotional trajectory memory bank, and calculate the evolution feature parameters based on the emotional evolution trajectory. The third processing module is used to input the emotional intensity sequence in the emotional evolution trajectory into the pre-trained evolution trend prediction model, output the next round of emotional intensity prediction value, and calculate the intervention urgency by combining the average rate of change, emotional accumulation duration and emotional acceleration in the evolution feature parameters, and generate an active intervention information vector based on the intervention urgency. The fourth processing module is used to input the emotional state vector, evolution feature parameters, intervention urgency, and intervention level in the active intervention information vector of the current round of dialogue into the strategy selection function as input data to obtain the strategy weight. It then selects the strategy with the highest weight or several empathy strategies sorted from the preset empathy strategy library from high to low. The fifth processing module encodes the selected empathy strategy and strategy weights into a structured prompt, which, along with the dialogue history context, the emotional evolution trajectory summary generated based on the emotional evolution trajectory, and the active intervention information vector, is input into the large language model to generate an empathic response.