Emotion action recognition method and system based on electronic skin
By extracting multi-dimensional features from electronic skin pressure distribution data and combining classification models with external state information for verification, the accuracy and response speed issues of emotion and action recognition in existing technologies have been resolved. This has enabled high-precision, high-real-time emotion and action recognition, which is suitable for emotional interaction applications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TUJIAN TECH (BEIJING) CO LTD
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies cannot accurately identify the emotional differences behind touch actions in haptic interaction, and their slow response speed affects user experience and the timeliness of problem solving.
By acquiring pressure distribution data of electronic skin when touched, multi-dimensional features such as spatiotemporal trajectory, force change, frequency and duration are extracted and fused. A preset classification model is used to classify emotional actions, and external state information is combined to verify confidence and output highly reliable emotional action categories.
It achieves high-precision and real-time emotion and action recognition, accurately captures emotional information, meets the instant response requirements of emotional interaction applications, and improves the accuracy and reliability of recognition.
Smart Images

Figure CN122244516A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robot emotional interaction technology, and more specifically, to an emotional action recognition method and system based on electronic skin. Background Technology
[0002] Emotion and action recognition is a core technology for robots to achieve human-like and intelligent interaction. It enables robots to better understand the emotional information conveyed by human actions, thus achieving more natural and efficient human-computer interaction. In today's technological development, accurate emotion and action recognition capabilities are crucial for improving the practicality of robots and the user experience.
[0003] While artificial intelligence has made significant progress in many fields, it still has certain limitations. Traditional technologies, when handling tactile interaction, merely remain at the level of physical recognition of touch actions, such as simple pressing and swiping. This superficial recognition method completely ignores the crucial emotional dimension of tactile interaction. In real life, the same pressing action could be a loving comfort or an outburst of anger, but traditional technologies cannot discern the underlying emotional differences. Furthermore, existing technologies also suffer from significant latency issues in response speed. In scenarios requiring immediate feedback, such as real-time emotional interaction and emergency handling, this latency severely impacts user experience and the timeliness of problem-solving. Summary of the Invention
[0004] In view of this, the present invention provides an emotion and action recognition method based on electronic skin, comprising: Acquire pressure distribution data of electronic skin when it is touched by a target object; The spatiotemporal trajectory features, intensity change features, frequency features, and duration features of the pressure distribution data are extracted and fused to obtain a multimodal emotion feature vector; The multimodal emotion feature vector is classified into emotion and action categories using a preset classification model to obtain initial emotion and action categories; The confidence level of the initial emotional action category is verified based on the multimodal emotional feature vector, the initial emotional action category, and the external state information of the target object to obtain the confidence level of the emotional action category. The external state information of the target object is used to characterize the external conditions that affect the performance of emotional actions. The confidence level of the emotional action category is compared with the confidence threshold of the emotional action category, and the initial emotional action category that is higher than the confidence threshold of the emotional action category is taken as the true emotional action category of the target object.
[0005] Optionally, spatiotemporal trajectory feature extraction is performed on the pressure distribution data, including: The pressure data of the corresponding frame sequence length in the pressure distribution data is selected according to the preset window length; The centroid positions of several frames are obtained by calculating the corresponding centroid positions based on the frame sequence pressure data. The average and maximum moving speeds of the centroid trajectory are calculated based on the centroid positions of the aforementioned frames to obtain the spatiotemporal trajectory feature vector.
[0006] Optionally, force variation feature extraction is performed on the pressure distribution data, including: Calculate the total pressure data based on the frame sequence pressure data, and determine the total force peak based on the total pressure data; Calculate the force standard deviation and force change rate based on the pressure data of the frame sequence; The force change feature vector is determined based on the total force peak value, the force standard deviation, and the force change rate.
[0007] Optionally, frequency feature extraction is performed on the pressure distribution data, including: The frame sequence pressure data is subjected to Fourier transform, and the low-frequency energy and high-frequency energy are calculated to obtain the frequency feature vector.
[0008] Optionally, duration feature extraction is performed on the pressure distribution data, including: The duration of contact with the target object is calculated based on the time of the pressure distribution data to obtain a duration feature vector.
[0009] Optionally, the step of classifying the multimodal emotion feature vector using a preset classification model to obtain an initial emotion action category includes: The pre-defined classification model is used to calculate the emotional intensity of the multimodal emotional feature vector to obtain the emotional intensity value; Using the preset classification model, the emotional intensity value is determined based on the emotional threshold to obtain the initial emotional action category.
[0010] Optionally, the emotion threshold is determined by the target object type.
[0011] Optionally, the emotion and action recognition method based on electronic skin provided by the present invention further includes preprocessing the pressure distribution data: The pressure distribution data is calibrated based on the dynamic pressure baseline value to obtain calibrated pressure distribution data; Filter the pressure distribution data that are higher than the environmental noise threshold from the calibration pressure distribution data to obtain the effective pressure distribution data; The effective pressure distribution data is smoothed to obtain smoothed pressure distribution data.
[0012] Optionally, the external state information of the target object includes the contact medium between the target object and the electronic skin, and the environmental noise threshold is determined by the contact medium between the target object and the electronic skin.
[0013] A second aspect of the present invention provides an emotion and action recognition system based on electronic skin, the system comprising: a pressure sensor array, an environmental sensor, a data acquisition module, and a processor; wherein, the pressure sensor array is used to acquire pressure distribution data of the electronic skin when it is touched by a target object, the environmental sensor is used to acquire external state information of the target object, the data acquisition module is used to acquire pressure distribution data, and the processor executes the above-described emotion and action recognition method based on electronic skin.
[0014] This invention acquires pressure distribution data of electronic skin when touched by a target object, without relying on vision or complex physiological sensors. It then extracts and fuses multi-dimensional features from the pressure distribution data, including spatiotemporal trajectory, force changes, frequency, and duration, to obtain a multimodal emotion feature vector. This constructs a direct mapping from physical features to emotional states, comprehensively and accurately reflecting the emotional information when touched by the target object. Next, a pre-defined classification model is used to classify the multimodal emotion feature vector into emotional actions. To ensure the reliability of the classification results, confidence verification is performed by combining the multimodal emotion feature vector, the initial emotional action category, and the external state information of the target object, outputting the confidence score of the emotional action category. This comprehensive approach effectively assesses the reliability of the classification results. Finally, the confidence score of the emotional action category is compared with a threshold; initial emotional action categories exceeding the threshold are taken as the true emotional action categories, ensuring the reliability of the output emotional action categories. The above process employs a lightweight model similar to CNN-LSTM and XGBoost. Leveraging the advantages of this lightweight model, sub-millisecond-level emotion recognition response and confidence verification can be achieved, meeting high real-time requirements and satisfying the instant response needs of scenarios such as emotional companion robots. Overall, the process achieves high-precision and real-time emotion recognition, accurately capturing emotional information and providing timely and powerful support for emotional interaction applications.
[0015] This invention eliminates inherent force differences between different target objects in pressure distribution data through a corresponding individual baseline calibration step, enabling data from different target objects to be analyzed under a unified standard and enhancing data comparability. The dynamic noise suppression step effectively removes environmental noise interference, improves data quality, and avoids the impact of noise on subsequent analysis. The spatial smoothing step preserves the trajectory continuity characteristics of emotional actions, making the data more consistent with actual emotional action changes, providing a reliable, stable, and high-quality data foundation for subsequent electronic skin-based emotional action recognition. In summary, these preprocessing steps significantly improve the accuracy and reliability of emotional action recognition, providing users with personalized recognition services and more accurately identifying emotional actions. Attached Figure Description
[0016] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0017] Figure 1 This is a schematic diagram of the structure of an emotion and action recognition system based on electronic skin according to an embodiment of the present invention; Figure 2 This is a flowchart illustrating an emotion and action recognition method based on electronic skin according to an embodiment of the present invention. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0019] like Figure 1 As shown, this embodiment of the invention provides an emotion and action recognition system based on electronic skin, including: a pressure sensor array 11, an environmental sensor 12, a data acquisition module 13, and a processor 14; wherein, the pressure sensor array 11 is used to acquire pressure array data of the electronic skin when it is touched by a target object, the environmental sensor 12 is used to acquire external state information of the target object, the data acquisition module 13 is used to acquire pressure distribution data, and the processor 14 is used to execute the following... Figure 2 An emotion and action recognition method based on electronic skin.
[0020] For example, the pressure sensor array 11 employs a 64×64-channel piezoresistive sensor array, connected to the data acquisition module 13 via a high-speed SPI (Serial Peripheral Interface) interface, and is set to a high sampling rate (e.g., 200Hz). This allows for more intensive and precise acquisition of pressure data, ensuring data integrity and accuracy. The processor 14 uses an NVIDIA Jetson Orin Nano (8GB memory) and enables the Tensor Real-Time Inference engine (TensorRT) quantization acceleration to improve model inference speed, significantly reducing data processing time and allowing the system to respond and output results more quickly.
[0021] like Figure 2 As shown, this embodiment of the invention provides an emotion and action recognition method based on electronic skin, specifically including: S1, acquire pressure distribution data of the electronic skin when it is touched by a target object.
[0022] The electronic skin in this embodiment is a robotic component that integrates sensors and other functions, enabling it to collect pressure array data and environmental data from multiple channels when touched by a target object. The pressure distribution data acquired using the pressure sensor array 11 is primarily used for in-depth analysis of the emotional information contained within the target object (which could be an adult or a child) during the contact with the electronic skin, in order to respond promptly and provide appropriate feedback based on this emotional information.
[0023] S2 extracts and fuses the spatiotemporal trajectory features, intensity change features, frequency features, and duration features of the pressure distribution data to obtain a multimodal emotion feature vector.
[0024] This embodiment extracts features from pressure distribution data across multiple dimensions, including spatiotemporal trajectory features reflecting the intent of emotional actions, intensity variation features reflecting the strength of emotions, frequency features reflecting the rhythm of emotions, and duration features reflecting the focus of emotions. After extracting each feature, these different types of features are fused to generate a multimodal emotional feature vector, such as a multimodal feature vector containing at least 12 dimensions. This vector integrates information from multiple aspects, helping to more comprehensively and accurately analyze the emotional information contained in the target object's contact with the electronic skin.
[0025] S3. Use a pre-defined classification model to classify the multimodal emotional feature vectors into emotional actions to obtain the initial emotional action categories.
[0026] The preset classification model in this embodiment can adopt a lightweight sequence modeling classifier structure similar to CNN-LSTM (Convolutional Neural Network; Long Short-Term Memory). The model's function is to classify emotional actions from multimodal emotional feature vectors, mapping the input feature vectors to predefined emotional action categories and outputting initial emotional action categories to represent the emotional state of the target object. These categories can include feelings such as pleasure, calmness, tension, and anger.
[0027] S4. Based on the multimodal emotion feature vector, the initial emotion action category, and the external state information of the target object, the confidence of the initial emotion action category is verified to obtain the confidence of the emotion action category. The external state information of the target object is used to characterize the external conditions that affect the performance of the emotion action.
[0028] In this embodiment, the confidence verification can be performed using a pre-trained XGBoost model (eXtreme GradientBoosting) to verify the reliability of the initial emotional action category output by the preset classification model.
[0029] Specifically, the confidence verification is based on the multimodal emotion feature vector extracted in step S2 and the initial emotion action category output by the preset classification model. The multimodal emotion feature vector is obtained by deep mining and fusion of pressure distribution data when the electronic skin is touched, covering information such as spatiotemporal trajectory, force change, frequency, and duration, reflecting the target object's touch behavior characteristics from multiple dimensions, laying the foundation for emotion analysis. The initial emotion action category is a preliminary judgment made by the classification model based on these feature vectors, using learned classification rules and patterns. In addition, the external state information of the target object obtained by the environmental sensor 12 is also used for verification, which is used to characterize the external conditions that affect the emotional action performance, such as whether the target object is wearing gloves when touched. These external conditions will significantly affect the target object's emotional action performance, and thus affect the accuracy of the pressure distribution data and the classification model. For example, wearing gloves will change the pressure transmission. The XGBoost model comprehensively evaluates the matching degree between the initial emotion action category and the actual situation based on the above data, and outputs a confidence score of 0-1. The closer the score is to 1, the more reliable the result; the closer it is to 0, the lower the reliability.
[0030] S5 compares the confidence level of the emotional action category with the confidence threshold of the emotional action category, and takes the initial emotional action category that is higher than the confidence threshold of the emotional action category as the true emotional action category of the target object.
[0031] Ultimately, the decision to output the final recognition result will be based on the confidence score. If the confidence score is higher than the emotional action category confidence threshold (e.g., 0.75), the result containing the emotional action category, emotional intensity value, and confidence score will be output. In other words, the output emotional action category will be regarded as the true emotional action category of the target object, providing reliable information for subsequent interactions and decisions. If the score is lower than the threshold, data may need to be collected again or the evaluation may need to be conducted.
[0032] This embodiment acquires pressure distribution data of electronic skin when touched by a target object, without relying on vision or complex physiological sensors. It then extracts and fuses multi-dimensional features from the pressure distribution data, including spatiotemporal trajectory, force changes, frequency, and duration, to obtain a multimodal emotion feature vector. This constructs a direct mapping from physical features to emotional states, comprehensively and accurately reflecting the emotional information when touched by the target object. Next, a pre-defined classification model is used to classify the multimodal emotion feature vector into emotional actions. To ensure the reliability of the classification results, confidence verification is performed by combining the multimodal emotion feature vector, the initial emotional action category, and the external state information of the target object, outputting the confidence score of the emotional action category. This comprehensive approach effectively assesses the reliability of the classification results. Finally, the confidence score of the emotional action category is compared with a threshold; initial emotional action categories exceeding the threshold are taken as the true emotional action categories, ensuring the reliability of the output emotional action categories. The above process employs a lightweight model similar to CNN-LSTM and XGBoost. Leveraging the advantages of this lightweight model, sub-millisecond-level emotion recognition response and confidence verification can be achieved, meeting high real-time requirements and satisfying the immediate response needs of scenarios such as emotional companion robots. Overall, the process achieves high-precision and real-time emotion recognition, accurately capturing emotional information and providing timely and powerful support for emotional interaction applications.
[0033] The above embodiments of emotional action recognition can be applied to various scenarios. For example, in child companion robots, the system can be integrated into the robot's back or abdomen and can identify the child's emotional state based on different touch actions: when the child user "gently strokes" the robot (identified as "pleasure", intensity > 80), the robot will trigger positive feedback, such as emitting a "happy" laugh and playing upbeat music; when the child user performs "steady, long-term pressing" (identified as "calm", intensity > 60), the robot judges that the child may be in a relaxed or seeking comfort state and will play a soothing lullaby; when the child user performs "rapid, large-amplitude patting" (identified as "tense" or "angry"), the system can capture this negative emotion in real time, and the robot will immediately stop the current activity and actively emit soothing language, such as "Baby, are you unhappy? Let me tell you a story."
[0034] For example, in mental health monitoring devices such as smart cushions or wearable wristbands, the system can non-invasively record the unconscious touching behavior of the target subject over a long period, analyze the trend of changes in emotional patterns, issue warnings for an increase in the frequency of "tension"-type actions, and assess the target subject's positive emotional level by the duration and frequency of "pleasure"-type actions. For instance, if the system detects a significant recent increase in the frequency of "tension"-type actions (large fluctuations in intensity and high frequency) in the target subject, it can issue a warning to the target subject or the psychologist, indicating that they may be facing significant stress or an increased level of anxiety; by analyzing the duration and frequency of "pleasure"-type actions (slow and gentle), it can serve as an objective indicator for assessing the target subject's relaxation and positive emotional level.
[0035] In some optional embodiments of this example, step S2, which involves extracting spatiotemporal trajectory features from the pressure distribution data, includes: S211, Select the frame sequence pressure data of the corresponding length in the pressure distribution data according to the preset window length.
[0036] In this embodiment, the frame sequence length of the feature extraction window can be set to 15 frames, corresponding to a 75ms time window, in order to balance the integrity of emotional features with system latency, ensuring that sufficient emotional information can be obtained while avoiding excessive processing time that would slow down the system response.
[0037] S212, calculate the corresponding centroid position based on the frame sequence pressure data to obtain the centroid positions of several frames.
[0038] The centroid position is calculated for the pressure data of the 64×64 layout in each frame. That is, the distribution of pressure data of each 64×64 array is analyzed to determine the location of its centroid, and finally the centroid positions of each frame are obtained.
[0039] S213, calculate the average and maximum moving speed of the centroid trajectory based on the centroid positions of several frames, so as to obtain the spatiotemporal trajectory feature vector.
[0040] This embodiment performs a series of calculations on the centroid trajectory based on the obtained centroid positions of several frames. The average movement speed reflects the overall speed of the centroid's movement over a period of time. For example, gentle emotional actions are often accompanied by a slower average movement speed because the movements are more relaxed; while excited emotional actions may correspond to a faster average movement speed, indicating more abrupt movements. The maximum movement speed of the centroid trajectory reflects the fastest speed at which the centroid moves at a certain moment. For example, in an excited state, there may be a sudden, large, and rapid movement, resulting in a larger maximum movement speed of the centroid; while in a calm or gentle state, the maximum movement speed is relatively smaller. In addition to the average and maximum movement speeds, the rate of change of the centroid's movement direction or its acceleration can also be calculated. The rate of change of movement direction reflects the frequency and magnitude of changes in the centroid's movement direction, while acceleration reflects the speed of change in the centroid's velocity.
[0041] This implementation effectively balances the integrity of emotional features with system latency by selecting frame sequence pressure data of a preset window length, ensuring timely and comprehensive acquisition of emotional information. Next, the centroid positions of several frames are calculated, along with the average and maximum movement speeds of the centroid trajectories, accurately reflecting the intent of emotional actions from multiple perspectives. This rich and comprehensive feature data provides a solid foundation for subsequent sentiment analysis, contributing to improved accuracy and reliability of emotional action recognition.
[0042] In some optional embodiments of this example, step S2 involves extracting force variation features from the pressure distribution data, including: S221, calculate the total pressure data based on the frame sequence pressure data, and determine the total force peak based on the total pressure data.
[0043] This step first calculates the total pressure data based on the frame sequence pressure data, which involves summarizing the pressure data corresponding to the 64×64 array in each frame. Then, the maximum value is found from this total pressure data and determined as the peak value of the total intensity. This peak value reflects the intensity of emotions at their highest levels; for example, the peak value of the total intensity is usually higher during angry actions.
[0044] S222, calculate the standard deviation of force and the rate of change of force based on the pressure data of the frame sequence.
[0045] Two calculations are performed based on frame sequence stress data. First, the standard deviation of the stress level is calculated, reflecting the dispersion of the stress data relative to the average value. For example, stress fluctuations during tense actions result in a larger standard deviation. Second, the rate of change of stress is calculated, indicating how quickly the stress changes over time. For instance, stress increases more rapidly during moments of surprise, resulting in a higher rate of change. In addition to the standard deviation and rate of change of stress, other calculations can be performed, such as the time it takes for stress to rise / fall or the average slope of stress change. This allows for a more accurate identification of emotions by reflecting the speed and intensity of stress changes under different emotional states from both a dynamic temporal dimension and an overall magnitude of change.
[0046] S223, determine the force change feature vector based on the total force peak value, force standard deviation and force change rate.
[0047] Combining the total intensity peak obtained in step S221, and the intensity standard deviation and intensity change rate calculated in step S222, these three indicators are combined to form an intensity change feature vector. This vector can comprehensively reflect the intensity characteristics of emotion.
[0048] This embodiment reflects the intensity of emotion from different perspectives by calculating the total peak force, standard deviation of force, and rate of change of force in the pressure distribution data. The total peak force highlights the force expression when the emotion is strong, the standard deviation of force reflects the fluctuation of force, and the rate of change of force shows the trend of force change over time. Combining these indicators into a force change feature vector provides rich and comprehensive feature information for emotion analysis, which helps to more accurately identify different emotional states and improve the performance and accuracy of the system in application scenarios such as emotion action recognition and emotion interaction.
[0049] In some optional embodiments of this example, step S2, which involves extracting frequency features from the pressure distribution data, includes: S231 performs a Fourier transform on the frame sequence pressure data and calculates the low-frequency energy and high-frequency energy to obtain the frequency feature vector.
[0050] This embodiment performs a Fourier transform on the frame sequence stress data, converting it from the time domain to the frequency domain and decomposing the stress data into components of different frequencies. Then, low-frequency energy (e.g., 0-5Hz, corresponding to slow actions like soothing) and high-frequency energy (e.g., 5-20Hz, corresponding to rapid actions like excitement) are calculated separately. Combining these energy values yields a frequency feature vector, which reflects the distribution of stress data across different frequencies. The energy distribution across different frequency ranges corresponds to different emotional states and behavioral characteristics, helping to more accurately and comprehensively identify different emotional states and improve the system's performance and accuracy in applications such as emotion recognition and emotional interaction.
[0051] In some optional embodiments of this example, step S2 involves extracting duration features from the pressure distribution data, including: S241, calculate the duration of the target object's contact based on the time of the pressure distribution data to obtain the duration feature vector.
[0052] This embodiment is used to statistically analyze the entire process from the moment pressure is applied to the target object until the pressure disappears at the end of the touch. This timeframe describes the target object's tactile behavior over time, and the duration reflects the level of emotional focus. Because the duration of contact often differs depending on the emotional state—for example, intimate physical contact lasts longer—this characteristic provides crucial time-based evidence for sentiment analysis, helping to more accurately identify and differentiate different emotional states.
[0053] In some optional implementations of this embodiment, step S3 uses a preset classification model to classify the multimodal emotion feature vectors into emotion-action categories to obtain initial emotion-action categories, including: S31, use a preset classification model to calculate the emotional intensity of the multimodal emotional feature vectors to obtain the emotional intensity value.
[0054] This embodiment uses a classification model with a lightweight sequence modeling classifier structure of CNN-LSTM to calculate the emotional intensity of multimodal emotional feature vectors. The multimodal emotional feature vectors contain emotional information in multiple dimensions. Through the processing of the model, an emotional intensity value representing the intensity of the emotion will be output, for example, in the range of 0-100. This value can quantify the strength of the emotion.
[0055] Typically, a CNN-LSTM can be composed of a data input module, a CNN module, an LSTM module, and an output module. S32, using a pre-defined classification model, the initial emotional action category is obtained by determining the emotional intensity value based on an emotional threshold.
[0056] The classification model then compares the calculated emotional intensity value with a set of preset emotional threshold intervals. Based on the comparison results, the final emotional action category is determined, such as "pleasure," "calm," "tension," or "anger." Moreover, the emotional thresholds are determined by the target audience type; that is, these threshold intervals can be dynamically adjusted according to the target audience type (such as adults or children). For example, the threshold interval for children can be multiplied by a correction coefficient less than 1 to adapt to the characteristics of different user groups.
[0057] In terms of emotion intensity calculation, this embodiment can quantify complex multimodal emotion features into specific intensity values, providing a clear numerical basis for the analysis and comparison of emotional actions. The emotion action category determination step can not only accurately classify emotion categories based on intensity values, but also dynamically adjust the threshold range according to the target object type, making emotion classification more closely aligned with the actual situation of different target objects. When using a preset classification model for emotion action classification, it can initially determine the emotion action category, achieving efficient and accurate emotion action category recognition and intensity quantification. This helps to better understand and respond to human emotions in fields such as emotion analysis and human-computer interaction, improving the system's intelligence and personalization level.
[0058] The models used in steps S31 and S32 need to be prepared through dataset construction and model training. In dataset construction, people apply emotional actions on the device according to predetermined categories and intensities, and these actions are saved along with the predetermined categories and intensities. This method is repeated multiple times (typically 10-1000 times) by a large number of people (typically 10-1000 people). In model training, the dataset is reasonably divided, and a portion is selected as the training set. The model is trained using the training set data (for example, the training of the CNN-LSTM in step S31 is accomplished by dividing the data into batches and performing backpropagation on each batch to update the model parameters; for example, the classification model in step S32 is obtained by optimizing the boundary point with classification error as the optimization objective).
[0059] In some optional embodiments of this invention, the emotion and action recognition method based on electronic skin provided by this embodiment further includes: preprocessing stress distribution data, specifically including: Step 1: Calibrate the pressure distribution data based on the dynamic pressure baseline value to obtain calibrated pressure distribution data.
[0060] To eliminate interference from individual and environmental noise, this embodiment performs adaptive preprocessing on the raw data. First, after acquiring the pressure distribution data, a dynamic baseline value related to the target object's historical data is subtracted. This baseline value is dynamically calculated using, for example, the average force value from the target object's initial movement data (e.g., the first 500 frames, 2.5 seconds). This is done to eliminate inherent force differences between different objects (e.g., adults and children), thereby obtaining calibrated pressure distribution data that makes subsequent data more comparable and accurate. Step 2: Filter the pressure distribution data that is higher than the environmental noise threshold from the calibration pressure distribution data to obtain the effective pressure distribution data.
[0061] Then, the calibrated pressure distribution data is compared with an environmental noise threshold, which is determined based on the external state information of the target object acquired by the environmental sensor, particularly depending on the contact medium between the target object and the electronic skin, thus achieving adaptive adjustment. When the target object is in contact with the electronic skin with bare skin, the environmental noise threshold is set to 500 kPa; if it is in contact with ordinary cotton clothing, the threshold is set to 1000 kPa. Through this comparison process, noise signals below the threshold are effectively filtered out, while pressure distribution data above the threshold are retained, thus obtaining effective pressure distribution data and significantly improving the purity of the data.
[0062] Step 3: Smooth the effective pressure distribution data to obtain smoothed pressure distribution data.
[0063] Finally, Gaussian filtering (e.g., setting the standard deviation to 0.5) is applied to the effective stress distribution data. The purpose is to preserve the continuity of the emotional action trajectory, so that the data can better reflect the changes in emotional actions, and finally obtain smoothed stress distribution data, making the data more stable and continuous, which is convenient for subsequent analysis.
[0064] This embodiment eliminates inherent force differences between different target objects in pressure distribution data through individual baseline calibration, enabling data from different target objects to be analyzed under a unified standard and enhancing data comparability. The dynamic noise suppression step effectively removes environmental noise interference, improves data quality, and avoids the impact of noise on subsequent analysis. The spatial smoothing step preserves the trajectory continuity characteristics of emotional actions, making the data more consistent with actual emotional action changes, providing a reliable, stable, and high-quality data foundation for subsequent electronic skin-based emotional action recognition. In summary, these preprocessing steps significantly improve the accuracy and reliability of emotional action recognition, providing users with personalized recognition services and more accurately identifying emotional actions.
[0065] A portion of this invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to the invention through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.
[0066] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.
Claims
1. An emotion and action recognition method based on electronic skin, characterized in that, The method includes: Acquire pressure distribution data of electronic skin when it is touched by a target object; The spatiotemporal trajectory features, intensity change features, frequency features, and duration features of the pressure distribution data are extracted and fused to obtain a multimodal emotion feature vector; The multimodal emotion feature vector is classified into emotion and action categories using a preset classification model to obtain initial emotion and action categories; The confidence level of the initial emotional action category is verified based on the multimodal emotional feature vector, the initial emotional action category, and the external state information of the target object to obtain the confidence level of the emotional action category. The external state information of the target object is used to characterize the external conditions that affect the performance of emotional actions. The confidence level of the emotional action category is compared with the confidence threshold of the emotional action category, and the initial emotional action category that is higher than the confidence threshold of the emotional action category is taken as the true emotional action category of the target object.
2. The method according to claim 1, characterized in that, The spatiotemporal trajectory feature extraction of the pressure distribution data includes: The pressure data of the corresponding frame sequence length in the pressure distribution data is selected according to the preset window length; The centroid positions of several frames are obtained by calculating the corresponding centroid positions based on the frame sequence pressure data. The average and maximum moving speeds of the centroid trajectory are calculated based on the centroid positions of the aforementioned frames to obtain the spatiotemporal trajectory feature vector.
3. The method according to claim 2, characterized in that, Extracting force variation features from the pressure distribution data, including: Calculate the total pressure data based on the frame sequence pressure data, and determine the total force peak based on the total pressure data; Calculate the force standard deviation and force change rate based on the pressure data of the frame sequence; The force change feature vector is determined based on the total force peak value, the force standard deviation, and the force change rate.
4. The method according to claim 3, characterized in that, Frequency feature extraction is performed on the pressure distribution data, including: The frame sequence pressure data is subjected to Fourier transform, and the low-frequency energy and high-frequency energy are calculated to obtain the frequency feature vector.
5. The method according to claim 1, characterized in that, The duration feature is extracted from the pressure distribution data, including: The duration of contact with the target object is calculated based on the time of the pressure distribution data to obtain a duration feature vector.
6. The method according to claim 1, characterized in that, The step of classifying the multimodal emotional feature vectors into emotional action categories using a preset classification model to obtain initial emotional action categories includes: The pre-defined classification model is used to calculate the emotional intensity of the multimodal emotional feature vector to obtain the emotional intensity value; Using the preset classification model, the emotional intensity value is determined based on the emotional threshold to obtain the initial emotional action category.
7. The method according to claim 6, characterized in that, The emotional threshold is determined by the type of the target object.
8. The method according to claim 1, characterized in that, It also includes preprocessing the pressure distribution data: The pressure distribution data is calibrated based on the dynamic pressure baseline value to obtain calibrated pressure distribution data; Filter the pressure distribution data that are higher than the environmental noise threshold from the calibration pressure distribution data to obtain the effective pressure distribution data; The effective pressure distribution data is smoothed to obtain smoothed pressure distribution data.
9. The method according to claim 8, characterized in that, The external state information of the target object includes the contact medium between the target object and the electronic skin, and the environmental noise threshold is determined by the contact medium between the target object and the electronic skin.
10. An emotion and action recognition system based on electronic skin, characterized in that, include: The system comprises a pressure sensor array, an environmental sensor, a data acquisition module, and a processor; wherein the pressure sensor array is used to acquire pressure distribution data of the electronic skin when it is touched by a target object, the environmental sensor is used to acquire external state information of the target object, the data acquisition module is used to acquire pressure distribution data, and the processor is used to execute the emotion and action recognition method based on electronic skin as described in any one of claims 1-9.