A method, system, device, and medium for sensing the hand contact state of a humanoid robot based on multimodal tactile fusion.

By setting up multimodal tactile sensors on the hand of a humanoid robot, pressure and deformation signals are collected and fused to generate structured contact event results, solving the problem of incomplete contact state determination in existing technologies and realizing efficient perception and control linkage for complex grasping scenarios.

CN122299643APending Publication Date: 2026-06-30ANHUI BOYAN INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ANHUI BOYAN INFORMATION TECH CO LTD
Filing Date
2026-04-10
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing robot hand contact sensing solutions struggle to simultaneously address multi-area pressure distribution sensing, contact dynamic change capture, and continuous time-series structured output. This results in incomplete contact state determination in complex grasping scenarios, failing to generate structured results that can be directly invoked by the control system.

Method used

A multimodal tactile fusion method is adopted. By setting pressure tactile sensors and sliding tactile sensors on the dexterous hand of a humanoid robot, array pressure signals and deformation image sequences are collected and fused to generate multimodal tactile fusion features, which are used for contact state recognition and abnormal event determination, and output structured contact event results.

Benefits of technology

It improves the integrity and adaptability of contact perception in complex grasping scenarios, enhances the adaptability to complex objects and diverse surface materials, realizes continuous characterization of contact state and abnormal early warning, and improves the linkage capability of grasping early warning, grasping force correction and abnormal protection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122299643A_ABST
    Figure CN122299643A_ABST
Patent Text Reader

Abstract

This invention relates to the field of intelligent sensing and control technology, and discloses a method, system, device, and medium for sensing the contact state of a humanoid robot hand based on multimodal tactile fusion. The method includes: acquiring an array of pressure signals using pressure tactile sensors placed in the contact area of ​​the humanoid robot's dexterous hand; acquiring a sequence of deformation images of the contact area using a sliding tactile sensor; fusing pressure branch features and sliding branch features to obtain multimodal tactile fusion features and generating a single-moment recognition output result; merging the recognition output results from multiple consecutive moments to obtain continuous contact state segments; using state parameter judgment rules to determine abnormal contact events, generating a structured contact event output result, and sending it to the robot control system. This invention achieves continuous sensing, stable recognition, and structured output of the robot hand's contact state, improving the ability to recognize sliding trends and control linkage capabilities in complex operating scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent sensing and control technology, and in particular to a method, system, device and medium for sensing the contact state of a humanoid robot's hand based on multimodal tactile fusion. Background Technology

[0002] With the rapid development of humanoid robots and embodied intelligent systems, the application scenarios of robots in various tasks continue to expand. Compared to traditional industrial robotic arms, which primarily handle fixed gripping, repetitive trajectories, and regular workpieces, humanoid robot hands are designed for grasping tasks in more complex open environments. These tasks involve diverse objects, complex surface materials, and rapidly changing force states, and are prone to instability during grasping. Therefore, enabling robots to accurately perceive contact states, identify slippage trends, and promptly output structured state results that can be used by the control system during grasping has become a key issue in improving the reliability and safety of robot operation.

[0003] Current robotic grasping and perception solutions largely rely on visual information, joint current estimation, end effector force sensors, or single tactile sensors to determine the state. Among these, visual-based solutions are susceptible to occlusion, lighting variations, viewing angle limitations, and reflective surfaces. When the target object is obscured by a finger, the local contact area is invisible, or the object's surface texture is insufficient, the ability of visual information to represent the true contact state significantly decreases. While solutions based on joint current or overall end effector force feedback can reflect overall force changes to some extent, they typically struggle to accurately describe the local pressure distribution, contact migration path, and early signs of slippage in different contact areas of the robot's hand. Solutions based on single tactile signals often focus only on a single type of contact information, such as sensing only normal pressure or detecting only contact / non-contact states. They struggle to simultaneously consider contact intensity, contact range, contact dynamics, and slippage trend recognition in complex contact scenarios, resulting in limited adaptability to complex objects.

[0004] Furthermore, target objects with different shapes, materials, and surface friction properties exhibit significantly different tactile response patterns on the robot hand. Classification using only static single-frame tactile images is insufficient to describe the continuous evolution of contact states, failing to generate structured temporal results directly relevant to control, such as "stable contact—transitional contact—slipping contact." Simultaneously, tactile sensors on different robot platforms vary in deployment area, array size, sampling frequency, and signal structure. Existing recognition algorithms are highly dependent on single sensor specifications, requiring data re-collection and model adaptation after changing the hand shape or upgrading the sensor, resulting in high system deployment costs and limited technology transfer capabilities. Moreover, existing solutions often focus on contact determination at the current moment, lacking the ability to merge states across multiple timeframes, perform continuous statistics, extract abnormal events, and issue trend warnings, making it difficult to output structured results directly usable by the controller. Summary of the Invention

[0005] In view of the aforementioned existing problems, the present invention is proposed.

[0006] Therefore, this invention provides a method, system, device and medium for sensing the contact state of a humanoid robot hand based on multimodal tactile fusion, which solves the problem that existing robot hand contact sensing schemes are difficult to simultaneously achieve multi-area pressure distribution sensing, contact dynamic change capture and continuous temporal structured output.

[0007] To solve the above-mentioned technical problems, the present invention provides the following technical solution: In a first aspect, the present invention provides a method for perceiving the contact state of a humanoid robot's hand based on multimodal tactile fusion, comprising: Pressure signals are collected by an array of pressure sensors placed in the contact area of ​​the humanoid robot's dexterous hand, and deformation image sequences of the contact area are collected by a sliding tactile sensor. The array pressure signal is subjected to a first preprocessing operation to obtain pressure branch features; the deformation image sequence is subjected to a second preprocessing operation to obtain slip branch features. The pressure branch feature and the slip branch feature are fused to obtain a multimodal tactile fusion feature, and a single-moment recognition output result is generated based on the multimodal tactile fusion feature. The identification output results at multiple consecutive time points are merged to obtain a continuous contact state segment; Based on the state parameter determination rules of the continuous contact state segment, abnormal contact events are determined, a structured contact event output result is generated, and the structured contact event output result is sent to the robot control system.

[0008] As a preferred embodiment of the humanoid robot hand contact state perception method based on multimodal tactile fusion described in this invention, the pressure tactile sensor adopts a flexible array piezoresistive tactile sensing structure, including a flexible substrate layer, a flexible electrode layer, a carbon black doped silicon rubber piezoresistive layer and a silicon rubber encapsulation layer, and multiple pressure sensing units are arranged in a preset spatial layout in the robot hand contact area. The pressure tactile sensor synchronously collects the resistance change signals corresponding to each pressure sensing unit during the contact between the robot hand and the target object, forming an array of pressure signals. Based on the preset spatial layout, the array pressure signals are spatially mapped to construct a two-dimensional pressure distribution map of the contact area. The first preprocessing operation is performed based on the two-dimensional pressure distribution map to obtain the pressure distribution input tensor, and pressure branch features are extracted based on the pressure distribution input tensor.

[0009] As a preferred embodiment of the humanoid robot hand contact state perception method based on multimodal tactile fusion described in this invention, the sliding tactile sensor includes a transparent silicone rubber elastomer layer, a reflective film layer or a surface microtexture marking layer disposed on the surface of the transparent silicone rubber elastomer layer, a light source disposed on the back side of the transparent silicone rubber elastomer layer, and an imaging unit. The sliding tactile sensor, under illumination, continuously acquires local deformation images of the contact area during contact with the target object by the imaging unit, forming a deformation image sequence. A second preprocessing operation is performed on the deformed image sequence to obtain the slip feature input; Based on the slip feature input, slip branch features are extracted, including local relative displacement changes, velocity changes, acceleration changes, slip ratio changes, and local contact abrupt change features in the contact area.

[0010] As a preferred embodiment of the humanoid robot hand contact state perception method based on multimodal tactile fusion described in this invention, the step of generating a single-moment recognition output result based on the multimodal tactile fusion features includes: The multimodal tactile fusion features are input into a pre-trained multi-task recognition network. Through the forward computation of the multi-task recognition network, the contact state category, object category, material properties, and contact trend parameters at the current moment are output synchronously. The contact trend parameters include at least the probability of slip contact state and the contact score; the probability of slip contact state is calculated by normalizing the contact state category, and the contact score is constructed based on the two-dimensional pressure distribution map output by the pressure tactile sensor, which is used to characterize the contact intensity and contact coverage level.

[0011] As a preferred embodiment of the humanoid robot hand contact state perception method based on multimodal tactile fusion described in this invention, the abnormal contact event determination based on the state parameter determination rules of the continuous contact state segment includes: The contact center offset, slippage trend score, and average contact score of the continuous contact state segment are obtained by calculation. When the contact center offset exceeds the contact center offset threshold range and the slip trend score exceeds the slip trend score threshold range, the current continuous contact state segment is determined to be a slip abnormal event. When the contact center offset is lower than the contact center offset threshold range and the average contact score is continuously lower than the average contact score threshold range, the current continuous contact state segment is determined to be an insufficient contact abnormal event. When the slip trend score is below the slip trend score threshold range and the average contact score exceeds the average contact score threshold range, the current continuous contact state segment is determined to be a stable contact event. The contact center offset, slippage trend score, and average contact score are all calculated within the continuous contact state segment corresponding to the effective identification frame set. When the number of effective frames in the continuous contact state segment is lower than the preset minimum frame number threshold, no abnormal event judgment is performed on the continuous contact state segment or only a low confidence prompt result is output.

[0012] As a preferred embodiment of the humanoid robot hand contact state perception method based on multimodal tactile fusion described in this invention, after generating the structured contact event output result, the method further includes: Based on the event type and risk level carried in the structured contact event output, the corresponding robot control command is matched from the preset control strategy mapping table; When the risk level is low, a prompt display instruction and a log recording instruction are generated; When the risk level is medium risk, a gripping force increment correction instruction and an end-effector execution speed reduction correction instruction are generated. When the risk level is high risk, a re-fetch instruction or an abnormal shutdown protection instruction is generated. The robot control commands are sent to the robot grasping controller, motion control unit, or strategy decision unit to adjust the current grasping behavior or execute safety protection.

[0013] As a preferred embodiment of the humanoid robot hand contact state perception method based on multimodal tactile fusion described in this invention, after generating the structured contact event output result, the method further includes: Visual information is generated based on the output results of the structured contact events; The visualization information includes a pressure distribution map of the contact area constructed from the array pressure signals output by the pressure tactile sensor, and a sliding state indicator and a sliding trend curve generated from the dynamic tactile information output by the sliding tactile sensor. The visualization information is overlaid on the monitoring interface of the robot grasping process to show the contact state and changing trend between the robot hand and the target object; Based on the output of the structured contact event, a grasping correction control command is generated, including a grasping force correction amount, a finger closure angle correction amount, an end effector speed correction amount, or a local contact posture correction amount. The grasping correction control command is sent to the robot control system to adjust the grasping strategy or perform anomaly protection operations.

[0014] Secondly, the present invention provides a humanoid robot hand contact state perception system based on multimodal tactile fusion, comprising: The data acquisition module is used to acquire array pressure signals through pressure tactile sensors set in the contact area of ​​the humanoid robot's dexterous hand, and to acquire deformation image sequences of the contact area through sliding tactile sensors; The feature extraction module is used to perform a first preprocessing operation on the array pressure signal to obtain pressure branch features; and to perform a second preprocessing operation on the deformation image sequence to obtain slip branch features. The fusion recognition module is used to fuse the pressure branch features and the slip branch features to obtain multimodal tactile fusion features, and generate a single-moment recognition output result based on the multimodal tactile fusion features; The time-series analysis module is used to merge the identification output results from multiple consecutive time points to obtain continuous contact state segments; The anomaly detection and output module is used to detect abnormal contact events based on the state parameter detection rules of the continuous contact state segment, generate structured contact event output results, and send the structured contact event output results to the robot control system.

[0015] Thirdly, the present invention provides an electronic device, including a memory and a processor; the memory is used to store computer-executable instructions, and the processor, when executing the computer-executable instructions, implements the steps of the method.

[0016] Fourthly, the present invention provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the method.

[0017] Compared with existing technologies, the beneficial effects of this invention are as follows: By deploying pressure and slip tactile sensors in the fingertip, finger side, and palm areas of the humanoid robot's dexterous hand, this invention can simultaneously acquire contact pressure distribution and dynamic change information between the robot's hand and the target object. Therefore, compared to existing solutions that rely solely on visual information, joint current estimation, or a single tactile signal, this invention more accurately characterizes the force state, contact range, and early slip signs of the contact area, improving the completeness of contact perception in complex grasping scenarios. Furthermore, by jointly modeling arrayed pressure signals and dynamic tactile information into a multimodal tactile fusion module, this invention can output contact state category, slip probability, contact stability score, object category, material properties, and trend parameters in the same recognition process. This improves the system's adaptability to complex grasping objects and diverse surface materials compared to existing solutions that can only output a single contact category or a single force state, and enhances the interpretability of contact state determination results. This invention, by performing time-series segmentation, state merging, continuous statistics, and anomaly event determination on the identification results from multiple consecutive moments, can further transform single-moment contact identification results into structured contact event results that can be directly invoked by the robot control system. This enables continuous representation of the state evolution processes such as stable contact, transitional contact, continuous slippage, and local contact anomalies, improving the system's linkage capabilities in grasping early warning, grasping force correction, operation strategy switching, and anomaly protection. Furthermore, by generating contact heatmaps, contact center positions, slippage state overlay displays, pressure distribution maps, three-dimensional pressure surface maps, and continuous time-series state comparison results, and combining these with the system's overall control output of grasping correction commands or early warning information, this invention forms a closed-loop working mechanism of "tactile perception - state determination - result visualization - control linkage," thereby improving the system's engineering deployability, debugging convenience, and practical application value. Attached Figure Description

[0018] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a schematic diagram of the overall process logic of a method provided in one embodiment of the present invention.

[0020] Figure 2 This is a schematic diagram of a joint contact state recognition process based on array pressure signals and dynamic tactile information, provided as an embodiment of the present invention.

[0021] Figure 3This is a schematic diagram of the continuous contact state segmentation and structured contact event output process provided in one embodiment of the present invention.

[0022] Figure 4 This is a schematic diagram of the overall system structure provided in one embodiment of the present invention.

[0023] Figure 5 This is a schematic diagram of the layout structure of a humanoid robot dexterous hand, a pressure tactile sensor, and a sliding tactile sensor provided in one embodiment of the present invention.

[0024] Figure 6 This is a schematic diagram of the layout structure and piezoresistive subarray of a pressure tactile sensor provided in one embodiment of the present invention.

[0025] Figure 7 This is a schematic diagram of the layout structure of a sliding tactile sensor and a micro-optical sensor provided in one embodiment of the present invention.

[0026] Figure 8 This is a schematic diagram illustrating the pressure tactile output results during the grasping process of a humanoid robot's dexterous hand on a scissor-like target object, according to an embodiment of the present invention.

[0027] Figure 9 This is a schematic diagram illustrating the state recognition and trend output results of a humanoid robot dexterous hand on a scissor-like target object at different contact stages, as provided in an embodiment of the present invention.

[0028] In the figure: 1. Humanoid robot dexterous hand; 2. Pressure tactile sensor; 3. Sliding tactile sensor; 4. Multimodal tactile fusion module; 5. System control; 2.1. Piezoresistive subarray; 3.1. Micro-photometer. Detailed Implementation

[0029] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of the present invention.

[0030] Example 1, referring to Figure 1 As one embodiment of the present invention, a method for perceiving the contact state of a humanoid robot's hand based on multimodal tactile fusion is provided, such as... Figure 1 The specific steps shown are as follows: S100: Collects array pressure signals through pressure tactile sensors located in the contact area of ​​the humanoid robot's dexterous hand, and collects deformation image sequences of the contact area through sliding tactile sensors; S200: Perform a first preprocessing operation on the array pressure signal to obtain pressure branch features; perform a second preprocessing operation on the deformation image sequence to obtain slip branch features; S300: The pressure branch feature and the slip branch feature are fused to obtain the multimodal tactile fusion feature, and a single-moment recognition output result is generated based on the multimodal tactile fusion feature; S400: Merge the identification output results from multiple consecutive time points to obtain a continuous contact state segment; S500: Based on the state parameter determination rules of continuous contact state segments, abnormal contact events are determined, structured contact event output results are generated, and the structured contact event output results are sent to the robot control system; It should be noted that existing robot grasping perception solutions mostly rely on visual information, joint current estimation, end effector force sensors, or single tactile sensors to determine the state. However, existing solutions struggle to simultaneously perceive pressure distribution across multiple areas, capture dynamic changes in contact, identify early slippage trends, and provide continuous temporal structured output. This results in incomplete contact state determination in complex grasping scenarios, poor adaptability to differences in sensing platforms, and the inability to generate structured results that can be directly used by the control system.

[0031] To address the aforementioned issues, this embodiment establishes a complete closed-loop processing flow through steps S100 to S500: By utilizing pressure and slip tactile sensors, it can simultaneously acquire contact pressure distribution information and contact dynamic change information, more accurately characterizing the stress state and early slip signs of the contact area compared to existing solutions, thus improving the integrity of contact perception in complex grasping scenarios. By extracting and fusing features from the array pressure signals and deformation image sequences, single-moment recognition output results can be generated synchronously within the same recognition process, enhancing the system's adaptability to complex grasping objects and diverse surface materials. By merging the recognition results from multiple consecutive moments and determining abnormal events, the single-moment recognition results are transformed into structured contact event results that can be directly invoked by the robot control system, achieving continuous characterization of state evolution processes such as stable contact, transitional contact, and slip contact, enhancing the system's linkage capabilities in grasping early warning, grasping force correction, and abnormal protection. Finally, by outputting the structured results to the robot control system, a closed-loop working mechanism is formed, improving the system's engineering deployability and practical application value.

[0032] Example 2, refer to Figure 2 and Figure 3 Based on the previous embodiment, this embodiment provides a specific implementation of a method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion, in order to illustrate the technical means used in this method.

[0033] In this embodiment of the invention, steps S100 and S200 include the following sub-steps A1 and B1: In A1: The pressure tactile sensor adopts a flexible array piezoresistive tactile sensing structure, including a flexible substrate layer, a flexible electrode layer, a carbon black doped silicone rubber piezoresistive layer and a silicone rubber encapsulation layer. Multiple pressure sensing units are arranged in a preset spatial layout in the contact area of ​​the robot hand. The robot's hand synchronously collects the resistance change signals of each pressure sensing unit during the contact between the robot's hand and the target object using pressure tactile sensors, forming an array of pressure signals. Based on the preset spatial layout, the array pressure signals are spatially mapped to construct a two-dimensional pressure distribution map of the contact area. The first preprocessing operation is performed based on the two-dimensional pressure distribution map to obtain the pressure distribution input tensor, and the pressure branch features are extracted based on the pressure distribution input tensor.

[0034] Specifically, pressure tactile sensors are installed on the fingertips, sides, palm, or other hand areas that come into contact with the target object in the dexterous hand of a humanoid robot. These sensors detect changes in normal pressure and output arrayed pressure signals during the robot's grasping, holding, handling, or manipulation of objects. The pressure tactile sensors employ a flexible array-type piezoresistive tactile sensing structure, including a flexible substrate layer, a flexible electrode layer disposed on the flexible substrate layer, a piezoresistive sensing layer disposed on the flexible electrode layer, and an elastic encapsulation layer covering the outside of the piezoresistive sensing layer. The piezoresistive sensing layer is a carbon black-doped silicone rubber piezoresistive layer, and the elastic encapsulation layer is a silicone rubber encapsulation layer.

[0035] Specifically, when the target object comes into contact with the pressure tactile sensor, the normal pressure acts on the carbon black-doped silicone rubber piezoresistive layer, causing changes in the local thickness or conductive path, which in turn causes changes in the resistance value of the corresponding array unit. The pressure tactile sensor then synchronously collects the corresponding resistance change signal to form an array pressure signal.

[0036] After signal acquisition, the system maps the array pressure signals to their spatial locations according to a preset spatial layout to construct a two-dimensional pressure distribution map of the contact area. This two-dimensional pressure distribution map visually presents the pressure magnitude and distribution pattern at various locations within the contact area, reflecting key information such as contact area, pressure center, and local peak pressure.

[0037] Furthermore, the system performs a first preprocessing operation on the two-dimensional pressure distribution map, including normalization, missing data compensation, size alignment, padding, cropping, resampling, or temporal organization, converting it into a pressure distribution input tensor with uniform specifications. Based on this, pressure branch features are extracted from the pressure distribution input tensor. These features characterize the static contact state between the robot hand and the target object, including at least the contact area, pressure center location, local peak pressure, regional pressure gradient, pressure diffusion trend, and geometric distribution characteristics of the contact area.

[0038] In B1: Through a sliding tactile sensor, under illumination by a light source, the imaging unit continuously acquires local deformation images of the contact area during the contact process with the target object, forming a deformation image sequence; A second preprocessing operation is performed on the deformed image sequence to obtain the slip feature input; Slip branch features are extracted based on slip feature input. Slip branch features include local relative displacement changes, velocity changes, acceleration changes, slip ratio changes, and local contact abrupt change features in the contact area.

[0039] Specifically, the sliding tactile sensor and the pressure tactile sensor are configured in conjunction to acquire a series of deformation images of the contact area over continuous time and output dynamic tactile signals characterizing minute relative displacements, velocity changes, acceleration changes, or sliding trends of the contact area. The sliding tactile sensor includes a transparent elastomer layer, a reflective film layer or a surface micro-texture marking layer disposed on the surface of the transparent elastomer layer, a light source disposed on the back side of the transparent elastomer layer, and an imaging unit.

[0040] Specifically, when the target object comes into contact with the transparent silicone rubber elastomer layer, the contact area deforms and is mapped into a continuous and observable local deformation image through the reflective film layer or the surface micro-texture marking layer. Under the illumination of the light source, the imaging unit continuously acquires the local deformation images to form a deformation image sequence.

[0041] After image acquisition, the system performs a second preprocessing operation on the deformed image sequence, including temporal arrangement, image enhancement, local displacement estimation, and dynamic feature extraction, converting it into a standardized slip feature input. Local displacement estimation calculates the relative displacement of points in the contact area between adjacent frames using image registration or optical flow methods.

[0042] Furthermore, based on the slip feature input, slip branch features are extracted. These features include local relative displacement changes, velocity changes, acceleration changes, slip ratio changes, and local contact abrupt change features in the contact area. Local relative displacement changes characterize the spatial migration of points in the contact area; velocity and acceleration changes reflect the severity and trend of slippage; slip ratio changes quantify the proportion of the contact area where slippage occurs; and local contact abrupt change features capture instantaneous abnormal fluctuations in the contact state. Through the extraction of these slip branch features, the system can accurately identify stable contact, local loosening, excessive slippage, continuous slippage, or grasping instability tendencies, providing dynamic tactile evidence for subsequent fusion recognition and anomaly determination.

[0043] In one feasible implementation, a unified preprocessing mapping is performed on the raw multi-channel tactile input output from the pressure tactile sensor and the sliding tactile sensor, the process of which can be represented as follows: ; in, Indicates the first t The raw multi-channel tactile inputs are collected in real time. The raw multi-channel tactile inputs include at least the array pressure signal output by the pressure tactile sensor and the dynamic tactile information output by the sliding tactile sensor. This indicates the preprocessing mapping operator. This represents the tactile input representation after unified preprocessing mapping.

[0044] Preprocessing mapping operator The system triggers execution upon receiving pressure and sliding haptic sensor inputs within the same sampling period. A complete preprocessing process can also be triggered when the number of consecutive input frames reaches the preset timing window length, or when the system detects a new grasping contact start event. The preprocessing process is preferably executed in the following order: "time synchronization and input alignment - spatial location mapping - normalization - size alignment - missing data compensation - padding or pruning - resampling - timing adjustment," to ensure consistency in both the temporal and spatial dimensions of inputs from different sources.

[0045] In a feasible implementation, if the current input contains timestamp anomalies, local sampling loss, or input specification mismatch, time alignment, missing data compensation, and size alignment operations are performed first. Size alignment is preferably performed based on the actual layout structure of the humanoid robot's dexterous hand. This involves first organizing the original array pressure signals according to the piezoresistive subarrays of the palm region and the fingertips, then mapping them to a unified two-dimensional input grid according to the target model's input requirements, making the palm input and fingertips input comparable in spatial scale. The timing window length is set based on the number of continuous sampling frames of the sliding tactile sensor and the duration of the current grasping action to ensure that pressure change information and dynamic tactile change information are included simultaneously within the same timing window. When a local sampling unit experiences a short-term failure, communication loss, or local gap, missing data compensation can be performed by combining the spatial neighborhood response relationship of adjacent subarrays. Filling, cropping, and resampling are preferably performed around the center region of the palm, the current pressure center region, or the current active sliding region to retain the most relevant effective region to the current contact state and to unify the organization of pressure input and dynamic tactile input in both spatial and temporal dimensions.

[0046] In a feasible implementation, the pressure branch features and slip branch features are extracted separately, and the process can be represented as follows: ; in, and These represent pressure branching characteristics and slip branching characteristics, respectively. This represents the pressure feature extraction mapping. This represents the slip feature extraction mapping. Indicates the first t The pressure representation input corresponding to each moment, Indicates the first t The slip representation input corresponding to the time step.

[0047] It should be noted that steps S100 and S200 construct a two-dimensional pressure distribution map and a deformation image sequence through the coordinated deployment of pressure and sliding tactile sensors. Preprocessing operations are then performed to align the spatial and temporal dimensions, effectively eliminating differences in scale, frequency, and structure between the heterogeneous sensors. Pressure branch features characterize static contact distribution, while sliding branch features characterize dynamic contact changes. These two types of features complement each other, providing a complete input foundation for subsequent multimodal fusion recognition and significantly improving the system's comprehensive perception capability of contact states in complex grasping scenarios.

[0048] In this embodiment of the invention, step S300 includes the following sub-steps C1 and C2: In C1: The multimodal haptic fusion module fuses pressure branch features and slip branch features in a unified feature space to obtain multimodal haptic fusion features. The process can be represented as follows: ; in, Indicates the first t Moment-based multimodal haptic fusion features The fusion mapping operator can be a linear mapping, nonlinear mapping, attention-weighted mapping, temporal aggregation mapping, or other feature fusion methods. This indicates a feature splicing operation.

[0049] In C2: Single-moment recognition output results are generated based on multimodal tactile fusion features; Specifically, the multimodal tactile fusion features are input into a pre-trained multi-task recognition network. Through the forward computation of the multi-task recognition network, the current contact state category, object category, material properties, and contact trend parameters are output simultaneously. This process can be represented as follows: ; in, This indicates a multi-task output mapping. Indicates the first t Output the contact state category at any given time. The output indicates the object category. This indicates the output of material properties. This indicates the output of trend parameters. Through multi-task joint output, the system can not only determine whether slippage has occurred, but also provide auxiliary semantic information and dynamic parameter information related to the contacting object.

[0050] Specifically, the contact trend parameters include at least the probability of slip contact state and the contact score; The probability of slip contact state is calculated by normalizing the contact state categories: ; in, Indicates the first t The discrimination score corresponding to the slip category in the constant contact state output. Indicates the first contact state output j Class discrimination score, C Indicates the total number of contact status categories. Indicates the first t The probability of being in a sliding contact state is input at any given time. This sliding probability can be used for subsequent timing state segmentation, abnormal contact detection, and control linkage. The contact score is constructed based on a two-dimensional pressure distribution map output by a pressure tactile sensor to reflect the contact intensity and contact coverage level. The process can be represented as follows: ; in, Indicates the first t Pressure distribution map at time 1 i Line number j Response value at column position This represents the weighting coefficient related to spatial location. H and W These represent the height and width of the pressure distribution map, respectively. The contact score represents the contact area. It can be used to characterize whether the current contact area is sufficient, whether the contact intensity is uniform, and whether there is a local attenuation trend in the contact. It can also serve as an auxiliary indicator for subsequent abnormal contact event assessment.

[0051] Furthermore, to simultaneously optimize the performance of contact state recognition, object recognition, material recognition, and dynamic parameter estimation, the multimodal haptic fusion module is trained using a joint loss function, the process of which can be expressed as follows: ; in, Indicates the joint optimization objective. This represents the contact state classification loss. This represents the object category classification loss. Indicates the material property classification loss. This represents the regression loss related to slip ratio, velocity, acceleration, or contact trend parameters. These represent the weight coefficients corresponding to each loss term. By jointly optimizing each loss term, the model can maintain its ability to recognize contact states while also ensuring accuracy in determining object attributes, material semantics, and dynamic trend parameters.

[0052] It should be noted that step S300 above integrates pressure branch features and slip branch features into multimodal tactile fusion features, and outputs contact state category, object category, material properties and contact trend parameters synchronously through a multi-task recognition network. This achieves joint modeling of static pressure distribution and dynamic contact changes, improves the system's adaptability to complex grasping objects and diverse surface materials, and enhances the interpretability of contact state determination results.

[0053] In this embodiment of the invention, step S400 includes the following sub-step D1: In D1: After completing the single-moment contact state recognition, the multimodal tactile fusion module performs temporal segmentation, state merging, and persistence statistics on the recognition output results of multiple consecutive moments to obtain the start and end times and duration of the stable contact segment, transition contact segment, and sliding contact segment.

[0054] Specifically, the first step is to filter valid frames from the recognition results at consecutive time points. This process can be represented as follows: ; in, Indicates the set of valid identification frames. Indicates the first t The highest class confidence score in the time-matter identification results. This indicates the effective identification threshold. Only when the identification confidence level at the current moment reaches the preset threshold requirement will the system include it in subsequent time series analysis, in order to improve the stability of continuous state segmentation and abnormal event determination.

[0055] After completing the selection of valid frames, the multimodal haptic fusion module merges the states of the valid frames in chronological order to obtain continuous contact state segments, which can be defined as follows: ; in, Indicates the first k A continuous contact state segment, Indicates the first t Contact status category label at any moment Indicates the first k The state categories corresponding to each consecutive state segment. and These represent the start and end times of the state segment, respectively. The above definition indicates that within the same continuous state segment, the state category remains consistent at each time point or satisfies a preset state merging condition.

[0056] Correspondingly, the first k The duration of a continuous contact state segment can be expressed as: ; in, Indicates the sampling frequency. Indicates the first k The duration of a continuous contact state segment. The duration is used to determine whether a contact state is a transient disturbance, a short-term transition, or a continuous abnormal state, and can be used as one of the bases for subsequent risk level classification.

[0057] In this embodiment of the invention, step S500 includes the following sub-steps E1 and E2: In E1: Abnormal contact events are determined based on the state parameter determination rules of continuous contact state segments, generating structured contact event output results, including: The contact center offset, slip trend score, and average contact score of the continuous contact state segment are obtained by calculation. When the contact center offset exceeds the contact center offset threshold range and the slip trend score exceeds the slip trend score threshold range, the current continuous contact state segment is determined to be a slip abnormal event. When the contact center offset is lower than the contact center offset threshold range and the average contact score is consistently lower than the average contact score threshold range, the current continuous contact state segment is determined to be an abnormal event of insufficient contact. When the slip trend score is below the slip trend score threshold range and the average contact score exceeds the average contact score threshold range, the current continuous contact state segment is determined to be a stable contact event. The contact center offset, slippage trend score, and average contact score are all calculated within the continuous contact state segment corresponding to the effective identification frame set. When the number of effective frames in the continuous contact state segment is lower than the preset minimum frame number threshold, no abnormal event judgment is performed on the continuous contact state segment or only a low confidence prompt result is output.

[0058] Specifically, based on the contact center position reconstructed from the pressure distribution corresponding to the pressure sensor, the contact center offset in the continuous state segment is constructed: ; in, Indicates the first The point of contact at any moment Indicates the first The average value of the contact center position within a continuous contact state segment. This indicates the number of valid frames contained within this status segment. Represents the L2 norm, Indicates the first The contact center offset of a continuous contact state segment. Used to quantify the dispersion of the center position of the contact area within the current continuous state segment relative to its average position within the segment, when A larger value indicates a clear trend of contact migration, local loosening, or unstable contact support during the current grasping process.

[0059] Specifically, a slip trend score is constructed within a continuous state segment based on the slip probability derived from the dynamic tactile features corresponding to the slip tactile sensor: ; in, Indicates the first Slip trend score for each continuous contact state segment Indicates the first The probability that the input at any given time belongs to a slip state. Slip trend score. Used to quantify the overall level of the slip probability within the current continuous state segment, when When the value continues to rise, it indicates that the contact state is evolving from stable contact to transitional contact or slip contact.

[0060] Specifically, the average contact score for each state segment is constructed based on the contact scores within the continuous state segment: ; in, Indicates the first Average contact score for a continuous contact state segment Indicates the first Contact score at any given moment. Average contact score. Used to quantify the overall adequacy of contact strength and contact coverage within the current continuous state segment, when When the value remains consistently low, it indicates that there is insufficient contact, local detachment, or a tendency for grasping instability in the current contact area.

[0061] Furthermore, after completing the above trend calculation, the multimodal haptic fusion module can determine abnormal contact events, and the process can be represented as follows: ; in, Indicates the first Abnormal contact determination results for a continuous contact state segment. Represents the characteristic function, Indicates the threshold for the slippage trend score. Indicates the contact center offset threshold. This represents the average contact score threshold.

[0062] In one feasible implementation, the above three quantities can be used in combination for anomaly event determination: when and At the same time, when the value is high, the current continuous contact state segment can be determined as a slip abnormality event; when Lower but When the value remains consistently low, the current continuous contact state segment can be identified as an insufficient contact anomaly; when lower and If the value is high, the current continuous contact state segment can be determined as a stable contact event.

[0063] In one feasible implementation, the slippage trend scoring threshold Contact center offset threshold and average contact score threshold The settings can be configured based on the dimensions of the dexterous hand structure, the resolution of the pressure tactile sensor, and the sampling characteristics of the sliding tactile sensor. Preferably, the sliding trend scoring threshold... It can be set within the range of 0.50 to 0.70; contact center offset threshold. It can be set to 3% to 10% of the effective width of the contact area, or expressed in pixel coordinates as 2 to 6 sampling units; average contact score threshold It can be set within the range of 0.20 to 0.40. For different grasping objects, different contact areas, or different sensor resolutions, the specific value of the threshold can be determined or modified based on the statistical distribution characteristics of the sample data during the training phase, the calibration results of the validation set, or the calibration results during the actual deployment process, so as to ensure that the abnormal contact judgment result is consistent with the actual grasping state.

[0064] Furthermore, after completing the abnormal contact determination, the system master control generates a structured contact event payload based on the determination results of the multimodal tactile fusion module. Its core output can be expressed as: ; in, Indicates the first A structured contact event load, Indicates the event type, Indicates the risk level. and These represent the start and end times of the event, respectively. Indicates the duration of the event. Indicates the corresponding contact state category. This indicates the slippage trend score, contact score, or overall risk score. This indicates the basis for the event triggering.

[0065] Furthermore, based on the event type and risk level carried in the structured contact event output, the corresponding robot control command is matched from the preset control strategy mapping table; specifically: When the risk level is low, generate prompt display instructions and log recording instructions; When the risk level is medium risk, generate a gripping force increment correction instruction and an end-efficiency reduction correction instruction; When the risk level is high, generate a re-fetch instruction or an abnormal shutdown protection instruction. The robot control commands are sent to the robot grasping controller, motion control unit, or strategy decision unit to achieve grasping warning, task playback, strategy correction, or anomaly protection.

[0066] Preferably, the gripping amplitude can be set to 5% to 15% of the current gripping force, the deceleration amplitude can be set to 10% to 30% of the current execution speed, and the re-grip and stop operations can be triggered after the risk level continues to exceed a preset duration threshold; wherein, the specific values ​​of the control amplitude, trigger threshold and duration conditions can be determined based on the sample data of the training phase, the calibration results of the validation set and the safety constraints of the robot gripping control.

[0067] In E2: After generating the structured contact event output, it also includes: Visual information is generated based on the output results of structured contact events; the visual information includes a pressure distribution map of the contact area constructed from the array pressure signals output by the pressure tactile sensor, and a slip state indicator and slip trend curve generated from the dynamic tactile information output by the slip tactile sensor. Visual information is overlaid on the monitoring interface of the robot grasping process to show the contact state and changing trend between the robot hand and the target object; Based on the output of the structured contact event, a grasping correction control command is generated, including the grasping force correction amount, the finger closing angle correction amount, the end effector speed correction amount, or the local contact posture correction amount. The grasping correction control command is sent to the robot control system to adjust the grasping strategy or perform anomaly protection operations.

[0068] Specifically, the multimodal haptic fusion module generates a normalized heat map based on the pressure distribution output by the pressure haptic sensor. This process can be represented as follows: ; in, Indicates the first The contact heat map corresponding to each moment. This represents the normalization mapping operator. This step represents the pressure distribution map. It maps the pressure distribution across different numerical ranges to a unified visual intensity space, allowing for a clear display of the force distribution, stress concentration locations, and contact coverage patterns within the current contact area. The system's central control unit overlays the thermal map onto the current contact or tactile image, creating a visual overlay result. This process can be represented as follows: ; in, Indicates the first The result of the time-stack display. This represents the original image or tactile background image corresponding to the current moment. This represents the intensity coefficient of the heatmap overlay. Through this overlay method, the system can intuitively display the current contact heat distribution, contact area, and changes in contact intensity on a unified interface without obscuring the main structure of the original image.

[0069] Specifically, the system's central control unit extracts typical frames from the continuous state segment results output by the multimodal haptic fusion module, forming a comparative display of stable contact, transitional contact, and sliding contact. For this comparative display, the central control unit can simultaneously overlay contact state text, sliding probability, contact score, object category, and material property information to create an interpretable interface suitable for debugging, evaluation, and demonstration. The interface can be displayed as a full-page report or in a miniaturized corner overlay format to adapt to different human-computer interaction and demonstration needs.

[0070] Specifically, the system master controller generates grasping correction control quantities based on the recognition results and trend parameters output by the multimodal tactile fusion module. The process can be represented as follows: ; in, Indicates the first Real-time capture and control volume, This indicates the capture control quantity for the next moment. Indicates control correction mapping, This represents the slip probability at the current moment. This indicates the contact score at the current moment. This represents the contact center offset within the current continuous state segment. The control correction mapping is used to correct the robot's grasping force, finger closure degree, local contact posture, or operation strategy based on the current contact stability, contact strength, and contact migration trend, reducing the risk of object slippage, partial detachment, or grasping instability.

[0071] In a feasible implementation, the grasping correction control parameters may include one or more of the following: grasping force correction, finger closure angle correction, end effector speed correction, local contact posture correction, or operation strategy switching flag. Preferably, the grasping force correction magnitude can be set to 5%–15% of the current grasping force, the finger closure angle correction magnitude can be set to 2%–10% of the current closure angle, and the end effector speed correction magnitude can be set to 10%–30% of the current execution speed; wherein, the specific values ​​of each correction parameter can be determined based on the sample data during the training phase, the validation set calibration results, and the robot's grasping control safety constraints. When the slip probability continues to increase and the contact score decreases, the system master controller can prioritize outputting grip enhancement and deceleration control; when the contact center offset continues to increase, the system master controller can further output local posture correction or re-grasping commands.

[0072] It should be noted that step S500 above comprehensively determines abnormal contact events through three dimensions, distinguishes different failure modes such as slippage anomalies and insufficient contact, generates structured event loads, and matches graded control commands according to risk levels to achieve differentiated responses from early warning enhancement to re-grabbing shutdown. Simultaneously, it generates contact heatmaps, slippage trend curves, and other visual information, which are overlaid on the monitoring interface to form a complete closed loop, improving the system's interpretability, response timeliness, and operational safety.

[0073] Example 3, referring to Figures 4-9 This invention provides a system structure and workflow for a method of perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion, as one embodiment of the present invention. Specifically, it includes: The humanoid robot dexterous hand 1, pressure tactile sensor 2, sliding tactile sensor 3, multimodal tactile fusion module 4, and system control 5 are included. The pressure tactile sensor 2 is respectively deployed in the fingertip area, finger side area, palm area, or other hand areas that come into contact with the target object in the humanoid robot dexterous hand 1. The pressure tactile sensor 2 includes a flexible substrate layer, a flexible electrode layer disposed on the flexible substrate layer, a piezoresistive sensing layer disposed on the flexible electrode layer, and an elastic encapsulation layer covering the outside of the piezoresistive sensing layer. The piezoresistive sensing layer is a carbon black doped silicone rubber piezoresistive layer, and the elastic encapsulation layer is a silicone rubber encapsulation layer. It is used to sense changes in normal pressure and output array pressure signals during the robot's grasping, holding, carrying, or manipulating of objects. The sliding tactile sensors 3 are respectively deployed in the central area of ​​the palm and the fingertip area of ​​the last segment of each finger. They include a transparent elastomer layer, a reflective film layer or surface micro-texture marking layer on the surface of the transparent elastomer layer, a light source on the back side of the transparent elastomer layer, and an imaging unit, used to acquire deformation image sequences or dynamic tactile information of the contact area over continuous time. The multimodal tactile fusion module 4 is communicatively connected to the pressure tactile sensor 2 and the sliding tactile sensor 3, used for unified characterization, joint identification, and timing analysis of the array pressure signals and dynamic tactile information. The system control unit 5 is connected to the pressure tactile sensor 2, the sliding tactile sensor 3, and the multimodal tactile fusion module 4, used for acquisition triggering, timing synchronization, data scheduling, parameter configuration, result recording, status output, and communication interaction with the robot's control system or upper-level control platform. It also outputs the contact state recognition results to the grasping control, force control adjustment, operation strategy generation, or anomaly protection logic.

[0074] The overall system flow includes: First, pressure tactile sensor 2 and sliding tactile sensor 3 respectively collect array pressure signals and dynamic tactile information; then, the multimodal tactile fusion module 4 performs spatial mapping, normalization, size alignment, missing data compensation, resampling, and time sequence organization processing on the input, and constructs a two-dimensional pressure distribution map and dynamic tactile feature input; then, in a unified feature space, it completes the joint identification and output of contact state category, object category, material properties, sliding probability, and trend parameters; after completing the identification at a single moment, it performs effective frame filtering, state segmentation, state merging, continuous statistics, and abnormal contact judgment on the results of multiple consecutive moments; finally, the system master control 5 outputs structured contact events to the host computer, robot controller, or other external application terminals.

[0075] Further details are attached. Figure 5 As shown, in an optional embodiment, pressure tactile sensors 2 are respectively disposed in the palm region, palm-finger transition region, and fingertip regions of the humanoid robot's dexterous hand 1, for synchronously acquiring multi-region contact pressure signals of the robot hand in a grasping state. The pressure tactile sensors 2 disposed in the finger and palm regions can form multiple piezoresistive subarrays 2.1 to form an array pressure sensing network covering the palm-finger-pad region.

[0076] In another optional embodiment, slip tactile sensors 3 are respectively disposed in the central area of ​​the palm and the fingertip area of ​​the last segment of each finger, for collecting deformation image sequences, local displacement change information, and slip trend information generated during the contact between the target object and the robot hand. The slip tactile sensors 3 disposed in the palm area are used to characterize local migration and slip risk on the overall contact surface, while the slip tactile sensors 3 disposed at the fingertips are used to characterize minute slip changes at key contact points, thereby improving the system's ability to perceive local loosening, continuous slippage, and grasping instability tendencies.

[0077] Further details are attached. Figure 6 As shown, in one optional embodiment, pressure tactile sensors 2 are deployed in the palm area and fingertip areas of the humanoid robot's dexterous hand 1 to collect array pressure signals generated by the robot during grasping, holding, carrying, or tactile exploration. The pressure tactile sensors 2 deployed in the palm and finger areas can form multiple piezoresistive subarrays 2.1, each piezoresistive subarray 2.1 arranged according to a preset spatial distribution to form a two-dimensional pressure sampling network covering the palm-finger-palate area.

[0078] In one optional implementation, after the target object comes into contact with the pressure tactile sensor 2, the normal pressure acts on the corresponding piezoresistive subarray 2.1, causing a change in the electrical response of each sampling unit, thereby forming an array pressure signal corresponding to the contact intensity distribution. The multimodal tactile fusion module 4 reconstructs a two-dimensional pressure distribution map based on the array pressure signal, and further extracts features such as contact area, pressure center, local peak pressure, regional pressure gradient, and contact coverage degree to characterize the contact state between the robot hand and the target object.

[0079] Further details are attached. Figure 7 As shown, in an optional embodiment, the sliding tactile sensor 3 is disposed in the central area of ​​the palm of the humanoid robot's dexterous hand 1 and the fingertip area of ​​the last segment of each finger, for collecting dynamic tactile information generated during the contact between the target object and the robot hand. Each sliding tactile sensor 3 may include a macro light sensor 3.1 for optically acquiring deformation images, texture changes, or displacement changes of the local contact area.

[0080] In an optional implementation, when the target object comes into contact with the sliding tactile sensor 3 and a contact migration or sliding tendency occurs, the macro light sensor 3.1 collects the local changes of the contact area over continuous time, and the multimodal tactile fusion module 4 extracts displacement changes, velocity changes, acceleration changes, sliding ratio changes and local contact abrupt change features based on the collected dynamic tactile information to identify stable contact, transitional contact, continuous sliding or local abnormal contact state.

[0081] Specifically, the joint identification process of contact state based on array pressure signals and dynamic tactile information includes: first, the pressure tactile sensor 2 collects array pressure signals, and the slip tactile sensor 3 collects deformation image sequences or dynamic tactile information; then, the multimodal tactile fusion module 4 performs unified preprocessing mapping on the above inputs, including spatial position mapping, normalization, size alignment, missing data compensation, resampling, and time sequence organization processing, to construct a two-dimensional pressure distribution map and dynamic tactile feature inputs. The multimodal tactile fusion module 4 performs pressure branch feature extraction and dynamic tactile branch feature extraction on the two-dimensional pressure distribution map and dynamic tactile feature inputs respectively, and completes multimodal feature fusion and joint identification in a unified feature space, outputting contact state category, object category, material properties, slip probability, and trend parameters.

[0082] In a specific engineering implementation, the model type used is cnn_lstm, the input size is 64×64, and the sequence length is 20. After training, the model can output the object category, material properties, contact state category and its corresponding confidence for the contact sequence of the target object, and simultaneously regress trend parameters such as slip ratio, average velocity, maximum velocity, average acceleration, maximum acceleration, attitude jump ratio and contact score, so as to be used for subsequent time series analysis and structured event generation.

[0083] Specifically, the continuous contact state segmentation and structured contact event output include: after completing the single-moment contact state recognition, the multimodal tactile fusion module 4 further performs temporal analysis on the recognition results of multiple consecutive moments. Specifically, the algorithm first filters the input frames for valid frames based on recognition confidence; then, it merges the valid frames in chronological order to construct continuous contact state segments and extracts the start and end times and duration of each segment; based on this, it combines the slip probability change trend, average contact score, contact center offset, and other local dynamic parameters to determine abnormal contact events; finally, the system control module 5 generates the structured contact event output results. The output content of the structured contact event includes at least: event type, risk level, start time, end time, duration, corresponding contact state category, risk score, and triggering basis. The structured contact event results can be output by the system control module 5 to the robot grasping controller, the host computer monitoring interface, the log recording module, or external application interfaces to achieve grasping warnings, task playback, strategy correction, or anomaly protection.

[0084] Specifically, the visualization output and control linkage include: the system master controller 5 can generate contact heatmaps, slip state overlay displays, continuous state comparison maps, and pressure 3D surface maps based on the recognition results output by the multimodal tactile fusion module 4, and display these visualization results on the monitoring interface, debugging interface, or host computer interface. Simultaneously, when the system detects abnormal contact events or an increased slip risk, the system master controller 5 can output control commands such as grasp correction, increased grip, speed limiting, re-grasping, or stop operation to the robot controller, forming a closed-loop mechanism of "tactile perception—state recognition—result display—control linkage." The system can output tactile overlay display maps, multi-state comparison maps, pressure heatmaps, and pressure 3D surface maps; among them, the multi-state comparison maps are used to display three typical states: stable contact, transitional contact, and slip contact, while the pressure heatmaps and pressure 3D surface maps are used to display the pressure distribution.

[0085] In a specific engineering implementation, as shown in the appendix Figure 8The diagram shows the pressure tactile output during the grasping process of a humanoid robot's dexterous hand on a scissor-like object. The two-dimensional pressure heatmap on the left displays the pressure distribution within the contact area in a two-dimensional plane during the current grasping state. Areas with higher brightness or stronger color response indicate higher pressure at those locations, which can be used to characterize the contact center, local high-pressure areas, and contact coverage. The three-dimensional pressure surface diagram on the right shows the relative intensity change of the pressure distribution in spatial coordinates at the same moment. The peak value of the surface corresponds to a local pressure concentration area, and the surface extension corresponds to the contact area between the target object and the robot's dexterous hand. This schematic result visually reflects the location of force concentration, overall contact distribution, and local pressure gradient changes during the grasping process of the scissor-like object, thus providing a basis for contact scoring, grasping stability analysis, and subsequent control correction.

[0086] In a specific engineering implementation, as shown in the appendix Figure 9 The diagram shows the state recognition and trend output results of a humanoid robot's dexterous hand encountering scissor-like objects at different contact stages. The upper part of the diagram sequentially displays tactile images under three typical states: stable contact, transitional contact, and sliding contact. Each image corresponds to a representative moment in the local dynamic tactile interface and can be overlaid with information such as contact state category, object category, material properties, sliding probability, contact score, and dynamic parameters to characterize the differences in tactile response characteristics at different contact stages. The lower part of the diagram illustrates the dynamic sliding trend, showing the trend of sliding probability over time during the same grasping process. The horizontal axis represents time, and the vertical trend represents the change in sliding risk level. The peak position in the diagram corresponds to the moment when the sliding risk significantly increases. This schematic result visually represents the state change process of scissor-like objects evolving from stable contact to transitional contact and then to sliding contact during grasping, providing interpretable support for continuous state segmentation, abnormal contact judgment, and structured contact event output.

[0087] Example 4: This example provides a humanoid robot hand contact state perception system based on multimodal tactile fusion, comprising: The data acquisition module is used to acquire array pressure signals through pressure tactile sensors set in the contact area of ​​the humanoid robot's dexterous hand, and to acquire deformation image sequences of the contact area through sliding tactile sensors; The feature extraction module is used to perform a first preprocessing operation on the array pressure signal to obtain pressure branch features; and to perform a second preprocessing operation on the deformation image sequence to obtain slip branch features. The fusion recognition module is used to fuse the pressure branch features and the slip branch features to obtain multimodal tactile fusion features, and generate a single-moment recognition output result based on the multimodal tactile fusion features; The time-series analysis module is used to merge the identification output results from multiple consecutive time points to obtain continuous contact state segments; The anomaly detection and output module is used to detect abnormal contact events based on the state parameter detection rules of continuous contact state segments, generate structured contact event output results, and send the structured contact event output results to the robot control system.

[0088] This embodiment also provides an electronic device, which includes a processor, a memory, a communication interface, a display screen, and an input device connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion.

[0089] This embodiment also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method proposed in the above embodiments.

[0090] The storage medium proposed in this embodiment belongs to the same inventive concept as the method proposed in the above embodiments. Technical details not described in detail in this embodiment can be found in the above embodiments, and this embodiment has the same beneficial effects as the above embodiments.

[0091] Based on the above description of the implementation methods, those skilled in the art can clearly understand that the present invention can be implemented using software and necessary general-purpose hardware, and of course, it can also be implemented using hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory, random access memory, flash memory, hard disk, or optical disk, and includes several instructions to cause an electronic device (which may be a personal computer, server, or network device, etc.) to execute the method of the embodiments of the present invention.

[0092] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion, characterized in that, include: Pressure signals are collected by an array of pressure sensors placed in the contact area of ​​the humanoid robot's dexterous hand, and deformation image sequences of the contact area are collected by a sliding tactile sensor. The array pressure signal is subjected to a first preprocessing operation to obtain pressure branch characteristics; A second preprocessing operation is performed on the deformed image sequence to obtain the slip branch features; The pressure branch feature and the slip branch feature are fused to obtain a multimodal tactile fusion feature, and a single-moment recognition output result is generated based on the multimodal tactile fusion feature. The identification output results at multiple consecutive time points are merged to obtain a continuous contact state segment; Based on the state parameter determination rules of the continuous contact state segment, abnormal contact events are determined, a structured contact event output result is generated, and the structured contact event output result is sent to the robot control system.

2. The method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion as described in claim 1, characterized in that, The pressure tactile sensor adopts a flexible array piezoresistive tactile sensing structure, including a flexible substrate layer, a flexible electrode layer, a carbon black doped silicone rubber piezoresistive layer, and a silicone rubber encapsulation layer. Multiple pressure sensing units are arranged in a preset spatial layout in the contact area of ​​the robot hand. The pressure tactile sensor synchronously collects the resistance change signals corresponding to each pressure sensing unit during the contact between the robot hand and the target object, forming an array of pressure signals. Based on the preset spatial layout, the array pressure signal is spatially mapped to construct a two-dimensional pressure distribution map of the contact area. The first preprocessing operation is performed based on the two-dimensional pressure distribution map to obtain the pressure distribution input tensor, and pressure branch features are extracted based on the pressure distribution input tensor.

3. The method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion as described in claim 2, characterized in that, The sliding tactile sensor includes a transparent silicone rubber elastomer layer, a reflective film layer or a surface microtexture marking layer disposed on the surface of the transparent silicone rubber elastomer layer, a light source disposed on the back side of the transparent silicone rubber elastomer layer, and an imaging unit. The sliding tactile sensor, under illumination by a light source, continuously acquires local deformation images of the contact area during contact with the target object by the imaging unit, forming a deformation image sequence; A second preprocessing operation is performed on the deformed image sequence to obtain the slip feature input; Based on the slip feature input, slip branch features are extracted, including local relative displacement changes, velocity changes, acceleration changes, slip ratio changes, and local contact abrupt change features in the contact area.

4. The method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion as described in claim 3, characterized in that, The generation of single-moment recognition output based on the multimodal tactile fusion features includes: The multimodal tactile fusion features are input into a pre-trained multi-task recognition network. Through the forward computation of the multi-task recognition network, the contact state category, object category, material properties, and contact trend parameters at the current moment are output synchronously. The contact trend parameters include at least the probability of slip contact state and the contact score; the probability of slip contact state is calculated by normalizing the contact state category, and the contact score is constructed based on the two-dimensional pressure distribution map output by the pressure tactile sensor, which is used to characterize the contact intensity and contact coverage level.

5. The method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion as described in claim 4, characterized in that, The abnormal contact event determination based on the state parameter determination rule of the continuous contact state segment includes: The contact center offset, slippage trend score, and average contact score of the continuous contact state segment are obtained by calculation. When the contact center offset exceeds the contact center offset threshold range and the slip trend score exceeds the slip trend score threshold range, the current continuous contact state segment is determined to be a slip abnormal event. When the contact center offset is lower than the contact center offset threshold range and the average contact score is continuously lower than the average contact score threshold range, the current continuous contact state segment is determined to be an insufficient contact abnormal event. When the slip trend score is below the slip trend score threshold range and the average contact score exceeds the average contact score threshold range, the current continuous contact state segment is determined to be a stable contact event. The contact center offset, slippage trend score, and average contact score are all calculated within the continuous contact state segment corresponding to the effective identification frame set. When the number of effective frames in the continuous contact state segment is lower than the preset minimum frame number threshold, no abnormal event judgment is performed on the continuous contact state segment or only a low confidence prompt result is output.

6. The method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion as described in claim 1, characterized in that, After generating the structured contact event output, the method further includes: Based on the event type and risk level carried in the structured contact event output, the corresponding robot control command is matched from the preset control strategy mapping table; When the risk level is low, a prompt display instruction and a log recording instruction are generated; When the risk level is medium risk, a gripping force increment correction instruction and an end-effector execution speed reduction correction instruction are generated. When the risk level is high risk, a re-fetch instruction or an abnormal shutdown protection instruction is generated. The robot control commands are sent to the robot grasping controller, motion control unit, or strategy decision unit to adjust the current grasping behavior or execute safety protection.

7. The method for perceiving the hand contact state of a humanoid robot based on multimodal tactile fusion as described in claim 1, characterized in that, After generating the structured contact event output, the method further includes: Visual information is generated based on the output results of the structured contact events; The visualization information includes a pressure distribution map of the contact area constructed from the array pressure signals output by the pressure tactile sensor, and a sliding state indicator and a sliding trend curve generated from the dynamic tactile information output by the sliding tactile sensor. The visualization information is overlaid on the monitoring interface of the robot grasping process to show the contact state and changing trend between the robot hand and the target object; Based on the output of the structured contact event, a grasping correction control command is generated, including a grasping force correction amount, a finger closure angle correction amount, an end effector speed correction amount, or a local contact posture correction amount. The grasping correction control command is sent to the robot control system to adjust the grasping strategy or perform anomaly protection operations.

8. A humanoid robot hand contact state perception system based on multimodal tactile fusion, employing the humanoid robot hand contact state perception method based on multimodal tactile fusion as described in any one of claims 1 to 7, characterized in that, include: The data acquisition module is used to acquire array pressure signals through pressure tactile sensors set in the contact area of ​​the humanoid robot's dexterous hand, and to acquire deformation image sequences of the contact area through sliding tactile sensors; The feature extraction module is used to perform a first preprocessing operation on the array pressure signal to obtain pressure branch features; and to perform a second preprocessing operation on the deformation image sequence to obtain slip branch features. The fusion recognition module is used to fuse the pressure branch features and the slip branch features to obtain multimodal tactile fusion features, and generate a single-moment recognition output result based on the multimodal tactile fusion features; The time-series analysis module is used to merge the identification output results from multiple consecutive time points to obtain continuous contact state segments; The anomaly detection and output module is used to detect abnormal contact events based on the state parameter detection rules of the continuous contact state segment, generate structured contact event output results, and send the structured contact event output results to the robot control system.

9. An electronic device comprising a memory and a processor, characterized in that: The memory is used to store computer-executable instructions, and when the processor executes the computer-executable instructions, it implements the steps of the humanoid robot hand contact state perception method based on multimodal tactile fusion as described in any one of claims 1 to 7.

10. A computer-readable storage medium having computer-executable instructions stored thereon, characterized in that: When the computer-executable instructions are executed by the processor, they implement the steps of the humanoid robot hand contact state perception method based on multimodal tactile fusion as described in any one of claims 1 to 7.