A visual self-adaptive dexterous hand massage strategy generation method for non-pedal type
By reconstructing a four-dimensional spatiotemporal representation of non-standard foot types using multi-view images and kinematic data, and combining neural radiation fields and hierarchical deep reinforcement learning, the problem of massage effect and comfort for users with non-standard foot types was solved, and high-precision, personalized massage strategy generation by dexterous hands was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHAANXI INTERESTING ARTIFICIAL INTELLIGENCE TECHNOLOGY CO LTD
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies are ill-suited to the unique physiological structure of users with non-standard foot types, resulting in insufficient massage effects and comfort. Dexterity of the hand can obstruct visual guidance, traditional methods struggle to locate muscle origins and insertions and acupoints, and control strategies lack comprehensive decision-making based on global features.
A four-dimensional spatiotemporal representation is reconstructed using multi-view image sequences and kinematic data. The occluded region is completed using a neural radiation field model, a probabilistic feature map is constructed, and a massage strategy is generated by combining hierarchical deep reinforcement learning to output the target impedance parameter.
It achieves high-precision 3D reconstruction and massage strategy generation for users with non-standard foot types, improving the robustness and personalized adaptability of massage. The dexterous hand exhibits compliant and safe dynamic characteristics, adapting to unexpected contact and user preferences.
Smart Images

Figure CN122201617A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer vision technology, and in particular to a method for generating visually adaptive dexterous hand massage strategies for non-standard foot types. Background Technology
[0002] With the integration of robotics and artificial intelligence, intelligent service robots are increasingly being used in fields such as healthcare and rehabilitation. Among these, robotic massage, as a technological approach capable of providing standardized and quantifiable therapeutic services, has garnered significant attention from the industry. To achieve precise physical interaction, these robots typically rely on computer vision technology to perceive the interactive object (such as a part of the human body) in three dimensions. Early 3D perception technologies, such as structured light scanning or binocular stereo vision, could acquire 3D point clouds or mesh models of targets in static scenes. In recent years, implicit neural representation methods, such as Neural Radiation Field (NeRF), have achieved breakthroughs in the fidelity and detail of 3D reconstruction by learning continuum representations of scenes from multi-view 2D images, providing a new paradigm for constructing high-precision digital twins. At the robot control level, the generation of massage strategies has evolved from simple pre-programmed trajectories to control commands incorporating force-sensing feedback, aiming to improve the safety and comfort of the interaction.
[0003] However, these technologies still have some limitations when applied to demanding, flexible, and personalized massage scenarios. First, traditional massage strategies are mostly planned based on standard human anatomy models, making it difficult to adapt to different individuals, especially those with unique physiological structures such as flat feet or high arches, significantly reducing massage effectiveness and comfort. Second, during massage tasks, the dexterous hand, which performs the main manipulation, itself becomes a major source of dynamic occlusion, continuously obscuring the field of view of the visual sensors. Existing 3D reconstruction methods often produce severe geometric artifacts or model gaps when dealing with large-area dynamic occlusion caused by known objects, fundamentally disrupting the integrity of the visual guidance control loop and making real-time, adaptive strategy adjustments impossible. Third, even with a personalized 3D model, robustly and accurately locating key biomechanical features such as muscle origins and insertions and acupoints on its non-Euclidean curved surface space remains a problem that traditional methods such as template matching struggle to solve. In addition, existing control strategies often focus on local force and position control, and lack an intelligent decision-making framework that can integrate global characteristics, multiple objectives (such as efficacy, comfort, and safety), and perform long-term planning. Summary of the Invention
[0004] The purpose of this section is to outline some aspects of embodiments of the present invention and to briefly describe some preferred embodiments. Simplifications or omissions may be made in this section, as well as in the abstract and title of this application, to avoid obscuring the purpose of these documents; however, such simplifications or omissions should not be construed as limiting the scope of the invention.
[0005] In view of the aforementioned existing problems, this invention is proposed. Therefore, this invention provides a visually adaptive dexterous hand massage strategy generation method for non-standard foot types to solve the problems mentioned in the background art.
[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution: a method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types, comprising:
[0007] Acquire multi-view image sequences from multiple visual sensors surrounding the foot, along with simultaneously acquired kinematic data that drives a dexterous hand to perform dynamic maneuvers;
[0008] Based on the multi-view image sequence and the kinematic data, a neural radiation field model is used to reconstruct the four-dimensional spatiotemporal representation of the foot under the dynamic occlusion of the dexterous hand. The neural radiation field model uses the occlusion information implied in the kinematic data as a priori condition for the reconstruction process to complete the occluded foot region.
[0009] Based on the four-dimensional spatiotemporal representation, a three-dimensional mesh model of the foot at a specific moment is extracted, and the three-dimensional mesh model is processed to construct a probabilistic feature map characterizing the individualized biomechanical features of non-standard foot types.
[0010] Using the probabilistic feature map and the real-time state of the dexterous hand as input, a pre-trained hierarchical deep reinforcement learning model is used to generate multi-scale policy instructions to drive the dexterous hand to perform massage tasks.
[0011] As a preferred embodiment of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types described in this invention, the method comprises: based on the multi-view image sequence and the kinematic data, reconstructing the four-dimensional spatiotemporal representation of the foot under dynamic occlusion by the dexterous hand using a neural radiation field model; wherein the neural radiation field model uses the occlusion information implicit in the kinematic data as a priori condition for the reconstruction process to complete the occluded foot region, including:
[0012] The kinematic data of the dexterous hand is converted into its time-varying geometric model in three-dimensional space, and the signed distance function from the spatial point to the time-varying geometric model is calculated to generate a differentiable occlusion field representing the occlusion relationship. The differentiable occlusion field is then fed back to the neural radiation field model as one of the input conditions.
[0013] As a preferred embodiment of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types described in this invention, it further includes:
[0014] Spatiotemporal coherence regularization is applied to a deformation field in the neural radiation field model used to model foot dynamics. The regularization aims to minimize the rate of change of the deformation field in the time dimension and the degree of non-rigid deformation in local regions in the spatial dimension.
[0015] As a preferred embodiment of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types described in this invention, the construction of the probabilistic feature map includes:
[0016] At least one anatomical anchor point is defined on the three-dimensional mesh model, and one or more geodesic distance fields characterizing the intrinsic distance of the surface are calculated for each vertex on the mesh by solving a heat conduction equation based on the Laplace-Beltrami operator on the three-dimensional mesh model.
[0017] As a preferred embodiment of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types described in this invention, it further includes:
[0018] The three-dimensional mesh model is abstracted into graph structure data, and the mesh vertices are used as graph nodes;
[0019] The graph structure data is processed using a graph neural network. The input node features of the graph neural network include the local geometric properties of the vertices and the geodesic distance field. The output is the membership probability of each node corresponding to a predefined biomechanical feature.
[0020] As a preferred embodiment of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types described in this invention, the hierarchical deep reinforcement learning model includes:
[0021] A high-level policy network that takes the probabilistic feature map as input and generates parameterized subtask objectives at a low frequency;
[0022] A low-level policy network, which takes the sub-task objective and the real-time state of the dexterous hand as input, generates action instructions at a high frequency for completing the sub-task objective.
[0023] As a preferred embodiment of the visual adaptive dexterity hand massage strategy generation method for non-standard foot types described in this invention, the training process of the low-level policy network employs a composite reward function, which includes:
[0024] A therapeutic effect term, which is calculated based on the degree of matching between the pressure distribution applied by the dexterous hand and the probability distribution of target features in the probabilistic feature map;
[0025] A comfort factor is calculated by penalizing the jerk of the dexterous hand end effector's motion and the rate of change of the applied force.
[0026] A safety measure that provides a penalty when the force or joint torque applied by the dexterous hand approaches a preset safety threshold.
[0027] As a preferred embodiment of the visual adaptive dexterity hand massage strategy generation method for non-standard foot types described in this invention, the action command output by the low-level strategy network is a set of target impedance parameters.
[0028] As a preferred embodiment of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types described in this invention, the target impedance parameter includes:
[0029] The target stiffness matrix, target damping matrix, and target equilibrium point pose are used to set the dynamic characteristics of the interaction between the dexterous hand end and the foot.
[0030] As a preferred embodiment of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types described in this invention, the method includes:
[0031] The user's individual identifier is used as contextual information and input into the hierarchical deep reinforcement learning model to adjust the generated massage strategy instructions online, thereby achieving personalized adaptation for specific users.
[0032] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0033] 1. By explicitly modeling the kinematic data of the dexterous hand as a differentiable occlusion field and using it as the geometric prior of the neural radiation field, the model can see through the dexterous hand as an occlusion, thereby performing robust and high-precision geometric and appearance completion of the occluded foot area, solving the bottleneck of visual guidance failure in dynamic interactive scenarios.
[0034] 2. This invention abandons the traditional rigid template matching method and uses a geodesic distance field based on thermal conduction to construct an intrinsic coordinate system for the curved surface. It also combines this with a graph neural network (GNN) to learn the mapping from intrinsic location and local geometry to biomechanical features. This allows the method to non-rigidly generate probabilistic distribution maps of key regions (such as fascia and acupoints) on 3D foot models of arbitrary shapes (including flat feet, high arches, etc.), greatly improving the robustness and generalization of feature localization.
[0035] 3. Furthermore, this invention outputs target impedance parameters at the control level rather than direct force or position commands, enabling the dexterous hand to exhibit compliant and safe dynamic characteristics when interacting with the foot, better adapting to unexpected contact and user micro-movements. Simultaneously, by using individual user identifiers as model input, the dexterous hand system can learn and generate personalized massage strategies tailored to specific user preferences, thus upgrading the therapeutic service experience of the dexterous hand. Attached Figure Description
[0036] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein:
[0037] Figure 1 This is a flowchart illustrating the overall process of a visually adaptive dexterous hand massage strategy generation method for non-standard foot types, as described in one embodiment of the present invention.
[0038] Figure 2 This is a flowchart of dynamic neural radiation field reconstruction and completion of a visual adaptive dexterity hand massage strategy generation method for non-standard foot types according to an embodiment of the present invention.
[0039] Figure 3 This is a schematic diagram illustrating the generation and execution of a hierarchical reinforcement learning massage strategy for a visually adaptive dexterous hand massage strategy generation method for non-standard foot types, as described in an embodiment of the present invention.
[0040] Figure 4 This is a diagram illustrating the robustness verification effect of neural field reconstruction based on SDF occlusion prior for a visual adaptive dexterity hand massage strategy generation method for non-standard foot types, as described in an embodiment of the present invention.
[0041] Figure 5 This is a diagram showing the biomechanical feature localization map generation effect of the heat conduction equation of the visual adaptive dexterous hand massage strategy generation method for non-standard foot types according to an embodiment of the present invention.
[0042] Figure 6 This is an image showing the adaptive impedance control effect generated by hierarchical deep reinforcement learning in an embodiment of the visual adaptive dexterity hand massage strategy generation method for non-standard foot types according to one embodiment of the present invention. Detailed Implementation
[0043] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of the present invention.
[0044] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.
[0045] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.
[0046] This invention is described in detail with reference to the schematic diagrams. When detailing the embodiments of this invention, for ease of explanation, the cross-sectional views illustrating the device structure may be partially enlarged, not adhering to the usual scale. Furthermore, the schematic diagrams are merely examples and should not be construed as limiting the scope of protection of this invention. In actual fabrication, the three-dimensional spatial dimensions of length, width, and depth should be included.
[0047] Furthermore, in the description of this invention, it should be noted that the terms "upper," "lower," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. These terms are used solely for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on the invention. In addition, the terms "first," "second," or "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance.
[0048] Unless otherwise explicitly specified and limited, the terms "installation," "connection," and "joining" in this invention should be interpreted broadly. For example, they can refer to fixed connections, detachable connections, or integral connections; similarly, they can refer to mechanical connections, electrical connections, or direct connections, or indirect connections through an intermediate medium, or internal connections between two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.
[0049] Example 1
[0050] Reference Figures 1 to 3This is the first embodiment of the present invention, which provides a method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types, including:
[0051] S1. Acquire a multi-view image sequence from multiple visual sensors surrounding the foot, along with simultaneously acquired kinematic data that drives a dexterous hand to perform dynamic operations.
[0052] It should be noted that, to ensure comprehensive observation of the foot and its surrounding space, this embodiment employs at least four industrial-grade global shutter cameras as visual sensors. These visual sensors can be arranged in a ring or hemispherical array around the foot to be massaged. The global shutter cameras are chosen to avoid the rolling shutter effect caused by line-by-line exposure during potentially rapid hand and foot movements, thus ensuring the geometric consistency of each frame at the moment of a single exposure. Furthermore, the layout of the visual sensor array is designed to cover all key surfaces of the foot, including the sole, instep, and inner and outer sides.
[0053] Furthermore, before data acquisition begins, the system comprised of multiple vision sensors needs rigorous calibration to obtain the intrinsic and extrinsic parameters of each camera. The calibration process can employ the classic Zhang Zhengyou calibration method, achieved by displaying a checkerboard calibration board in different poses within the scene. After calibration, for the... For each camera, its intrinsic parameter matrix can be obtained. And the extrinsic parameter matrix relative to the world coordinate system. Intrinsic parameter matrix. The internal optical characteristics of the camera are described in the following form:
[0054]
[0055] in, and These are the focal lengths of the camera in the x and y directions, respectively. and Principal point coordinates This represents the coordinate axis tilt parameter. (External parameter matrix) Defines the distance from the world coordinate system to the first... Rotation matrix of each camera coordinate system Translation vector .
[0056] It should be noted that the extrinsic parameter matrix and the intrinsic parameter matrix can together constitute the first... Projection matrix of each camera This projection matrix is the geometric basis for volume rendering of the neural radiation field model.
[0057] Specifically, during the acquisition process, all cameras continuously capture high-resolution (e.g., 1920×1080 pixels) RGB images at a preset frame rate (e.g., 30 frames per second), forming... Parallel image sequences ,in For the number of cameras, The total number of frames in the sequence. Indicates the first Each camera at the timestamp The captured image frames.
[0058] Furthermore, in parallel with visual data acquisition, a multi-vision sensor system records the underlying data driving the dexterous hand's movements in real time. In this embodiment, the dexterous hand is defined as a multi-degree-of-freedom robotic hand whose movements are driven by a robot controller. The acquired kinematic data mainly consists of the dexterous hand at each time stamp. Joint angle vector ,in, This represents the number of degrees of freedom of a dexterous hand. The joint angle vector can be represented as:
[0059]
[0060] It should be noted that this kinematic data can be read directly from the robot controller's real-time interface (such as via ROS or EtherCAT bus) at high frequencies (e.g., 100Hz or higher). Among these, the joint angle vectors... It describes the skillful hand in time The most direct data for configuration. The reason for collecting joint angles rather than end-effector pose is that it can calculate the precise position and orientation of any link or fingertip of the dexterous hand in three-dimensional space without any doubt through the positive kinematic model, providing strong prior knowledge about the geometry and spatial position of the occluder for the neural radiation field model.
[0061] Furthermore, after obtaining the aforementioned visual and kinematic data, data synchronization is also necessary. The purpose of data synchronization is to ensure that any timestamp Multi-view images acquired With the angle of the dexterous hand joint at that moment There is a precise time correspondence. Without data synchronization, errors in the data will cause a mismatch between the dexterity hand position in the image and the position calculated from kinematic data during subsequent reconstruction, severely affecting the accuracy of occluded area completion. Therefore, this embodiment employs a hardware-triggered synchronization scheme. A master clock generator simultaneously sends synchronization pulse signals to all cameras and the robot controller. Upon receiving the pulse signal, the camera immediately takes an exposure, while the robot controller records the current joint angle the instant it receives the signal. In this way, the timestamp error between different data streams can be controlled at the microsecond level. Finally, a spatiotemporally strictly aligned multimodal data tuple is output. For each discrete timestamp... The system will generate a data packet. :
[0062]
[0063] in, Represented as the first Visual observation data from multiple perspectives.
[0064] S2. Based on multi-view image sequences and kinematic data, a neural radiation field model is used to reconstruct the four-dimensional spatiotemporal representation of the foot under dynamic occlusion by a dexterous hand. The neural radiation field model uses the occlusion information implied in the kinematic data as a priori condition for the reconstruction process to complete the occluded foot region.
[0065] It should be noted that this step aims to address the severe dynamic occlusion problem caused by the actuator itself in dexterous hand-human interaction scenarios. To this end, this invention incorporates dexterous hand kinematic priors into a dynamic neural radiation field model to reconstruct the non-rigid dynamics of the foot itself. Simultaneously, it explicitly models the occluding objects and uses this model to guide the completion operation of the occluded area. (Reference) Figure 2 .
[0066] Furthermore, in order for the neural radiation field model to understand the existence of occlusion, the kinematic data acquired by the dexterous hand in step S1 must first be transformed into a mathematical expression that can describe the occlusion relationship in three-dimensional space. It should be explained that this transformation process ensures that the occlusion information is processed in a continuous and differentiable manner, thus enabling seamless integration into the gradient-optimized neural network training process.
[0067] Specifically, firstly, for any timestamp Using the forward kinematics (FK) model of the dexterous hand and a pre-built 3D mesh model of the dexterous hand (usually described in URDF or SDF format), the joint angle vectors are... This is mapped to the poses of all links in the dexterous hand in the world coordinate system. Based on these poses, the dexterous hand at each moment can be constructed. Time-varying geometric model Then, in order to transform this mesh model into a continuous and differentiable field, this invention calculates the field at any point in space. arrive The surface's signed distance function (SDF). This signed distance function... Defined as:
[0068]
[0069] in, The coordinates of the query point in space; For timestamps; In time A three-dimensional geometric model of a dexterous hand; It is a model Points on the surface; At point lie in It is positive on the outside, negative on the inside, and zero on the surface; This represents the Euclidean distance.
[0070] It should be noted that this SDF field provides a continuous measure of whether a point in space is occupied by a dexterous hand.
[0071] Furthermore, to better serve the volume rendering process of the neural radiation field and make it differentiable with respect to model parameters, this invention converts it into a differentiable occlusion field, denoted as... This process is implemented using a sigmoid function, which maps the SDF value to a probability value between [0,1], representing the probability that the point is occluded by a dexterous hand.
[0072]
[0073] in, It is a positive scalar hyperparameter used to control the sharpness of the SDF-to-probability transformation. A value that is too small will cause the occlusion field boundary to become blurred, while a value that is too large will cause the gradient to vanish, affecting training stability. In this embodiment, The value range can be set to [10, 100]. Preferably, The possible value is 50. When When it is a large negative number (the point is deep inside the dexterous hand). It is a large positive number. Approaching 1; when When the value is a large positive number (the point is far from the dexterous hand). It is a large negative number. Approaching 0.
[0074] It should be noted that this micro-shading field This constitutes a key input condition, directly informing the model space of the neural radiation field the likelihood that each point is occupied by a dexterous hand, thus forming a strong geometric prior.
[0075] Furthermore, this invention introduces a deformation field into the original dynamic neural radiation field model to model the non-rigid foot. The aim is to integrate four-dimensional spacetime... The dynamic scene in the image is decomposed into a static gauge space and a deformation field that maps points from the observation space to the gauge space.
[0076] Specifically, the dynamic neural radiation field model based on deformation fields mainly consists of two multilayer perceptron (MLP) networks:
[0077] Deformation network, denoted as The network is an MLP, and in this embodiment, it has 8 hidden layers with 256 neurons per layer, using the ReLU activation function. Its input is the position-encoded coordinates of observed spatial points. and timestamp Output the displacement vector of this point in space. and a geometric eigenvector Through this deformation network, points in the observation space can be mapped to corresponding points in the gauge space. Geometric features This primarily encodes time-related local geometric information, providing richer context for canonical networks.
[0078] Standardized network, denoted as This network uses the same MLP structure as the Deformation Network, namely 8 hidden layers, each with 256 neurons. Its input includes: canonical points. Position encoding Position encoding of camera view direction Differentiable occlusion field calculated at the original location of this point The value, and the geometric features from the deformation network. The output of this canonical network is the volume density at that point. and color related to perspective .
[0079] It should be noted that using a differentiable occlusion field as input to the canonical network allows the network to consider whether a point is occupied by a dexterous hand when predicting the volume density of a point. During training, if a point is occupied by a dexterous hand with a high probability (i.e., the value of the differentiable occlusion field is close to 1), the network will be guided to predict a very low volume density. This allows the network to "see through" the dexterous hand during rendering, focusing on learning the geometry of the foot behind it. Conversely, if a point is not occluded (the value of the differentiable occlusion field is close to 0), the network learns the density of that point normally based on the visual signal. In this way, kinematic priors can be integrated into the dynamic neural radiation field in an end-to-end differentiable form, achieving effective geometric and appearance completion of occluded foot regions.
[0080] Furthermore, the training objective of the model is set to minimize the difference between the rendered image and the actual observed image, while ensuring the physical realism of the reconstruction result. For any camera viewpoint... and timestamp From its image A batch of pixels is randomly sampled, and each pixel corresponds to a ray. ,in It is the center of the camera. It is the direction vector passing through that pixel. The color of that ray... It can be calculated using the discrete form of the volume rendering equation:
[0081]
[0082] in, It is the cumulative transmittance. and It is the first ray on the line Each sampling point is connected via a deformation network. and standard networks The calculated volume density and color, It is the distance between adjacent sampling points.
[0083] Furthermore, the training loss function is a composite function, which mainly consists of the following parts:
[0084] The first part is the loss of luminosity. This refers to the L2 loss between the rendered color and the actual pixel color:
[0085]
[0086] in, For a training batch, the set of all rays. It is the actual pixel color corresponding to the ray.
[0087] The second part is the spatiotemporal coherence regularization loss. To ensure that the non-rigid deformation of the foot is physically smooth and continuous, and to avoid temporal flickering and spatial tearing, this invention utilizes a deformation network. Regularization constraints were applied. It should be explained that the spatiotemporal coherence regularization loss aims to minimize the rate of change of the deformation field in the time dimension and the degree of non-rigid deformation in local regions in the spatial dimension. It consists of a temporal coherence term and a spatial coherence (local rigidity) term.
[0088] Specifically, this temporal coherence term aims to penalize the time derivative of the deformation field, encouraging smooth foot motion in consecutive moments. In discrete time, this term approximates the difference in deformation displacement between adjacent frames.
[0089]
[0090] in, This indicates that a time point t is randomly selected from the set of time points. Indicates from spatial region A point is randomly selected from the data.
[0091] Specifically, this spatial coherence term is intended to penalize the degree to which the Jacobian matrix of the deformation field deviates from a rigid transformation (i.e., rotation). For a small neighborhood, the deformation of the foot should be approximated as rigid:
[0092]
[0093] in, It is deformation with respect to spatial position Jacobian matrix, It is the identity matrix. It is the Frobenius norm.
[0094] It should be noted that spatial coherence aims to encourage the Jacobian matrix to be orthogonal, thereby making local deformations approximate rotations and suppressing irrational stretching and compression.
[0095] Furthermore, the spatiotemporal coherence regularization loss is:
[0096]
[0097] Furthermore, the final total loss function is:
[0098]
[0099] in, These are preset weight hyperparameters, with values ranging from [0,1]. It is used to balance reconstruction fidelity and physical realism. The value range is [0.01, 0.1], which is used to effectively suppress spatiotemporal noise while ensuring geometric details.
[0100] Furthermore, once we obtain the total loss function, we can use optimizers such as Adam to minimize it, thereby training the parameters in both the deformed and normalized networks end-to-end. After training is complete, for any given timestamp... The continuous volume density field of the foot at that moment can be obtained by querying a pre-trained model. Finally, the classic Marching Cubes algorithm is used to extract isosurfaces from the volume density field at that moment with a preset density threshold (e.g., 25.0), thus obtaining a complete and unobstructed 3D mesh model of the foot at that moment.
[0101] S3. Based on the four-dimensional spatiotemporal representation, extract the three-dimensional mesh model of the foot at a specific moment, process the three-dimensional mesh model, and construct a probabilistic feature map that represents the individualized biomechanical characteristics of non-standard foot types.
[0102] It should be noted that, due to the significant differences between the geometry of non-standard foot types and standard models, traditional methods based on template matching or geometric registration are insufficient for robustly locating key regions (such as acupoints, muscle groups, and fascia). Therefore, this invention constructs a probabilistic feature map that can locate biomechanical features on any non-standard foot surface in a non-rigid, adaptive manner.
[0103] Furthermore, to establish a comparable coordinate system between foot models of different shapes, this invention utilizes the properties of intrinsic geometry of surfaces. Unlike Euclidean distance, geodesic distance is the shortest path length between two points on a surface and is invariant to non-rigid deformations such as bending and stretching of the model. This property makes it an ideal measure for describing the relative positional relationships of points on non-standard foot shapes.
[0104] Specifically, firstly, on a 3D mesh model, professionals (such as traditional Chinese medicine practitioners or rehabilitation therapists) interactively annotate one or more anatomically stable and easily identifiable anatomical anchor points. For example, bony landmarks such as the calcaneal tuberosity and the tip of the medial malleolus can be selected as anchor points. These anchor points will serve as the reference origin for calculating the intrinsic distance. Subsequently, the Laplace-Beltrami operator based on the 3D mesh model is solved. The heat conduction equation of the Laplace-Beltrami Operator is used to efficiently approximate the geodesic distances from all vertices to anchor points on the mesh. It should be noted that, compared to the traditional Dijkstra isograph search algorithm, the heat conduction method produces a smoother distance field and is less sensitive to local topological noise in the mesh.
[0105] Specifically, since the heat conduction equation describes the diffusion process of heat on a curved surface, its solution is proportional to the square of the geodesic distance in a short time. The specific calculation process is as follows:
[0106] S301. For a three-dimensional mesh model, its discrete Laplace-Beltramm operator The cotangent weight can usually be used to define this. And for the first mesh... For each vertex, its Laplace action is performed on the vertex function. The result is:
[0107]
[0108] in, It is the vertex The set of adjacent vertices; yes An adjacent vertex; and Is with the edge The interior angles of two opposite triangles; It is the vertex The area of the Voronoi region or the area of the mixed Voronoi region.
[0109] S302, Anchor point As the initial heat source, that is, to define an initial heat distribution function. The value is 1 at the anchor point and 0 at other vertices. Then, solving the discrete form of the heat conduction equation yields a system of linear equations:
[0110]
[0111] in, It is after time The subsequent heat distribution.
[0112] It should be noted that this system of linear equations can be solved efficiently using a sparse matrix solver (such as Cholesky decomposition). Furthermore, it is important to note the time complexity. It is a key parameter of the linear equation system, which controls the range of heat diffusion. Its value should be selected according to the average side length and size of the mesh to ensure that the heat can cover the entire model but not be overly smoothed.
[0113] S303, Obtaining heat distribution After that, each vertex to anchor point geodesic distance It can then be approximated using the Heat-Geodesic formula:
[0114]
[0115] It should be noted that by repeating steps S301 to S303 above for each predefined anchor point, each vertex on the mesh can be configured. Calculate a geodesic distance field vector For example, if three anchor points are selected, then The geodesic distance field vector is the distance to the vertex. The intrinsic coordinate representation on the foot surface is robust to changes in the overall shape of the foot.
[0116] Furthermore, once the intrinsic coordinate representation is available, a Graph Neural Network (GNN) can be used to learn the complex mapping relationship from local geometry and intrinsic location to biomechanical features, thereby generating the final probabilistic feature map. Moreover, based on the characteristics of GNNs, they are naturally suitable for processing non-Euclidean graph structure data such as 3D meshes. The graph neural network model is constructed as follows:
[0117] First, the obtained 3D mesh model will be directly abstracted into a graph. , where vertex set It is the set of vertices and edges of the mesh. It is the set of edges of the grid. Then, it is the set of edges for each node in the graph (i.e., the grid vertex). Define a rich input feature vector In this invention, the input feature vector is composed of local geometric properties and a geodesic distance field.
[0118] Specifically, local geometric properties are mainly used to describe the shape of the local surface around a vertex. These can include vertex coordinates, vertex normal vectors, principal curvature / Gaussian curvature / mean curvature, and local shape descriptors such as Shape-DNA or HKS (Heat Kernel Signature).
[0119] Specifically, the geodesic distance field is the geodesic distance field vector calculated in the previous step.
[0120] Finally, these features are concatenated to obtain the initial node feature vector.
[0121] It should be noted that local geometric properties enable GNNs to perceive local details such as the concavity and convexity of the sole and the curvature of the arch, while the geodesic distance field provides GNNs with a global, intrinsic sense of position. Combining the two allows the model to not only "see" local details but also "know" the relative position of those details within the overall anatomical structure of the foot.
[0122] Furthermore, the graph neural network model of this invention employs a multi-layer graph convolutional network. Taking GraphSAGE as an example, its layer update rule is as follows:
[0123]
[0124] in, It is a node In the Layer feature representation; It is an aggregate function (such as mean, max, or LSTM); It is the first The learnable weight matrix of the layer. By stacking multiple layers, each node can aggregate information from its multi-hop neighborhood, thereby learning more complex structural patterns. In this embodiment, it is a GNN model containing 3 to 5 graph convolutional layers.
[0125] Furthermore, after the last GNN layer, a fully connected layer and a Softmax activation function are added to output a probability distribution vector for each node. :
[0126]
[0127] in, It is the total number of layers in the GNN. It is the number of predefined biomechanical feature categories (for example, it may include "origin of plantar fascia", "insertion of gastrocnemius muscle", "Yongquan acupoint", etc.). (One category, plus a "background" category). The element Represents vertices Belonging to the The probability of a biomechanical feature.
[0128] Furthermore, this GNN model requires supervised training on a dataset D of 3D foot models with expert annotations. In the training dataset, each vertex of each model is labeled with its corresponding biomechanical feature category.
[0129] Furthermore, in this embodiment, the method for constructing the foot 3D model dataset D with expert annotations is as follows:
[0130] First, 200 volunteers with different foot types (e.g., normal feet, flat feet, high arches, etc.) were recruited. A high-precision 3D scanner was used to perform static 3D scans on the left and right feet of each volunteer, obtaining initial high-resolution 3D mesh models. Then, the original scanned models underwent standard processing such as denoising, hole filling, and simplification to unify the mesh topology and align it to a standard coordinate system, forming a basic database containing 400 foot models. At least three senior TCM doctors or rehabilitation therapists were invited to annotate the biomechanical features of each foot model in the database using dedicated 3D annotation software (such as MeshLab with plugins). The annotation involved drawing predefined biomechanical feature regions (e.g., "plantar fascia origin," "Yongquan acupoint," etc.) on the 3D mesh and assigning a corresponding category label to each mesh vertex within the region. For regions with inconsistent annotations, the final labels were determined through majority voting or expert consultation. Finally, the annotated dataset D was randomly divided into training, validation, and test sets in an 8:1:1 ratio for training the GNN model. The model uses the standard cross-entropy loss function during training.
[0131]
[0132] in, It is a true tag with one-hot encoding.
[0133] It should be noted that after the model training is complete, for a new non-standard 3D mesh model, the final output after this step is a probabilistic feature map. This feature map is the set of probability distribution vectors of all vertices attached to the 3D mesh model. It is not a rigid, zero-or-one segmentation result, but a probabilistic soft partition. For example, for the feature "origin of plantar fascia", the feature map will give a series of high probability values in a small region near the calcaneal tuberosity, forming a probability "heatmap". This probabilistic representation is more robust, can tolerate individual differences and annotation uncertainty, and provides a smoother guiding signal for reinforcement learning policy generation.
[0134] S4. Using probabilistic feature maps and the real-time state of the dexterous hand as input, a pre-trained hierarchical deep reinforcement learning model is used to generate multi-scale policy instructions to drive the dexterous hand to perform massage tasks.
[0135] It should be noted that traditional dexterity hand control strategies struggle with global planning in long-term tasks and the simultaneous handling of multiple interdependent objectives such as therapeutic efficacy, comfort, and safety. To address this, this invention constructs a Hierarchical Deep Reinforcement Learning (HRL) model. This model decomposes massage tasks into high-level decision-making and low-level execution, thereby achieving efficient planning and control of the strategy across different time scales. (Reference) Figure 3 .
[0136] Furthermore, the reinforcement learning model consists of a high-level policy network and a low-level policy network, which work together to make decisions about what to do and how to do it, respectively.
[0137] Specifically, the high-level policy network operates at a lower frequency (e.g., every...). The system performs macro-level planning (making decisions every second). Its core task is to determine the next sub-task to be executed based on the user's overall foot biomechanical characteristics, such as "deeply pressing the plantar fascia area" or "cyclically pressing the acupoints below the medial malleolus." The input to this policy network is the state space of a higher-level policy network, which includes: a probabilistic feature map, massage task history, and user individual identifier embeddings. The massage task history is a simplified historical record, such as the types of sub-tasks executed in the past and their durations, to avoid duplication and promote comprehensive coverage. User individual identifier embeddings can be implemented using an embedding lookup table. Assuming the system supports M users, this table is an M×E matrix, where E is the dimension of the embedding vector (e.g., E=16). When a massage is performed on user ID i, the i-th row of this matrix is taken as its embedding vector. This vector is then concatenated with other state features of the existing policy network layer to form an augmented state vector, which is then input into subsequent fully connected layers. Through end-to-end training, the model can encode the specific preferences of different users into their corresponding embedding vectors. The output of this high-level policy network is a parameterized subtask objective. It's important to note that the actions of the high-level policy are not direct machine commands, but rather guidance for the low-level policy. This parameterized subtask objective is in the form of a tuple:
[0138]
[0139] in, It is a category index of the target biomechanical feature, used to indicate which feature in the map should be focused on by low-level strategies (e.g., (Represents "plantar fascia"). The target average pressure (in Newtons per square meter) is the desired applied pressure, defining the intensity of the massage. Expressed as minimum average pressure, This is expressed as the maximum average pressure. Massage movement patterns are a discrete category that defines the basic techniques of massage. This is the duration of the subtask. Indicates the minimum duration. Indicates the maximum duration.
[0140] It should be noted that the high-level policy network itself can be an Actor-Critic structure with multiple fully connected layers (MLP) and trained using algorithms such as SAC (Soft Actor-Critic) or PPO (Proximal Policy Optimization).
[0141] Specifically, the low-level policy network operates at a higher frequency (e.g., per...) Real-time control is achieved in milliseconds (50Hz). Its task is to generate specific motion commands to drive the dexterous hand, under the constraints of the sub-task objective given by the high-level policy network. The input to this low-level policy network is its state space, which includes: the current sub-task objective, and the dexterous hand's real-time state (joint angles). and joint velocity The position and orientation of the end effector (massage head). and speed And external contact force measured by a six-dimensional force / torque sensor installed on a dexterous wrist or fingertip. The input consists of a local probabilistic feature map, and to improve efficiency, only the probabilistic feature map information within a one-neighborhood around the contact point of the dexterous hand's end effector is input, along with the user's individual identifier embedding. The output of this low-level policy network is a set of target impedance parameters. :
[0142]
[0143] in, The target stiffness matrix defines the "stiffness" of the end effector in six degrees of freedom (three translations and three rotations) in Cartesian space. Higher stiffness means greater corrective force for position errors, and vice versa. The target damping matrix defines the dissipation characteristics of the dexterous hand. Appropriate damping can suppress vibrations and make the contact process more stable. It is usually set to the critical damping to obtain the fastest response without overshoot. . The target equilibrium point pose is a virtual "spring" zero point. Because the dexterous hand generates a force that moves its end effector towards this equilibrium point, this is achieved by continuously changing... It can guide the massage head to move on the surface of the foot.
[0144] Furthermore, the reinforcement learning model is trained in a high-fidelity physical simulation environment, which includes an accurate model of the dexterous hand and a foot model reconstructed from step S2 and processed through step S3. The training process consists of two phases:
[0145] Low-level policy training: First, fix a meaningful sub-task objective (or sample from a pre-defined distribution), and train only the low-level policy network. The goal is to learn to effectively complete the task given any sub-task objective. Its reward function... It is a composite function used to quantify what constitutes a "good" massage movement:
[0146]
[0147] in, It is a weight hyperparameter, with a value range of [0,1], and This is used to balance the importance of different objectives. In this implementation, 0.3 is acceptable. 0.2 is acceptable. A value of 0.5 is acceptable. In the comfort category, 0.4 is acceptable. A value of 0.6 is acceptable. In the security section, A value of 10.0 can be chosen to produce a steep penalty gradient when approaching the threshold.
[0148] Specifically, the therapeutic effects included It is used to reward actions that are precisely applied to the target area. Its calculation is based on the pressure distribution applied by the dexterous hand and the target features in step S3. The degree of matching between the probability distributions. Assume at time... The contact area between the dexterous hand and foot is The applied pressure distribution is The probability map of the target features is The therapeutic effect can be defined as a similarity measure between the distributions of the two entities. In this embodiment, the similarity measure is the negative value of the Wasserstein distance or the negative KL divergence.
[0149]
[0150] It also includes the average pressure and target pressure. Matching bonus: .
[0151] Specifically, the comfort item Used to penalize unsmooth, abrupt movements to improve user experience. This comfort item consists of two parts:
[0152] Penalty for jerk in the motion of the end effector: .
[0153] Penalty for the rate of change of the applied force: .
[0154] final, .in, and This represents the weight coefficient for the corresponding penalty term, taking values in the range [0,1]. .
[0155] Specifically, the safety items This is used to ensure that the robot's operations remain within safe limits at all times. This can be achieved by imposing a significant penalty on behaviors that approach a safety threshold. In this embodiment, the penalty for contact forces or joint torques can be designed as a potential function:
[0156]
[0157] in, It is the first The torque of each joint It is the safe threshold for joint torque. This is a parameter that controls the sharpness of the penalty. When the torque is much less than the threshold, the safety term approaches zero; when it approaches or exceeds the threshold, the safety term provides a huge negative reward.
[0158] High-level policy training: Once the low-level policy training converges, its parameters are frozen. Then, the high-level policy is trained. For each sub-task objective executed by the high-level policy, the low-level policy network executes... The steps. Rewards for high-level policy networks. It is the sum of the cumulative rewards obtained by the lower-level policy network during the execution of the selected sub-task objective: ,in It is the reward for low-policy networks.
[0159] It should be noted that, through the above processing, the high-level policy network can learn how to maximize the long-term cumulative reward of the entire massage process by issuing a series of sub-task objectives.
[0160] For example, the high-fidelity physical simulation environment is constructed as follows:
[0161] First, an open-source physics engine that supports soft body dynamics and accurate contact force calculations, such as PyBullet, is selected. The 3D mesh model of the foot reconstructed in step S2 is imported into the simulation environment. To simulate the mechanical properties of foot muscles and soft tissues, the finite element method (FEM) is used to model the foot model. The mesh is voxelized, and each tetrahedral element is assigned corresponding material properties (such as Young's modulus and Poisson's ratio). These parameters can be referenced from specific values for foot soft tissues in biomechanics literature. The URDF model of the dexterous hand is imported into the environment, and its accurate kinematic and dynamic parameters (mass, center of mass, inertia matrix, etc.) are configured. Then, a six-dimensional force / torque sensor is virtually fitted at the wrist or fingertip of the dexterous hand model. This virtual sensor needs to output the force and torque generated when in contact with the foot model in real time. Its readings are used to calculate the reward function and as state input. Finally, the aforementioned foot model and dexterous hand model are integrated into the same simulation scenario to ensure that the simulator can perform physics calculations at a sufficiently high frequency (e.g., 1000Hz) to support the 50Hz control frequency of the low-level policy network.
[0162] It should be noted that this simulation environment can simulate the real massage interaction process, providing stable and repeatable interaction data for training the hierarchical reinforcement learning model.
[0163] Furthermore, in actual deployment, the trained reinforcement learning model will be run directly. The following operations will be performed during each high-frequency control cycle (50Hz):
[0164] Obtain the real-time state and local feature maps of the dexterous hand.
[0165] By combining the sub-task objectives given by the higher level with the embedding of individual user identifiers, a lower-level state is constructed.
[0166] The target impedance parameters are obtained by inputting the low-level state into the low-level policy network.
[0167] Based on these parameters, the dexterous hand's underlying controller calculates and executes joint torques using a standard Cartesian space impedance control law. :
[0168]
[0169] in, It is the Jacobian matrix of the dexterous hand, which enables the dexterous hand to behave like a spring-damped system that can be programmed online by a neural network, thereby achieving safe, comfortable and effective adaptive massage for non-standard foot types.
[0170] It should be noted that by embedding individual user identifiers as contextual information into two layers of the reinforcement learning model, the model can learn during training to associate different users' preferences (such as sensitivity to stress and speed) with that user's embedding vector. Therefore, during execution, only the current user's ID needs to be provided, and the entire policy (including the selection of subtasks and execution details) will be automatically adjusted, thus achieving true end-to-end personalized adaptation.
[0171] Example 2
[0172] Reference Figures 4 to 6 This is the second embodiment of the present invention. This embodiment provides a visually adaptive dexterous hand massage strategy generation method for non-standard foot types. To verify the effectiveness of the present invention, a series of simulation experiments were conducted, and the results are as follows.
[0173] refer to Figure 4 This experiment simulated the surface recovery ability of a dexterous hand in the area below the forearm when it moves to the junction of the sole and arch of the foot. The gray squares in the figure represent depth point cloud data captured by the camera. (The horizontal axis interval is missing from the original text.) Within the defined area, there exists a significant region of dynamic occlusion (indicated by a gray dashed arch, representing the location of the dexterous hand). Within this region, true foot surface data is completely missing. The black dashed line represents the conventional neural field reconstruction without incorporating SDF priors. Due to the lack of observational data, benchmark / traditional algorithms tend to perform simple smooth interpolation between occlusion edges. Especially... Nearby, on the actual surface of the foot, there exists a depth of approximately The depression (corresponding to the deep part of the arch of the foot), but the surface depth reconstructed by the benchmark algorithm floats on the surface. This means a geometric error exceeding 1.0 unit depth, completely losing the curvature features of the foot arch. The black dotted line represents the reconstruction result of the method of this invention. Thanks to injecting the SDF field of the dexterous hand kinematic model as prior information into the training process, the algorithm can distinguish the spatial relationship between the hand and foot and use the learned foot geometric manifold features to complete the occluded area. In the core occlusion area, the reconstruction curve of the present invention closely matches the true value, at the trough ( The depth error is within 0.05 units. Compared to the benchmark algorithm, this invention improves reconstruction accuracy by an order of magnitude under strong occlusion, effectively solving the "blind man's elephant" problem caused by obstructed vision during massage. (Reference) Figure 5In the upper left image, it can be seen that the arch height of a high-arched foot (approximately 0.5 on the Y-axis) is much higher than that of a flat foot (approximately 0.1 on the Y-axis). This significant morphological difference means that the linear Euclidean distance between the two points is not equivalent to the surface distance. In the upper right image, this comparison shows the distance measurement methods from the anchor point to the true location of the feature. The traditional linear distance increases linearly with changes in surface parameters, failing to reflect changes in curvature. However, the thermally conductive geodesic distance increases more rapidly due to propagation along a high-curvature surface. At the true location of the feature (the vertical dashed line), the two measurement methods produce numerical discrepancies, explaining why the traditional method fails on non-standard foot types. In the lower left image, the true location of the target feature is at the x-axis of 0.59. However, for a high-arched foot, the predicted location is around 0.35 to 0.72, showing a severe negative offset; for a flat foot, the predicted location is around 0.43 to 0.74. This is because traditional methods, which do not consider intrinsic geometry, have significant positioning errors in high-arched feet, potentially causing the massage head to act on the wrong bony prominences and triggering pain. In the lower right image, after applying the solution of this invention, the generated predicted location is smaller than that of traditional methods for both high-arched and flat feet. Specifically, the range is reduced by 0.01 (0.43 to 0.73) for flat feet and by 0.1 (0.35 to 0.62) for high-arched feet. This demonstrates that the positioning of the solution of this invention is more accurate. (Reference) Figure 6 In the top diagram, the traditional fixed stiffness is controlled at... At the moment of contact, a huge force impact is generated, with a peak value exceeding 10N, and then oscillates around 6.5N, far exceeding the set safety threshold (5N), which can easily cause user discomfort. With the solution of this invention, although the force curve shows a momentary rise at the moment of contact, it is quickly suppressed and converges and stabilizes near the 5N safety threshold within 0.2 seconds. This demonstrates that the dexterous hand system using this solution possesses excellent compliant contact capability. In the middle diagram, its output target impedance corresponds to the force control effect in the top diagram. Traditional control maintains a constant high stiffness (1000 N / m), while the solution of this invention... During the idle period, it maintains a stiffness of approximately 800 N / m to ensure motion accuracy; however, Upon detection of contact, the low-level strategy network rapidly reduces the stiffness to approximately 100 N / m. This soft-landing strategy actively absorbs the impact energy, and the stiffness subsequently dynamically adjusts between 100 and 800 N / m (exhibiting a wave-like pattern), simulating the kneading technique of a massage therapist's "press-release-press" motion. In the bottom image, the gray background area is marked as a lump / spasm, representing an abnormal induration under the skin of the foot. Traditional methods would mechanically track position commands, causing the end effector to deeply penetrate the lump area (the trajectory enters the gray area), which in practice would cause severe pain. Using the solution of this invention, [the following occurs]. When entering the hard block region from the left or right, although the target depth requires downward movement, the strategy network senses an abnormal increase in contact force and actively controls the trajectory to rise (solid line above dashed line), avoiding deep pressing into the hard block region. Therefore, this invention ensures overall trajectory tracking ( While providing follow-up care, it also has pain point avoidance and adaptive compliant adjustment capabilities, achieving a balance between safety and treatment effectiveness.
[0174] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. The solutions in the embodiments of this application can be implemented using various computer languages, such as the object-oriented programming language Java and the interpreted scripting language JavaScript.
[0175] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0176] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0177] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0178] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.
[0179] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A method for generating visually adaptive dexterous hand massage strategies for non-standard foot types, characterized in that, include: Acquire multi-view image sequences from multiple visual sensors surrounding the foot, along with simultaneously acquired kinematic data that drives a dexterous hand to perform dynamic maneuvers; Based on the multi-view image sequence and the kinematic data, a neural radiation field model is used to reconstruct the four-dimensional spatiotemporal representation of the foot under the dynamic occlusion of the dexterous hand. The neural radiation field model uses the occlusion information implied in the kinematic data as a priori condition for the reconstruction process to complete the occluded foot region. Based on the four-dimensional spatiotemporal representation, a three-dimensional mesh model of the foot at a specific moment is extracted, and the three-dimensional mesh model is processed to construct a probabilistic feature map characterizing the individualized biomechanical features of non-standard foot types. Using the probabilistic feature map and the real-time state of the dexterous hand as input, a pre-trained hierarchical deep reinforcement learning model is used to generate multi-scale policy instructions to drive the dexterous hand to perform massage tasks.
2. The method for generating visually adaptive dexterous hand massage strategies for non-standard foot types as described in claim 1, characterized in that, Based on the multi-view image sequence and the kinematic data, a neural radiation field model is used to reconstruct the four-dimensional spatiotemporal representation of the foot under the dynamic occlusion of the dexterous hand. The neural radiation field model uses the occlusion information implicit in the kinematic data as a priori condition for the reconstruction process to complete the occluded foot region, including: The kinematic data of the dexterous hand is converted into its time-varying geometric model in three-dimensional space, and the signed distance function from the spatial point to the time-varying geometric model is calculated to generate a differentiable occlusion field representing the occlusion relationship. The differentiable occlusion field is then fed back to the neural radiation field model as one of the input conditions.
3. The method for generating visually adaptive dexterous hand massage strategies for non-standard foot types as described in claim 1 or 2, characterized in that, Also includes: Spatiotemporal coherence regularization is applied to a deformation field in the neural radiation field model used to model foot dynamics. The regularization aims to minimize the rate of change of the deformation field in the time dimension and the degree of non-rigid deformation in local regions in the spatial dimension.
4. The method for generating visually adaptive dexterous hand massage strategies for non-standard foot types as described in claim 1, characterized in that, Constructing the probabilistic feature map includes: At least one anatomical anchor point is defined on the three-dimensional mesh model, and one or more geodesic distance fields characterizing the intrinsic distance of the surface are calculated for each vertex on the mesh by solving a heat conduction equation based on the Laplace-Beltrami operator on the three-dimensional mesh model.
5. The method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types as described in claim 4, characterized in that, Also includes: The three-dimensional mesh model is abstracted into graph structure data, and the mesh vertices are used as graph nodes; The graph structure data is processed using a graph neural network. The input node features of the graph neural network include the local geometric properties of the vertices and the geodesic distance field. The output is the membership probability of each node corresponding to a predefined biomechanical feature.
6. The method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types as described in claim 1, characterized in that, The hierarchical deep reinforcement learning model includes: A high-level policy network that takes the probabilistic feature map as input and generates parameterized subtask objectives at a low frequency; A low-level policy network, which takes the sub-task objective and the real-time state of the dexterous hand as input, generates action instructions at a high frequency for completing the sub-task objective.
7. The method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types as described in claim 6, characterized in that, The training process of the low-level policy network employs a composite reward function, which includes: A therapeutic effect term, which is calculated based on the degree of matching between the pressure distribution applied by the dexterous hand and the probability distribution of target features in the probabilistic feature map; A comfort factor is calculated by penalizing the jerk of the dexterous hand end effector's motion and the rate of change of the applied force. A safety measure that provides a penalty when the force or joint torque applied by the dexterous hand approaches a preset safety threshold.
8. The method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types as described in claim 6 or 7, characterized in that, The action command output by the low-level policy network is a set of target impedance parameters.
9. The method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types as described in claim 8, characterized in that, The target impedance parameters include: The target stiffness matrix, target damping matrix, and target equilibrium point pose are used to set the dynamic characteristics of the interaction between the dexterous hand end and the foot.
10. The method for generating a visually adaptive dexterous hand massage strategy for non-standard foot types as described in any one of claims 1 to 9, characterized in that, The method includes: The user's individual identifier is used as contextual information and input into the hierarchical deep reinforcement learning model to adjust the generated massage strategy instructions online, thereby achieving personalized adaptation for specific users.