An assisted feeding robot visual line estimation method and an assisted feeding robot
By introducing a line-of-sight estimation method and combining a rigid-flexible robotic arm into the feeding robot, the problem of low positioning accuracy in the feeding robot was solved, achieving precision and safety in the feeding process, and improving the user experience and work efficiency of the feeding robot.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA SHENZHEN INTERNATIONAL GRADUATE SCHOOL
- Filing Date
- 2024-01-16
- Publication Date
- 2026-06-23
AI Technical Summary
The existing technology suffers from low positioning accuracy, which leads to the inability of existing feeding robots to accurately identify errors. This results in a trade-off between safety and accuracy during the feeding process.
A gaze estimation method for an assisted feeding robot is adopted. The method acquires facial images through a camera module, detects the face position, performs face alignment and head pose estimation, corrects the facial image, extracts features, and combines rigid and flexible robotic arms to achieve gaze estimation and compliant control, ensuring feeding accuracy and safety.
It improves the positioning accuracy and safety of the feeding robot, reduces errors, enhances the safety and user experience of the feeding process, and reduces the learning cost for the elderly and the fatigue of staff.
Smart Images

Figure CN117901098B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robotics, and in particular to a method for estimating the line of sight of an auxiliary feeding robot and the auxiliary feeding robot itself. Background Technology
[0002] As the global population ages, the service and consumption market for the elderly is expanding, encompassing not only food and daily necessities but also quality of life, health, and home living. In this context, safe and efficient service robots have a significant market potential. Many elderly people face upper limb mobility issues due to muscle atrophy and joint damage, making independent feeding difficult and requiring external assistance to help them eat.
[0003] Currently, there are two main methods for assisting feeding: manual feeding and feeding robots.
[0004] 1. Artificial feeding of elderly people with weak functions is time-consuming and laborious, especially in nursing homes and other settings where one person needs to care for multiple people. Staff members will become fatigued from long hours of mechanical repetition. Furthermore, the people being cared for often experience psychological burden from being cared for, and may even feel that they have lost their dignity, which is detrimental to their mental health.
[0005] 2. Current feeding robots employ rigid structures, largely developed based on multi-axis rigid robotic arms. While these rigid robots offer advantages in positioning accuracy, they suffer from poor interactivity, instill fear in users, and are prone to injuring users in unexpected situations. They can easily injure the elderly or cause psychological distress during assisted feeding. Some feeding robots utilize flexible mechanisms, but positioning accuracy is difficult to guarantee. Furthermore, existing feeding robots lack intelligent visual or voice interaction algorithms, resulting in poor interactivity and hindering a comfortable dining experience for the elderly. Manual feeding is expensive, time-consuming, and labor-intensive, with varying results and difficulty in monitoring. This invention is a rigid-flexible dual-arm assisted feeding robot, combining the precision of rigid arms with the safety of flexible arms. Incorporating a gaze detection algorithm, it can accurately add and retrieve target food, allowing the elderly to easily and comfortably eat simply by looking at the desired food. This improves the safety of the feeding process, reduces the learning curve for the elderly, and enhances work efficiency and user experience.
[0006] The main body of an existing automatic feeding robot consists of a food tray and a spoon for feeding, which is also the current mainstream form of feeding robots. Automatic feeding robots require button operation, which can be difficult for users with hand and foot disabilities; the tray rotates after each feeding, and users cannot select the food themselves.
[0007] Another existing feeding robot can select dishes through voice recognition. While this gives users room to choose, it requires users to speak clearly and the food on the plate to be easy to describe, which greatly limits its daily use.
[0008] An existing soft feeding robot uses a flexible arm for feeding. Although the arm is flexible, it is pneumatically driven, resulting in poor positioning accuracy, low load capacity, and no sensors, making feedback control impossible.
[0009] The mainstream solutions for existing feeding robots adopt rigid structures and are generally developed based on multi-axis robotic arms. Although they have certain advantages in positioning accuracy, they are very easy to injure users in case of accidents. Some feeding robots use flexible mechanisms, but the positioning accuracy is difficult to guarantee, making it difficult to complete the task of assisting feeding. Summary of the Invention
[0010] The purpose of this invention is to solve the technical problem of low positioning accuracy of existing feeding robots, and to propose an auxiliary feeding robot line of sight estimation method and an auxiliary feeding robot.
[0011] The technical problem of this invention is solved by the following technical solution:
[0012] A method for estimating the gaze of an assisted feeding robot includes the following steps: S1, acquiring a face image through a camera module; S2, detecting the face position in the face image and obtaining facial feature points; S3, aligning the face based on the facial feature points and estimating the head pose; S4, correcting the face image based on the head pose estimation result; S5, extracting features from the corrected face image to obtain a real-time gaze estimation result, confirming that the food in the plate is the desired food for the person being fed based on the real-time gaze estimation result, and achieving compliant control of the robotic arm's end effector through a sensing unit.
[0013] In some embodiments, step S2 further includes the following steps: S21, detecting the face position of the face image using a face detection algorithm; S22, obtaining the transformation relationship between the current pose and the standard pose based on the angle relationship between the line connecting the current pose feature points in the face feature points and the horizontal line, and performing the first face alignment.
[0014] In some embodiments, step S3 further includes the following steps: S31, performing a second face alignment based on the facial feature points using a real-time face alignment method; S32, finding the most suitable projection 3D transformation from the feature points in the standard pose and the current pose through Prouk analysis to obtain the head pose matrix.
[0015] In some embodiments, the head pose matrix satisfies the following relationship: in, H is the position of the center of the feeder's eyes relative to the camera, and H is the head pose matrix. It is the center position of the eye in the standard pose.
[0016] In some embodiments, in step S5, a user-specific gaze estimation network model is used to extract features from the corrected face image to estimate the gaze estimation vector in the standard pose. The gaze estimation vector in the current pose is obtained by rotating the gaze estimation vector in the standard pose using a head pose matrix. The sensing unit includes a tension sensor, which measures the resultant force of the drive rope and obtains the tension signal of the drive rope through geometric conversion. A digital transmitter is provided to convert the tension signal of the drive rope into a digital signal.
[0017] In some embodiments, the gaze estimation vector under the standard pose and the gaze estimation vector under the current pose have the following relationship: in, H is the gaze estimation vector in the standard pose, and H is the head pose matrix. It is the gaze estimation vector under the current pose.
[0018] In some embodiments, the training and optimization of the user-specific gaze estimation network model includes the following steps: S51, training the gaze estimation network model using a meta-learning framework, optimizing the estimation error caused by differences in the eyeball structure of different people, and obtaining a general gaze estimation network model; S52, calibrating the user by taking image samples of the user gazing at objects at known locations and fine-tuning the general gaze estimation network model to obtain a user-specific gaze estimation network model.
[0019] In some embodiments, the meta-learning method employs the MAML++ framework.
[0020] In some embodiments, step S51, training the gaze estimation network model, includes the following steps: S511, dividing the gaze estimation dataset used for training according to identity labels, sampling the divided dataset using n categories, with k images in each category, sampling k+1 images from each of the n different identity categories, using k images for training, and using the remaining 1 image for testing during the training phase; S512, n*k images form a support set, and the remaining k images form a query set, with the set consisting of a support set and a query set constituting a sample. The gaze estimation network model is trained on the support set of a sample and tested on the query set, which is called a task; S513, take h samples as a batch of samples for training, and obtain the loss of these h tasks; S514, train h of the gaze estimation network models on h support sets and verify them on the query set, and accumulate the h losses as the final loss; S515, the final loss is obtained by calculating the gradient of the initial parameters of the gaze estimation network model and updating the initial parameters of the gaze estimation network model, and performing multiple rounds of training; where n, k, and h are constants.
[0021] This invention also proposes an assisted feeding robot, comprising an installation module, a rigid robotic arm, a flexible robotic arm, and a camera interaction system. The rigid and flexible robotic arms are fixedly mounted on the installation module. The camera interaction system includes a camera module and a gaze estimation module. The flexible robotic arm has a sensing unit inside for compliant control of the end effector. The camera module is fixedly mounted on the installation module to capture facial images and transmit them to the gaze estimation module. The gaze estimation module is used to implement the aforementioned gaze estimation method for the assisted feeding robot to obtain the real-time gaze estimation result. Based on the real-time gaze estimation result, the camera interaction system controls the rigid robotic arm to place the food tray in front of the person being fed and controls the flexible robotic arm to deliver the desired food to the person's mouth.
[0022] The beneficial effects of this invention compared to the prior art include:
[0023] This invention proposes a gaze estimation method for an assisted feeding robot. It acquires facial images via a camera module, performs head pose estimation and corrects the facial images, extracts features from the corrected facial images, and considers the influence of head pose on the estimated gaze vector. This allows the gaze estimation method to ignore the influence of head pose and primarily consider the gaze estimation accuracy under standard pose. Consequently, the gaze estimation method is well-adapted to the user's eye structure and the surrounding environment, effectively improving positioning accuracy and reducing errors. Furthermore, it achieves compliant control of the robotic arm's end effector through a sensing unit, enabling effective application in feeding tasks and ensuring feeding safety.
[0024] Furthermore, this invention proposes an auxiliary feeding robot that is fixedly mounted on an installation module using a rigid robotic arm and a flexible robotic arm. The flexible robotic arm is equipped with a sensing unit. By combining the rigid and flexible robotic arms, the invention solves the problem of existing feeding robots being unable to balance accuracy and safety. It utilizes the positioning accuracy advantage of the rigid robotic arm, as well as the inherent passive compliance of the flexible robotic arm and the active compliance control using the obtained tension. This improves safety from both active and passive perspectives, thereby implementing compliance control and force limiting for the robot to avoid accidental injury to the person being fed, effectively enhancing safety.
[0025] Other beneficial effects of the embodiments of the present invention will be further described below. Attached Figure Description
[0026] Figure 1 This is a schematic diagram of the feeding robot in an embodiment of the present invention.
[0027] Figure 2 This is a schematic diagram of the structure of the flexible arm of the rope-driven spring sheet in an embodiment of the present invention.
[0028] Figure 3 This is a top view of the flexible arm of the rope-driven spring sheet in an embodiment of the present invention.
[0029] Figure 4 This is an isometric view of the spring plate arm in an embodiment of the present invention.
[0030] Figure 5 This is a schematic diagram of the drive module of the rope-driven spring plate flexible arm in an embodiment of the present invention.
[0031] Figure 6 This is a schematic diagram of the spring sheet unit in an embodiment of the present invention.
[0032] Figure 7 This is a schematic diagram of the driver kit in an embodiment of the present invention.
[0033] Figure 8 This is a schematic diagram of the sensing kit in an embodiment of the present invention.
[0034] Figure 9 This is a flowchart of the line-of-sight estimation method in an embodiment of the present invention.
[0035] Figure 10 This is a schematic diagram of the operation flow of the line-of-sight estimation method in an embodiment of the present invention.
[0036] Figure 11 This is a schematic diagram of the line-of-sight estimation network model generated in an embodiment of the present invention.
[0037] Figure 12 This is a schematic diagram of a rope-driven elastic rod flexible arm in another embodiment of the present invention.
[0038] Figure 13 This is a schematic diagram of a spatial three-dimensional robotic arm in another embodiment of the present invention.
[0039] The attached figures are labeled as follows:
[0040] 1 Mounting plate, 2 Rigid robotic arm, 3 Rope-driven spring plate flexible arm, 31 Spring plate arm, 311 Spring plate sub-unit, 3111 Spring plate, 3112 Concave disc, 3113 Convex disc, 312 Retaining rope, 32 Drive box, 321 Drive kit, 3211 Cable reel, 3212 Retainer, 3213 Second motor module, 3214 Reducer, 3215 Encoder, 322 Sensor kit, 3221 Tension sensor, 3222 Digital transmitter, 3223 Guide pulley, 323 Driver mounting plate, 324 Drive box mounting post, 325 Arm adapter plate, 33 Drive rope, 4 Camera unit, 41 Camera, 42 Camera mounting bracket. Detailed Implementation
[0041] The present invention will be further described below with reference to the accompanying drawings and preferred embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other.
[0042] It should be noted that the directional terms such as left, right, up, down, top, and bottom used in this embodiment are only relative concepts or are based on the normal use of the product, and should not be considered as restrictive.
[0043] The current method of fixing plates is to use a fixed base. Some improvements have added a rotating function to the fixed base, which can easily lead to the following two problems:
[0044] 1. The fixed position of the food tray greatly reduces the feeding range due to the limited position of the food tray, making it less flexible.
[0045] 2. The plates cannot be easily replaced. Due to the added degree of rotation, the plates must be fixed to the base and must be made of special materials, which cannot be replaced. This makes them less versatile. In contrast, the rigid arm can hold bowls and trays, making it more flexible and convenient.
[0046] The present invention provides a rigid-flexible dual-arm assisted feeding robot, such as... Figure 1As shown, this invention addresses the issue of balancing accuracy and safety in existing feeding robots, while also enhancing human-robot interaction by introducing gaze detection technology. The feeding robot comprises a mechanical component and an integrated gaze estimation algorithm. The mechanical component uses a combination of rigid and flexible arms, including a mounting plate 1, a rigid robotic arm 2, a flexible robotic arm, and a camera interaction system. In this embodiment, the flexible robotic arm is a rope-driven spring-loaded flexible arm 3. The camera interaction system includes a camera unit 4, a gaze estimation module, and a controller. The flexible robotic arm has a sensing unit inside for compliant control of the end effector, ensuring feeding safety. The camera unit 4 captures facial images and transmits them to the gaze estimation module. The gaze estimation module implements the gaze estimation method for the auxiliary feeding robot proposed in this embodiment to obtain real-time gaze estimation results. Based on the real-time gaze estimation results, the controller in the camera interaction system controls the rigid robotic arm to place the food tray in front of the person being fed and controls the flexible robotic arm to deliver the desired food to the person's mouth. In this embodiment of the invention, the controller is an STM32 microcontroller.
[0047] In this embodiment, camera unit 4 is camera 41. The rigid robotic arm 2, the rope-driven spring flexible arm 3, and the camera unit 4 are fixedly mounted on the mounting module with screws. The algorithm part includes a line-of-sight estimation method.
[0048] The rigid robotic arm 2 includes a rigid link and a rotating module. The rigid link and the rotating module are fixedly connected at intervals. When performing a feeding task, the rotating module provides driving force to drive the rigid robotic arm 2 to rotate. The rigid robotic arm 2 can place the food tray in front of the person being fed after it grasps the food tray with its end gripper.
[0049] Traditional feeding robots lack a gaze estimation module, requiring users to operate them via buttons, resulting in low automation, high operating costs, and a poor human-computer interaction experience. In this embodiment of the invention, the feeding robot uses a camera interaction system to control a rigid robotic arm 2 to grasp the food tray and place it in front of the person being fed. The gaze estimation method proposed in this embodiment confirms that the food tray contains the desired food, and then controls a flexible robotic arm to deliver the food to the person's mouth. Specifically, the camera unit 4 uses the gaze estimation method to confirm the desired food in the tray. This gaze estimation method relies on facial image data captured by the camera unit 4. The gaze estimation module receives facial image data captured by camera 41 as input to obtain real-time gaze estimation results.
[0050] The rope-driven spring flexible arm 3 in this embodiment includes an elastic arm, a drive module, and a drive rope 33. The drive module drives the elastic arm to move by pulling the drive rope 33. When performing a feeding task, the rope-driven spring flexible arm 3 uses the utensils installed at its end to deliver the target food to the mouth of the person being fed. The end compliance control is achieved through the sensing unit of the rope-driven spring flexible arm 3 to ensure feeding safety. At this point, the entire feeding task is completed.
[0051] The rope-driven spring plate flexible arm 3 in this embodiment has the ability to move in the horizontal plane. The specific steps are as follows:
[0052] 1. A rigid robotic arm 2 places the plate in the appropriate place;
[0053] 2. The flexible arm 3 of the rope-driven spring plate is bent into a "C" or "S" shape so that its end reaches near the plate position;
[0054] 3. Camera unit 4 detects the gaze of the person being fed, determines the specific food they want, and converts it into a location signal;
[0055] 4. The rope-driven spring plate flexible arm 3 moves to the position of the specific food item mentioned above, and the end gripper picks up the food item.
[0056] 5. The rope-driven spring plate flexible arm 3 is straightened, and the end is brought close to the mouth of the person being fed, and the food is slowly brought into the mouth.
[0057] It should be noted that: 1. The end gripper is not driven by the motor in the drive box, but is driven by a separate servo motor, just like the end of the rigid arm. 2. Taking one of the motors as an example, the rotation of the motor pulls the rope to shorten (or lengthen). The end of the rope is fixed to the concave-convex disc of the rope-driven spring-plate flexible arm 3 by the rope head lock, which pulls the rope-driven spring-plate flexible arm 3 to bend in the horizontal plane. Other motors are similar. The required rope length for a specific shape can be obtained through kinematic algorithms, which can then be converted into the number of revolutions required by the motor.
[0058] In this embodiment, the installation module is the mounting plate 1, which consists of a metal fixing plate and an aluminum profile bracket, wherein the metal plate has corresponding positioning holes.
[0059] In this embodiment, the rigid robotic arm 2 consists of four rigid links and a rotating module. Specifically, the rotating module in this embodiment is a joint motor module, which includes three parallel joints and a first motor module. The first motor module is the motor module at the joint, and the joint is directly driven by the motor module at the joint. That is, the rotation motor at the joint directly drives the joint to rotate. Under this joint configuration, the rigid robotic arm 2 has three degrees of freedom, and its end effector can only move in the corresponding plane.
[0060] In this embodiment, the flexible arm 3 of the rope-driven spring plate is as follows: Figure 2 As shown, it includes an elastic arm, a drive module, and a drive rope 33. In this embodiment of the invention, the elastic arm is a spring-loaded arm 31, and the drive module is a drive box 32. The drive method of the rope-driven spring-loaded flexible arm 3 is as follows: Figure 7 The second motor module 3213 shown pulls the drive rope 33, thereby driving the spring plate arm 31 to perform planar motion, wherein the planar position is perpendicular to the rectangular spring plate 3111 and lies in the plane passing through its center point.
[0061] like Figure 2 and Figure 4 As shown, the spring plate arm 31 has a modular structure, consisting of several spring assemblies and a retaining rope 312. In this embodiment, the spring assembly is a spring plate sub-unit 311. The spring plate arm 31 specifically consists of three short arm segments and one retaining rope 312. Each short arm segment contains four spring plate units 311, thus comprising a total of 12 spring plate units 311. The spring plate unit 311 is as follows... Figure 6 As shown, it consists of a spring plate 3111, a concave disk, and a convex disk. In this embodiment, the concave disk is a concave disc 3112, and the convex disk is a convex disc 3113. The concave and convex discs are engaged with the holes in the spring plate 3111 and the concave disc 3112 via pins on the convex disc 3113. In this embodiment, the convex disc 3113 has two holes that engage with the two holes in the concave disc 3112 for positioning, and is then fixed by screws at the lifting lugs on both sides of the discs. In this embodiment, one side of the arm adapter plate 325 is fixed to the support platform of the drive box 32 by bottom screws, and the other side is fixed to the spring plate arm 31 by screws, serving a connecting function.
[0062] In this embodiment, the retaining rope 312 is a steel wire rope, such as Figure 2 and Figure 4 As shown, its two ends are fixed to the spring plate arm 31, specifically to the openings on the concave and convex discs of the first and last spring plate subunits 311 on the spring plate arm 31, which serve to retain them and can greatly reduce the lateral bending of the spring plate arm 31 when it deflects at a large angle.
[0063] like Figure 6 As shown, in this embodiment, the spring sheet 3111 is made of 45 steel with a thickness of 0.7mm, and is positioned by pins on the concave disc 3112 and the convex disc 3113. Specifically, the convex disc has two pin structures (integrated structure), the concave disc has two holes, the pins cooperate for positioning, and then there are two lifting lugs next to it, which are fixed by screws.
[0064] Because the spring plate 3111 has a certain thickness, it needs to be installed alternately from left to right to ensure that the center of the spring plate arm 31 does not deviate to one side. If the spring plate near the end is always on the side near the root, the center of the entire rope-driven flexible arm 3 will shift. Alternating left and right installation will not cause this. Specifically, the spring plate 3111 is positioned by the concave disc 3112 and the convex disc 3113, and the plane of motion of the spring plate arm 31 is always perpendicular to the spring plate 3111 and passes through the center point of the spring plate 3111. Both the concave disc 3112 and the convex disc 3113 have opening areas, and at least one opening area on each of the concave disc 3112 and the convex disc 3113 is through which the drive rope 33 passes, and at least one opening area on each of the concave disc 3112 and the convex disc 3113 is through which the holding rope 312 passes. In this embodiment, specifically, the concave disc 3112 and the convex disc 3113 have four opening areas: top, bottom, left, and right. Figure 2 As shown, the left and right openings are for the movement of the drive rope 33; as Figure 4 As shown, the function of the upper and lower openings is to facilitate the movement of the retaining rope 312.
[0065] like Figure 2 , Figure 3 , Figure 5 As shown, in this embodiment, the drive box 32 of the rope-driven spring flexible arm 3 consists of a corresponding number of drive kits 321, sensing kits 322 (sensing kits 322 are sensing units in this embodiment), driver mounting plate 323, drive box mounting post 324, and arm adapter plate 325. Specifically, the drive box 32 consists of 6 drive kits 321, sensing kits 322, driver mounting plate 323, drive box mounting post 324, and arm adapter plate 325. The drive kits 321 and sensing kits 322 are respectively fixedly mounted on the driver mounting plate 323. The drive kits 321 and sensing kits 322 are used to measure the tension signal of the drive rope 33. One side of the arm adapter plate 325 is fixedly connected to the driver mounting plate 323, and the other side is fixedly connected to the spring assembly of the elastic arm. Figure 7 As shown, the drive kit 321 includes a spool 3211, a cage 3212, a second motor module 3213, a reducer 3214, and an encoder 3215. Figure 8As shown, the sensing kit 322 includes a tension sensor 3221, a digital transmitter 3222, and a guide pulley 3223. The bottom of the cable reel 3211 is connected to the output shaft of the second motor module 3213 via a key. The cable reel 3211 has a U-shaped groove, and the end of the drive rope 33 is fixed inside the groove. The drive rope 33 winds around the groove. The function of the cable reel 3211 is to transmit the torque of the second motor module 3213, pull the drive rope 33, collect excess parts of the drive rope 33, and fix its trajectory. The retainer 3212 is used to maintain a specific angle of the drive rope 33, which helps to realize subsequent tension calculation (requiring the geometric relationship of the rope path of the drive rope 33). The tension sensor 3221 is used to measure the resultant force of the drive rope 33 at the guide pulley 3223, and the digital transmitter 3222 is used to convert the tension signal of the drive rope 33 into a digital signal.
[0066] Looking at the fixed end of the drum 3211, the path of the drive rope 33 is: drum 3211 - guide pulley 3223 - spring plate unit 311. The second motor module 3213, reducer 3214, and encoder 3215 are common motor kits. The encoder 3215 is a sensor that provides feedback on the position, speed, and current of the motor (i.e., the second motor module 3213).
[0067] like Figure 1 , Figure 2 , Figure 3 , Figure 7 As shown, in this embodiment, the drive rope 33 is a steel wire rope, one end of which is fixed to the spool 3211 of the drive kit 321, and the other end is fixed to the corresponding spring plate unit 311 of the spring plate arm 31, serving as a transmission device. It is fixed by placing a rope end at the end and tightening it with screws, relying on friction for fixation. Each small segment of the spring plate arm 31 is driven by two drive ropes 33 and can only perform bending motion in a plane. These two drive ropes 33 are respectively fixed to both sides of the spring plate unit 311 at the end of this small segment and to the corresponding second motor module 3213. The second motor module 3213 changes the bending angle of this small segment by tightening or loosening the drive ropes 33.
[0068] In this embodiment, the sensing method by which the rope-driven flexible robot acquires rope tension data depends on the positional arrangement of the drive kit 321 and the sensing kit 322, such as... Figure 3As shown, its function is to measure the tension of the drive rope 33. Specifically, the tension signal of the drive rope 33 is calculated through the positional arrangement of the drive assembly 321 and the sensing assembly 322. The principle is to measure the resultant force of the drive rope 33 at the guide pulley 3223 using the tension sensor 3221, and then obtain the tension signal of the drive rope 33 through geometric conversion. Specifically, the component force of the drive rope 33 is obtained by converting the angle of the ropes on both sides of the guide pulley 3223, which is the tension. This is then converted into a digital signal by the digital transmitter 3222, where the geometric relationship is obtained through the common tangent of the cage 3212 and the guide pulley 3223. Based on the measured tension data, compliant control of the flexible robotic arm's end effector can be achieved.
[0069] This embodiment provides a real-time gaze estimation method and system based on facial and binocular features. It includes four steps: image acquisition and face detection, head pose correction, eye image extraction, and pupil detection and gaze estimation.
[0070] This system considers the impact of head pose on the gaze estimation vector and eliminates this impact by correcting the image, allowing the gaze estimation algorithm to ignore the influence of head pose and primarily focus on the gaze estimation accuracy under standard poses. This part works in conjunction with a dual-arm feeding robot to detect the feeder's attention in real-time by estimating the gaze, thereby making the correct food selection and feeding actions. The gaze estimation method proposed in this embodiment is as follows: Figure 9 As shown, it includes five steps: receiving face images, face detection and alignment, mesh fitting and head pose estimation, face image correction, and gaze estimation.
[0071] The specific process of the line-of-sight estimation method is as follows: Figure 10 As shown,
[0072] S1. Obtain the face image captured by the camera through the camera module;
[0073] S2, Face Detection and Alignment Step: This step detects the face position in the face image and obtains facial feature points. The input target face image must have sufficient detail to distinguish facial contours and pupils. Specifically, it includes the following steps:
[0074] S21. Detect the face position of the face image using a face detection algorithm; specifically, detect the face position of the person from the input face image, crop the image of the face center and surrounding area and perform facial feature point detection. This process outputs the positions of 6 facial feature points through the BlazeFace (face detection) algorithm, including the centers of the two eyes, the two ears, the nose, and the mouth.
[0075] S22. Based on the angle between the line connecting the current pose feature points and the horizontal line, obtain the transformation relationship between the current pose and the standard pose, and perform the first face alignment. Specifically, perform preliminary face alignment based on the facial feature points. Then, the pose where the horizontal line connecting the left and right feature points is the standard pose, and the pose represented by the facial feature points detected in the current input image is the current pose. Here, face alignment refers to finding the transformation relationship between the current pose and the standard pose, which can be calculated based on the angle between the line connecting the facial feature points of the current pose and the horizontal line.
[0076] Step S3, Mesh Fitting and Head Pose Estimation, involves aligning the face based on facial feature points and estimating the head pose. Step S3 includes the following steps:
[0077] S31. Perform a second face alignment based on the facial feature points using a real-time face alignment method. Specifically, based on the 6 facial feature point positions obtained in S2, perform a second face alignment by further aligning the face using the real-time face alignment method FaceMesh, which outputs 468 face alignment points.
[0078] S32. The head pose matrix is obtained by iteratively finding the most suitable projected 3D transformation from feature points in the standard pose and the current pose using Procrustes analysis. Specifically, the pose with the head upright and facing the camera is called the standard pose, and the head pose in the current input face image is called the current pose. The current pose can be represented by the positions of 468 facial alignment points. Then, the most suitable projected 3D transformation is found iteratively from feature points in the standard pose and the current pose using Procrustes analysis. Here, Procrustes analysis is a method for finding 3D transformations between different poses. Using this method, combined with camera calibration, the current head pose matrix H can be obtained, and the following relationship holds:
[0079]
[0080] In the formula It is the position of the center of the feeder's eye relative to the camera. These are the center positions of the eyes in the standard pose. They can be transformed using the head pose matrix H. The input image is then warped using this pose matrix H to obtain the input image I. H Image I corresponding to the standard pose O The location of the mouth center is given by the feature points of the mouth output by the BlazeFace face detection algorithm, and obtained by rotating the head pose matrix.
[0081] S4, the face image correction step, corrects the face image based on the head pose estimation result. Specifically, the face image is corrected using the head pose matrix H obtained in step S3, ensuring the face image is in a standard pose facing the screen. The corrected image is then used to complete the subsequent gaze estimation step. Specifically, based on the feature points at both eyes, from image I... O Extract the left-eye and right-eye images.
[0082] S5. The gaze estimation step extracts the features of the corrected face image to obtain the real-time gaze estimation result. Based on the real-time gaze estimation result, it is confirmed that the food on the plate is what the person being fed wants. The end effector of the robotic arm is compliantly controlled through the sensing unit to ensure feeding safety.
[0083] Specifically, the S4-corrected face image is input into a trained user-specific gaze estimation network model. The user-specific gaze estimation network model is then used to extract features from the corrected face image, thereby estimating the gaze estimation vector in the standard pose. The gaze estimation vector for the current pose is obtained by rotating the gaze estimation vector under the standard pose using the head pose matrix. The sensing unit includes a tension sensor, which measures the resultant force of the drive rope and converts it into a tension signal through geometric relationships. A digital transmitter then converts this tension signal into a digital signal. The line-of-sight estimation vector is obtained under the standard pose. The gaze estimation vector at the current pose The following relationship exists:
[0084]
[0085] in H is the gaze estimation vector in the standard pose, and H is the head pose matrix. It is the gaze estimation vector under the current pose.
[0086] Specifically, the left and right eye images extracted by S4 are input into a user-specific gaze estimation network model. The gaze estimation features for both eyes are extracted through a backbone network, which can be any network that can be trained to extract gaze estimation feature vectors. The extracted gaze estimation features are combined with the positions of the left and right eyes in the standard pose and input into a gaze multilayer perceptron to obtain gaze estimation vectors. The model is then trained using the error between these vectors and the actual gaze direction to obtain the user-specific gaze estimation network model.
[0087] The training and optimization of the user-specific gaze estimation network model in this embodiment includes the following steps:
[0088] S51. Optimizations were made to address the estimation errors caused by differences in the eyeball structures of different individuals. During the training phase, the gaze estimation network model was trained using the meta-learning method MAML++ (MAML++ is an improvement of MAML) (MAML, Model-Agnostic Meta-Learning). This framework effectively improves the generalization ability of the gaze estimation network model for gaze estimation tasks of different individuals, resulting in a general gaze estimation network model.
[0089] S52. Based on this, only the user of the feeding robot needs to be calibrated. By taking pictures of the user gazing at objects in known locations, the general gaze estimation network can be fine-tuned to obtain the user's user-specific gaze estimation network model. This can reduce gaze estimation errors caused by structural differences and environmental differences.
[0090] The main idea behind the meta-learning method, MAML++, is to allow the network to learn how to learn from different tasks. Here, "different tasks" is defined as gaze estimation tasks for different individuals. Figure 11 As shown, the detailed training method is as follows: First,
[0091] More specifically, training the gaze estimation network model in step S51 includes the following steps:
[0092] S511. Divide the gaze estimation dataset used for training according to identity labels. Sample the divided dataset using the method of n-way (n categories) and k-shot (k images per category). Sample k+1 images from the data of different identities in n categories. Use k images for training and the remaining 1 image for testing during the training phase. Here, n and k are constants.
[0093] S512. The n*k images used for training are called a Support set, and the remaining k images are called a Query set. The set consisting of a Support set and a Query set is called a sample. The gaze estimation network model is trained on the Support set of a sample and tested on the Query set, which is called a task.
[0094] S513. Take h samples as a batch for training to obtain the loss of these h tasks. The structure of each gaze estimation network model used for training is the same as the original gaze estimation network model, which is called the same network structure model. Here, h is a constant.
[0095] S514. These h gaze estimation network models (with the same network structure model) are trained on a support set of h samples. Each gaze estimation network model is called a meta-learner, where the first meta-learner is called meta-learner 1, the kth learner is called meta-learner k, and so on, with the last meta-learner being meta-learner h. The h meta-learners are trained on the support set of h samples and validated on the query set using the resulting h losses (loss1 to loss...). h The sum of these losses is used as the final Loss (the sum of the meta-learner's predicted gaze and gaze label losses).
[0096] S515. Finally, the loss is calculated by taking the gradient of the initial parameters φ of the gaze estimation network model and updating the initial parameters φ of the gaze estimation network model, for a total of N rounds of training. The final general gaze estimation network model has the following property: it is trained separately for several rounds on all sampled tasks and obtains the minimum sum of losses. That is, this general gaze estimation network model has the fastest initialization and convergence among all tasks, which greatly improves the generalization of the general gaze estimation network model and also enables subsequent fine-tuning to converge quickly.
[0097] The auxiliary feeding robot of this invention has the following effects:
[0098] I. Significantly improves security while maintaining accuracy.
[0099] Traditional feeding robots use a single arm, which cannot control the position of the tray. The lack of coordination between the tray and utensils easily leads to spillage during food transport. Using only a rigid robotic arm compromises safety; using only a flexible robotic arm compromises accuracy. The accuracy of this invention is primarily reflected in the precise food-grabbing process. During food grabbing, the rigid and flexible arms work in conjunction with line-of-sight detection to accurately determine which dish the user wants to eat, such as radish or meat in radish stir-fry, or meat in green pepper stir-fry but not green peppers. A typical flexible arm cannot accomplish this task, but the rigid arm holding the tray can compensate. For example, when the flexible arm is near the target, the rigid arm controls the tray to move the target dish directly below the end of the flexible arm (via the camera unit), allowing the flexible arm to simply grab it. This collaborative approach ensures accuracy.
[0100] This invention proposes a novel rigid-flexible dual-arm system for assisted feeding. By combining a rigid robotic arm with a flexible robotic arm, it addresses the challenge of balancing precision and safety in existing feeding robots. Furthermore, it enhances safety primarily through both active and passive methods.
[0101] 1. Utilize the inherent passive compliance of the rope-driven spring flexible arm. Since the flexible arm itself is not a rigid connection, its springs have a certain degree of elasticity. Therefore, it will deform accordingly after touching the person being fed, so as to avoid accidentally injuring the person being fed or causing a bad feeding experience.
[0102] 2. A novel tension sensing scheme for a rope-driven flexible arm is proposed, and the obtained tension is used for active compliant control. This scheme involves alternating ropes, using tension sensors to measure the resultant force of the ropes at the guide pulleys, and limiting the position of each rope to maintain geometric relationships. The rope tension is then calculated from the resultant force using these geometric relationships. With the rope tension information, compliant control and force limiting can be implemented on the robot, preventing accidental injury to the feeder and improving safety.
[0103] II. Improve the human-computer interaction experience
[0104] This invention utilizes gaze estimation technology to detect the desired food on the plate based on the feeder's gaze. During this process, the feeder only needs to move their eyes without any other actions, significantly reducing the operational steps and learning costs, improving the dining experience, and enhancing human-computer interaction. The algorithm offers two main benefits:
[0105] 1. Currently, the mainstream method for gaze estimation is to use eye trackers. However, specialized equipment such as eye trackers is expensive and requires an additional screen for gaze estimation. Gaze estimation significantly improves the user-friendliness of feeding tasks, especially for individuals with upper limb motor impairments and those with weakened functional abilities. Deploying eye trackers and screens for feeding tasks would greatly increase the cost and size of feeding robots, contradicting the initial goals of low-cost and user-friendly care.
[0106] 2. Deep learning-based methods for estimating gaze using face and eye images only require a standard RGB camera and sufficient computing power, making them simpler and less costly to deploy compared to eye-tracking methods. However, current methods on the same dataset have an error (the angle between the actual gaze and the estimated gaze) of around 3°. When trained and tested on different datasets without model generalization and personalization, the error can reach as high as 10°-20°. This makes image-based gaze estimation unusable in real-world environments due to excessive error, thus preventing its deployment on feeding robots. The gaze estimation method proposed in this invention comprehensively considers the influence of head pose on the estimated gaze vector, the model's generalization ability, and personalization. By rotating the image after estimating the head pose, the gaze estimation algorithm can ignore the influence of the head pose and mainly consider the gaze estimation accuracy under the standard pose. By adding the meta-learning framework of MAML++, the generalization of the model is improved, resulting in a general gaze estimation model. By using the user's image to fine-tune the general model, the model can be well adapted to the user's eye structure and the environment, thereby improving accuracy and reducing errors. This makes the image-based gaze estimation method effective for feeding tasks.
[0107] III. Save labor costs and improve work efficiency
[0108] The rigid-flexible dual-arm assisted feeding robot of this invention replaces manual labor by automating the feeding process, saving labor costs and improving work efficiency.
[0109] In other embodiments, the elastic arm can be replaced with other flexible robotic arms, such as... Figure 12 The rope-driven elastic rod flexible arm shown is specifically modified by replacing the spring plate in the above embodiment with an elastic rod, replacing the concave-convex disc composed of a concave disc and a convex disc with a structural disc, and fixing the rope at the point of fixation.
[0110] In other embodiments, the spring sheet in the spring sheet arm 31 can be changed in size and material.
[0111] The rigid robotic arm 2 in the above embodiments is a planar robotic arm, which can only move on a plane. In a preferred embodiment, its degree of freedom configuration can be changed, and it can be replaced with, for example... Figure 13 The spatial robotic arm shown has spatial movement capabilities and can change the height of the spoon without changing the effect.
[0112] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, several equivalent substitutions or obvious modifications can be made without departing from the concept of the present invention, and all such modifications, achieving the same performance or purpose, should be considered within the scope of protection of the present invention.
Claims
1. A method for estimating the gaze of an assisted feeding robot, characterized in that, Includes the following steps: S1. Acquire facial images through the camera module; S2. Detect the facial position in the face image and obtain facial feature points; S3. Perform face alignment based on the facial feature points and estimate head pose. S4. Correct the face image based on the head position pose estimation results; S5. Extract the features of the corrected face image to obtain the real-time gaze estimation result. Based on the real-time gaze estimation result, confirm that the food on the plate is what the person being fed wants, and realize the compliant control of the end of the robotic arm through the sensing unit. In step S5, a user-specific gaze estimation network model is used to extract features from the corrected face image, and the gaze estimation vector under the standard pose is estimated. The gaze estimation vector under the current pose is obtained by rotating the gaze estimation vector under the standard pose using the head pose matrix. The sensing unit includes a tension sensor, which measures the resultant force of the drive rope and obtains the tension signal of the drive rope through geometric conversion. A digital transmitter is then provided to convert the tension signal of the drive rope into a digital signal.
2. The line-of-sight estimation method for an assisted feeding robot as described in claim 1, characterized in that, Step S2 also includes the following steps: S21. Detect the face position in the face image using a face detection algorithm; S22. Based on the angle between the line connecting the current pose feature points in the facial feature points and the horizontal line, obtain the transformation relationship between the current pose and the standard pose, and perform the first face alignment.
3. The line-of-sight estimation method for an assisted feeding robot as described in claim 2, characterized in that, Step S3 also includes the following steps: S31. Perform a second face alignment based on the facial feature points using a real-time face alignment method; S32. By iteratively finding the most suitable projection 3D transformation from the feature points in the standard pose and the current pose through Pluke analysis, the head pose matrix is obtained.
4. The line-of-sight estimation method for an assisted feeding robot as described in claim 3, characterized in that, The head pose matrix satisfies the following relationship: in, It is the position of the center of the feeder's eye relative to the camera. It is the head pose matrix. It is the center position of the eye in the standard pose.
5. The line-of-sight estimation method for an assisted feeding robot as described in claim 1, characterized in that, The gaze estimation vector under the standard pose and the gaze estimation vector under the current pose have the following relationship: in, It is the gaze estimation vector under the standard pose. It is the head pose matrix. It is the gaze estimation vector under the current pose.
6. The line-of-sight estimation method for an assisted feeding robot as described in claim 1, characterized in that, The training and optimization of the user-specific gaze estimation network model includes the following steps: S51. The gaze estimation network model is trained using a meta-learning framework. The estimation error caused by the differences in eyeball structure among different people is optimized to obtain a general gaze estimation network model. S52. The user is calibrated by taking pictures of the user gazing at objects in known locations and then fine-tuning the general gaze estimation network model to obtain a user-specific gaze estimation network model.
7. The line-of-sight estimation method for an assisted feeding robot as described in claim 6, characterized in that, The meta-learning method uses the MAML++ framework.
8. The line-of-sight estimation method for an assisted feeding robot as described in claim 6, characterized in that, Step S51, training the gaze estimation network model includes the following steps: S511. Divide the gaze estimation dataset used for training according to identity labels. Sample the divided dataset by means of n categories and k in each category. Sample k+1 images from the data of different identities in n categories. Use k of these images for training and use the remaining 1 image for testing during the training phase. S512, n*k images form a support set, and the remaining k images form a query set. A set consisting of a support set and a query set is considered a sample. The gaze estimation network model is trained on the support set of a sample and tested on the query set, which is called a task. S513. Take h samples as a batch of samples for training and obtain the loss for these h tasks; S514. Train h of the gaze estimation network models on h support sets, and verify the h losses obtained on the query set, and sum them as the final loss; S515. The final loss is calculated by taking the gradient of the initial parameters of the gaze estimation network model and updating the initial parameters of the gaze estimation network model, and then performing multiple rounds of training. Where n, k, and h are constants.
9. An auxiliary feeding robot, characterized in that, Includes installation modules, rigid robotic arms, flexible robotic arms, and camera interaction systems; The rigid robotic arm and the flexible robotic arm are fixedly mounted on the mounting module; The video interaction system includes a camera module and a gaze estimation module; The flexible robotic arm is equipped with a sensing unit to achieve compliant control of the end effector. The camera module is fixedly mounted on the mounting module, captures facial images and transmits them to the gaze estimation module. The gaze estimation module is used to implement the gaze estimation method for the auxiliary feeding robot as described in any one of claims 1-8, so as to obtain the real-time gaze estimation result. Based on the real-time gaze estimation results, the camera interaction system controls the rigid robotic arm to place the plate in front of the person being fed, and controls the flexible robotic arm to deliver the desired food to the person's mouth.