Instruction for a dual-arm robot using demonstrations of human hands.

A camera-based neural network system accurately detects and converts human hand movements into robot gripper motions, addressing the inefficiencies of conventional robot teaching methods and enabling precise dual-arm robot programming.

JP7880791B2Active Publication Date: 2026-06-26FANUC LTD

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
FANUC LTD
Filing Date
2022-10-14
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Conventional methods for teaching industrial robots, particularly dual-arm robots, are unintuitive, time-consuming, and expensive, and existing techniques for identifying left and right hands in human performers impose artificial constraints or require complex computational steps, making it difficult to accurately program robots for tasks involving both hands.

Method used

A method using a single camera to capture human hand movements, analyze hand poses with neural networks, and convert these poses into robot gripper movements, allowing for efficient and reliable detection of both hands without whole-body imaging or skeletal analysis, and enabling precise robot programming.

Benefits of technology

Enables accurate and efficient teaching of dual-arm robots by simplifying the programming process, reducing errors, and eliminating the need for expensive motion capture systems, while allowing natural hand movements without artificial constraints.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007880791000001
    Figure 0007880791000001
  • Figure 0007880791000002
    Figure 0007880791000002
  • Figure 0007880791000003
    Figure 0007880791000003
Patent Text Reader

Abstract

To provide a technique of efficient and reliable dual arm robot teaching using dual hand detection in human demonstration.SOLUTION: There is provided a method for dual arm robot teaching from dual hand detection in human demonstration. A camera image of the demonstrator's hands and workpieces is provided to a first neural network which determines the identity of the left and right hands from the image, and also provides cropped sub-images of the identified hands. The cropped sub-images are provided to a second neural network which detects the poses of both the left and right hands from the images. The dual hand pose data for an entire operation is converted to robot gripper pose data and used for teaching two robot arms to perform the operation on the workpieces, where each hand's motion is assigned to one robot arm. Edge detection from camera images may be performed in tasks requiring precision, such as inserting a part into an aperture.SELECTED DRAWING: Figure 3
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Cross - reference to Related Applications This application is a continuation - in - part of U.S. Utility Patent Application No. 17 / 018,674, titled "DUAL HAND DETECTION IN TEACHING FROM DEMONSTRATION", filed on September 11, 2020.

[0002] This disclosure relates to the field of industrial robot programming, and more particularly, to a method for identifying and determining the poses of a human demonstrator's left and right hands from a series of camera images, wherein the resulting hand motion traces are improved with respect to the accuracy of motion points and the smoothness of the path, and the improved motion traces are used to teach or program a dual - arm robot system to perform operations through human demonstration.

Background Art

[0003] It is well known to use industrial robots to repeatedly perform various operations such as manufacturing, assembly, and material movement. However, teaching a robot to perform even a very simple operation, such as picking up a workpiece at a random position and orientation on a conveyor and moving the workpiece to a container, has been problematic with conventional methods.

[0004] One traditional method of robot teaching involves an operator using a teach pendant to instruct the robot to perform stepwise actions, such as "move in the X direction" or "rotate the gripper around the local Z axis," until the robot and its gripper are in the correct position and orientation for the operation. This operation data is then saved and repeated multiple times. Another known technique for teaching robots to perform operations is the use of motion capture systems in conjunction with human demonstrations. Because robot programming using teach pendants and motion capture systems has proven to be unintuitive, time-consuming, and / or expensive, robot teaching techniques using human demonstrations with camera images have been developed.

[0005] In some types of operations, such as assembling devices composed of many parts, people naturally use both hands to perform the operation task. For accurate robot teaching in such cases, it is necessary to reliably detect the left and right hands of a human performer. One known method for identifying the left and right hands of a human performer involves providing a camera image of the person's entire body, performing anthropomorphic analysis of the image to identify the left and right arms, and then identifying the left and right hands based on the arm identification. However, this technique requires a separate camera image for arm / hand identification, distinct from the image needed for hand pose detection, and further requires a supplementary computational step for skeletal analysis of the body.

[0006] Other techniques that may be used to distinguish between the left and right hands of a human performer require that each hand maintain a relative position to the other, or that all teaching movements of each hand remain within boundaries. However, these techniques impose constraints that are difficult to maintain on the natural hand movements of a human performer, and there is a risk of misidentification of hands if these constraints are violated.

[0007] Furthermore, some robotic tasks involving moving parts or assembling products utilize two robotic arms, where the two arms perform two different operations simultaneously, or where they work together to grasp, position, and install parts. Teaching dual-arm robots presents challenges in terms of difficulty and time using the conventional methods described above. This problem becomes even more pronounced when teaching movements to two robotic arms.

[0008] In light of the above circumstances, there is a need for an efficient and reliable dual-arm robot teaching technique that uses human-assisted detection of both hands. [Overview of the Initiative]

[0009] A method for teaching a dual-arm robot by detecting both hands in a human demonstration, in accordance with the teachings of this disclosure, is described and illustrated. Camera images of the performer's hands and the workpiece are provided to a first neural network, which identifies the left and right hands from the images and provides cropped sub-images of the identified hands. The cropped sub-images are provided to a second neural network, which detects the poses of both the left and right hands from the images. The pose data of both hands for the overall operation is converted into pose data for the robot's grippers and used to teach the two robot arms to perform operations on the workpiece. The movement of each hand is assigned to one robot arm. For tasks requiring precision, such as inserting a part into an opening, it is also possible to improve the robot's movement by performing edge detection from the camera images to improve part position estimation.

[0010] Additional features of the apparatus and method disclosed herein will become apparent from the following description and accompanying claims, in conjunction with the attached drawings. [Brief explanation of the drawing]

[0011] [Figure 1]Figure 1 shows a method according to one embodiment of the present disclosure for analyzing an image of a human hand and determining the corresponding position and orientation of a finger-shaped robotic gripper. [Figure 2] Figure 2 shows a method according to one embodiment of the present disclosure for analyzing an image of a human hand to determine the corresponding position and orientation of a magnetic or suction cup type robotic gripper. [Figure 3] Figure 3 is a diagram of a system and process relating to one embodiment of the present disclosure for identifying the position and posture of hands from camera images of both hands of a human performer. [Figure 4] Figure 4 is a diagram illustrating the process of training a neural network for hand detection and identification used in the system of Figure 3, according to one embodiment of the present disclosure. [Figure 5] Figure 5 is a flowchart illustrating a method for identifying the position and posture of a human performer's hands from camera images of both hands, according to one embodiment of the present disclosure. [Figure 6] Figure 6 is a flowchart illustrating a method for instructing a robot to perform operations using camera images of a human performer's hands and corresponding workpieces, according to one embodiment of the present disclosure. [Figure 7] Figure 7 is a diagram of a system for robot operation based on human demonstration and teaching using both hands, according to one embodiment of the present disclosure. [Figure 8] Figure 8 is a block diagram of a system and process relating to one embodiment of the present disclosure, which uses the hand movements to identify the position and orientation of the hands of a human performer from camera images of both hands, to improve the movements of both hands, and to teach these movements to two robotic arms. [Figure 9] Figure 9 illustrates a multi-step technique for precise object position estimation using vision-based edge detection to provide improved starting and target positions for the movements of both hands, as shown in one of the steps of Figure 8. [Figure 10] Figure 10 is a set of 3D graphs illustrating two different techniques for smoothing or simplifying hand motion tracing, as shown in one of the steps in Figure 8. [Figure 11]Figure 11 is a system diagram of the operation of a dual-arm robot based on human demonstration and teaching using both hands, according to one embodiment of the present disclosure. [Figure 12] Figure 12 is a flowchart illustrating a method for teaching a dual-arm robot based on the detection of both hands in a human demonstration, according to one embodiment of the present disclosure. [Modes for carrying out the invention]

[0012] The following discussion relating to embodiments of this disclosure concerning the teaching of a dual-arm robot by demonstration of human hands is essentially illustrative and is not intended in any way to limit the disclosed apparatus and technology, or any use or application of such apparatus and technology.

[0013] Industrial robots are well known for their use in a variety of manufacturing, assembly, and material handling operations. One known type of robot operation is sometimes referred to as "pick, move, and place." The robot picks up a part or workpiece at a first position and moves it to a second position. The first position is often a conveyor belt where parts with random orientations, such as parts just removed from a mold, are flowing. The second position may be another conveyor or transport container leading to a different operation, but in either case, the part needs to be positioned in a specific location and oriented in a specific orientation at the second position. Similarly, other robot operations, such as assembling multiple components into a device like a computer enclosure, require taking parts from one or more sources and placing them in a precise position and orientation.

[0014] To perform the types of operations described above, it is usually necessary to determine the position and orientation of the incoming part using a camera and to teach the robot to grasp the part in a specific way using a finger gripper or a magnetic or suction cup gripper. Teaching the robot how to grasp the part depending on its orientation has typically been done by a human operator using a teach pendant. The teach pendant is used by the operator to instruct the robot to move step by step, such as "move in the X direction" or "rotate the gripper around the local Z axis," until the robot and its gripper are in the correct position and orientation to grasp the workpiece. The robot's configuration and the position and orientation of the workpiece are then recorded by the robot controller for use in the "pick" operation. Similar teach pendant commands then define the "move" and "place" operations. However, using a teach pendant to program a robot is often unintuitive, error-prone, and time-consuming, especially for non-expert operators.

[0015] Another known technique for teaching robots to perform pick, move, and place operations is the use of motion capture systems. Motion capture systems consist of multiple cameras positioned around a work cell, recording the position and orientation of the human operator and the workpiece as the operator manipulates it. To more accurately detect key positions of the operator and workpiece within the camera images as the operation is performed, uniquely recognizable marker dots may be attached to the operator and / or the workpiece. However, this type of motion capture system is expensive, and accurately setting up and configuring it to ensure accurate recorded positions is difficult and time-consuming.

[0016] Techniques have been developed to overcome the limitations of the above-described existing robot teaching methods. This technique includes a method of using a single camera to capture an image of a person performing natural part grasping and movement operations, analyzing the image of the person's hand and the position of the hand with respect to the part, and generating robot programming instructions.

[0017] Figure 1 is a diagram showing a method of analyzing an image of a human hand and determining the corresponding position and orientation of a finger-type robot gripper according to an embodiment of the present disclosure. The hand 110 has a hand coordinate system 120 defined to adhere to the hand. The hand 110 includes a thumb 112 having a thumb tip 114 and an index finger 116 having an index finger tip 118. Other points on the thumb 112 and the index finger 116, such as the positions of the bases of the thumb 112 and the index finger 116, and the positions of the first joints of the thumb 112 and the index finger 116, can also be identified in the camera image.

[0018] The point 122 is located midway between the base of the thumb 112 and the base of the index finger 116, and the point 122 is defined as the origin of the hand coordinate system 120. The orientation of the hand coordinate system 120 can be defined using any rule suitable for correlating with the orientation of the robot gripper. For example, the Y-axis line of the hand coordinate system 120 can be defined as perpendicular to the plane of the thumb 112 and the index finger 116 (the same plane is defined by the points 114, 118, and 122). Thus, the X-axis line and the Z-axis line are within the plane of the thumb 112 and the index finger 116. Further, the Z-axis line can be defined as bisecting the angle (angle 114-122-118) made by the thumb 112 and the index finger 116. The orientation of the X-axis line can be found by the right-hand rule from the known Y-axis line and Z-axis line. As described above, the rules defined here are merely illustrative, and other coordinate system orientations may be used instead. What is important is that the position and orientation of the coordinate system can be defined based on the main recognizable points of the hand, and the position and orientation of the coordinate system can be correlated with the position and orientation of the robot's gripper.

[0019] An image of the hand 110 can be provided using a camera (not shown in FIG. 1; described later). The image is analyzed to determine the spatial positions of the thumb 112 and index finger 116, including the thumb tip 114, index finger tip 118, and finger joint portions, i.e., the origin position 122 and the orientation of the hand's reference coordinate system 120 (such as within the coordinate system of the work cell). In FIG. 1, the position and orientation of the hand's reference coordinate system 120 are associated with the gripper coordinate system 140 of the gripper 150 mounted on the robot 160. The gripper coordinate system 140 has an origin 142 corresponding to the origin 122 of the hand's reference coordinate system 120, and points 144 and 146 corresponding to the index finger tip 118 and thumb tip 114, respectively. Thus, the two fingers of the finger-type gripper 150 are in the X-Z plane of the gripper coordinate system 140, and the Z-axis bisects the angle 146-142-144.

[0020] The origin 142 of the gripper coordinate system 140 is also defined as the tool center point of the robot 160. The tool center point is a point whose position and orientation are recognized by the robot controller, and the controller can provide command signals to the robot 160 to move the tool center point and the coordinate system (gripper coordinate system 140) associated with the tool center point to the defined position and orientation.

[0021] FIG. 2 is a diagram showing a method of analyzing an image of a human hand to determine the corresponding position and orientation of a magnetic or suction cup type robot gripper according to an embodiment of the present disclosure. FIG. 1 shows a method of associating the hand posture with the orientation of a mechanical gripper having movable fingers, and FIG. 2 shows a method of associating the hand posture with a flat gripper (e.g., circular) that lifts a part on a plane of the part using either suction force or magnetic force.

[0022] The hand 210 also includes a thumb 212 and an index finger 216. Point 214 is located where the thumb 212 makes contact with the part 220. Point 218 is located where the index finger 216 makes contact with the part 220. Point 230 is defined as being midway between points 214 and 218, and point 230 corresponds to the tool center point (TCP) 240 of the face gripper 250 of the robot 260. In the case of the face gripper 250 shown in Figure 2, the plane of the gripper 250 is defined as the plane containing lines 214–218, and may be defined as a plane perpendicular to the plane of the thumb 212 and index finger 216 based on the detection of the finger joints and fingertips. The tool center point 240 of the gripper 250 corresponds to point 230 as described above. This fully defines the position and orientation of the face gripper 250 corresponding to the position and orientation of the hand 210.

[0023] A technique for teaching a robot to perform an operation based on a human demonstration, particularly based on the analysis of images of a human hand and a workpiece taken by a camera, is described in U.S. Patent Application No. 16 / 843,185, filed on April 8, 2020, by the same applicant as this application, entitled "ROBOT TEACHING BY HUMAN DEMONSTRATION". U.S. Patent Application No. 16 / 843,185 (hereinafter referred to as "Application 185") is incorporated herein by reference in its entirety. In particular, Application 185 discloses a technique for determining the 3D coordinates of key points of one hand (such as finger joints) from camera images of the hand.

[0024] In some types of operations, such as assembling a device composed of several components, human performers naturally use both hands to perform the operation task. For accurate robot teaching in such cases, it is necessary that the human performer's left and right hands are reliably identified in the images. One known method for identifying a human performer's left and right hands involves providing a camera image of the person's entire body, performing an anthropomorphic analysis of the body image to identify the left and right arms, and then identifying the left and right hands based on the arm identification. However, this technique requires a separate camera image from the one needed for hand pose detection to identify the arms / hands, and further requires an additional computational step for skeletal analysis of the body. Other two-handed teaching methods prohibit the human performer from crossing one hand over the "opposite side" of the other hand.

[0025] This disclosure describes a technique for reliably determining the identification, position, and orientation of both hands of a human performer in a camera image using the key point detection method of application No. 185, without imposing artificial restrictions on the use or movement of the performer's hands as required by existing methods, and without requiring whole-body imaging and analysis.

[0026] Figure 3 is a diagram of a system and process relating to one embodiment of the present disclosure for determining the position and orientation of a human performer's hands from camera images of both hands. Camera 310 provides an image of the training workspace; that is, camera 310 provides an image of the area occupied by the operator's hands while demonstrating the instruction. The training workspace may be, for example, a tabletop on which the device is assembled. Camera 310 provides a color image of the training workspace, but is preferably a two-dimensional (2D) camera that does not provide depth information like a 3D camera.

[0027] Camera 310 provides images 312 as shown in Figure 3. The processing of images 312 is described in detail in Figure 3. Camera 310 provides a continuous series of images, each image processed as shown in Figure 3 to provide a complete motion sequence used by the robot, such as picking up a part, moving it to a new location, and positioning it in a desired orientation. Since the human performer is at the top of image 312, the right hand is shown on the left side of image 312 and the left hand is shown on the right side of image 312.

[0028] Image 312 is analyzed by a first neural network 320 to identify the left and right hands in image 312 and determine their respective positions. The first neural network 320 can identify the left and right hands in images of hands only (not the whole body), providing performance not available in conventional hand image analysis systems. The first neural network 320 identifies the left and right hands based on cues such as finger curvature (the fact that human fingers can only bend in one direction) and the relative position of each finger to the thumb, regardless of the relative positions of the hands in image 312. With appropriate training (see Figure 4 below), it has been demonstrated that the first neural network 320 can quickly and reliably identify and position the left and right hands in image 312.

[0029] Based on the output of the first neural network 320 in box 330, a cropped image 332 of the right hand and a cropped image 334 of the left hand are created. In this case as well, the images of the right hand 332 and the left hand 334 are determined not simply based on the position of the hands in images 310 / 312, but based on the actual identification of the hands through image analysis by the first neural network 320. That is, in some images, the hands may be crossed so that the left and right hands are shown on the opposite side of their expected "normal" positions.

[0030] Images 332 of the right hand and 334 of the left hand are cropped around the hands as shown to maximize the image resolution provided for subsequent analysis and minimize the amount of superfluous data. Image 332 of the right hand is provided to a second neural network 350 via line 342. The second neural network 350 analyzes image 332 to determine the three-dimensional (3D) coordinates of numerous key points of the right hand. Key points include fingertips, knuckles, thumb tips, and thumb joints. The second neural network 350 is trained using many images of a particular hand (here, for illustrative purposes, we assume a right hand). A technique for determining the 3D coordinates of key points of a hand from known (left or right) hand identification images is disclosed in U.S. Patent Application No. 16 / 843,185, cited above.

[0031] The image of the left hand 334 is provided by line 344. If the second neural network 350 is trained to recognize the important points of the image of the right hand, the image of the left hand 334 must be horizontally flipped in box 346 before being provided to the second neural network 350. The second neural network 350 analyzes the flipped version of image 334 to determine the three-dimensional (3D) coordinates of the numerous important points (fingertips, knuckles, etc.) of the left hand. Because image 334 is horizontally flipped, the second neural network 350 can analyze the flipped image 334 as if it were an image of the right hand.

[0032] Incidentally, the second neural network 350 may be trained using either a left-hand or right-hand image. If the second neural network 350 is trained using a right-hand image, the left-hand image must be inverted for processing by the second neural network 350, and vice versa.

[0033] In line 362, a 3D "wireframe" structure of the right hand is provided in box 372. As described in detail in the previously cited U.S. Patent Application No. 16 / 843,185, the 3D wireframe structure of the hand output by the second neural network 350 includes key points of the hand structure and their connections (e.g., the bone segment of the index finger connecting the fingertip at coordinates X1 / Y1 / Z1 to the first joint at coordinates X2 / Y2 / Z2) to the extent that can be determined based on the visibility of the original image. In other words, the position of fingers or parts of fingers that are bent downwards and hidden in the image cannot be resolved.

[0034] In line 364, the 3D wireframe structure of the left hand is output from the second neural network 350. The horizontal coordinates (usually the X coordinates) of the left hand's important points must be inverted in box 366 before being provided to box 374. The horizontal inversion in box 366 must be along the same mirror plane (e.g., the Y-Z plane) as the inversion of the original image in box 346.

[0035] As a result of the image analysis described above, box 372 contains the 3D wireframe structure of the right hand (3D coordinates of the fingertips and thumb joints), and box 374 similarly contains the 3D wireframe structure of the left hand. Using the 3D coordinate data from the hands, the gripper coordinates can be calculated as shown in Figures 1 and 2 and described above. In this way, the gripper position and orientation are calculated and output in line 380.

[0036] Figure 4 is a diagram illustrating the steps for training a neural network 320 for hand detection and identification used in the system of Figure 3, according to an embodiment of the present disclosure. The first neural network 320 is shown in the center of Figure 4. The first neural network 320 is shown in Figure 3 and, as described above, is responsible for identifying and positioning the left and right hands in an image. Training the first neural network 320 for left-hand versus right-hand recognition is achieved by providing the first neural network 320 with a large number of training images in which the left and right hands are in predetermined relative positions.

[0037] Image 410 is an example of a training image used to train the first neural network 320. Image 410 includes both hands of a human performer, with the left and right hands in known relative positions, such as being on a predetermined side of a dividing line or identified within a bounding box. One way to predetermine the positions of the left and right hands in Image 410 is that the hands are in their "normal" relative positions (not crossed at the wrists). Another way to predetermine the positions of the left and right hands in Image 410 is that the hands are positioned on each side of a dividing line 412. In Image 410, the dividing line 412 is in or near the center of the image, but does not necessarily have to be. If the hands cross at the wrists, the positions of the left and right hands are manually annotated in the bounding box.

[0038] The first neural network 320 is a multilayer neural network, as is well known to those skilled in the art, and includes an input layer, an output layer, and typically two or more hidden layers. The first neural network 320 is trained to recognize images of hands and recognize the structural characteristics of hands that distinguish between left and right hands. By combining several factors, such as finger curvature (bending in only one direction toward the palm) and the relative position of the thumb and fingers, it is possible to distinguish between the upper and lower, left and right, of a particular hand. Because the first neural network 320 recognizes the distinction between left and right hands before analyzing each image, the neural network 320 can automatically construct the structure of its layers and nodes to reliably correlate structural features with hand identification. Through training by analyzing multiple images, the first neural network 320 learns to recognize the contrast between structural features characteristic of the right hand and features characteristic of the left hand.

[0039] Output image 420 shows the result of training with image 410. A hand is detected and placed in box 422, and the first neural network 320 recognizes that the hand is a right hand based on the hand's position relative to the dividing line 412. (Since the person's body is at the top of image 410 / 420, the person's right hand is on the left side of image 410 / 420.) Similarly, a hand is detected and placed in box 424, and the first neural network 320 recognizes that the hand is a left hand based on the hand's position. As shown by boxes 422 and 424, a technique is employed to crop subimages around the hand, for example, by cropping the subimages to include the region containing all visible fingertips and thumbtips, as well as the position identified as the wrist joint.

[0040] Image 430 is another example of a training image used to train the first neural network 320. Image 430 also includes both hands of a human performer, with the left and right hands identified within bounding boxes. In Image 430, bounding box 432 is provided as an annotation or indexing property to identify the right hand. In Image 430, the performer's hands are crossed, so the right hand is located where the left hand would have been expected. However, for the purpose of bounding box identification, the first neural network 320 recognizes that the hand within bounding box 432 is the performer's right hand. Similarly, bounding box 434 is provided as an annotation or indexing property to identify the left hand.

[0041] Output image 440 shows the results of training with image 430. A hand is detected and placed in box 442, which is essentially the same as bounding box 432, and the first neural network 320 recognizes that the hand is a right hand based on the bounding box information, even though the hand is crossed. Similarly, a hand is detected and placed in box 444, and the first neural network 320 recognizes that the hand is a left hand based on the bounding box information. By analyzing the hands in boxes 442 and 444 in images 430 / 440, the first neural network 320 is progressively trained for hand identification and detection.

[0042] Image 430 is very different from Image 410. The input images include various performers, various components, operations and backgrounds, with and without gloves, and even slightly different camera angles (viewpoints). These differences in the input training images help train the first neural network 320 to robustly recognize hand structure and identification in images processed during the actual execution phase of robot teaching.

[0043] Many other input images 450 are provided to the first neural network 320 for training. Each input image 450 becomes an output image 460 in which the left and right hands are positioned and identified, as shown in Figure 4. After training, the first neural network 320 can be used to identify the left and right hands in image 312 (even when the hands are crossed) and provide a trimmed subimage containing the appropriately identified hands, as shown in Figure 3. A test system was developed to demonstrate the performance of neural networks such as the first neural network 320 in quickly and accurately identifying the right and left hands in images, precisely as described above, even when the left and right hands repeatedly overlap, cross, and uncross in a continuous series of images.

[0044] Figure 5 is a flowchart 500 of a method for identifying the position and posture of hands from camera images of both hands of a human performer, according to one embodiment of the present disclosure. The flowchart 500 shows the steps of the method corresponding to the system block diagram in Figure 3.

[0045] Box 502 provides images including both hands of a human performer. Images such as image 312 in Figure 3 preferably do not include the entire human body. Furthermore, the left and right hands in the images do not need to be in their "normal" or "expected" relative positions. The images show a human performer using both hands to grasp and position individual components of one or more workpieces, such as in the assembly of a multi-component device. In practice, images will be provided rapidly and sequentially (multiple images per second) to enable teaching a series of spatial grasping and positioning operations. In addition to hand identification, position, and orientation, the position and orientation of the workpiece will also be determined from the images and used in combination with hand ("gripper") data for robot teaching.

[0046] In box 504, the first neural network 320 is used to identify and position the left and right hands in the provided image. The operations performed in box 504 were described in detail earlier. In box 506, the original image is cropped into two sub-images, one containing the left hand and the other containing the right hand. Hand identification is provided in the sub-images.

[0047] In box 508, the subimage of the right hand is analyzed using the second neural network 350 to detect the structure of the fingers and the orientation of the hand. The operations performed in box 508 are as described above and are also described in detail in U.S. Patent Application No. 16 / 843,185 cited earlier. The second neural network 350 is trained to detect the structure of the hand using either a right-hand or left-hand image, so the subimage needs to be properly identified before analysis by the second neural network 350. In flowchart 500, it is assumed that the second neural network 350 is trained using a right-hand image. Therefore, the subimage of the right hand from box 506 is passed directly to box 508.

[0048] In box 510, the left-hand subimage is flipped horizontally before being provided to box 508 for analysis. Again, we assume that the second neural network 350 is trained using the right-hand image. Therefore, the left-hand subimage from box 506 must be flipped horizontally before being passed to box 508. The reverse procedure applies similarly. If the second neural network 350 is trained using the left-hand image, the right-hand subimage is flipped before analysis.

[0049] Box 512 uses the structure of the fingers of the right hand and hand posture data (3D coordinates of key points in the hand skeleton) to calculate the corresponding gripper posture, and the gripper posture (along with the workpiece posture data) is output as a robot teaching process. The complete method for teaching the robot from images of a human demonstration (hand and workpiece) is described below.

[0050] In box 514, the horizontal coordinates (e.g., X-coordinates) of the finger structure and hand posture data for the left hand from box 508 are inverted. Then, in box 512, the data is used to calculate the corresponding gripper posture, and the gripper posture is output as a robot teaching step. To return the 3D coordinate data of the hand to the correct position of the 3D coordinate data in the original input image, the horizontal coordinate data must be inverted or mirrored relative to a mirror.

[0051] As those skilled in the art will understand, the calculation of the 3D coordinates of the hand posture requires that the positions of the left and right hand sub-images in the original input image be always known. Furthermore, the pixel coordinates of the provided original image must be mapped to the physical workspace where the demonstration is taking place. This allows the 3D position and posture of the gripper and workpiece to be calculated from the pixel coordinates of the image.

[0052] From box 512, the robot teaching process is output and recorded for robot programming. The teaching process includes the position and orientation of the grippers calculated from the coordinate data of the left and right arms, as well as the position and orientation of the corresponding workpieces. The process then loops back to box 502 to receive another input image.

[0053] Figure 6 is a flowchart 600 of a method for teaching a robot to perform operations using camera images of a human performer's hands and corresponding workpieces, according to one embodiment of the present disclosure. The flowchart 600 is arranged in three vertical columns corresponding to the pick process (right), the move process (center), and the place process (left). The three individual processes illustrate how to create a robot motion program by analyzing images of the hands and workpiece. Both-hand detection in the images is essential for these processes.

[0054] The picking process begins in start box 602. In box 604, the workpiece and hand are detected in the image from camera 310. The two-hand detection method, described in detail earlier, is used in box 604. The position and orientation of the workpiece coordinate system are determined from the analysis of the workpiece in the image, and the position and orientation of the corresponding hand coordinate system are determined from the analysis of the hand in the image.

[0055] The decision diamond 606 determines, for each hand, whether the fingertips (thumb tip 114 and index finger tip 118 in Figure 1) made contact with the workpiece. This is determined from the camera image. If the fingertips made contact with the workpiece, the gripping posture and position of the workpiece and hand are recorded in box 608. It is important to identify the posture and position of the hand relative to the workpiece. That is, the position and orientation of the hand coordinate system and the workpiece coordinate system must be defined relative to some global, fixed reference coordinate system, such as the work cell coordinate system. This allows the controller to determine how to position the gripper to grasp the workpiece in a later playback phase. This workpiece contact analysis is performed for both the right and left hands.

[0056] After the gripping posture and position of the workpiece and hand are recorded in box 608, the picking process ends in end box 610. Next, the process proceeds to the move process, which begins in box 622. The move process can be performed individually for each hand. In box 624, the workpiece is detected in the camera image. In the determination diamond 626, if the workpiece is not detected in the camera image, the process loops back to box 624 to capture another image. If the workpiece is detected in the camera image, the position (and optionally the posture) of the workpiece is recorded in box 628.

[0057] In box 634, a hand (either hand – the one performing the current move operation) is detected in the camera image. In the decision diamond 636, if no hand is detected in the camera image, the process loops back to box 634 to capture another image. If a hand is detected in the camera image, the hand's position (and optionally its orientation) is recorded in box 638. If both the workpiece position (from box 628) and the hand position (from box 638) are detected and recorded from the same camera image, the hand position and workpiece position are combined and recorded in box 640. To combine the hand position and workpiece position, one may simply take the average of the two. For example, if the midpoint of the thumb tip 114 and the index finger tip 118 needs to coincide with the center / origin of the workpiece, the average position can be calculated between the midpoint and the center of the workpiece.

[0058] Multiple positions along the move process are preferably recorded to define a smooth move path by repeating the operation from the "Move Start" box 622 to the "Combine Hand and Workpiece Positions" box 640. In box 640, the hand position and workpiece position are combined and recorded, and after no further positions are needed for the move process, the move process ends in the End box 642. The process then proceeds to the Place process, which begins in box 662.

[0059] In box 664, the position of the workpiece is detected within the image from camera 310. In determination diamond 666, it is determined whether the workpiece is found in the camera image and whether the workpiece is stationary. Alternatively, it can be determined whether the fingertip has finished contact with the workpiece. If it is determined that the workpiece is stationary, or the fingertip has finished contact with the workpiece, or both, the orientation and position of the workpiece at the time of arrival are recorded in box 668. The entire process of the placing and teaching phases ends in termination box 670.

[0060] The robot teaching process described in the flowchart of Figure 600 relies on robust detection of the human hand posture within the image. If the human demonstration involves the use of both hands, the two-hand detection methods and systems shown in Figures 3-5 become essential.

[0061] Figure 7 shows a system 700 for robot operation based on human demonstration teaching using both hands, according to one embodiment of the present disclosure. The human demonstrator 710 is positioned so that a camera 720 can capture images of the demonstrator's hands and the workpiece being operated on. Camera 720 corresponds to camera 310 in Figure 3. Camera 720 provides images to a computer 730, which analyzes the images as previously described in detail to identify the 3D wireframe coordinates of the hands along with the position of the corresponding workpiece. The analysis by the computer 730 includes a two-handed detection method shown in Figures 3 to 5.

[0062] A human demonstrator 710 demonstrates the complete operation, such as assembling multiple components to complete the device. A camera 720 provides a series of images, and a computer 730 analyzes the images and records the identified robot teaching commands. Each teaching step includes the gripper's pose calculated from the hand's pose, and the corresponding workpiece's position / pose. This recording of the teaching steps includes grasping and positioning operations performed by one or both hands of the human demonstrator 710.

[0063] Once the robot's operation is fully defined from the human demonstration, the robot program is transferred from the computer 730 to the robot controller 740. The controller 740 communicates with the robot 750. The controller 740 calculates robot motion commands for the robot 750 to move the robot's gripper 760 to the position and orientation in the gripper coordinate system identified from the image. The robot 750 moves the gripper 760 relative to the workpiece 770 according to the series of commands from the controller 740, thereby performing the operation demonstrated by the human demonstrator 710.

[0064] In the scenario shown in Figure 7, the gripper 760 grasps the workpiece 770 and performs some operation on the workpiece 770, such as moving the workpiece 770 to a different position or orientation, or both. The gripper 760 is shown as a finger-type gripper, but it may instead be a suction cup or magnetic surface gripper, as previously mentioned.

[0065] The system 700 in Figure 7 can be used in two different modes. In one mode, a human demonstrator pre-teaches the entire process of an operation, such as assembling the device, once, and then the robot repeatedly performs the assembly operation based on the movement instructions of the components taught by the human demonstrator. The other mode is known as remote operation, in which a human demonstrator works in real time in conjunction with the robot. In this mode, each movement of the hand grasping and moving a part is analyzed and immediately executed by the robot, and the robot's movements are visually fed back to the human operator. Both of these operating modes can benefit from the disclosed technique of two-handed detection by human demonstrator.

[0066] The previous discussion described a technique for reliably detecting the movements of a human demonstrator's left and right hands and using the movements of both hands to define the gripper's motion in order to teach a robot to grasp and move a workpiece. This method can be extended to teach the coordinated motion of two robotic arms using the detection of both hands of a human demonstrator. This technique is described below.

[0067] Figure 8 is a block diagram of a system and process relating to one embodiment of the present disclosure, which uses the hand movements to identify the position and orientation of the hands of a human performer from camera images of both hands, to improve the movements of both hands, and to teach them to two robotic arms. The upper part of Figure 8 operates as described above with reference to Figure 3.

[0068] Camera 810 provides a continuous series of images of a human demonstration scene, including both hands of the performer and the workpiece being handled by the performer. Camera 810 provides the images to a first neural network 820, which is trained in box 830 to identify and segment the left and right hands in each image. As previously mentioned, the first neural network 820 can properly identify the left and right hands even if they are crossed. The cropped images of the left and right hands are provided to a second neural network 840 for analysis. The second neural network 840 is trained (on either the right or left hand images) to detect important features of the hands. The second neural network 840 indicates two paths: one for the cropped left hand image and another for the cropped right hand image. The same neural network 840 can be used to analyze both the left and right hands. As mentioned earlier, the hand not used for training needs to be flipped before and after analysis by the neural network 840. Box 850 provides key points for both hands (e.g., joints and fingertips). Everything from camera 810 to box 850 is as described above. For the entire operation performed by the human performer, both hands are detected for each image in a series (e.g., with an image interval of 0.1 seconds), resulting in the acquisition of continuous position / orientation data for each hand.

[0069] In box 860, pixel depth data from camera 810 is integrated with the key points of the hand to provide a 3D hand motion trace 862 of the performer's left hand and a trace 872 of the performer's right hand. Using pixel depth data from 3D camera 810 is a preferred technique for obtaining the 3D coordinates of points in motion traces 862 and 872. Other techniques for obtaining the 3D coordinates of key points of the hand are also possible, such as using one or more cameras 810 or providing hand size data (length of each joint of each finger) in advance.

[0070] In the scenario described later (and discussed in more detail), the left-hand trace 862 lowers its hand to grasp the memory card (point 864), then lifts the memory card, and lowers it again to position the card in place (point 866). At the same time, the right-hand trace 872 descends to push one end of the memory card into the slot (point 874), and then rises and moves away. Next, the left-hand trace 862 rises and moves to the opposite end of the memory card that is not yet inserted, pushing that end into the slot (point 868). Again, the 3D traces shown in box 860 are left-hand and right-hand motion traces demonstrated by a human, and these hand movements can be replaced with the movements of the robotic grippers in a dual-arm robotic system (see Figures 1 and 2).

[0071] In Box 880, two different improvement processes are performed on the motion trace from Box 860. The first improvement is object position estimation, which uses visual edge detection techniques to minimize the placement error of an object (e.g., a memory card) relative to the device on which it is placed (e.g., a computer enclosure). Object position estimation is optional and is typically used only when precise placement is required (e.g., assembly tasks), and not typically used when simply placing an object on a container or conveyor belt. The second improvement is path smoothing or simplification, which is also optional and redefined in a way that eliminates small twists in the hand motion trace so that the replacement trace is more suitable for robot programming. The improvement processes in Box 880 are described further below with reference to Figures 9 and 10.

[0072] In box 890, improved motion traces from the left and right hands of a human performer are provided to the dual-arm robot system, with the "left" robot arm performing the movements and tasks of the human left hand, and the "right" robot arm performing the movements and tasks of the human right hand. In Figure 8, the viewpoint of the figure is reversed, so the left hand and left robot arm are shown on the right side of the figure. As mentioned above, the hand analysis by the second neural network 840 provides the positions of the thumb and fingers, which can be used to determine both the hand movements and the gripper positions. Therefore, the robot commands provided in box 890 allow the robot controller to fully control the movement of each robot arm, the orientation of the gripper, and the gripper operations (grasping and releasing) for the execution of the tasks performed by the human.

[0073] Figure 9 illustrates a multi-step technique for precise object position estimation using vision-based edge detection, providing improved starting and target positions for bimanual motion, as shown in box 880 of Figure 8. Box 910 provides an RGB (color) image of a human demonstration scene before memory card insertion from camera 810. The image diagram in box 910 shows a computer housing 912 and a memory card 914. The memory card 914 is in a holder that holds the card 914 in a position where it can be grasped by a robot gripper, which then inserts the card 914 into a slot in the computer housing 912. Box 920 provides an edge image of the RGB image from box 910. The edge image in box 920 is provided by performing edge detection analysis of the vision (RGB) image in box 910 in a manner well known in the art. The edge detection analysis can be performed on the same computer as the hand pose analysis described above.

[0074] In box 930, an RGB image of the human demonstration scene after the memory card has been installed is provided by camera 810. In the image diagram in box 930, the computer casing 912 and the memory card 914 are still visible, but at this point the memory card 914 is in the installation position in the slot of the computer casing 912. In box 940, the edge image of the RGB image from box 930 is provided by performing edge analysis as described above.

[0075] In box 950, the edge image of box 920 is subtracted from the edge image of box 940. The only significant difference between the edge image of box 920 and the edge image of box 940 is the position of the memory card 914, which has moved from its position in the holder to its installation position in the (lateral) housing 912. The differential edge image is shown in box 960 (with the region defined by boxes 922 and 942 magnified), and the top edge of the memory card 914 is visible as a line 962 at its installation position, along with some noise pixels that are common in any edge analysis image. In box 970, the main feature of box 960 (the line 962 representing the top edge of the memory card 914) is shown transposed (magnified) onto the image of box 930.

[0076] As explained and shown in Figure 9, the start and end positions of the upper edge of the memory card 914 are adjusted to their precise positions in order to provide sufficient information to correct the hand motion traces 862 and 872. Specifically, the 3D coordinates of point 864 can be adjusted to the center of the upper edge of the memory card 914 in the pre-installation image in box 910 (or, if desired, slightly off-center), where the left robot arm will grasp the card 914 from the holder. Similarly, point 866 is adjusted (using the edge line 962) based on the precise installation position of the card 914 in box 970. Likewise, point 874 can be adjusted to lie on line 962 near one end of the card 914, and point 868 can be adjusted to lie on line 962 near the other end of the card 914.

[0077] It will be understood that the image edge analysis shown in Figure 9 is performed using a fixed coordinate system, which is the same coordinate system as the hand motion trace shown in box 860 of Figure 8. Therefore, in order to provide precise object positions at key motion points in the motion trace, the hand motion trace is corrected according to the edge detection object position estimation procedure in Figure 9 (box 880 of Figure 8). That is, the positions of actions such as picking up the component (memory card), placing the memory card in a precise location, and pressing the top edge of the memory card to secure it in the slot are obtained from the edge analysis / object position estimation in Figure 9. The angular direction of the card is also obtained from the edge analysis / object position estimation, not from the hand posture data.

[0078] Figure 10 is a set of 3D graphs illustrating two different techniques for smoothing or simplifying hand motion traces, as shown in box 880 of Figure 8. When a human demonstrator performs pick, move, and place operations for robot teaching, the human hand often makes unintended extraneous movements, which are often of small amplitude. The techniques disclosed herein use large hand movements (start and end points, as well as normal motion shapes) to provide a smoother motion profile with characteristics more suitable for programming robot motions.

[0079] Graph 1010 includes the original hand motion trace, as shown in box 860 of Figure 8. The 3D hand motion trace 862 depicts the motion of the performer's left hand, and trace 872 depicts the motion of the performer's right hand. In the scenario described above, the left hand trace 862 depicts the hand lowering (path arrow 1), grasping the memory card (point 864), lifting the memory card and then lowering it (path arrow 2), and placing the card in the desired position (point 866). At the same time, the right hand trace 872 descends (path arrow 3), pushing one end of the memory card into the slot (point 874), after which the right hand trace 872 lifts up and moves away. Next, the left hand trace 862 lifts up and moves to the opposite end of the memory card that is not yet inserted (path arrow 4), pushing that end of the memory card into the slot (point 868).

[0080] As explained in relation to Figure 9, the positions and orientations of operating points 864, 866, 868, and 874 were updated by edge detection-based object position estimation (for precise placement tasks such as assembly). For reference, memory card 914 is also shown on graph 1010. The 3D graphs in Figure 10 all represent the same fixed coordinate system described above for Figures 8 and 9.

[0081] Graph 1020 is a 3D graph containing smoothed motion traces 1022 and 1032 compared to the original corresponding traces 862 and 872, respectively. Traces 862 and 872 each follow a number of path points, and it is understood that each path point is determined from the hand pose analysis of one image, as shown in Figure 8. The smoothed motion trace 1022 is calculated using least-squares interpolation to create a new set of points by removing unwanted or extraneous biased motion from the original points, based on the original path points from trace 862, and then using spline interpolation to calculate trace 1022 through the new set of points. The same applies to the smoothed motion trace 1032. Least-squares interpolation cannot move the motion points 864, 866, 868, and 874. Therefore, the smoothed motion traces 1022 and 1032 contain motion points 864, 866, 868, and 874 at positions corrected by the edge detection object position estimation in Figure 9. The smoothed motion traces 1022 and 1032 do not include the small-amplitude "catching" motion of the original hand motion trace, making them more suitable for robot motion programming.

[0082] Graph 1040 is a 3D graph containing motion traces 1042 and 1052, which replace the original corresponding traces 862 and 872 with simplified line-segment-based traces. Using the original hand motion trace 862 as an example, several line segments that make up the simplified trace 1042 are defined, and the first set of three line segments is constructed using the original points on trace 862, points 864, 866, and the highest point (maximum z-coordinate). The first line segment is created by drawing straight up (same x and y coordinates) from point 864 to point 1044, which has the same z-coordinate as the highest point. The second line segment is created from point 1044 to point 1046, which is directly above point 866. The second line segment is horizontal and passes the highest point on its way from point 1044 to point 1046. The last, third line segment descends vertically from point 1046 to point 866. The three line segments defined in this way represent the first major movement of the original trace 862 (from point 864 to point 866). Using the same technique, a simplified line-based trace is defined that represents the second movement of the original trace 862. These line segments together constitute the simplified trace 1042.

[0083] The original hand motion trace 872 (the right hand simply performing the press-fitting task) can be replaced with a single line segment. Again, the simplified motion traces 1042 and 1052 include the operating points 864, 866, 868, and 874 at positions corrected by the edge detection object position estimation in Figure 9. The simplified motion traces 1042 and 1052 are clearly very suitable for robot motion programming. The simplified motion traces 1042 and 1052 can be made even more suitable for robot motion programming by, for example, rounding the square corners.

[0084] Figure 11 shows a dual-arm robot operation system 1100 based on human demonstration and instruction using both hands, according to one embodiment of the present disclosure. The human demonstrator demonstrates the operation (in this case, an assembly or installation task) in a tabletop workspace 1110. In the workspace 1110, camera 1120 images the demonstrator's hands (1112 / 1114) and the workpiece on which the operation is performed (computer housing 1116 and memory card 1118). Camera 1120 corresponds to camera 810 in Figure 8. Camera 1120 provides images to computer 1130, which analyzes the images and identifies key point coordinates of the hands using two pre-trained neural networks, along with the position of the corresponding workpiece, as detailed earlier. It will be understood by those skilled in the art that the connections between the devices in Figure 11 (e.g., camera and computer, controller and robot, etc.) may be wired, wireless, or a combination thereof, as in Figure 7. The analysis by computer 1130 includes the two-handed detection method shown in Figures 3 to 5 and Figure 8, resulting in motion traces of the left and right hands, including actions such as grasping, pressing, and releasing.

[0085] Computer 1130 applies depth data from camera 1120 to path points in the left and right hand motion traces from a human demonstration, as shown in box 860 in Figure 8. Computer 1130 also optionally applies two different refinement processes to the hand motion traces. The first refinement process is object position estimation using edge detection, which is advantageous when precise object position estimation is required, such as in an assembly process where one piece is inserted into the opening of another piece. This object position estimation technique corrects the position of motion points, such as picking up a memory card and inserting it into a slot, while the rest of the hand motion trace remains the same as that of a human performer. The second refinement process involves smoothing or simplifying the hand motion trace (after correction by object position estimation) to provide a motion trace more suitable for robot programming. Image analysis and path refinement by computer 1130 determine the left and right motion traces.

[0086] Computer 1130 provides the robot controller 1140 with established left and right motion traces. The controller 1140 communicates with camera 1150, which is positioned to capture images of the robot workspace 1160. The robot workspace 1160 contains the left robot 1162 and the right robot 1164, as well as the workpieces operated by robots 1162 / 1164. The workpieces correspond to the computer enclosure 1116 and memory card 1118 demonstrated in workspace 1110. Camera 1150 provides images of the workpieces to the controller 1140, which then provides control commands to robots 1162 / 1164 to perform the operations demonstrated by the human. The controller 1140 constantly monitors the position and orientation of each robot's gripper in the workspace coordinate system. Based on the image data of the workpiece, the controller 1140 can move the gripper using the determined left and right motion traces from the computer 1130, enabling it to perform precise part placement operations as demonstrated. The left robot 1162 performs the motion and operation of the left hand motion trace in the demonstration, and the right robot 1164 performs the motion and operation of the right hand motion trace in the demonstration.

[0087] The dual-arm robot teaching system shown in Figure 11 is also applicable to tasks where parts are moved or where parts are randomly placed in each task, such as parts arriving on a conveyor belt or a pile of parts being selected one at a time from a box. In this case, the teaching process includes both key point detection of the hand and posture detection of the workpiece, as detailed with reference to Figure 6, to determine the hand posture for grasping a part in a specific orientation.

[0088] Figure 12 is a flowchart 1200 of a method for teaching a dual-arm robot by detecting both hands in a human demonstration, according to one embodiment of the present disclosure. In box 1210, images of both hands of a human performer are provided by a 3D digital camera. The images are provided as a continuous flow so as to capture the movements and actions of both hands, as detailed earlier. In box 1220, a first trained neural network is used to determine the identification of the left and right hands in the images, even when the hands are "crossed" and deviate from their normal positions in the workspace demonstration scene. Trimmed sub-images of the identified left and right hands are provided from box 1220.

[0089] In box 1230, the cropped sub-images are analyzed by a second trained neural network to detect the structure of the fingers of the left and right hands (coordinates of key points including fingertips and joints). In box 1240, depth data from the camera is added to obtain hand pose data, providing 3D path points for the motion trace of the left and right hands in the workspace coordinate system (the gripper configuration is also determined from the finger / thumb positions). In the decision diamond 1250, it is determined whether the task demonstration is complete. If the task is not complete, the process returns to providing images of the hand and workpiece, and the hand motion data continues to be captured as a series of path points.

[0090] Once the task is completed in Decision Diamond 1250, the motion traces of the left and right hands are also completed, and the process moves to Box 1260, where the hand motion traces are optionally improved by object position estimation. In Box 1260, object position estimation maintains the overall motion traces of the left and right hands and corrects the positions of the motion points (e.g., pick, place, press) based on image edge analysis as described above. In Box 1270, the improved hand motion traces are optionally smoothed or simplified using one of the techniques in Figure 10. As a result of the improvements in Box 1270, the left and right motion traces are finalized.

[0091] In box 1280, the determined left and right motion traces are provided to the robot controller that operates the two robot arms. The robot controller receives an image of the workpiece to be manipulated and uses the determined left and right motion traces to instruct the left and right robot arms to perform operations on the workpiece. The motion traces also include gripper movements determined from hand posture data (such as the relative positions of the thumb and index finger).

[0092] Throughout the discussions to date, various computers and controllers have been described and suggested. It should be understood that the software applications and modules for these computers and controllers run on one or more computing devices having processors and memory modules. In particular, this includes the processors in computers 730 and 1130, as well as in robot controllers 740 and 1140. Specifically, the processor in computer 1130 is configured to perform dual-arm robot teaching via human demonstration in the manner described above, including bimanual detection by first and second neural networks, and motion trace improvement.

[0093] As outlined earlier, the disclosed technique for two-handed detection in human-demonstrated dual-arm robot teaching makes robot motion programming faster, easier, and more intuitive than conventional techniques, provides reliable detection of both hands of a human demonstrator while requiring only a single camera, and enables smooth programming of dual-arm robot systems even for assembly tasks requiring precise part placement.

[0094] To date, several preferred modes and embodiments of dual-arm robot teaching using human-assisted two-handed detection have been discussed, and those skilled in the art will recognize their modifications, rearrangements, additions, and secondary combinations. Accordingly, the appended claims and the claims introduced below are intended to be interpreted as including such modifications, rearrangements, additions, and secondary combinations, so as to be in the true spirit and scope of these. [Aspect 1] A method for teaching a dual-arm robot by demonstrating with both hands of a person, wherein the method is: To provide a series of images from a three-dimensional (3D) camera, including the left and right hands of a person demonstrating an operation on one or more workpieces. Using a first neural network running on a computer having a processor and memory, each of the images is analyzed, and the left hand and the right hand in the image are identified and their positions are determined. A second neural network running on the computer analyzes sub-images of the left and right hands and determines the coordinates of multiple important points on the left and right hands. Applying the pixel depth data from the 3D camera to the important points to generate 3D posture data for the left and right hands of each of the images, The 3D left and right hand posture data of the series of images is combined with the 3D left and right hand motion traces that define the operation. To improve the 3D motion traces of the left and right hands by performing object position estimation using edge detection, replacing the motion trace with a simplified version, or both, A method comprising controlling a dual-arm robotic system to perform the operation using the improved 3D left-hand and right-hand motion traces by a robotic controller, wherein the left-hand motion trace is performed by one robotic arm and the right-hand motion trace is performed by the other robotic arm. [Aspect 2] The method according to Embodiment 1, wherein in the training process, the first neural network is trained to distinguish between the left hand and the right hand, the training process provides the first neural network with a plurality of training images in which the left hand and the right hand are already identified, and the first neural network analyzes the training images to identify distinctive features of the left hand and the right hand, including finger curvature and relative position. [Aspect 3] The method according to embodiment 1, wherein each of the sub-images is cropped to include the left hand or the right hand with a predetermined margin. [Aspect 4] The method according to embodiment 1, wherein the second neural network is trained using a plurality of training images of either the left hand or the right hand. [Aspect 5] The method according to embodiment 4, wherein the analysis by the second neural network includes, when the second neural network is trained using a training image of the right hand, horizontally inverting the sub-image of the left hand before and after the analysis, and when the second neural network is trained using a training image of the left hand, horizontally inverting the sub-image of the right hand before and after the analysis. [Aspect 6] The method according to embodiment 1, wherein the aforementioned multiple important points of the left and right hands include the tip of the thumb, the thumb joint, the fingertip, and the finger joint. [Aspect 7] The method according to embodiment 1, wherein the gripper's posture is determined based on the 3D left hand and right hand posture data in the 3D left hand and right hand motion trace. [Aspect 8] The method according to Embodiment 1, wherein object position estimation includes converting visual images of one or more workpieces into first edge images before the demonstration, converting visual images of one or more workpieces into second edge images after the demonstration, creating a difference edge image by subtracting the second edge image from the first edge image, and determining the position and orientation of corresponding features of one or more workpieces using the features identified in the difference edge image. [Aspect 9] The method according to embodiment 8, wherein the position and orientation of the corresponding features of one or more workpieces are determined both before and after the demonstration, the position and orientation of the corresponding features before the demonstration is used to correct the object grasping points in the 3D left-hand and right-hand motion traces, and the position and orientation of the corresponding features after the demonstration is used to correct the object placement points in the 3D left-hand and right-hand motion traces. [Aspect 10] The method according to embodiment 1, wherein replacing the motion trace with a simplified version includes creating a new set of path points from the original set of path points using least-squares interpolation, calculating a smoothed motion trace through the new set of path points using spline interpolation, or replacing each of the motion traces with a simplified trace consisting of horizontal and vertical line segments constructed using path reversal points and peak path points. [Aspect 11] The method according to embodiment 1, wherein the robot controller receives an image of the robot workspace including the one or more workpieces and uses the improved 3D left-hand and right-hand motion traces to control the dual-arm robot system to perform the operation. [Aspect 12] The method according to embodiment 11, wherein the robot controller transposes the improved 3D left-hand and right-hand motion traces such that the motion points in the motion traces coincide with the positions of one or more workpieces in the image of the robot workspace, the motion points including the points at which the gripper grips, releases, or presses one of the workpieces. [Aspect 13] A method for programming a dual-arm robot system to be operated by the demonstration of both human hands, the method being: A person demonstrating the aforementioned operation of a workpiece using both hands, The camera images of the hand demonstrating the operation on the workpiece are analyzed by a computer to create demonstration data including the gripper's pose calculated from the three-dimensional (3D) coordinates of key points of the hand, wherein the 3D coordinates of the key points are determined from the image by a first neural network used to identify the left and right hands in the image and a second neural network used to calculate the 3D coordinates in sub-images of the identified left and right hands. Improving the demonstration data by performing object position estimation that corrects the operating point using edge detection, replacing the demonstration data with a simplified version, or both. To cause the dual-arm robot system to perform the operations on the workpiece, a robot motion command is generated based on the demonstration data, where one robot arm performs the task demonstrated by the person's hand, the other robot arm performs the task demonstrated by the other hand, and A method comprising performing the operation on the workpiece using the dual-arm robot system. [Aspect 14] The method according to embodiment 13, wherein the demonstration data includes the position and orientation of the hand coordinate system, the gripper coordinate system corresponding to the hand coordinate system, and the workpiece coordinate system during the gripping step of the operation. [Aspect 15] The method according to embodiment 13, wherein in the training process, the first neural network is trained to distinguish between the left hand and the right hand, and in the training process, a plurality of training images in which the left hand and the right hand are already identified are provided to the first neural network. [Aspect 16] The method according to embodiment 13, wherein the second neural network is trained using a plurality of training images of either the left hand or the right hand, and when the second neural network is trained using training images of the right hand, the sub-images of the left hand are horizontally flipped before and after analysis by the second neural network, and when the second neural network is trained using training images of the left hand, the sub-images of the right hand are horizontally flipped before and after analysis by the second neural network. [Aspect 17] A system for teaching a dual-arm robot through demonstrations using both human hands, wherein the system is: 3D camera, A computer having a processor and memory, Receiving a series of images from a three-dimensional (3D) camera, including the left and right hands of a person demonstrating an operation on one or more workpieces. Using a first neural network, analyze each of the images and determine the identification and position of the left hand and the right hand within the images. A second neural network analyzes sub-images of the left and right hands and determines the coordinates of multiple important points on the left and right hands. Applying the pixel depth data from the 3D camera to the important points to generate 3D posture data for the left and right hands of each of the images, The 3D left and right hand posture data of the series of images is combined with the 3D left and right hand motion traces that define the operation, and The computer is configured to perform a process including improving the 3D left-hand and right-hand motion traces by performing object position estimation using edge detection, replacing the motion trace with a simplified version, or both, A system comprising a robot controller that communicates with the computer, the controller controlling a dual-arm robot system to perform the operation using the improved 3D left-hand and right-hand motion traces, wherein the left-hand motion trace is performed by one robot arm and the right-hand motion trace is performed by the other robot arm. [Aspect 18] The system according to embodiment 17, wherein in the training process, the first neural network is trained to distinguish between the left hand and the right hand, the training process provides the first neural network with a plurality of training images in which the left hand and the right hand are already identified, and the first neural network analyzes the training images to identify distinctive features of the left hand and the right hand, including finger curvature and relative position. [Aspect 19] The system according to embodiment 17, wherein the second neural network is trained using a plurality of training images of either the left hand or the right hand, and analysis using the second neural network includes horizontally flipping the left hand sub-images before and after the analysis if the second neural network is trained using the right hand training images, and horizontally flipping the right hand sub-images before and after the analysis if the second neural network is trained using the left hand training images. [Aspect 20] The system according to embodiment 17, wherein the multiple important points of the left and right hands include the tip of the thumb, the thumb joint, the fingertips and finger joints, and the gripper's posture is determined based on the 3D left and right hand posture data in the 3D left and right hand motion trace. [Aspect 21] The system according to embodiment 17, wherein object position estimation includes converting visual images of one or more workpieces into first edge images before the demonstration, converting visual images of one or more workpieces into second edge images after the demonstration, creating a difference edge image by subtracting the second edge image from the first edge image, and determining the position and orientation of corresponding features of one or more workpieces using features identified in the difference edge image. [Aspect 22] The position and orientation of the corresponding features of one or more workpieces are determined both before and after the demonstration, the position and orientation of the corresponding features before the demonstration is used to correct the object grasping points in the 3D left-hand and right-hand motion traces, and the position and orientation of the corresponding features after the demonstration is used to correct the object placement points in the 3D left-hand and right-hand motion traces, according to Embodiment 21. [Aspect 23] The system according to embodiment 17, wherein replacing the motion trace with a simplified version includes creating a new set of path points from the original set of path points using least-squares interpolation, calculating a smoothed motion trace through the new set of path points using spline interpolation, or replacing each of the motion traces with a simplified trace consisting of horizontal and vertical line segments constructed using path reversal points and peak path points. [Aspect 24] The system according to embodiment 17, wherein the robot controller receives an image of the robot workspace including one or more workpieces and uses the improved 3D left-hand and right-hand motion traces to control the dual-arm robot system and perform operations. [Aspect 25] The system according to embodiment 24, wherein the robot controller transposes the improved 3D left-hand and right-hand motion traces such that the motion points in the motion traces coincide with the positions of one or more workpieces in the image of the robot workspace, the motion points including the points where the gripper grips, releases, or presses one of the workpieces.

Claims

1. A method for teaching a dual-arm robot by demonstrating with both hands of a person, wherein the method is: To provide a series of images from a three-dimensional (3D) camera, including the left and right hands of a person demonstrating an operation on one or more workpieces. Using a first neural network running on a computer having a processor and memory, each of the images is analyzed, and the left hand and the right hand in the image are identified and their positions are determined. A second neural network running on the computer analyzes sub-images of the left and right hands and determines the coordinates of multiple important points on the left and right hands. Applying the pixel depth data from the 3D camera to the important points to generate 3D posture data for the left and right hands of each of the images, The 3D left and right hand posture data of the series of images is combined with the 3D left and right hand motion traces that define the operation. Improving the 3D left-hand and right-hand motion traces by performing object position estimation using edge detection, replacing the motion trace with a simplified version, or both, wherein the object position estimation includes converting the visual images of one or more workpieces into first edge images before the demonstration, converting the visual images of one or more workpieces into second edge images after the demonstration, creating a differential edge image by subtracting the second edge image from the first edge image, and determining the position and orientation of corresponding features of one or more workpieces using the features identified in the differential edge image, and A method comprising controlling a dual-arm robotic system to perform the operation using the improved 3D left-hand and right-hand motion traces by a robotic controller, wherein the left-hand motion trace is performed by one robotic arm and the right-hand motion trace is performed by the other robotic arm.

2. The method according to claim 1, wherein in the training process the first neural network is trained to distinguish between the left hand and the right hand, the training process provides the first neural network with a plurality of training images in which the left hand and the right hand are already identified, and the first neural network analyzes the training images to identify distinctive features of the left hand and the right hand, including finger curvature and relative position.

3. The method according to claim 1, wherein each of the sub-images is trimmed to include the left hand or the right hand with a predetermined margin.

4. The method according to claim 1, wherein the second neural network is trained using a plurality of training images of either the left hand or the right hand.

5. The method according to claim 4, wherein the analysis by the second neural network includes, when the second neural network is trained using a training image of the right hand, horizontally inverting the sub-image of the left hand before and after the analysis, and when the second neural network is trained using a training image of the left hand, horizontally inverting the sub-image of the right hand before and after the analysis.

6. The method according to claim 1, wherein the plurality of important points of the left and right hands include the tip of the thumb, the thumb joint, the fingertip, and the finger joint.

7. The method according to claim 1, wherein the gripper's posture is determined based on the 3D left hand and right hand posture data in the 3D left hand and right hand motion trace.

8. The method according to claim 1, wherein the position and orientation of the corresponding features of one or more workpieces are determined both before and after the demonstration, the position and orientation of the corresponding features before the demonstration is used to correct the object grasping points in the 3D left-hand and right-hand motion traces, and the position and orientation of the corresponding features after the demonstration is used to correct the object placement points in the 3D left-hand and right-hand motion traces.

9. The method according to claim 1, wherein replacing the motion trace with a simplified version includes creating a new set of path points from the original set of path points using least-squares interpolation, calculating a smoothed motion trace through the new set of path points using spline interpolation, or replacing each of the motion traces with a simplified trace consisting of horizontal and vertical line segments constructed using path reversal points and peak path points.

10. The method according to claim 1, wherein the robot controller receives an image of the robot workspace including one or more workpieces and uses the improved 3D left-hand and right-hand motion traces to control the dual-arm robot system to perform the operation.

11. The method according to claim 10, wherein the robot controller transposes the improved 3D left-hand and right-hand motion traces such that the motion points in the motion traces coincide with the positions of one or more workpieces in the image of the robot workspace, the motion points including points where the gripper grips, releases, or presses one of the workpieces.

12. A method for programming a dual-arm robot system to be operated by the demonstration of both human hands, the method being: A person demonstrating the aforementioned operation of a workpiece using both hands, The camera images of the hand demonstrating the operation on the workpiece are analyzed by a computer to create demonstration data including the gripper's pose calculated from the three-dimensional (3D) coordinates of key points of the hand, wherein the 3D coordinates of the key points are determined from the image by a first neural network used to identify the left and right hands in the image and a second neural network used to calculate the 3D coordinates in sub-images of the identified left and right hands. Improving the demonstration data by performing object position estimation that corrects the operating point using edge detection, replacing the demonstration data with a simplified version, or both, wherein the object position estimation includes converting the visual images of one or more workpieces into first edge images before the demonstration, converting the visual images of one or more workpieces into second edge images after the demonstration, creating a differential edge image by subtracting the second edge image from the first edge image, and determining the position and orientation of corresponding features of one or more workpieces using the features identified in the differential edge image. To cause the dual-arm robot system to perform the operations on the workpiece, a robot motion command is generated based on the demonstration data, where one robot arm performs the task demonstrated by the person's hand, the other robot arm performs the task demonstrated by the other hand, and A method comprising performing the operation on the workpiece using the dual-arm robot system.

13. The method according to claim 12, wherein the demonstration data includes the position and orientation of the hand coordinate system, the gripper coordinate system corresponding to the hand coordinate system, and the coordinate system of the workpiece in the gripping step of the operation.

14. The method according to claim 12, wherein in the training process, the first neural network is trained to distinguish between the left hand and the right hand, and in the training process, a plurality of training images in which the left hand and the right hand are already identified are provided to the first neural network.

15. The method according to claim 12, wherein the second neural network is trained using a plurality of training images of either the left hand or the right hand, and when the second neural network is trained using training images of the right hand, the sub-images of the left hand are horizontally inverted before and after analysis by the second neural network, and when the second neural network is trained using training images of the left hand, the sub-images of the right hand are horizontally inverted before and after analysis by the second neural network.

16. A system for teaching a dual-arm robot through demonstrations using both hands, wherein the system is: 3D camera, A computer having a processor and memory, Receiving a series of images from a three-dimensional (3D) camera, including the left and right hands of a person demonstrating an operation on one or more workpieces. Using a first neural network, analyze each of the images and determine the identification and position of the left hand and the right hand within the images. A second neural network analyzes sub-images of the left and right hands and determines the coordinates of multiple important points on the left and right hands. Applying the pixel depth data from the 3D camera to the important points to generate 3D posture data for the left and right hands of each of the images, The 3D left and right hand posture data of the series of images is combined with the 3D left and right hand motion traces that define the operation, and The computer is configured to perform the steps of: improving the 3D left-hand and right-hand motion traces by performing object position estimation using edge detection, replacing the motion trace with a simplified version, or both, wherein the object position estimation includes converting visual images of one or more workpieces into first edge images before the demonstration, converting visual images of one or more workpieces into second edge images after the demonstration, creating a differential edge image by subtracting the second edge image from the first edge image, and determining the position and orientation of corresponding features of one or more workpieces using features identified in the differential edge image; and A system comprising a robot controller that communicates with the computer, the controller controlling a dual-arm robot system to perform the operation using the improved 3D left-hand and right-hand motion traces, wherein the left-hand motion trace is performed by one robot arm and the right-hand motion trace is performed by the other robot arm.

17. The system according to claim 16, wherein in the training process the first neural network is trained to distinguish between the left hand and the right hand, the training process provides the first neural network with a plurality of training images in which the left hand and the right hand are already identified, and the first neural network analyzes the training images to identify distinctive features of the left hand and the right hand, including finger curvature and relative position.

18. The system according to claim 16, wherein the second neural network is trained using a plurality of training images of either the left hand or the right hand, and analysis using the second neural network includes, if the second neural network is trained using training images of the right hand, horizontally flipping the sub-images of the left hand before and after the analysis, and if the second neural network is trained using training images of the left hand, horizontally flipping the sub-images of the right hand before and after the analysis.

19. The system according to claim 16, wherein the plurality of important points of the left and right hands include the tip of the thumb, the thumb joint, the fingertips and finger joints, and the gripper's posture is determined based on the 3D left and right hand posture data in the 3D left and right hand motion trace.

20. The system according to claim 16, wherein the position and orientation of the corresponding features of one or more workpieces are determined both before and after the demonstration, the position and orientation of the corresponding features before the demonstration is used to correct the object grasping point in the 3D left-hand and right-hand motion trace, and the position and orientation of the corresponding features after the demonstration is used to correct the object placement point in the 3D left-hand and right-hand motion trace.

21. The system according to claim 16, wherein replacing the motion trace with a simplified version includes creating a new set of path points from the original set of path points using least-squares interpolation, calculating a smoothed motion trace through the new set of path points using spline interpolation, or replacing each of the motion traces with a simplified trace consisting of horizontal and vertical line segments constructed using path reversal points and peak path points.

22. The system according to claim 16, wherein the robot controller receives an image of the robot workspace including one or more workpieces and uses the improved 3D left-hand and right-hand motion traces to control the dual-arm robot system and perform operations.

23. The system according to claim 22, wherein the robot controller transposes the improved 3D left-hand and right-hand motion traces such that the motion points in the motion traces coincide with the positions of one or more workpieces in the image of the robot workspace, the motion points including points where the gripper grips, releases, or presses one of the workpieces.