Systems, apparatus, and methods for robots to learn and perform skills
The robotic system learns and adapts skills through human interaction, addressing the limitations of pre-programmed robots in unstructured environments by enabling autonomous task performance in dynamic settings.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- DILIGENT ROBOTICS INC
- Filing Date
- 2023-10-02
- Publication Date
- 2026-06-10
AI Technical Summary
Robots lacking manipulators struggle to perform tasks in unstructured environments due to the need for pre-programming and inability to adapt to dynamic changes, limiting their functionality in settings like hospitals and homes.
A robotic system capable of learning and adapting skills through human demonstrations and interactions, utilizing sensors and machine learning to navigate and manipulate objects without prior programming, enabling operation in unstructured environments.
Enables robots to perform tasks autonomously in dynamic, unstructured environments by learning from human interactions, enhancing their adaptability and task performance in complex settings such as hospitals and homes.
Smart Images

Figure 0007872768000001 
Figure 0007872768000002 
Figure 0007872768000003
Abstract
Description
【Technical Field】 【0001】 Cross - Reference to Related Applications 【0001】 This application claims priority to U.S. Provisional Patent Application No. 62 / 463,628, titled "Method and System for Robotic Learning of Execution Processes Related to Perceptually Constrained Manipulation Skills", filed on February 25, 2017, and U.S. Provisional Patent Application No. 62 / 463,630, titled "Method and System for Robotic Execution of a Perceptually Constrained Manipulation Skill Learned via Human Interaction", filed on February 25, 2018. The disclosures of each provisional patent application are hereby incorporated by reference in their entirety. 【0002】 Government Support 【0002】 This invention was made with government support under Grant No. 1621651 awarded by the National Science Foundation of the United States under the Small Business Innovation Research Program - Phase I. The United States government has certain rights in this invention. 【0003】 【0003】 This disclosure generally relates to systems, devices, and methods for a robot to learn and execute skills. More specifically, this disclosure relates to a robotic device capable of learning and executing skills in an unstructured environment. 【Background Art】 【0004】
[0004] Robots can be used to perform and automate a variety of tasks. Robots can perform tasks by moving through environments such as office buildings or hospitals. Robots may be equipped with wheels, tracks, or other movable components that enable them to move autonomously within the environment. However, robots that do not have arms or other manipulators cannot manipulate objects in the environment. Consequently, the task-performing capabilities of these robots are limited, and for example, such robots may not be able to pick up or transport objects without a human being present to manipulate the objects at the pickup point or transport point. 【0005】
[0005] A robot including an arm or other manipulator may be capable of picking up and transporting objects to a location without human intervention. For example, a robot with an arm having an end effector such as a gripper can use the gripper to pick up one or more objects from different locations and transport those objects to a new location, all without human assistance. These robots can be used to automate specific tasks, thereby allowing human operators to focus on other tasks. However, most commercial robots do not include a manipulator due to the problems and complexities of programming the manipulator's movements. 【0006】
[0006] Furthermore, most commercial robots are designed to operate in structured environments, such as factories and warehouses. Unstructured environments, such as hospitals and homes, which involve human interaction, can impose additional challenges on robot programming. In unstructured environments, robots cannot rely on complete knowledge of their surroundings and must be able to perceive changes in their surroundings and adapt based on those changes. Therefore, in unstructured environments, robots need to continuously acquire information about their environment in order to be able to make autonomous decisions and perform tasks. Often, the robot's movements in the environment, such as the movement of its arm or end effector, are also constrained by objects and other obstacles in the environment, further exacerbating the problems of robot perception and operation. Due to the uncertain and dynamic nature of unstructured environments, robots usually cannot be pre-programmed to perform tasks. 【0007】
[0007] Therefore, there is a need for a robotic system that can perceive and adapt to dynamic, unstructured environments and perform tasks within those environments, without relying on pre-programmed operational skills. [Overview of the project] 【0008】
[0008] Systems, apparatuses and methods for a robot to learn and perform skills are described. In some embodiments, the apparatus includes memory, a processor, an operating element, and a set of sensors. The processor is operably coupled to the memory, the operating element, and the set of sensors and can be configured to acquire a representation of the environment via a subset of sensors from the set of sensors; identify a plurality of markers in the representation of the environment, each marker from the plurality of markers being associated with a physical object from a plurality of physical objects placed in the environment; present information indicating the position of each marker from the plurality of markers in the representation of the environment; receive a set of markers selected from a plurality of markers associated with a set of physical objects from a plurality of physical objects; acquire sensor information associated with the operating element for each position from a plurality of positions associated with the movement of the operating element in the environment, the movement of the operating element being associated with a physical interaction between the operating element and a set of physical objects; and generate a model configured to define the movement of the operating element to perform a physical interaction between the operating element and a set of physical objects based on the sensor information. 【0009】
[0009] In some embodiments, the operating element may include a plurality of joints and end effectors. In some embodiments, the set of physical objects may include a human being. 【0010】
[0010] In some embodiments, the multiple markers are alignment markers, and the representation of the environment is a visual representation of the environment. 【0011】
[0011] In some embodiments, two or more markers from a plurality of markers can be associated with one physical object from a set of physical objects. Alternatively or additionally, in some embodiments, one marker from a plurality of markers can be associated with two or more physical objects from a set of physical objects. 【0012】
[0012] In some embodiments, the method includes: acquiring a representation of the environment via a set of sensors; identifying a plurality of markers in the representation of the environment, wherein each marker from the plurality of markers is associated with a physical object from a plurality of physical objects placed in the environment; presenting information indicating the location of each marker from the plurality of markers in the representation of the environment; receiving, after presentation, a set of markers selected from a plurality of markers associated with a set of physical objects from a plurality of physical objects; acquiring sensor information associated with the operating element for each location from a plurality of locations associated with the movement of the operating element in the environment, wherein the movement of the operating element is associated with a physical interaction between the operating element and a set of physical objects; and generating a model configured to define the movement of the operating element to perform a physical interaction between the operating element and a set of physical objects based on the sensor information. 【0013】
[0013] In some embodiments, the method further includes receiving a first subset of features selected from a set of features, wherein the model is generated based on sensor data associated with the first subset of features and not based on sensor data associated with a second subset of features from a set of features not included in the first set of features. 【0014】
[0014] In some embodiments, the method includes: acquiring a representation of the environment via a set of sensors; identifying a plurality of markers in the representation of the environment, wherein each marker from the plurality of markers is associated with a physical object from a plurality of physical objects placed in the environment; presenting information indicating the position of each marker from the plurality of markers in the representation of the environment; and identifying a model associated with the performance of a physical interaction between an operating element and a set of physical objects in response to the reception of a set of markers selected from a plurality of markers associated with a set of physical objects from a plurality of physical objects, wherein the operating element includes a plurality of joints and end effectors; and generating a trajectory of the operating element that defines the movement of the plurality of joints and end effectors associated with the performance of the physical interaction. 【0015】
[0015] In some embodiments, the method further includes displaying to the user the trajectory of an operating element in a representation of the environment, receiving input from the user after the display, and performing physical interaction by moving a plurality of joints and end effectors in response to input indicating the acceptance of the trajectory of the operating element. 【0016】
[0016] In some embodiments, the model is associated with (i) a stored set of markers, (ii) sensor information indicating at least one of the position or orientation of the operating element at a point along the stored trajectory of the operating element associated with the stored set of markers, and (iii) sensor information indicating the configuration of a plurality of joints at a point along the stored trajectory. A method for generating the trajectory of an operating element includes calculating a conversion function between a set of markers and a stored set of markers, converting at least one of the position or orientation of the operating element for each point using the conversion function, determining a planned configuration of a plurality of joints for each point based on the configuration of a plurality of joints at a point along the stored trajectory, and identifying a portion of the trajectory between that point and a contiguous point for each point based on the planned configuration of a plurality of joints at that point. 【0017】
[0017] Other systems, processes, and features will become apparent to those skilled in the art after examining the following drawings and detailed description. All such additional systems, processes, and features are included in this description, within the scope of the present invention, and are intended to be protected by the appended claims. 【0018】
[0018] Those skilled in the art will understand that the drawings are primarily illustrative and not intended to limit the scope of the spirit of the invention as described herein. The drawings are not necessarily to a fixed scale, and in some cases, various aspects of the spirit of the invention disclosed herein may be exaggerated or enlarged in the drawings to facilitate understanding of different features. In the drawings, similar reference letters generally refer to similar features (e.g., functionally similar elements and / or structurally similar elements). [Brief explanation of the drawing] 【0019】 [Figure 1]
[0019] This is a block diagram showing the configuration of a system including a robotic device according to several embodiments. [Figure 2] 【0020】 This block diagram shows the configuration of a robotic device according to several embodiments. [Figure 3] 【0021】 This block diagram shows the configuration of a control unit associated with a robotic device according to several embodiments. [Figure 4] 【0022】 This is a schematic diagram of the operating elements of a robotic device according to several embodiments. [Figure 5] 【0023】 This is a schematic diagram of a robotic device according to several embodiments. [Figure 6A] 【0024】 This is a schematic diagram of objects in the environment as seen by robotic devices according to several embodiments. [Figure 6B]
[0024] This is a schematic diagram of objects in the environment as seen by a robotic device according to several embodiments. [Figure 7A] 【0025】 It is a schematic diagram of an object in an environment seen by a robot device according to some embodiments. [Figure 7B] 【0025】It is a schematic diagram of an object in an environment seen by a robot device according to some embodiments. [Figure 8] 【0026】 It is a flowchart showing an environmental scanning method executed by a robot device according to some embodiments. [Figure 9] 【0027】 It is a flowchart showing a method of learning and executing skills executed by a robot device according to some embodiments. [Figure 10] 【0028】 It is a flowchart showing a method of learning skills executed by a robot device according to some embodiments. [Figure 11] 【0029】 It is a flowchart showing a method of executing skills executed by a robot device according to some embodiments. [Figure 12] 【0030】 It is a block diagram showing a system architecture of robot learning and execution including user actions according to some embodiments. 【Mode for Carrying Out the Invention】 【0020】 【0031】 A system, apparatus, and method for a robot to learn and execute skills are described herein. In some embodiments, the systems, apparatuses, and methods described herein relate to a robot device capable of learning skills through human demonstrations and interactions and executing the learned skills in an unstructured environment. 【0021】 【0032】In some embodiments, the systems, apparatus, and methods described herein relate to a robot capable of learning skills (e.g., manipulative skills) through a Learning from Demonstration (LfD) process, in which a human demonstrates actions to the system through kinesthetic instruction (e.g., a human guiding the robot physically through actions and / or remotely) and / or the human performing the actions themselves. Such systems, apparatus, and methods do not require the robot to be pre-programmed with manipulative skills; rather, the robot is designed to adapt and learn skills through observation. For example, the robot can acquire and perform manipulative skills using machine learning techniques. After skill learning, the robot can perform the skills in different environments. The robot can learn and / or perform skills based on visual data (e.g., perceived visual information). Alternatively or additionally, the robot can learn and / or perform skills using tactile data (e.g., torque, force, and other non-visual information). Robot learning can be performed in a factory before robot deployment, or in a field (e.g., a hospital) after robot deployment. In some embodiments, users without robotics and / or programming training can teach skills to the robot, and / or the robot can adapt to operate in the environment. For example, the robot may have a learning algorithm that utilizes natural human behavior and may include tools that can guide the user through the demonstration process.
[0022] 【0033】In some embodiments, robots can be designed to interact with and collaborate with humans to perform tasks. In some embodiments, robots can operate among humans in a socially predictable and acceptable manner using common social behaviors. Mobile robots can also be designed to navigate within an environment while interacting with humans in that environment. For example, a robot can be programmed to navigate among humans by speaking specific phrases, move laterally to allow humans to pass, and use gaze to convey intention while navigating. In some embodiments, a robot can have sensors that enable it to perceive and track humans in its surrounding environment and use that information to trigger gaze and other social behaviors.
[0023] 【0034】 In some embodiments, the robot can be designed to suggest options for achieving a goal or performing an action during the LfD process. For example, the robot can suggest a few different options for achieving a goal (e.g., picking up an object) and indicate which of those options is most likely to be efficient and / or effective in achieving the goal. In some embodiments, the robot can adapt the skill based on user input, for example, the user exhibiting features that are appropriate for inclusion in the skill model.
[0024] 【0035】In some embodiments, a robotic device may be capable of learning and / or performing skills in an unstructured environment, such as a dynamic and / or human environment in which the robotic device does not have complete prior information about the environment. An unstructured environment may include, for example, indoor and outdoor settings and may include one or more people or other objects that can move within the environment. Since most natural or real-world environments are unstructured, a robotic device that can adapt to and operate in an unstructured environment, such as the robotic devices and / or systems described herein, can offer a significant improvement over existing robotic devices that cannot adapt to unstructured environments. An unstructured environment may include indoor settings (e.g., buildings, offices, houses, rooms, etc.) and / or other types of enclosed spaces (e.g., aircraft, trains, and / or other types of mobile compartments) as well as outdoor settings (e.g., parks, beaches, fields, meadows). In one embodiment, the robotic device described herein may operate in an unstructured hospital environment.
[0025] 【0036】 Figure 1 is a high-level block diagram showing System 100 in several embodiments. System 100 can be configured to learn and perform skills, such as operational skills, in an unstructured environment. System 100 can be implemented as a single device or through multiple devices connected to a network 105. For example, as shown in Figure 1, System 100 can include, for example, one or more robotic devices 102 and 110, a server 120, and one or more computing devices such as an additional computing device 150. Although four devices are shown, it should be understood that System 100 can include any number of computing devices, including computing devices not specifically shown in Figure 1.
[0026] 【0037】Network 105 is implemented as a wired and / or wireless network and can be any type of network used to operably connect the computing devices, including robotic devices 102 and 110, server 120, and computing device 150 (e.g., local area network (LAN), wide area network (WAN), virtual network, telecommunications network). As will be further detailed herein, in some embodiments, for example, the computing devices are computers connected to each other via an Internet Service Provider (ISP) and the Internet (e.g., network 105). In some embodiments, the connection can be defined between any two computing devices via network 105. For example, as shown in Figure 1, the connection can be defined between robotic device 102 and any one of robotic device 110, server 120, or additional computing device 150. In some embodiments, the computing devices can communicate with each other (e.g., send and / or receive data) and with network 105 via an intermediate network and / or alternative network (not shown in Figure 1). Such intermediate and / or alternative networks may be the same type of network as network 105 and / or a different type of network. Each computing device may be any type of device configured to transmit and / or receive data from one or more other computing devices via network 105.
[0027] 【0038】 In some embodiments, system 100 includes a single robotic device, for example, robotic device 102. Robotic device 102 can be configured to perceive information about the environment, learn skills through human demonstrations and interactions, and / or perform those skills in the environment. A more detailed diagram of an example robotic device is shown in Figure 2.
[0028] 【0039】In other embodiments, system 100 includes a plurality of robotic devices, for example, robotic devices 102 and 110. Robotic device 102 can transmit data to and / or receive data from robotic device 110 via network 105. For example, robotic device 102 can transmit information it perceives about the environment (e.g., the location of objects) to robotic device 110 and receive information about the environment from robotic device 110. Robotic devices 102 and 110 can also transmit and / or receive information from each other to learn and / or perform skills. For example, robotic device 102 can learn a skill in an environment and transmit a model representing the learned skill to robotic device 110, and upon receiving the model, robotic device 110 can use the model to perform the skill in the same or a different environment. Robotic device 102 may be in the same or a different location as robotic device 110. For example, robotic devices 102 and 110 can be placed in the same room of a building (e.g., a hospital building), thereby allowing them to learn and / or perform skills together (e.g., moving heavy or large objects). Alternatively, robotic device 102 can be placed on the first floor of a building (e.g., a hospital building), and robotic device 110 can be placed on the second floor of the building, and the two can communicate with each other to relay information about the different floors (e.g., where objects are located on those floors, where resources may be, etc.).
[0029] 【0040】In some embodiments, the system 100 includes one or more robotic devices, e.g., robotic devices 102 and / or 110, and a server 120. The server 120 may be a dedicated server for managing robotic devices 102 and / or 110. The server 120 may be located in the same or a different location as the robotic devices 102 and / or 110. For example, the server 120 may be located in the same building as the robotic devices (e.g., a hospital building) and managed by a local administrator (e.g., a hospital administrator). Alternatively, the server 120 may be located in a remote location (e.g., a location associated with the manufacturer or provider of the robotic devices).
[0030] 【0041】 In some embodiments, the system 100 includes one or more robotic devices, for example, robotic devices 102 and / or 110, and an additional computing device 150. The computing device 150 can be any suitable processing device configured to run and / or perform a specific function. For example, in a hospital setting, the computing device 150 could be a diagnostic and / or therapeutic device that connects to a network 105 and can communicate with other computing devices, including robotic devices 102 and / or 110.
[0031] 【0042】Figure 2 schematically shows a robotic device 200 according to several embodiments. The robotic device 200 includes a control unit 202, a user interface 240, at least one operating element 250, and at least one sensor 270. Furthermore, in some embodiments, the robotic device 200 optionally includes at least one transport element 260. The control unit 202 includes a memory 220, a storage device 230, a processor 204, a system bus 206, and at least one input / output interface ("I / O interface") 208. The memory 220 can be, for example, random access memory (RAM), a memory buffer, a hard drive, a database, erasable programmable read-only memory (EPROM), electrically erasable read-only memory (EEPROM), and / or read-only memory (ROM). In some embodiments, the memory 220 stores instructions that cause the processor 204 to execute modules, processes, and / or functions associated with scanning or displaying the environment, learning skills, and / or performing skills. The storage device 230 can be, for example, a hard drive, a database, cloud storage, a network-attached storage device, or other data storage device. In some embodiments, the storage device 230 can store, for example, sensor data including state information about one or more components of the robot device 200 (e.g., operating element 250), learned models, marker location information, and the like.
[0032] 【0043】The processor 204 of the control unit 202 can be any suitable processing device configured to run and / or perform functions associated with displaying the environment, learning skills, and / or performing skills. For example, the processor 204 can be configured to perform skills by generating a model of the skill based on sensor information or by generating a trajectory for performing the skill using the model, as further described herein. More specifically, the processor 204 can be configured to perform modules, functions, and / or processes. In some embodiments, the processor 204 can be a general-purpose processor, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), and / or a digital signal processor (DSP), etc.
[0033] 【0044】 The system bus 206 can be any suitable component that enables the processor 204, memory 220, storage device 230, and / or other components of the control unit 202 to communicate with each other. The I / O interface 208 can be any suitable component that connects to the system bus 206 and enables communication between the internal components of the control unit 202 (e.g., the processor 204, memory 220, storage device 230) and external input / output devices such as the user interface 240, operating element 250, transport element 260, and sensor 270.
[0034] 【0045】 The user interface 240 may include one or more components configured to receive inputs and transmit outputs to other devices and / or user-operated devices, such as a user-operated robotic device 200. For example, the user interface 240 may include a display device 242 (e.g., a display, touchscreen, etc.), an audio device 244 (e.g., a microphone, speaker), and optionally, one or more additional input / output devices ("I / O devices") 246 configured to receive inputs and / or generate outputs to the user.
[0035] 【0046】 The operating element 250 can be any suitable component capable of manipulating and / or interacting with stationary objects and / or moving objects, including, for example, a human being. The operating element 250 may include a plurality of segments connected to one another via joints that can provide translation along one or more axes and / or rotation around one or more axes. The operating element 250 may also include an end effector that can engage and / or otherwise interact with objects in the environment. For example, the operating element may include a gripping mechanism that can releasably engage (e.g., grip) with an object in the environment to pick up and / or transport the object. Other examples of end effectors include, for example, a vacuum engagement mechanism, a magnetic engagement mechanism, a suction mechanism, and / or a combination thereof. A detailed diagram of an example operating element is shown in Figure 4.
[0036] 【0047】 The transport element 260 can be any suitable component configured to move, such as wheels or tracks. One or more transport elements 260 can be provided to the base of the robotic device 200 to enable the robotic device 200 to move around in the environment. For example, the robotic device 200 may include multiple wheels to enable it to navigate within a building, such as a hospital. The transport element 260 can be designed and / or sized to facilitate movement through narrow and / or constrained spaces (e.g., narrow corridors and passages, narrow rooms such as supply storage rooms, etc.).
[0037] 【0048】The sensor 270 can be any suitable component that enables the robot device 200 to capture information about the environment around the robot device 200 and / or objects in the environment. The sensor 270 can include, for example, an image capture device (e.g., a camera such as a red-green-blue depth (RGB-D) camera or a webcam), an audio device (e.g., a microphone), a light sensor (e.g., a light detection and distance sensor or a lidar sensor, a color detection sensor), a proprioceptive sensor, a position sensor, a tactile sensor, a force or torque sensor, a temperature sensor, a pressure sensor, a motion sensor, a sound detector, and the like. For example, the sensor 270 can include at least one image capture device, such as a camera, that captures visual information about objects and the environment around the robot device 200. In some embodiments, the sensor 270 can include a tactile sensor, such as a sensor that can transmit force, vibration, touch, and other non-visual information to the robot device 200.
[0038] 【0049】 In some embodiments, the robotic device 200 may have human-like features, such as a head, body, arms, legs, and / or base. For example, the robotic device 200 may include a face with eyes, a nose, a mouth, and other human-like features. Although not schematically shown, the robotic device 200 may also include actuators, motors, couplers, connectors, power supplies (e.g., onboard batteries), and / or other components that link, actuate, and / or drive different parts of the robotic device 200.
[0039] 【0050】Figure 3 is a schematic block diagram showing control unit 302 according to several embodiments. Control unit 302 may include the same components as control unit 202 and may be structurally and / or functionally similar to control unit 202. For example, control unit 302 may include a processor 304, memory 320, I / O interface 308, system bus 306, and storage device 330, which may be structurally and / or functionally similar to the processor 204, memory 220, I / O interface 208, system bus 206, and storage device 230, respectively.
[0040] 【0051】Memory 320 stores instructions that cause the processor 304 to execute modules, processes, and / or functions, indicated as active scan 322, marker identification 324, learning and model generation 326, trajectory generation and execution 328, and success monitoring 329. Active scan 322, marker identification 324, learning and model generation 326, trajectory generation and execution 328, and success monitoring 329 can be implemented as one or more programs and / or applications linked to hardware components (e.g., sensors, operating elements, I / O devices, processors, etc.). Active scan 322, marker identification 324, learning and model generation 326, trajectory generation and execution 328, and success monitoring 329 can be implemented by one or more robotic devices. For example, a robotic device can be configured to implement active scan 322, marker identification 324, and trajectory generation and execution 328. As another example, the robotic device may be configured to perform active scanning 322, marker identification 324, optionally learning and model generation 326, and trajectory generation and execution 328. As yet another example, the robotic device may be configured to perform active scanning 322, marker identification 324, trajectory generation and execution 328, and optionally success monitoring 329. Although not shown, memory 320 may also store programs and / or applications associated with the operating system, as well as general robotic operations (e.g., power management, memory allocation, etc.).
[0041] 【0052】The storage device 330 stores information related to the learning and / or execution of skills. The storage device 330 stores, for example, internal state information 331, a model 334, object information 340, and a machine learning library 342. The internal state information 331 may include information about the state of a robotic device (e.g., robotic device 200) and / or the environment in which the robotic device is operating (e.g., a building such as a hospital). In some embodiments, the state information 331 may indicate the location of the robotic device in an environment such as a room, floor, or enclosed space. For example, the state information 331 may include a map 332 of the environment and indicate the location of the robotic device in that map 332. The state information 331 may also include the locations of one or more objects (or markers representing objects and / or markers associated with objects) in the environment, for example, in the map 332. Thus, the state information 331 can identify the location of the robotic device relative to one or more objects. The objects may include any type of physical object placed in the environment. The objects may be stationary or movable. For example, examples of objects in an environment such as a hospital include equipment, supplies, instruments, tools, furniture, and / or people (e.g., nurses, doctors, patients, etc.).
[0042] 【0053】 The object information 340 may include information related to physical objects in the environment. For example, the object information may include information that identifies or quantifies various characteristics of an object, such as location, color, shape, and surface features. The object information may also identify codes, symbols, and other markers associated with the physical object, such as Quick Response or "QR" codes, barcodes, tags, etc. The object information can enable the control unit 302 to identify physical objects in the environment.
[0043] 【0054】The machine learning library 342 may include modules, processes, and / or functions related to different algorithms for machine learning and / or model generation of different skills. In some embodiments, the machine learning library may include methods such as Hidden Markov Models or "HMMs". An example of an existing machine learning library in Python is scikit-learn. The storage device 330 may also include additional software libraries related to, for example, robot simulation, motion planning and control, kinesthetic instruction, and perception.
[0044] 【0055】Model 334 is a model that is generated to perform different actions and represents a skill learned by a robotic device. In some embodiments, each model 334 is associated with a set of markers tied to different physical objects in the environment. Marker information 335 can indicate which markers are associated with a particular model 334. Each model 334 may also be associated with sensor information 336 collected, for example, through one or more sensors of the robotic device during kinesthetic instruction and / or other demonstrations of the skill. Sensor information 336 may include manipulator information 337 associated with the manipulator elements of the robotic device as the manipulator elements of the robotic device perform actions during the demonstration. Manipulator information 337 may include, for example, joint configuration, end effector position and configuration, and / or forces and torques acting on the joint and / or end effector. Manipulator information 337 may be recorded at specific points (e.g., keyframes) during the demonstration and / or execution of the skill, or alternatively, it may be stored throughout the entire demonstration and / or execution of the skill. The sensor information 336 may also include information associated with the environment in which the skill was demonstrated and / or performed, such as the location of markers in the environment. In some embodiments, a success criterion 339 may be associated with each model 334. The success criterion 339 can be used to monitor the performance of the skill. In some embodiments, the success criterion 339 may include information associated with visual and tactile data perceived using one or more sensors, such as a camera, force / torque sensor, etc. The success criterion 339 may be linked to, for example, the visual detection of the motion of an object, the detection of forces acting on components of a robotic device (e.g., weight from an object), the detection of engagement between components of a robotic device and an object (e.g., changes in pressure or force acting on a surface), etc.An example of using tactile data for robotic learning of manipulative skills is available at http: / / ieeexplore.ieee.org / document / 7463165 / and is described in the article entitled "Learning Haptic Affordances from Demonstration and Human-Guided Exploration" by Chu et al., published at the 2016 IEEE Haptics Symposium (HAPTICS), Philadelphia, PA, 2016, pp. 119-125, which is incorporated herein by reference. An example of using visual data for robotic learning of maneuvering skills is described in the article "Simultaneously Learning Actions and Goals from Demonstration" by Akgun et al., published in Autonomous Robots, Volume 40, Issue 2, February 2016, pp. 211-227, accessible at https: / / doi.org / 10.1007 / s10514-015-9448-x ("Akgun article"), which is incorporated herein by reference.
[0045] 【0056】 Figure 4 schematically shows the operating element 350 according to several embodiments. The operating element 350 can form part of a robotic device, for example, robotic device 102 and / or 200. The operating element 350 can be implemented as an arm including two or more segments 352 joined together via a joint 354. The joint 354 can allow one or more degrees of freedom. For example, the joint 354 can provide translation along one or more axes and / or rotation around one or more axes. In one embodiment, the operating element 350 can have seven degrees of freedom provided by the joint 354. Although four segments 352 and four joints 354 are shown in Figure 4, those skilled in the art will understand that the operating element may include a different number of segments and / or joints.
[0046] 【0057】 The operating element 350 includes an end effector 356 that can be used to interact with objects in the environment. For example, the end effector 356 can be used to engage and / or manipulate different objects. Alternatively or additionally, the end effector 356 can be used to interact with movable or dynamic objects, including, for example, a human being. In some embodiments, the end effector 356 can be a gripper that can releasably engage or grip one or more objects. For example, the end effector 356 implemented as a gripper can pick up an object and move it from a first location (e.g., a supply storage room) to a second location (e.g., an office, room, etc.).
[0047] 【0058】Multiple sensors 353, 355, 357, and 358 can be placed on different components of the operating element 350, such as the segment 352, the joint 354, and / or the end effector 356. Sensors 353, 355, 357, and 358 can be configured to measure sensor information, including environmental information and / or operating element information. Examples of sensors include position encoders, torque and / or force sensors, touch and / or tactile sensors, image capture devices such as cameras, temperature sensors, pressure sensors, and light sensors. In some embodiments, sensor 353 located on segment 352 may be a camera configured to capture visual information about the environment. In some embodiments, sensor 353 located on segment 352 may be an accelerometer configured to measure acceleration and / or calculate the moving speed and / or position of segment 352. In some embodiments, sensor 355 located on joint 354 may be a position encoder configured to measure the position and / or configuration of joint 354. In some embodiments, the sensor 355 located on the joint 354 may be a force or torque sensor configured to measure the force and torque applied to the joint 354. In some embodiments, the sensor 358 located on the end effector 356 may be a position encoder and / or a force or torque sensor. In some embodiments, the sensor 357 located on the end effector 356 may be a touch or tactile sensor configured to measure the engagement between the end effector 356 and an object in the environment. Alternatively or additionally, one or more of the sensors 353, 355, 357, and 358 may be configured to record information about one or more objects and / or marks in the environment. For example, the sensor 358 located on the end effector 356 may be configured to track the location of an object in the environment and / or the position of an object relative to the end effector 356. In some embodiments, one or more of the sensors 353, 355, 357, and 358 may also track whether an object, such as a person, has moved in the environment.Sensors 353, 355, 357, and 358 can transmit recorded sensor information to a computing device located on the robot device (e.g., onboard control units such as control units 202 and / or 302), or sensors 353, 355, 357, and 358 can transmit sensor information to a remote computing device (e.g., a server such as server 120).
[0048] 【0059】 The operating element 350 may optionally include a coupling element 359 that allows the operating element 350 to be releasably coupled to a robotic device, such as any robotic device described herein. In some embodiments, the operating element 350 may be coupled to a fixed location on the robotic device and / or to multiple locations on the robotic device (e.g., the right or left side of the body of the robotic device, as shown in Figure 5). The coupling element 359 may include any type of mechanism that allows the operating element 350 to be coupled to a robotic device, such as a mechanical mechanism (e.g., a fastener, latch, mount), a magnetic mechanism, or a friction fit.
[0049] 【0060】 Figure 5 schematically shows a robotic device 400 according to several embodiments. The robotic device 400 includes a head 480, a body 488, and a base 486. The head 480 can be connected to the body 488 via a segment 482 and one or more joints (not shown). The segment 482 is movable and / or flexible, allowing the head 480 to move relative to the body 488.
[0050] 【0061】The head 480 includes one or more image capture devices 472 and / or other sensors 470. The image capture devices 472 and / or other sensors 470 (e.g., LiDAR sensors, motion sensors, etc.) can enable the robot device 400 to scan the environment and acquire a representation of the environment (e.g., a visual representation or other semantic representation). In some embodiments, the image capture device 472 may be a camera. In some embodiments, the image capture device 472 may be movable so that it can be used to focus on different areas of the environment around the robot device 400. The image capture devices 472 and / or other sensors 470 can collect sensor information and transmit it to a computing device or processor mounted on the robot device 400, for example, a control unit 202 or 302. In some embodiments, the head 480 of the robot device 400 may have a human-like shape and may include one or more human features, such as eyes, nose, mouth, ears, etc. In such embodiments, the image capture device 472 and / or other sensors 470 can be implemented as one or more human features. For example, the image capture device 472 can be implemented as an eye in the head 480.
[0051] 【0062】 In some embodiments, the robotic device 400 can use an image acquisition device 472 and / or other sensors 470 to scan the environment for information about objects in the environment, such as physical structures, devices, articles, people, etc. The robotic device 400 can engage in active scanning, or it can initiate scanning in response to a trigger (e.g., user input, event detection, or environmental change).
[0052] 【0063】In some embodiments, the robotic device 400 can engage in adaptive scanning, which can perform scans based on stored knowledge and / or user input. For example, the robotic device 400 can identify areas in an environment to search for and scan for objects based on prior information it has about those objects. Referring to Figure 6A, the robotic device 400 can scan a scene (e.g., an area of a room) and obtain a representation 500 of the scene. In the representation 500, the robotic device 400 identifies that a first object 550 is in area 510 and a second object 560 is in areas 510 and 530. The robotic device 400 can store the locations of objects 550 and 560 in an internally stored map of the environment, so that when the robotic device 400 performs a future scan, it can use that information to find objects 550 and 560. For example, if the robotic device 400 returns to the scene and scans the scene a second time, it may obtain a different view of the scene, as shown in Figure 6B. When performing this second scan, the robotic device 400 can acquire a representation of the scene 502. To find objects 550 and 560 in representation 502, the robotic device 400 can refer to previously stored information about the locations of those objects when it acquires the representation of the scene 500. Taking into account that its own location in the environment may have changed, the robotic device 400 can recognize that objects 550 and 560 may be in different areas of representation 502. Based on this information, the robotic device 400 can know to search area 510 for object 550, but to search areas 520 and 540 for object 560. By using previously stored information about the locations of objects 550 and 560, the robotic device 400 can automatically identify areas to scan densely (e.g., by zooming in, by slowly moving the camera through those scenes) in search of objects 550 and 560.
[0053] 【0064】In some embodiments, the robotic device 400 may also know to scan different areas of the scene more closely based on human input. For example, a human may indicate to the robotic device 400 that a particular area of the scene contains one or more objects, and the robotic device 400 may scan those areas more closely to identify those objects. In such embodiments, the robotic device 400 may include an input / output device 440 such as a display and / or touchscreen having a keyboard or other input device, as schematically shown in Figure 5.
[0054] 【0065】 In some embodiments, the robotic device 400 can scan the environment and identify that an object, such as a human, is moving within the environment. For example, as shown in Figures 7A and 7B, object 660 may be moving within the environment while object 650 remains stationary. Figure 7A shows a scene representation 600 showing object 660 in areas 610 and 630, and Figure 7B shows a scene representation 602 showing object 660 in areas 620 and 640. In both representations 600 and 602, object 650 may remain in the same location within area 610. The robotic device 400 can identify that object 660 has moved within the scene and adjust its actions accordingly. For example, if the robotic device 400 is planned to interact with object 660, it may change its trajectory, for example, move closer to object 660, and / or change the trajectory of an operating element or other component configured to interact with object 660. Alternatively or additionally, if the robotic device 400 is planned to interact with object 650 (and / or another object in the scene), it can take into account the movement of object 660 while planning the course to interact with object 650. In some embodiments, the robotic device 400 can engage in active scanning so that its actions can be coordinated in near real-time.
[0055] 【0066】As schematically shown in Figure 5, the base 486 may optionally include one or more transport elements implemented as wheels 460. The wheels 460 can enable the robotic device 400 to move around an environment, for example, a hospital. The robotic device 400 also includes at least one operating element 450. The operating element 450 may be structurally and / or functionally similar to other operating elements described herein, for example, operating element 350. The operating element 450 may be fixedly attached to the body 488 of the robotic device 400, or optionally, the operating element 450 may be removably coupled to the body 488 via a coupling element (e.g., coupling element 359) that can be attached to the coupling 484 of the robotic device 400. The coupling portion 484 can be configured to engage with the coupling element 359 and provide an electrical connection between the operating element 450 and an onboard computing device (e.g., control unit 202 or 302), thereby enabling the onboard computing device to power and / or control the components of the operating element 450 and to receive information collected by sensors (e.g., sensors 353, 355, 357, and 358) located on the operating element 450.
[0056] 【0067】 Optionally, the robotic device 400 may also include one or more additional sensors 470 located on the segment 482, the body 488, the base 486, and / or other parts of the robotic device 400. The sensors 470 may be, for example, image-capturing devices, force or torque sensors, motion sensors, light sensors, pressure sensors, and / or temperature sensors. The sensors 470 may enable the robotic device 400 to capture visual and non-visual information about the environment.
[0057] 【0068】Figures 8 to 11 are flowcharts illustrating methods 700 that can be performed by a robotic system (e.g., robotic system 100) comprising one or more robotic devices, according to several embodiments. For example, all or part of method 700 can be performed by one robotic device, such as any robotic device described herein. Alternatively, all of method 700 can be performed sequentially by multiple robotic devices, each performing a part of method 700. Alternatively, all or part of method 700 can be performed simultaneously by multiple robotic devices.
[0058] 【0069】 As shown in Figure 8, in 702, the robotic device can scan the environment and acquire a representation of the environment. The robotic device can scan the environment using one or more sensors (e.g., sensors 270 or 470 and / or image acquisition device 472). In some embodiments, the robotic device can scan the environment using a movable camera, where the camera's position and / or focus can be adjusted to capture an area of the environment in the scene. In 704, based on the information collected during the scan, the robotic device can analyze the data to identify markers in the captured representation of the environment. Markers can be associated with one or more objects in the scene marked using visual or alignment markers, such as QR codes, barcodes, tags, etc. Alternatively or additionally, the robotic device can identify markers associated with one or more objects in the environment via object recognition using object information (e.g., object information 340) stored in the robotic device's memory (e.g., storage device 330). The object information may include information indicating different characteristics of the object, such as location, color, shape, and surface features. In one embodiment, object information can be organized as numerical values representing different features of the object, which can be called a feature space.
[0059] 【0070】After identifying the marker, the robotic device may optionally represent the marker in the environmental representation at 706. In some embodiments, the environmental representation may be a visual representation, such as an extended diagram of the environment. In such embodiments, the robotic device may, for example, display the visual representation of the environment on a display screen and display the location of the marker in the visual representation of the environment. Alternatively or additionally, the environmental representation may be a semantic representation of the environment having the location of the marker represented by the semantic marker in the environment.
[0060] 【0071】 In some embodiments, in 708, the robot device can present a representation of the environment having a marker to the user and optionally prompt the user to accept or reject the marker in the representation of the environment, for example, via a user interface or other type of I / O device. If the user does not accept the marker (708: no), method 700 returns to 702, and the robot device can rescan the environment to obtain a second representation of the environment. If the user accepts the marker (708: yes), method 700 proceeds to 708, and the robot device can store information associated with the marker (e.g., location, features, etc.) in memory (e.g., storage device 330). For example, the robot device can store the location of the marker in an internal map of the environment (e.g., map 332).
[0061] 【0072】In some embodiments, in 704, the robotic device can identify a marker, and in 710, it can proceed directly to store the location of the marker and / or other information associated with the marker without prompting the user to allow the marker. In such embodiments, the robotic device can analyze the location of the marker before storing it. For example, the robotic device may have previously stored information about the marker's location (e.g., acquired during a previous scan of the environment and / or input to the robotic device by the user or a computing device), and can compare the marker's location with the previously stored information to check accuracy and / or identify changes in the marker's location. In particular, if the previously stored information indicates that a particular marker should be in a different location than the location identified by the robotic device, the robotic device may initiate an additional scan of the environment to verify the marker's location and then store it. Alternatively or additionally, the robotic device may send a notification to the user indicating that the marker's location has changed. In such a case, the robotic device may store the new location of the marker as well as a message indicating that there has been a change in the marker's location. The user or computing device can then review the message and accept the change in marker location at a later point in time.
[0062] 【0073】Optionally, method 700 can proceed to 712, in which the robotic device can prompt a user, for example via a user interface or other type of I / O device, to select a set of markers from markers identified in a representation of the environment, as shown in Figure 9. In 714, the user can make a selection, and the robotic device can receive the selection from the user. Alternatively, in some embodiments, the robotic device can automatically select a set of markers instead of prompting a user to make a selection. The robotic device can be programmed to select markers based on certain predefined or learned rules and / or conditions. For example, the robotic device can be instructed to select markers associated with a particular type of object (e.g., supplies) at a particular time or when there is little foot traffic in the building. In the case of low foot traffic in the building, the robotic device can determine when there is little foot traffic in the building by actively moving around the building (e.g., patrolling and monitoring corridors and rooms) and scanning the environment. The robotic device can then know to select markers associated with a particular object when there is less foot traffic in the building than at most other times.
[0063] 【0074】 After the robotic device receives a set of markers selected by the user and / or automatically selects a set of markers, method 700 can proceed to skill learning in 716 or skill execution in 718.
[0064] 【0075】For any specific skill, a robotic device can be taught the skill before or after it has performed the skill. For example, to acquire an operational skill, a robotic device can be taught using LfD (e.g., kinesthetic teaching), thereby allowing a user or other robotic device to demonstrate the skill to the robotic device. For example, an operational element of a robotic device, such as an arm, can move through a series of waypoints and interact with an object. In the case of kinesthetic teaching, a user can physically demonstrate the skill to the robotic device. Training or teaching can be performed, for example, in a mass production setting such as a manufacturing environment, where the robotic device can be taught using an aggregated model that represents the general performance of the skill. Alternatively or additionally, teaching can be performed in-situ after the robotic device has been deployed (e.g., in a hospital), so that the robotic device can learn to perform the skill in a specific location environment. In some embodiments, the robotic device can be taught in an in-situ setting, and then the robotic device can transmit information associated with the learned skill to one or more additional robotic devices, so that those additional robotic devices also have knowledge of the taught skill when operating in the same in-situ setting. Such embodiments are useful when multiple robotic devices are deployed in one location. Each robotic device can then receive and transmit information to other robotic devices, thereby enabling the robotic devices to collectively learn a set of skills in their field environment.
[0065] 【0076】In the learning mode shown in Figure 10, method 700 proceeds to 720-724, where the user can teach the robotic device skills using the LfD teaching process. In one embodiment, skills can be defined as grasping an object placed at a specific location, picking up an object, moving an object to a different location, and unloading an object to a different location. In 720, the user (or another robotic device) can guide the robotic device, including the manipulator elements (e.g., manipulator elements 250, 350, or 450), through movement. For example, the user can guide the manipulator elements of the robotic device through demonstrations associated with specific skills, e.g., human interaction, engagement with an object, and / or manipulation of an object. While guiding the robotic manipulator elements through movement, the user can indicate to the robot when to capture information about the state of the manipulator elements (e.g., joint configuration, joint force and / or torque, end effector configuration, end effector position) and / or the environment (e.g., the location of the object associated with the selected marker and / or other objects in the environment). For example, in 722, the robot device may receive a signal from the user to capture information about the operating element and / or the environment at a waypoint or in a keyframe during the movement of the operating element. In 724, in response to receiving a signal, the robot device may capture information about the operating element and / or the environment in that keyframe. Operating element information may include, for example, joint configuration, joint torque, end effector position, and / or end effector torque. Environment information may include, for example, the position of a selected marker relative to the end effector, indicating to the robot device when an object in the environment may have moved. If the movement is still in progress (728: no), the robot device may wait to capture information about the operating element and / or the environment in an additional keyframe. In some embodiments, the robot device may be programmed to capture keyframe information without receiving a signal from the user.For example, while an operating element is being moved by a user, the robot device can monitor changes in the segments and joints of the operating element. If these changes exceed a threshold or if there is a change in the direction of the segment or joint's trajectory, the robot device can automatically select that point as a keyframe and record information about the operating element and / or environment at that keyframe.
[0066] 【0077】 During the movement of the operating element, the robotic device in 730 may also continuously or periodically record sensor information, such as information about the operating element and / or the environment, without receiving signals from the user. For example, the robotic device may record information about the trajectories of segments and joints and their configuration as the user moves the operating element through a demonstration.
[0067] 【0078】 In some embodiments, the robotic device may include an audio device such as a microphone (e.g., 244), and the keyframe divisions can be controlled by speech commands. For example, a user may indicate to the robotic device that they plan to perform a demonstration by saying, "I will guide you." The demonstration can begin when the user indicates the first keyframe by saying, "start here." Intermediate keyframes can be indicated by saying, "go here." The final keyframe, representing the end of the demonstration, can be indicated by saying, "end here." Suitable examples of demonstration instruction are provided in the Akgun article.
[0068] 【0079】Once movement or demonstration is complete (728: Yes), the robotic device can generate a model of the demonstrated skill based on a subset of all recorded sensor information (e.g., operating element information, environmental information). For example, in 732, the robotic device may optionally prompt the user to select features related to skill learning, for example, via a user interface or other type of I / O device, and in 734, the robotic device can receive the features selected by the user. Alternatively or additionally, the robotic device may know to select specific features to be used to generate the model based on previous commands from the user. For example, the robotic device may recognize, for example, based on sensor information, that object pickup is being demonstrated, and may automatically select one or more features of sensor information (e.g., joint configuration, joint torque, end effector torque) to include as features related to generating the skill model, for example, based on past demonstrations of picking up the same or different objects.
[0069] 【0080】In 736, the robotic device can generate a model of the skill using selected features. The model can be generated using a stored machine learning library or algorithm (e.g., machine learning library 342). In some embodiments, the model can be represented as an HMM algorithm including, for example, several hidden states, a feature space (e.g., features contained in a feature vector), and an emission distribution for each state modeled as a normal distribution. In some embodiments, the model can be represented as a support vector machine or "SVM" model, which can include parameters such as kernel type (e.g., linear, radial, polynomial, sigmoid), cost parameter or function, weights (e.g., equal, class-balanced), loss type or function (e.g., hinge, square hinge), and solution or problem type (e.g., dual, primal). The model can be associated with relevant sensor information and / or other sensor information recorded by the robotic device during the skill demonstration. The model can also be associated with marker information showing features associated with a set of markers manipulated during the skill demonstration and / or one or more physical objects linked to those markers. In 738, the robotic device can store the model in memory (for example, in memory device 230 or 330).
[0070] 【0081】 Optionally, at 740, the robotic device may determine whether the user will perform another demonstration of the skill. If another demonstration is to be performed (740: yes), method 700 may return to 720, and the user (or other robotic device) may guide the robotic device through the additional demonstration. If the demonstration is complete (740: no), method 700 may optionally return to the beginning and perform a new scan of the environment. Alternatively, in some embodiments, method 700 may terminate.
[0071] 【0082】In some embodiments, as described above, the robotic device can actively scan the surrounding environment to monitor environmental changes. Thus, during learning and / or execution, the robotic device can engage in continuous scanning of the environment and update the representation of the environment and stored environmental information accordingly.
[0072] 【0083】 In the execution mode shown in Figure 11, at 750, the robot device can optionally prompt the user to select a model, such as a model previously generated by the robot device in learning mode, for example, via a user interface or other type of I / O device. At 752, the robot device can receive a model selection. In some embodiments, the robot device can receive a model selection from the user, or alternatively, the robot device can automatically select a model based on specific rules and / or conditions. For example, the robot device can be programmed to select a model when it is in a specific area of a building (e.g., in a specific room or floor), at a specific date and time, etc. Alternatively or additionally, the robot device can be known to select a specific model based on a selected set of markers. At 754, the robot device can decide whether to move closer to the selected markers before generating a trajectory with respect to the selected markers and performing the skill. For example, the robot device can decide whether to move to a better position to perform the skill (e.g., closer to or near the markers, facing the markers from a specific angle) based on a selected set of markers and a selected model. A robotic device may make this decision based on sensor information recorded during the skill demonstration. For example, the robotic device may recognize that it was closer to the marker when the skill was demonstrated and adjust its position accordingly.
[0073] 【0084】If the robot device decides to move relative to the selected marker (754: yes), it can move its position in 756 (e.g., adjust its location and / or orientation), and method 700 can return to 702, where the robot device scans the environment again to obtain a representation of the environment. Method 700 can then proceed to 754 again through various steps. If the robot device decides not to move relative to the selected marker (754: no), it can generate a motion trajectory for the operating element of the robot device, for example.
[0074] 【0085】 In particular, in 758, the robotic device can compute a function that performs a conversion (e.g., translation) between a selected set of markers and a set of markers associated with a selected model, which are referred to herein as “memorized markers” or “memorized set of markers” (e.g., markers selected when the robotic device learned a skill, i.e., when the selected model was generated). For example, the robotic device can be taught a skill using a first set of markers that are at a specific location and / or orientation relative to a part of the robotic device, such as an end effector, and later the robotic device can perform the skill using a second set of markers that are at a different location and / or orientation relative to an operating element. In such a case, the robotic device can compute a conversion function that performs a conversion between the location and / or orientation of the first set of markers and the location and / or orientation of the second set of markers.
[0075] 【0086】In 760, the robotic device can use a calculated transformation function to transform a portion of the manipulator element, such as the position and orientation of the manipulator's end effector, at each keyframe recorded when the skill is taught. Optionally, in 762, the robotic device can use inverse kinematic equations or algorithms to determine the configuration of the manipulator's joints at each keyframe. The position and orientation of the end effector and a set of markers can be provided in task space (e.g., Cartesian space in which the robotic device operates), while the joint orientation can be provided in joint or configuration space (e.g., an n-dimensional space associated with the configuration of the manipulator, where the robotic device is represented as a point and n is the degree of freedom of the manipulator). In some embodiments, the inverse kinematic calculation can be guided by joint configuration information recorded when the robotic device is taught the skill (e.g., joint configurations recorded during instructional demonstrations using the manipulator). For example, the joint configurations recorded at each keyframe can be seeded into the inverse kinematic calculation (e.g., to provide or bias initial estimations for the calculation). For example, additional conditions can be imposed on the inverse kinematics calculation, such as the requirement that the calculated joint configuration does not deviate from the joint configuration in adjacent keyframes by more than a predetermined amount. In 764, the robotic device can, for example, plan the trajectory between joint configurations from one keyframe to the next in joint space to generate the complete trajectory for the manipulating element to perform the skill.
[0076] 【0087】 In some embodiments, the robotic device can plan the trajectory of the operating element in task space after changing the position and orientation of a portion of the operating element (e.g., an end effector). In such embodiments, method 700 can proceed directly from 760 to 764.
[0077] 【0088】In 766 and 768, the robot device may optionally present a trajectory to the user and prompt the user to accept or reject the trajectory, for example, via a user interface or other I / O device. Alternatively, the robot device may accept or reject a trajectory by analyzing relevant sensor information based on internal rules and / or conditions. If the trajectory is rejected (768: No), in 770, the robot device may optionally modify one or more parameters of the selected model and generate a second trajectory in 758-764. The model parameters can be modified, for example, by selecting different features (e.g., different sensor information) to include in the model generation. In some embodiments, if the model is an HMM model, the robot device may modify the model parameters based on the determined success or failure, where the robot device tracks the log-likelihoods of different models with different parameters and selects the model with a higher log-likelihood than the others. In some embodiments, if the model is an SVM model, the robot device may modify the parameters by changing the feature space or configuration parameters (e.g., kernel type, cost parameter or function, weights) as described above.
[0078] 【0089】 If a trajectory is permitted (768: yes), in 772, the robotic device may move the manipulator to execute the generated trajectory. While the manipulator is executing the planned trajectory, in 774, the robotic device may record and / or store sensor information, such as information about the manipulator and / or the environment, through one or more sensors on the manipulator and other components of the robotic device.
[0079] 【0090】Optionally, in 774, the robotic device may determine whether the execution of a skill was successful (e.g., whether the interaction with an object satisfies predefined success criteria). For example, the robotic device may scan the environment and determine whether the environment and the current state of the robotic device include, for example, the location of one or more objects and / or the position or orientation of those objects relative to an operating element or another component of the robotic device, and whether the current state matches predefined success criteria. Predefined and / or learned success criteria may be provided by the user or, in some embodiments, by different robotic devices and / or computing devices. Predefined and / or learned success criteria may indicate information about different characteristics of the environment and / or robotic device associated with success. In some embodiments, the user may also provide the robotic device with input indicating that the execution was successful.
[0080] 【0091】 In a particular example, if the skill is defined as grasping and picking up an object at a specific location, the success of the skill may be defined as detecting that one or more markers associated with the object are in a specific relationship with each other and / or with the robotic device, or that the end effector or joint (e.g., wrist joint) of the operating element is receiving (or has received) sufficient force or torque to support the weight of the object and thus pick up the object. If the execution is unsuccessful (776: no), in 770, the robotic device may optionally change the parameters of the model and / or generate a new trajectory in 758-764. If the execution is successful (776: yes), data associated with the successful interaction (e.g., data indicating that the execution was successful and how successful) may be recorded, and method 700 may optionally return to the beginning and perform a new scan of the environment. Alternatively, in some embodiments, method 700 may terminate.
[0081] 【0092】 Figure 12 is a block diagram showing a system architecture for robot learning and execution, including user-performed actions, according to several embodiments. System 800 can be configured for robot learning and execution. System 800 may include one or more robotic devices, such as any robotic devices described herein, and may perform modules, processes, and / or functions shown in Figure 12 as active scan 822, marker identification 824, learning and model generation 826, trajectory generation and execution 828, and success monitoring 829. Active scan 822, marker identification 824, learning and model generation 826, trajectory generation and execution 828, and success monitoring 829 may correspond to one or more steps performed by the robotic devices described with reference to Method 700, shown in Figures 8 to 11. For example, the active scan 822 may include step 702 of method 700, the marker identification 824 may include one or more of steps 704-710 of method 700, the learning and model generation 826 may include one or more of steps 712-738, the trajectory generation and execution 828 may include one or more of steps 712, 714, 718, and 750-774, and the success monitoring 829 may include one or more of steps 774 and 776.
[0082] 【0093】System 800 can connect to (communicate with) one or more devices, including, for example, a camera 872, an arm 850 (including a gripper 856 and a sensor 870), a display device 842, and a microphone 844. System 800 can receive inputs from the user associated with one or more user actions 880 via the display device 842, the microphone 844, and / or other I / O devices (not shown). User actions 880 may include, for example, 882: a user requesting marker acceptance or environment rescanning by a marker, 884: a user selecting a marker, 886: a user selecting information related to model generation, 888: a user selecting a model on which to perform a skill, 890: a user accepting a trajectory on which to perform a skill, 892: a user confirming the success of a performed skill, and 894: a user teaching a skill via sensorimotor learning.
[0083] 【0094】In the case of active scan 822, the system 800 scans the environment using camera 872 and records sensor information about the environment, including information associated with one or more markers in the environment. In the case of marker identification 824, the system 800 analyzes the sensor information to identify one or more markers in the environment and can receive input from the user, for example via display device 842, indicating 822: user acceptance of markers or a request to rescan the environment. In the case of learning and model generation 826, the system 800 receives sensor information collected by camera 872 and / or sensor 870 on arm 850 and can use that information to generate a model of the skill. As part of learning and model generation 826, the system 800 can receive input from the user, for example via display device 842 and / or microphone 844, indicating 884: the user has selected a set of markers to teach the skill, 886: the user has selected specific features of the recorded sensor information to use in generating the model, and / or 894: the user is demonstrating the skill. In the case of trajectory generation and execution 828, the system 800 can generate a planned trajectory and execute the trajectory by controlling the movement of the arm 850. As part of trajectory generation and execution 828, the system 800 can receive input from the user, for example via the display device 842, indicating 888: that the user has selected a model for trajectory generation and / or 890: that the user has accepted or rejected the generated trajectory. In the case of success monitoring 829, the system 800 can determine whether the skill execution was successful by analyzing sensor information recorded by the sensor 870 during the skill execution. As part of success monitoring 829, the system 800 can receive input from the user, for example via the display device 842 and / or microphone 844, indicating 892: that the user has confirmed the success of the execution.
[0084] 【0095】While the connections between specific devices and / or system 800 and those devices are shown in Figure 12, it will be understood that in any embodiment described herein, additional devices (not shown) can communicate with system 800 to receive information from and / or transmit information to system 800.
[0085] 【0096】 It should be understood that this disclosure may include any one and up to all of the following examples.
[0086] 【0097】 Example 1: A device comprising a memory, a processor, an operating element, and a pair of sensors, wherein the processor is operably coupled to the memory, the operating element, and the pair of sensors, and is configured to: acquire a representation of the environment; identify a plurality of markers in the representation of the environment, each marker from the plurality of markers being associated with a physical object from a plurality of physical objects placed in the environment; present information indicating the position of each marker from the plurality of markers in the representation of the environment; receive a set of markers selected from a plurality of markers associated with a set of physical objects from a plurality of physical objects; acquire sensor information associated with the operating element for each position from a plurality of positions associated with the movement of the operating element in the environment, the movement of the operating element being associated with a physical interaction between the operating element and a pair of physical objects; and generate a model configured to define the movement of the operating element to perform a physical interaction between the operating element and a pair of physical objects based on the sensor information.
[0087] 【0098】 Example 2: The apparatus of Example 1, wherein the set of physical objects includes a human being.
[0088] 【0099】 Example 3: The apparatus of any of Examples 1-2, wherein the operating element includes an end effector configured to engage with a subset of physical objects from a set of physical objects.
[0089] 【0100】Example 4: A subset of sensors is a first subset of sensors, and the processor is configured to acquire sensor information via a second subset of sensors from a set of sensors, the second subset of sensors being different from the first subset of sensors, one of the devices from Examples 1 to 3.
[0090] 【0101】 Example 5: An apparatus from any one of Examples 1 to 4, wherein the operating element includes a plurality of movable components connected via a plurality of joints, and a set of sensors includes at least one sensor configured to measure the force acting on the joints from the plurality of joints, or a sensor configured to detect the engagement between the movable components from the plurality of movable components and a physical object from a set of physical objects.
[0091] 【0102】 Example 6: The apparatus of Example 5, further comprising a set of sensors configured to measure the position of a joint or movable component relative to a part of the apparatus.
[0092] 【0103】 Example 7: The apparatus of Example 5, wherein the sensor set further includes at least one of a light sensor, a temperature sensor, an audio capture device, and a camera.
[0093] 【0104】 Example 8: The apparatus of Example 1, wherein the operating element includes (i) a plurality of joints and (ii) an end effector configured to move a physical object from a set of physical objects, and a set of sensors includes a sensor configured to measure the force applied to the joints from at least one of the end effector or a plurality of joints coupled to the end effector when the end effector moves a physical object.
[0094] 【0105】Example 9: Any one of Examples 1 to 8, wherein the sensor information includes sensor data associated with a set of features, the processor is further configured to receive a first subset of features selected from the set of features, and the processor is configured to generate a model based on the sensor data associated with the first subset of features, rather than sensor data associated with a second subset of features from the set of features that are not included in the first set of features.
[0095] 【0106】 Example 10: The apparatus of Example 9, wherein the processor is further configured to prompt the user to select at least one feature from a set of features so that the processor receives a first subset of selected features in response to a selection made by the user.
[0096] 【0107】 Example 11: A device from any one of Examples 1 to 10, in which multiple markers are alignment markers and the representation of the environment is a visual representation of the environment.
[0097] 【0108】 Example 12: The processor is further configured to store a model and information associated with the model in memory, the information associated with the model includes (i) a set of markers and (ii) sensor information, one of the devices from Examples 1 to 11.
[0098] 【0109】 Example 13: The apparatus of Example 1, wherein the operating element includes multiple joints, and the sensor information includes information indicating the current state of each joint from the multiple joints for each position from a plurality of positions associated with the movement of the operating element.
[0099] 【0110】 Example 14: Any one of the devices in Examples 1 to 13, wherein the processor is further configured to prompt the user to select at least one marker from a set of markers in response to a selection made by the user.
[0100] 【0111】Example 15: Any one of the devices from Examples 1 to 14, wherein the processor is configured to acquire a representation of the environment by scanning a target area in the environment using a set of sensors.
[0101] 【0112】 Example 16: A method comprising: acquiring a representation of an environment via a set of sensors; identifying a plurality of markers in the representation of the environment, wherein each marker from the plurality of markers is associated with a physical object from a plurality of physical objects placed in the environment; presenting information indicating the position of each marker from the plurality of markers in the representation of the environment; receiving, after presentation, a set of markers selected from a plurality of markers associated with a set of physical objects from a plurality of physical objects; acquiring sensor information associated with an operating element for each position from a plurality of positions associated with the movement of the operating element in the environment, wherein the movement of the operating element is associated with a physical interaction between the operating element and a set of physical objects; and generating a model configured to define the movement of the operating element to perform a physical interaction between the operating element and a set of physical objects based on the sensor information.
[0102] 【0113】 Example 17: The method of Example 16, wherein the sensor information includes sensor data associated with a set of features, the method further includes receiving a first subset of features selected from the set of features, and generating a model based on the sensor data associated with the first subset of features, rather than sensor data associated with a second subset of features from the set of features that are not included in the first set of features.
[0103] 【0114】 Example 18: The method of Example 17, further comprising prompting the user to select at least one feature from a set of features so that, in response to a selection made by the user, a first subset of the selected features is received.
[0104] 【0115】Example 19: The method of Examples 16-18, in which multiple markers are alignment markers and the representation of the environment is a visual representation of the environment.
[0105] 【0116】 Example 20: Any one of Examples 16 to 19, wherein the operating element includes multiple joints, and the sensor information includes information indicating the current state of each joint from the multiple joints for each position from a plurality of positions associated with the movement of the operating element.
[0106] 【0117】 Example 21: Any one of Examples 16-20, further comprising prompting the user to select at least one marker from a set of markers so that, after presentation, a selection of a set of markers is received in response to the user's choice.
[0107] 【0118】 Example 22: Obtaining a representation of the environment is one of the methods in Examples 16-21, which involves scanning a target area in the environment using a set of sensors.
[0108] 【0119】 Example 23: A non-temporary processor-readable medium storing code representing instructions to be executed by a processor, the code including code that causes the processor to: acquire a representation of an environment via a set of sensors; identify a plurality of markers in the representation of the environment, each marker from the plurality of markers associated with a physical object from a plurality of physical objects placed in the environment; present information indicating the position of each marker from the plurality of markers in the representation of the environment; and, in response to receiving a set of markers selected from a plurality of markers associated with a set of physical objects from a plurality of physical objects, identify a model associated with the execution of a physical interaction between an operating element and a set of physical objects, the operating element including a plurality of joints and end effectors; and use the model to generate a trajectory of the operating element that defines the movement of the plurality of joints and end effectors associated with the execution of the physical interaction.
[0109] 【0120】 Example 24: A non-temporary processor-readable medium of Example 23, in which the code causing the processor to identify a model associated with the execution of a physical interaction includes code that prompts the user to identify the model.
[0110] 【0121】 Example 25: A non-transient processor-readable medium from any one of Examples 22-24, which includes code that causes the processor to identify a model associated with the execution of a physical interaction, and which causes the processor to identify a model based on the selection of a set of markers.
[0111] 【0122】 Example 26: A non-temporary processor-readable medium of any one of Examples 23-25, further comprising code that causes the processor to display to the user the trajectory of an operating element in a representation of the environment, receive input from the user after the display, and perform physical interaction by moving a plurality of joints and end effectors in response to input indicating the tolerance of the operating element's trajectory.
[0112] 【0123】 Example 27: A non-temporary processor-readable medium from any one of Examples 23-26, wherein the trajectory is a first trajectory, and the non-temporary processor-readable medium further includes code that causes the processor to modify a set of parameters associated with a model in response to an input that does not indicate the acceptance of a trajectory for an operating element, thereby generating a modified model, and then using the modified model to generate a second trajectory for an operating element.
[0113] 【0124】 Example 28: A non-temporary processor-readable medium from any one of Examples 23 to 26, wherein the trajectory is a first trajectory, the model is a first model, and the non-temporary processor-readable medium further includes code that causes the processor to generate a second model based on sensor data associated with a set of features different from the set of features used to generate the first model in response to an input that does not indicate an acceptable trajectory for an operating element, and to generate a second trajectory for an operating element using the second model.
[0114] 【0125】Example 29: A non-temporary processor-readable medium of any one of Examples 23-28, further comprising code that causes a processor to perform physical interactions by moving multiple joints and end effectors, to acquire sensor information associated with the performance of the physical interactions, and to determine, based on the sensor information, whether the performance of the physical interactions satisfies predefined and / or learned success criteria.
[0115] 【0126】 Example 30: A non-temporary processor-readable medium of Example 29, further comprising code that causes the processor to generate a signal indicating that a physical interaction was successful in response to a determination that the execution of a physical interaction satisfies predefined and / or learned success criteria, and to modify a model based on sensor information to generate a modified model and to generate a second trajectory of the operating element using the modified model in response to a determination that the execution of a physical interaction does not satisfy predefined and / or learned success criteria.
[0116] 【0127】 Example 31: A non-temporary processor-readable medium from any one of Examples 23 to 30, wherein the model is associated with (i) a stored set of markers, (ii) sensor information indicating at least one of the position or orientation of the operating element at a point along a stored trajectory of the operating element associated with the stored set of markers, and (iii) sensor information indicating the configuration of multiple joints at a point along the stored trajectory, and the code that causes the processor to generate the trajectory of the operating element includes code that causes the processor to calculate a conversion function between the set of markers and the stored set of markers, for each point to convert at least one of the position or orientation of the operating element using the conversion function, for each point to determine a planned configuration of multiple joints based on the configuration of multiple joints at a point along the stored trajectory, and for each point to identify a portion of the trajectory between that point and a contiguous point based on the planned configuration of multiple joints at that point.
[0117] 【0128】Example 32: A non-temporary processor-readable medium from any one of Examples 23 to 30, wherein the model is associated with (i) a stored set of markers and (ii) sensor information indicating at least one of the position or orientation of an operating element at a point along a stored trajectory of an operating element associated with the stored set of markers, and the code that causes the processor to generate the trajectory of the operating element includes the code that causes the processor to calculate a conversion function between a set of markers and a selected set of markers, to convert at least one of the position or orientation of the operating element at each point using the conversion function, to determine a planned configuration of multiple joints at each point, and to determine a portion of the trajectory between that point and consecutive points at each point based on the planned configuration of multiple joints at that point.
[0118] 【0129】 Example 33: A non-temporary processor-readable medium of any one of Examples 23 to 32, further including code that causes the processor to determine whether or not to change the location of an operating element based on the distance between the first location of the operating element and the location of a physical object from a set of physical objects, and in response to the determination to change the location of the operating element, to move the operating element from the first location to a second location closer to the location of the physical object, wherein the code that causes the processor to generate the trajectory of the operating element includes code that, after the move, generates the trajectory based on the location of the physical object relative to the second location of the operating element.
[0119] 【0130】While various embodiments of the present invention have been described and illustrated herein, those skilled in the art will readily devise various other means and / or structures to perform the functions described herein and / or obtain one or more of the results and / or benefits described herein, and such variations and / or modifications will each be considered within the scope of the embodiments of the present invention described herein. More generally, all parameters, dimensions, materials and configurations described herein are intended to be illustrative, and those skilled in the art will readily understand that actual parameters, dimensions, materials and / or configurations will depend on the particular application or the application in which the teachings of the present invention are used. Those skilled in the art will be able to recognize or determine many equivalents to the particular embodiments of the present invention described herein simply by using routine experimentation. Accordingly, it should be understood that the above embodiments are presented merely as examples, and embodiments of the present invention may be carried out in ways other than those specifically described and described within the appended claims and their equivalents. Embodiments of the present invention in this disclosure relate to the individual features, systems, articles, materials, kits and / or methods described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and / or methods is included within the scope of the present invention as long as such features, systems, articles, materials, kits, and / or methods are not mutually inconsistent.
[0120] 【0131】 Furthermore, various concepts of the present invention can be implemented in one or more ways, and one example is provided. The operations performed as part of the method can be arranged in any suitable way. Thus, even when shown as sequential operations in exemplary embodiments, embodiments can be constructed in which the operations are performed in a different order than shown, and which may include performing several operations simultaneously.
[0121] 【0132】Some embodiments and / or methods described herein can be implemented by different software (executed in hardware), hardware, or combinations thereof. Hardware modules may include, for example, general-purpose processors, field-programmable gate arrays (FPGAs), and / or application-specific integrated circuits (ASICs). Software modules (executed in hardware) can be expressed in a variety of software languages (e.g., computer code), including C, C++, Java®, Ruby, Visual Basic®, and / or other object-oriented, procedural, or other programming languages and development tools. Examples of computer code include, but are not limited to, microcode or microinstructions, machine instructions such as those generated by a compiler, code used to generate web services, and files containing higher-level instructions executed by a computer using an interpreter. For example, embodiments may be implemented using imperative programming languages (e.g., C, Fortran, etc.), functional programming languages (e.g., Haskell, Erlang, etc.), logic programming languages (e.g., Prolog), object-oriented programming languages (e.g., Java, C++, etc.), or other suitable programming languages and / or development tools. Further examples of computer code, though not limited to them, include control signals, encryption codes, and compression codes.
[0122] 【0133】 While terms such as "first," "second," and "third" may be used in this specification to describe various elements, it will be understood that these elements should not be limited by these terms. These terms are used solely to distinguish one element from another. Accordingly, the first element, as described below, can be referred to as the second element without departing from the teachings of this disclosure.
[0123] 【0134】The terms used herein are intended solely to describe specific embodiments and are not intended to be limiting. Where used herein, the singular forms "a," "an," and "the" are intended to also include the plural forms unless the context clearly indicates otherwise. Where used herein, the terms "comprises" and / or "comprising" or "includes" and / or "including" specify the presence of the described features, regions, integers, steps, actions, elements, and / or components, but do not exclude the presence or addition of one or more other features, regions, integers, steps, actions, elements, components, and / or groups thereof.
Claims
[Claim 1] Memory and Processor and Operation elements and, At least one first sensor, It comprises at least one second sensor, The processor is operably coupled to the memory, the operating element, the at least one first sensor, and the at least one second sensor, and the processor is To obtain a visual representation of an environment in which multiple physical objects are arranged via at least one of the first sensors, Identifying a plurality of markers in a visual representation of the environment by using recognition of a visual code marked on one of the plurality of physical objects or by using object recognition based on object information, wherein each of the plurality of markers is associated with each of the plurality of physical objects placed in the environment. To present to the user, via a user interface, information indicating the position of each of the multiple markers in the visual representation of the environment, The user interface receives from the user a selection of one set of markers from among the multiple markers associated with one set of physical objects from among the multiple physical objects, When the user moves the operating element through a demonstration of a skill that includes physical interaction between the operating element and the set of physical objects, sensor information is acquired via at least one second sensor at each of a set of keyframes during the demonstration, wherein the sensor information includes sensor data associated with a set of features indicating the state of the operating element, and each of the set of keyframes indicates a specific point that is automatically selected when a change in the joint of the operating element exceeds a threshold or when there is a change in the direction of the trajectory of the joint of the operating element during the demonstration of the skill. The set of features is presented to the user via the user interface so that the user can select a subset of the set of features related to learning the skills. To generate a model of the skill using sensor data associated with a subset of the features selected by the user, A robotic device configured to associate marker information indicating the aforementioned set of markers and sensor information with the aforementioned model. [Claim 2] The robotic device according to claim 1, wherein the operating element includes an end effector configured to engage with a subset of at least one of the set of physical objects. [Claim 3] The robot device according to claim 1, wherein the at least one second sensor is different from the at least one first sensor. [Claim 4] The operating element includes a plurality of movable components connected via a plurality of joints, The at least one second sensor is A force sensor configured to measure the force acting on one of the multiple joints, or Includes at least one engagement sensor configured to detect engagement between one of the plurality of movable components and one of the set of physical objects, The robotic device according to claim 1, wherein the processor is configured to generate the model for performing the skill using at least sensor data indicating force measured by the force sensor or engagement detected by the engagement sensor. [Claim 5] The operating element includes (i) a plurality of joints, and (ii) an end effector configured to move one of the physical objects from the set of physical objects, The at least one second sensor includes a force sensor configured to measure the force applied to the end effector or at least one of the multiple joints coupled to the end effector when the end effector moves the physical object, The robotic device according to claim 1, wherein the processor is configured to generate the model for performing the skill using at least sensor data indicating force measured by the force sensor. [Claim 6] The aforementioned processor, From the machine learning library stored in the aforementioned memory, select a Hidden Markov Model (HMM) or Support Vector Machine (SVM) algorithm. The robotic device according to claim 1, configured to generate the model for performing the skill by setting one or more parameters of the HMM or SVM algorithm using the sensor data associated with the subset of features selected by the user. [Claim 7] The processor, after generating the model for performing the skill, To obtain a second visual representation of the environment via the at least one first sensor, Identifying a plurality of second markers in a second visual representation of the environment by using the recognition of the visual code or the object recognition based on the object information, wherein each of the plurality of second markers is associated with each of the plurality of physical objects placed in the environment; The robotic device according to claim 1, further configured to use the model and one set of markers from the plurality of second markers to generate a trajectory of the operating element for performing the skill using the operating element. [Claim 8] The aforementioned processor, Presenting the trajectory to the user via the user interface so that the user can accept or reject the trajectory, The robotic device according to claim 7, further configured to optionally change one or more parameters of the model in response to receiving input from the user indicating rejection of the trajectory. [Claim 9] The aforementioned processor, Performing the skill using the aforementioned operating element, Using the at least one first sensor and the at least one second sensor, data is acquired during the performance of the skill. The system receives input from the user via the user interface indicating whether the execution of the skill was successful, In response to the input indicating that the execution of the skill was successful, the memory is stored the data associated with the successful execution of the skill. The robotic device according to claim 7, further configured to optionally change one or more parameters of the model in response to an input indicating that the performance of the skill was unsuccessful. [Claim 10] A non-temporary processor-readable medium storing code representing instructions to be executed by the processor of a robot device, wherein the code is stored in the processor, The robot device acquires a visual representation of an environment in which multiple physical objects are arranged, via a set of sensors. Identifying a plurality of markers in a visual representation of the environment by using recognition of a visual code marked on one of the plurality of physical objects or by using object recognition based on object information, wherein each of the plurality of markers is associated with each of the plurality of physical objects placed in the environment. The user is presented with information indicating the position of each marker among the plurality of markers in a visual representation of the environment, via the user interface of the robot device, so that the user can select one set of markers among the plurality of markers associated with one set of physical objects among the plurality of physical objects. In response to receiving the selection of the set of markers, to identify a model associated with the execution of a physical interaction between the operating element of the robotic device and the set of physical objects, wherein the operating element includes a plurality of joints and end effectors. Using the aforementioned model, a trajectory is generated that includes the movement of the plurality of joints and the end effector associated with the execution of the physical interaction, Presenting the trajectory to the user via the user interface so that the user can accept or reject the trajectory, In response to receiving input from the user indicating the acceptance of the trajectory, the movement of the plurality of joints and the end effector is performed to execute the physical interaction, The code includes, in response to receiving input from the user indicating rejection of the aforementioned orbit, modifying one or more parameters of the model to generate a modified model, and using the modified model to generate a second orbit, The model is associated with (i) a stored set of markers, (ii) sensor information indicating at least one of the position or orientation of the operating element at a point along a stored trajectory of the operating element associated with the stored set of markers, and (iii) sensor information indicating the relative positions of the plurality of joints at the point along the stored trajectory. The aforementioned point is a non-temporary processor-readable medium, which is automatically selected when a change in one of the multiple joints exceeds a threshold or when there is a change in the direction of the trajectory of one of the multiple joints. [Claim 11] In response to receiving the input from the user indicating the acceptance of the trajectory, the processor To acquire sensor information associated with the execution of the physical interaction via the aforementioned set of sensors, A non-temporary processor-readable medium according to claim 10, further comprising code for determining whether the execution of the physical interaction satisfies a success criterion based on the sensor information. [Claim 12] The aforementioned processor, In response to a determination that the execution of the physical interaction satisfies the success criteria, a signal is generated indicating that the physical interaction was successful. In response to the determination that the execution of the physical interaction does not meet the success criteria, The process involves modifying the model based on the aforementioned sensor information to generate a modified model, The non-temporary processor-readable medium according to claim 11, further comprising code that causes the modification model to generate a second trajectory of the operating element. [Claim 13] The code that causes the processor to generate the trajectory is, Calculate a conversion function between the set of markers selected by the user and the stored set of markers, For each point, the conversion function is used to convert at least one of the position or orientation of the operating element, For each point, the relative positions of the plurality of joints at the point along the stored trajectory are determined based on the relative positions of the plurality of joints at the point, A non-temporary processor-readable medium according to claim 10, comprising code that causes each point to identify the portion of the trajectory between the point and the next point of the moving operating element, based on a plan of the relative positions of the plurality of joints of that point. [Claim 14] The aforementioned processor, In response to receiving the selection of the set of markers, the system determines whether to move the robot device to the set of markers selected by the user, The non-temporary processor-readable medium according to claim 10, further comprising code that, in response to a decision to move the robot device with respect to the set of markers, moves the robot device using the transport elements of the robot device to adjust the position or orientation of the operating elements and obtains a second visual representation of the environment before generating the trajectory. [Claim 15] The aforementioned processor, When the user moves the operating element through the demonstration of the skill, the change in the operating element is monitored. The robotic device according to claim 1, further configured to select a point associated with a change in direction as a keyframe in response to a change in direction in the operating element. [Claim 16] The system further comprises a transport element that moves around in the aforementioned environment, The robotic device according to claim 1, wherein the processor is configured to execute instructions for moving the robotic device based on the model and / or the set of markers.