A robot satellite assembly method and system based on multi-modal reinforcement learning

By combining multimodal reinforcement learning with visual and force perception systems, a decision network is trained to generate adjustment strategies for the robotic arm, solving the problem of lack of correlation in perception methods for space robots and improving satellite assembly efficiency.

CN118514081BActive Publication Date: 2026-06-19UNIV OF SCI & TECH BEIJING

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF SCI & TECH BEIJING
Filing Date
2024-06-26
Publication Date
2026-06-19

Smart Images

  • Figure CN118514081B_ABST
    Figure CN118514081B_ABST
Patent Text Reader

Abstract

This invention provides a robotic satellite assembly method based on multimodal reinforcement learning, belonging to the field of robotics. The method comprises a multi-degree-of-freedom spatial robotic arm system, a force perception system, a visual perception system, and a reinforcement learning system based on multimodal perception. The visual perception system acquires QR code target information and axis image information of the satellite parts; the force perception system acquires force and torque information at the end effector of the robotic arm; the acquired dual-view visual perception information is transmitted to the reinforcement learning system based on multimodal perception, where multiple decision networks are trained using a multimodal reinforcement learning algorithm to obtain a trained decision network. An admittance control algorithm is then used to output position and attitude adjustment strategies; the multi-degree-of-freedom spatial robotic arm system controls the robotic arm to complete the satellite assembly task. Using this invention can improve the efficiency of space robot satellite assembly.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of robotics, and in particular to a method and system for assembling a robot satellite based on multimodal reinforcement learning. Background Technology

[0002] In the process of assembling satellite components using space robots, the complex and unstructured environment makes it crucial to acquire comprehensive environmental information. The robot's ability to perceive its environment directly impacts the completion of the assembly task. Continuous advancements in sensor technology have equipped robots with a wider variety of sensors, enabling them to acquire information in multiple modalities, including force perception (force / torque information), visual perception (images and text), and auditory perception (auditory information). This multimodal approach mimics human perception. However, for a specific assembly task, relying on only one perception method and a single modal of information to acquire specific environmental information is insufficient to meet the demands of complex assembly tasks.

[0003] Existing technologies for manipulating satellite parts and components by space robots often break down complex tasks into multiple sub-tasks and complete these sub-tasks by acquiring different modal information through different sensors. While this approach allows the robot to acquire environmental information in different modalities within each sub-task, the lack of correlation between the perception methods results in low perception capabilities for the space robot and low efficiency in space robot satellite assembly. Summary of the Invention

[0004] To address the technical problem of low perception capabilities and low efficiency in satellite assembly for space robots due to the lack of correlation between existing perception methods, this invention provides a robot satellite assembly method and system based on multimodal reinforcement learning. The technical solution is as follows:

[0005] On the one hand, a robotic satellite assembly method based on multimodal reinforcement learning is provided. This method is implemented by a robotic satellite assembly device based on multimodal reinforcement learning, and includes:

[0006] S1. The visual perception system acquires the QR code target information and axis image information of the satellite parts through a visual camera;

[0007] S2. The force sensing system acquires force and torque information at the end of the robotic arm through a six-dimensional force sensor;

[0008] S3. The QR code target information, axis image information, and force and torque information at the end of the robotic arm are transmitted to a multimodal perception-based reinforcement learning system. Multiple decision networks in the multimodal perception-based reinforcement learning system are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks. Based on the multiple trained decision networks, an admittance control algorithm is used to adjust the position and posture of the robotic arm to obtain the position adjustment strategy and posture adjustment strategy of the robotic arm.

[0009] S4. Transmit the position adjustment strategy and attitude adjustment strategy of the robotic arm to the multi-degree-of-freedom space robotic arm system to control the robotic arm to complete the satellite assembly task.

[0010] Optionally, the multi-degree-of-freedom spatial robotic arm system is used to solve for the corresponding joint angles based on the motion trajectory or the known end-effector pose;

[0011] The force sensing system is used to detect displacement changes and output real-time force and torque information at the end of the robotic arm.

[0012] The visual perception system is used to perform depth perception on satellite parts and obtain QR code target information and axis image information; wherein, the QR code target information is target-assisted positioning information and the axis image information is deflection angle.

[0013] The reinforcement learning system based on multimodal perception is used to model and generate policies for each assembly subtask using an admittance control algorithm.

[0014] Optionally, the plurality of decision networks include: a hole-finding strategy network, a single-view hole-finding strategy network, a multi-view hole-finding strategy network, and a screwing strategy network.

[0015] Optionally, step S4 trains multiple decision networks in the multimodal perception-based reinforcement learning system using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks, including:

[0016] S41. Perform MDP modeling on the satellite assembly task, and define the motion space, state space and reward function of the robotic arm.

[0017] S42. Based on the robotic arm's motion space, state space, and reward function, a multimodal reinforcement learning algorithm is used to train multiple decision networks to obtain multiple well-trained decision networks.

[0018] Optionally, the satellite assembly task includes: a hole-finding task, a hole-insertion task, and a screwing task.

[0019] On the other hand, a robotic satellite assembly system based on multimodal reinforcement learning is provided. This system is applied to a robotic satellite assembly method based on multimodal reinforcement learning, and includes:

[0020] A visual perception system is used to acquire QR code target information and axis image information of satellite parts through a visual camera;

[0021] Force sensing system, used to acquire force and torque information at the end of a robotic arm through a six-dimensional force sensor;

[0022] A reinforcement learning system based on multimodal perception is used to transmit the QR code target information, axis image information, and force and torque information of the robotic arm end effector to the reinforcement learning system based on multimodal perception. Multiple decision networks in the reinforcement learning system are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks. Based on the multiple trained decision networks, an admittance control algorithm is used to adjust the position and attitude of the robotic arm to obtain the position adjustment strategy and attitude adjustment strategy of the robotic arm.

[0023] A multi-degree-of-freedom space robotic arm system is used to transmit the position adjustment strategy and attitude adjustment strategy of the robotic arm to the multi-degree-of-freedom space robotic arm system to control the robotic arm to complete the satellite assembly task.

[0024] Optionally, the multi-degree-of-freedom spatial robotic arm system is used to solve for the corresponding joint angles based on the motion trajectory or the known end-effector pose;

[0025] The force sensing system is used to detect displacement changes and output real-time force and torque information at the end of the robotic arm.

[0026] The visual perception system is used to perform depth perception on satellite parts and obtain QR code target information and axis image information; wherein, the QR code target information is target-assisted positioning information and the axis image information is deflection angle.

[0027] The reinforcement learning system based on multimodal perception is used to model and generate policies for each assembly subtask using an admittance control algorithm.

[0028] Optionally, the plurality of decision networks include: a hole-finding strategy network, a single-view hole-finding strategy network, a multi-view hole-finding strategy network, and a screwing strategy network.

[0029] Optionally, the step of training multiple decision networks in the multimodal perception-based reinforcement learning system using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks includes:

[0030] MDP modeling is performed for the satellite assembly task to define the motion space, state space, and reward function of the robotic arm;

[0031] Based on the robotic arm's motion space, state space, and reward function, a multimodal reinforcement learning algorithm is used to train multiple decision networks, resulting in multiple well-trained decision networks.

[0032] Optionally, the satellite assembly task includes: a hole-finding task, a hole-insertion task, and a screwing task.

[0033] On the other hand, a robotic satellite assembly device based on multimodal reinforcement learning is provided. The robotic satellite assembly device based on multimodal reinforcement learning includes: a processor; a memory, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, any one of the methods described above for robotic satellite assembly based on multimodal reinforcement learning is implemented.

[0034] On the other hand, a computer-readable storage medium is provided, wherein at least one instruction is stored in the storage medium, the at least one instruction being loaded and executed by a processor to implement any of the above-described methods of the robot satellite assembly method based on multimodal reinforcement learning.

[0035] The beneficial effects of the technical solutions provided by the embodiments of the present invention include at least the following:

[0036] In this embodiment of the invention, firstly, the visual perception system acquires QR code target information and axis image information of satellite parts through a visual camera; secondly, the force perception system acquires force and torque information at the end of the robotic arm through a six-dimensional force sensor; thirdly, the QR code target information, axis image information, and force and torque information at the end of the robotic arm are transmitted to a reinforcement learning system based on multimodal perception. Multiple decision networks in the reinforcement learning system based on multimodal perception are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks; fourthly, based on the multiple trained decision networks, a position adjustment strategy and an attitude adjustment strategy for the robotic arm are obtained; finally, the position adjustment strategy and attitude adjustment strategy for the robotic arm are transmitted to a multi-degree-of-freedom spatial robotic arm system to control the robotic arm to complete the satellite assembly task.

[0037] The embodiments of the present invention can improve the perception ability and intelligence level of space robots. By combining visual and force perception, using admittance control as the underlying controller and multimodal reinforcement learning algorithm to train the decision network, the system outputs position adjustment strategy and attitude adjustment strategy. By adjusting the strategy, the robotic arm is controlled to complete the satellite assembly task. The present invention can improve the work efficiency of space robot satellite assembly. Attached Figure Description

[0038] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0039] Figure 1 This is an architecture diagram of a robot satellite assembly system based on multimodal reinforcement learning provided by an embodiment of the present invention;

[0040] Figure 2 This is a flowchart of a robot satellite assembly method based on multimodal reinforcement learning provided by an embodiment of the present invention;

[0041] Figure 3 This is a block diagram of a robot satellite assembly system based on multimodal reinforcement learning provided in an embodiment of the present invention;

[0042] Figure 4 This is a schematic diagram of the structure of a robotic satellite assembly device based on multimodal reinforcement learning provided in an embodiment of the present invention. Detailed Implementation

[0043] The technical solution of the present invention will now be described with reference to the accompanying drawings.

[0044] In embodiments of the present invention, words such as "exemplarily," "for example," etc., are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" in the present invention should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the word "exemplary" is intended to present the concept in a concrete manner. Furthermore, in embodiments of the present invention, the meaning expressed by "and / or" can be both, or either one.

[0045] In the embodiments of this invention, the terms "image" and "picture" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning. Similarly, the terms "of," "corresponding (relevant)," and "corresponding" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning.

[0046] In this embodiment of the invention, sometimes a subscript such as W1 may be written in a non-subscript form such as W1. When the difference is not emphasized, the meaning they express is the same.

[0047] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.

[0048] This invention provides a robotic satellite assembly method based on multimodal reinforcement learning. This method can be implemented by a robotic satellite assembly device based on multimodal reinforcement learning, which can be a terminal or a server. Figure 1 The illustrated embodiment of the present invention provides an architecture diagram of a robotic satellite assembly system based on multimodal reinforcement learning. In one feasible implementation, a visual perception system identifies and classifies satellite parts, acquiring QR code target information and axis image information of the satellite parts through a visual camera; wherein, the QR code target information is target-assisted positioning information; the axis image information is the deflection angle; a force perception system acquires force and torque information at the end of the robotic arm through a six-dimensional force sensor; the QR code target information and axis image information, along with the force and torque information at the end of the robotic arm, are transmitted to a reinforcement learning system based on multimodal perception, and multiple decision networks in the reinforcement learning system based on multimodal perception are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks; based on the multiple trained decision networks, an admittance control algorithm is used to obtain the position adjustment strategy and attitude adjustment strategy of the robotic arm; the position adjustment strategy and attitude adjustment strategy of the robotic arm are transmitted to a multi-degree-of-freedom spatial robotic arm system to control the robotic arm to complete the satellite assembly task.

[0049] like Figure 2 The flowchart shown is for a robot satellite assembly method based on multimodal reinforcement learning. The processing flow of this method may include the following steps:

[0050] S1. The visual perception system acquires QR code target information and axis image information of satellite parts through a visual camera.

[0051] Optionally, a multi-degree-of-freedom spatial robotic arm system is used to solve for the corresponding joint angles based on the motion trajectory or the known end-effector pose;

[0052] The force sensing system is used to detect changes in displacement and output real-time force and torque information at the end of the robotic arm.

[0053] A visual perception system is used to perform depth perception on satellite parts and obtain QR code target information and axis image information; wherein, the QR code target information is target-assisted positioning information and the axis image information is deflection angle.

[0054] A reinforcement learning system based on multimodal perception is used to model and generate policies for each assembly subtask using admittance control algorithms.

[0055] S2. The force perception system acquires force and torque information at the end of the robotic arm through a six-dimensional force sensor.

[0056] S3. Transmit the QR code target information, axis image information, and force and torque information at the end of the robotic arm to the multimodal perception-based reinforcement learning system. Train multiple decision networks in the multimodal perception-based reinforcement learning system using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks. Based on the multiple trained decision networks, use an admittance control algorithm to adjust the position and attitude of the robotic arm to obtain the position adjustment strategy and attitude adjustment strategy of the robotic arm.

[0057] Optionally, multiple decision networks may be used, including: a hole-finding strategy network, a single-view hole-finding strategy network, a multi-view hole-finding strategy network, and a screwing strategy network.

[0058] S4. Transmit the position adjustment strategy and attitude adjustment strategy of the robotic arm to the multi-degree-of-freedom space robotic arm system to control the robotic arm to complete the satellite assembly task.

[0059] Optionally, S4 trains multiple decision networks in the multimodal perception-based reinforcement learning system using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks, including:

[0060] S41. Perform MDP modeling on the satellite assembly task, and define the motion space, state space and reward function of the robotic arm.

[0061] The robotic arm's motion space is the result of its decision-making learning; the robotic arm selects actions to interact with the environment, and the motion quantity is the output of the neural network; the robotic arm's actions are defined in a continuous space, and the satellite assembly task continuously controls the robotic arm.

[0062] The state space includes the robotic arm's own state information and environmental information. During the robotic arm's policy learning process, the state variables serve as inputs for neural network training. The robotic arm continuously interacts with the environment, collecting state variables to update the decision model. The quality of the state space design directly affects the convergence, convergence speed, and performance of the decision network. Irrelevant state information can disrupt the training process, requiring manual selection of suitable state information. Furthermore, in the different sub-tasks of satellite component assembly, each sub-task has different objectives, necessitating the setting of appropriate state observation values.

[0063] The reward function is the reward that the robotic arm receives after performing an action based on its current state, which can guide the robotic arm to learn.

[0064] S42. Based on the robotic arm's motion space, state space, and reward function, a multimodal reinforcement learning algorithm is used to train multiple decision networks to obtain multiple well-trained decision networks.

[0065] Optionally, satellite assembly tasks include: hole-finding tasks, socketing tasks, and screwing tasks.

[0066] In one feasible implementation method, the specific process of the robotic arm performing satellite assembly tasks includes:

[0067] (1) The robot collects force and torque information through force / torque sensors, and obtains the contact state between the shaft and the hole through gravity compensation memory zero drift compensation data; the robot collects visual feature information in the task space through vision sensors; among them, visual feature information can reflect the deflection state of the hole, which can improve the perception ability and insertion efficiency of the robotic arm.

[0068] (2) When the robotic arm successfully searches for the hole and reaches the top of the hole, the desired force is set; the desired force can ensure that the robotic arm has a stable downward force in the z direction to move into the hole. The force and the force sensor data are combined after gravity compensation and zero drift compensation. The speed of the end of the robotic arm is adjusted by force / torque error to change the contact state between the robotic arm and the outside, and the contact force is returned to the robotic arm; the posture of the axis is adjusted by dual-view visual perception information. The adjustment direction includes: positive direction, zero and negative direction; the robotic arm is controlled to perform the insertion operation by force / torque information and dual-view visual perception information; the state space can be represented by the following formula (1):

[0069]

[0070] in, This represents the contact force experienced by the end effector of the robotic arm in the x-direction; This represents the contact force experienced by the end effector of the robotic arm in the y-direction; Represents the contact force experienced by the end effector of the robotic arm in the z-direction; Indicates in The torque acting in the direction; Indicates in The torque acting in the direction; This indicates the deflection angle of the axis held by the robotic arm about the x-direction; This indicates the deflection angle of the axis held by the robotic arm around the y-direction.

[0071] In one feasible implementation, during the insertion process, the robotic arm applies a desired force of 15N on the axis of the vertical base, and adjusts the rotation angle of the axis about the x-direction and about the y-direction while inserting the insertion; wherein, the action definition can be expressed by the following formula (2):

[0072]

[0073] in, It represents the amount of rotation of the axis around the x-axis at the next moment, and can be positive or negative; It represents the amount of rotation of the axis around the y-axis at the next moment, and can be positive or negative; Indicates the direction of rotation of x; This indicates the direction of rotation of y; the efficiency of the socket can be improved by modifying the range of rotation values.

[0074] (3) After each training session, based on the performance of the robotic arm, output the corresponding reward and penalty, where the reward function can be expressed by the following formula (3):

[0075]

[0076] in, This indicates the reward value for a successful socket connection. This indicates a successful socket connection and a low number of steps, resulting in a reward value. This indicates that the force / torque has exceeded the threshold. This indicates a time penalty during the training process; This represents the penalty value for exceeding the deflection limit. This indicates the penalty value for socket failure.

[0077] (4) Use Solidworks to build satellite and space robot models, import them into the simulation environment, import torque sensors, cameras and end effectors to obtain the simulation environment for satellite assembly tasks; after the simulation environment is built, control the robot arm through the strategy of training by reinforcement learning algorithm.

[0078] In this embodiment of the invention, firstly, the visual perception system acquires QR code target information and axis image information of satellite parts through a visual camera; secondly, the force perception system acquires force and torque information at the end of the robotic arm through a six-dimensional force sensor; thirdly, the QR code target information, axis image information, and force and torque information at the end of the robotic arm are transmitted to a reinforcement learning system based on multimodal perception. Multiple decision networks in the reinforcement learning system based on multimodal perception are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks; fourthly, based on the multiple trained decision networks, a position adjustment strategy and an attitude adjustment strategy for the robotic arm are obtained; finally, the position adjustment strategy and attitude adjustment strategy for the robotic arm are transmitted to a multi-degree-of-freedom spatial robotic arm system to control the robotic arm to complete the satellite assembly task.

[0079] The embodiments of the present invention can improve the perception ability and intelligence level of space robots. By combining visual and force perception, using admittance control as the underlying controller and multimodal reinforcement learning algorithm to train the decision network, the system outputs position adjustment strategy and attitude adjustment strategy. By adjusting the strategy, the robotic arm is controlled to complete the satellite assembly task. The present invention can improve the work efficiency of space robot satellite assembly.

[0080] Figure 3 This is a block diagram illustrating a robotic satellite assembly system based on multimodal reinforcement learning, according to an exemplary embodiment. The system is used in a robotic satellite assembly method based on multimodal reinforcement learning. (Refer to...) Figure 3 The system includes a visual perception system 310, a force perception system 320, a reinforcement learning system based on multimodal perception 330, and a multi-degree-of-freedom spatial robotic arm system 340. Among them:

[0081] The visual perception system 310 is used to acquire QR code target information and axis image information of satellite parts through a visual camera;

[0082] The force sensing system 320 is used to acquire force and torque information at the end of the robotic arm through a six-dimensional force sensor;

[0083] The multimodal perception-based reinforcement learning system 330 is used to transmit the QR code target information, axis image information, and force and torque information of the robotic arm end effector to the multimodal perception-based reinforcement learning system. Multiple decision networks in the multimodal perception-based reinforcement learning system are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks. Based on the multiple trained decision networks, an admittance control algorithm is used to adjust the position and attitude of the robotic arm to obtain the position adjustment strategy and attitude adjustment strategy of the robotic arm.

[0084] A multi-degree-of-freedom space robotic arm system is used to transmit the position adjustment strategy and attitude adjustment strategy of the robotic arm to the multi-degree-of-freedom space robotic arm system to control the robotic arm to complete the satellite assembly task.

[0085] Optionally, the multi-degree-of-freedom spatial robotic arm system is used to solve for the corresponding joint angles based on the motion trajectory or the known end-effector pose;

[0086] The force sensing system is used to detect displacement changes and output real-time force and torque information at the end of the robotic arm.

[0087] The visual perception system is used to perform depth perception on satellite parts and obtain QR code target information and axis image information; wherein, the QR code target information is target-assisted positioning information and the axis image information is deflection angle.

[0088] The reinforcement learning system based on multimodal perception is used to model and generate policies for each assembly subtask using an admittance control algorithm.

[0089] Optionally, the plurality of decision networks include: a hole-finding strategy network, a single-view hole-finding strategy network, a multi-view hole-finding strategy network, and a screwing strategy network.

[0090] Optionally, the step of training multiple decision networks in the multimodal perception-based reinforcement learning system using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks includes:

[0091] MDP modeling is performed for the satellite assembly task to define the motion space, state space, and reward function of the robotic arm;

[0092] Based on the robotic arm's motion space, state space, and reward function, a multimodal reinforcement learning algorithm is used to train multiple decision networks, resulting in multiple well-trained decision networks.

[0093] Optionally, the satellite assembly task includes: a hole-finding task, a hole-insertion task, and a screwing task.

[0094] In this embodiment of the invention, firstly, the visual perception system acquires QR code target information and axis image information of satellite parts through a visual camera; secondly, the force perception system acquires force and torque information at the end of the robotic arm through a six-dimensional force sensor; thirdly, the QR code target information, axis image information, and force and torque information at the end of the robotic arm are transmitted to a reinforcement learning system based on multimodal perception. Multiple decision networks in the reinforcement learning system based on multimodal perception are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks; fourthly, based on the multiple trained decision networks, a position adjustment strategy and an attitude adjustment strategy for the robotic arm are obtained; finally, the position adjustment strategy and attitude adjustment strategy for the robotic arm are transmitted to a multi-degree-of-freedom spatial robotic arm system to control the robotic arm to complete the satellite assembly task.

[0095] The embodiments of the present invention can improve the perception ability and intelligence level of space robots. By combining visual and force perception, using admittance control as the underlying controller and multimodal reinforcement learning algorithm to train the decision network, the system outputs position adjustment strategy and attitude adjustment strategy. By adjusting the strategy, the robotic arm is controlled to complete the satellite assembly task. The present invention can improve the work efficiency of space robot satellite assembly.

[0096] Figure 4 This is a schematic diagram of the structure of a robotic satellite assembly device based on multimodal reinforcement learning provided in an embodiment of the present invention, as shown below. Figure 4 As shown, the robotic satellite assembly equipment based on multimodal reinforcement learning can include the above-mentioned... Figure 3 The illustrated robotic satellite assembly system is based on multimodal reinforcement learning. Optionally, the robotic satellite assembly device 410 based on multimodal reinforcement learning may include a first processor 2001.

[0097] Optionally, the multimodal reinforcement learning-based robotic satellite assembly device 410 may also include a memory 2002 and a transceiver 2003.

[0098] The first processor 2001, memory 2002, and transceiver 2003 can be connected via a communication bus.

[0099] The following is combined Figure 4 A detailed introduction to each component of the multimodal reinforcement learning-based robotic satellite assembly equipment 410 is provided below:

[0100] The first processor 2001 is the control center of the multimodal reinforcement learning-based robotic satellite assembly equipment 410. It can be a single processor or a collective term for multiple processing elements. For example, the first processor 2001 can be one or more central processing units (CPUs), application-specific integrated circuits (ASICs), or one or more integrated circuits configured to implement embodiments of the present invention, such as one or more digital signal processors (DSPs), or one or more field-programmable gate arrays (FPGAs).

[0101] Optionally, the first processor 2001 can perform various functions of the multimodal reinforcement learning-based robotic satellite assembly equipment 410 by running or executing software programs stored in the memory 2002 and calling data stored in the memory 2002.

[0102] In a specific implementation, as one example, the first processor 2001 may include one or more CPUs, for example... Figure 4 CPU0 and CPU1 are shown in the diagram.

[0103] In a specific implementation, as one example, the multimodal reinforcement learning-based robotic satellite assembly equipment 410 may also include multiple processors, for example... Figure 4 The first processor 2001 and the second processor 2004 are shown in the diagram. Each of these processors can be a single-core processor or a multi-core processor. Here, a processor can refer to one or more devices, circuits, and / or processing cores used to process data (such as computer program instructions).

[0104] The memory 2002 is used to store the software program that executes the present invention, and is controlled by the first processor 2001 to execute it. The specific implementation method can be referred to the above method embodiment, and will not be repeated here.

[0105] Optionally, the memory 2002 may be a read-only memory (ROM) or other type of static storage device capable of storing static information and instructions, random access memory (RAM) or other type of dynamic storage device capable of storing information and instructions, or electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but not limited thereto. The memory 2002 may be integrated with the first processor 2001 or may exist independently, and may be connected via the interface circuit of the multimodal reinforcement learning-based robot satellite assembly device 410. Figure 4 (Not shown in the image) is coupled to the first processor 2001, and this embodiment of the invention does not specifically limit this.

[0106] The transceiver 2003 is used to communicate with network devices or with terminal devices.

[0107] Alternatively, transceiver 2003 may include a receiver and a transmitter. Figure 4 (Not shown separately). The receiver is used to implement the receiving function, and the transmitter is used to implement the transmitting function.

[0108] Optionally, the transceiver 2003 can be integrated with the first processor 2001, or it can exist independently and be connected to the interface circuit of the multimodal reinforcement learning-based robotic satellite assembly device 410. Figure 4 (Not shown in the image) is coupled to the first processor 2001, and this embodiment of the invention does not specifically limit this.

[0109] It should be noted that, Figure 4 The structure of the multimodal reinforcement learning-based robotic satellite assembly device 410 shown in the figure does not constitute a limitation on the router. Actual knowledge structure recognition devices may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0110] Furthermore, the technical effects of the robot satellite assembly equipment 410 based on multimodal reinforcement learning can be referred to the technical effects of the robot satellite assembly method based on multimodal reinforcement learning described in the above method embodiments, and will not be repeated here.

[0111] It should be understood that the first processor 2001 in the embodiments of the present invention may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor, etc.

[0112] It should also be understood that the memory in the embodiments of the present invention can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of random access memory (RAM) are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced synchronous DRAM (ESDRAM), synchronous linked DRAM (SLDRAM), and direct rambus RAM (DR RAM).

[0113] The above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable system. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.

[0114] It should be understood that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. A and B can be singular or plural. Additionally, the character " / " in this article generally indicates an "or" relationship between the preceding and following related objects, but it can also represent an "and / or" relationship. Please refer to the context for a more accurate understanding.

[0115] In this invention, "at least one" means one or more, and "more than one" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be a single item or multiple items.

[0116] It should be understood that, in various embodiments of the present invention, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0117] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0118] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the devices, systems, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0119] In the embodiments provided by this invention, it should be understood that the disclosed devices, systems, and methods can be implemented in other ways. For example, the system embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between systems or units may be electrical, mechanical, or other forms.

[0120] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0121] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0122] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0123] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A robot satellite assembly method based on multi-modal reinforcement learning, characterized in that, The robotic satellite assembly method based on multimodal reinforcement learning is implemented by a multi-degree-of-freedom spatial robotic arm system, a force perception system, a visual perception system, and a reinforcement learning system based on multimodal perception; the method includes: S1. The visual perception system acquires the QR code target information and axis image information of the satellite parts through a visual camera; S2. The force sensing system acquires force and torque information at the end of the robotic arm through a six-dimensional force sensor; S3. The QR code target information, axis image information, and force and torque information at the end of the robotic arm are transmitted to a multimodal perception-based reinforcement learning system. Multiple decision networks in the multimodal perception-based reinforcement learning system are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks. Based on the multiple trained decision networks, an admittance control algorithm is used to adjust the position and posture of the robotic arm to obtain the position adjustment strategy and posture adjustment strategy of the robotic arm. S4. Transmit the position adjustment strategy and attitude adjustment strategy of the robotic arm to the multi-degree-of-freedom space robotic arm system to control the robotic arm to complete the satellite assembly task.

2. The multi-modal reinforcement learning based robotic satellite assembly method of claim 1, wherein, The multi-degree-of-freedom spatial robotic arm system is used to solve for the corresponding joint angles based on the motion trajectory or the known end-effector pose. The force sensing system is used to detect displacement changes and output real-time force and torque information at the end of the robotic arm. The visual perception system is used to perform depth perception on satellite parts and obtain QR code target information and axis image information; wherein, the QR code target information is target-assisted positioning information and the axis image information is deflection angle. The reinforcement learning system based on multimodal perception is used to model and generate policies for each assembly subtask using an admittance control algorithm.

3. The multi-modal reinforcement learning based robotic satellite assembly method of claim 1, wherein, The multiple decision networks include: a hole-finding strategy network, a single-view hole-finding strategy network, a multi-view hole-finding strategy network, and a screwing strategy network.

4. The multi-modal reinforcement learning based robotic satellite assembly method of claim 3, wherein, S4 trains multiple decision networks in the multimodal perception-based reinforcement learning system using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks, including: S41. Perform MDP modeling on the satellite assembly task, and define the motion space, state space and reward function of the robotic arm. S42. Based on the robotic arm's motion space, state space, and reward function, a multimodal reinforcement learning algorithm is used to train multiple decision networks to obtain multiple well-trained decision networks.

5. The multi-modal reinforcement learning based robotic satellite assembly method of claim 1, wherein, The satellite assembly task includes: hole-finding task, hole-insertion task, and screwing task.

6. A robot satellite assembly system based on multimodal reinforcement learning, wherein the robot satellite assembly system based on multimodal reinforcement learning is used to implement the robot satellite assembly method based on multimodal reinforcement learning as described in any one of claims 1-5, characterized in that, The system includes: A visual perception system is used to acquire QR code target information and axis image information of satellite parts through a visual camera; Force sensing system, used to acquire force and torque information at the end of a robotic arm through a six-dimensional force sensor; A multimodal perception-based reinforcement learning system is used to transmit the QR code target information, axis image information, and force and torque information of the robotic arm end effector to the multimodal perception-based reinforcement learning system. Multiple decision networks in the multimodal perception-based reinforcement learning system are trained using a multimodal reinforcement learning algorithm to obtain multiple trained decision networks. Based on these multiple trained decision networks, an admittance control algorithm is used to adjust the position and attitude of the robotic arm, obtaining the position adjustment strategy and attitude adjustment strategy for the robotic arm. A multi-degree-of-freedom space robotic arm system is used to transmit the position adjustment strategy and attitude adjustment strategy of the robotic arm to the multi-degree-of-freedom space robotic arm system to control the robotic arm to complete the satellite assembly task.

7. The multimodal reinforcement learning based robotic satellite assembly system of claim 6, wherein, The multi-degree-of-freedom spatial robotic arm system is used to solve for the corresponding joint angles based on the motion trajectory or the known end-effector pose. The force sensing system is used to detect displacement changes and output real-time force and torque information at the end of the robotic arm. The visual perception system is used to perform depth perception on satellite parts and obtain QR code target information and axis image information; wherein, the QR code target information is target-assisted positioning information and the axis image information is deflection angle. The reinforcement learning system based on multimodal perception is used to model and generate policies for each assembly subtask using an admittance control algorithm.

8. The multimodal reinforcement learning based robotic satellite assembly system of claim 6, wherein, The multiple decision networks include: a hole-finding strategy network, a single-view hole-finding strategy network, a multi-view hole-finding strategy network, and a screwing strategy network.

9. A multi-modal reinforcement learning based robotic satellite assembly apparatus, characterized in that, The robotic satellite assembly equipment based on multimodal reinforcement learning includes: processor; A memory storing computer-readable instructions that, when executed by the processor, implement the method as described in any one of claims 1 to 5.

10. A computer readable storage medium, characterized in that, The computer-readable storage medium contains program code that can be invoked by a processor to execute the method as described in any one of claims 1 to 5.