Assembly task-oriented robot grasping and assembly control method and system

By predicting the object's posture and using deep reinforcement learning, combined with Markov decision processes, the robot arm can adjust the object's posture in complex scenarios. This solves the problem that the initial posture of the object does not meet the requirements of downstream tasks in existing technologies, and improves the success rate and efficiency of assembly tasks.

CN117359637BActive Publication Date: 2026-06-26SHANDONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG UNIV
Filing Date
2023-11-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies struggle to adjust the posture of objects by robotic arms in complex scenarios, making them unsuitable for various assembly tasks. In particular, they cannot effectively complete assembly when the initial posture of the object does not meet the requirements of the downstream task.

Method used

By predicting the posture of the object to be grasped, a deep reinforcement learning policy network is used to map it into a sequence of grasping and placing actions. The posture of the object is adjusted by combining a Markov decision process, and an appropriate grasping and assembly strategy is selected. The posture adjustment network is trained using a self-supervised learning method.

Benefits of technology

It enables dynamic adjustment of object posture in complex scenarios, adapts to various assembly tasks, reduces manual annotation costs, and improves algorithm convergence speed and assembly task success rate.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117359637B_ABST
    Figure CN117359637B_ABST
Patent Text Reader

Abstract

The present application belongs to the technical field of mechanical arm assembly control, and provides a mechanical arm grasping and assembly control method and system for assembly tasks, and the technical scheme is: according to the downstream assembly task, the attitude of the object is adjusted, and finally a suitable grasping and assembly strategy is selected. First, the attitude of the object to be grasped is predicted, and the prediction result and the task code jointly constitute state information, which is mapped into a continuous multiple grasping and placing action sequence through a deep reinforcement learning strategy network, so that the robot adjusts the object to an attitude suitable for the corresponding assembly task, and finally a suitable grasping is selected and an assembly action is performed to complete the downstream task.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of robotic arm assembly control technology, and particularly relates to a robotic arm grasping and assembly control method and system for assembly tasks. Background Technology

[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.

[0003] The problems of object grasping, placement, and assembly have been extensively studied in the fields of computer vision and robotics.

[0004] Current research primarily focuses on single-task scenarios. For example, in assembly tasks, researchers typically fix the workpiece to the robot's end effector, assuming the robot can stably grasp the target object in a fixed posture before assembly, and that there will be no relative displacement between the object and the gripper during assembly. This limits their application in real-world scenarios with complex, continuous tasks. Some pioneering work has developed task-oriented grasping or other robotic arm manipulation frameworks, enabling robots to perform sequential tasks, such as grasping and assembling workpieces, or grasping and using tools. However, these methods only allow for limited generalization to similar types of tasks and cannot be applied to more complex real-world scenarios. Summary of the Invention

[0005] To address at least one of the technical problems mentioned above, this invention provides a robotic arm grasping and assembly control method and system for assembly tasks. This system can adjust the posture of an object according to the downstream assembly task and ultimately select a suitable grasping and assembly strategy for grasping and assembly.

[0006] To achieve the above objectives, the present invention adopts the following technical solution:

[0007] The first aspect of the present invention provides a robotic arm grasping and assembly control method for assembly tasks, comprising:

[0008] The pose of the object to be grasped is predicted to obtain a set of candidate grasping point poses;

[0009] The candidate grasping posture set is evaluated according to the assembly task type. If there is a grasping posture with a score higher than the threshold, the assembly action is executed; otherwise, the pose adjustment module is called to adjust the pose of the target object until a grasping posture with a score higher than the threshold is obtained and the assembly action is executed.

[0010] The step of invoking the pose adjustment module to adjust the pose of the target object includes:

[0011] The candidate grasp point pose set and the encoding of the assembly task constitute the state information;

[0012] The posture adjustment task is modeled as a Markov decision process. Through a deep reinforcement learning policy network, the state information is mapped into a sequence of consecutive grasping and placing actions. Based on this action sequence, the robotic arm is controlled to grasp and place the object, adjusting the object to a posture suitable for the corresponding assembly task.

[0013] A second aspect of the present invention provides a robotic arm gripping and assembly control system for assembly tasks, comprising:

[0014] The pose prediction module is used to predict the pose of the object to be grasped and obtain a set of candidate grasping point poses.

[0015] The grasping evaluation and posture adjustment module is used to evaluate the candidate grasping posture set according to the assembly task type. If there is a grasping posture with a score higher than the threshold, the assembly action is executed; otherwise, the pose adjustment module is called to adjust the posture of the target object until a grasping posture with a score higher than the threshold is obtained and the assembly action is executed.

[0016] The step of invoking the pose adjustment module to adjust the pose of the target object includes:

[0017] The candidate grasp point pose set and the encoding of the assembly task constitute the state information;

[0018] The posture adjustment task is modeled as a Markov decision process. Through a deep reinforcement learning policy network, the state information is mapped into a sequence of consecutive grasping and placing actions. Based on this action sequence, the robotic arm is controlled to grasp and place the object, adjusting the object to a posture suitable for the corresponding assembly task.

[0019] A third aspect of the present invention provides a computer-readable storage medium.

[0020] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the robotic arm grasping and assembly control method for assembly tasks as described above.

[0021] A fourth aspect of the present invention provides a computer device.

[0022] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the robotic arm gripping and assembly control method for assembly tasks as described above.

[0023] Compared with the prior art, the beneficial effects of the present invention are:

[0024] (1) This invention proposes a new framework. First, the posture of the object to be grasped is predicted. The prediction result and the task encoding together constitute state information. This information is mapped into a sequence of grasping and placing actions through a deep reinforcement learning policy network. This allows the robot to adjust the object to a posture suitable for performing the corresponding assembly task. Finally, the robot selects the appropriate grasping action and performs the assembly action to complete the downstream task. The posture of the object can be adjusted according to the downstream assembly task to complete the assembly task in the complex scenario. This framework can be used to learn common assembly tasks in home scenarios.

[0025] (2) This invention requires no additional data annotation. It trains the grasping posture evaluation model in a self-supervised manner in a simulation environment, responsible for generating the grasping quality and task relevance scores of the grasping posture; and trains the posture adjustment network based on deep reinforcement learning in the simulation environment. Neither training method requires manual construction and annotation of the dataset, effectively saving labor costs.

[0026] (3) The present invention designs an effective action mask, which effectively shields dangerous robot actions while dynamically compressing the action space of the posture adjustment task, avoiding blind exploration by the robot during the network training stage and accelerating the convergence speed of the algorithm.

[0027] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0028] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.

[0029] Figure 1 This is a scene diagram of a power adapter being plugged in.

[0030] Figure 2 This is a framework diagram of a robotic arm grasping and assembly control method for assembly tasks provided in an embodiment of the present invention;

[0031] Figure 3 This is a robot gripper pose representation established with the object coordinate system as the reference coordinate system, provided by an embodiment of the present invention;

[0032] Figure 4 This is a self-supervised training method for the grasp point evaluation network provided in the embodiments of the present invention;

[0033] Figure 5 This is an adjustment and assembly task process provided in an embodiment of the present invention;

[0034] Figure 6This is the process of mapping the grasping pose to a base coordinate system provided in the embodiments of the present invention;

[0035] Figure 7 This is a schematic diagram of the pre-grabbing posture provided in an embodiment of the present invention;

[0036] Figure 8 This is the pose adjustment framework provided in the embodiments of the present invention;

[0037] Figure 9 These are two assembly task scenarios provided in the embodiments of the present invention. Detailed Implementation

[0038] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0039] It should be noted that the following detailed description is illustrative and intended to provide further explanation of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0040] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of exemplary embodiments according to the invention. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.

[0041] In common assembly scenarios, different assembly tasks impose constraints on the selection of the robotic arm's gripping position and posture. A suitable gripping method requires stability during object clamping and transport while also being suitable for downstream tasks. However, a more complex situation is prevalent in real-world scenarios: none of the current candidate gripping points can meet the requirements of the downstream task. For example, consider a task of inserting a power adapter into a switch socket; when the initial posture of the power adapter is as follows... Figure 1 As shown in (a), without external assistance, the robotic arm cannot achieve the desired result through a single collision-free grasp. Figure 1 The ideal gripping effect is shown in (b). At this point, the robot needs to adjust the pose of the object to be grasped until an ideal gripping state is achieved before it can perform the grasping and subsequent tasks.

[0042] The above task includes the following four key aspects: (1) identifying the pose of the object to be grasped and the corresponding grasping point, (2) adjusting the pose of the target object according to an appropriate strategy, (3) grasping by combining task information and pose observation, and (4) completing the assembly task.

[0043] This invention proposes a novel framework for learning robot operations related to assembly tasks. It adjusts the object's posture based on the downstream assembly task and ultimately selects an appropriate grasping and assembly strategy. First, the posture of the object to be grasped is predicted. The prediction result, together with the task encoding, constitutes state information. This information is then mapped to a sequence of consecutive grasping and placement actions via a deep reinforcement learning policy network. This allows the robot to adjust the object to a suitable posture for the corresponding assembly task. Finally, the appropriate grasping action is selected and executed to complete the downstream task. To reduce the cost of manual annotation, a self-supervised learning method is used for training a portion of the network. Training labels are automatically generated in a simulation environment based on the robot's operation results.

[0044] Example 1

[0045] like Figure 2 As shown, this embodiment provides a robotic arm grasping and assembly control method for assembly tasks, including the following steps:

[0046] Step 1: Predict the pose of the object to be grasped to obtain a set of candidate grasping point poses;

[0047] Step 2: Evaluate the candidate gripping point pose set for the assembly task;

[0048] Step 3: Adjust the object's posture and select an appropriate grasping and assembly strategy;

[0049] Step 4: Perform the grab and assemble operations.

[0050] In step 2, the evaluation of the candidate grasping point pose set specifically includes:

[0051] The evaluation criteria for high-quality gripping points differ at different task stages: when adjusting the posture of the target object through gripping and placement, a gripping point is considered to meet the requirements if stable gripping and transportation can be achieved; however, when performing gripping before assembly, additional physical and semantic constraints specific to the current task must be met, such as the gripping position not obstructing assembly and the gripping force being greater than the static friction generated by interaction with the environment during assembly.

[0052] Therefore, in this embodiment, the grab point evaluation model consists of two parts, which simulate two evaluation criteria respectively.

[0053] For a given object to be grasped, a set of grasping postures G is obtained by uniformly sampling around it, covering the feasible grasping space around the object.

[0054] Where the grab point g∈G is represented as

[0055] g = [x, y, z, q] x ,q y,q z ,q w ]

[0056] This describes the pose representation of a robot gripper established with the object coordinate system {O} as the reference coordinate system, such as... Figure 3 As shown in (a) and (b), x, y, and z describe the position of the origin of the robot end-effector coordinate system in the reference coordinate system {O}, and q x q y q z q w The orientation of the end-effector coordinate system relative to {O} is then described using quaternions.

[0057] For a given target object pose It is a set of gripping points that enable stable gripping of the target object. It is a set of grab points that can accomplish assembly tasks.

[0058] In this embodiment, Q is used. G Evaluate the performance of the gripping point g in achieving stable gripping and transport, when Q G >ε G When, we consider g∈G G Using Q T To evaluate g's ability to perform assembly tasks, when Q T >ε T When, we consider g∈G T Among them, ε G With ε T As a hyperparameter, it determines the robustness of task-oriented crawling.

[0059] In order to predict Q G With Q T Data is collected in a simulation environment. For an object to be grasped with a randomly assigned positioning posture o∈O, the robotic arm repeatedly executes the grasping posture g∈G obtained from N samplings. After each grasp, a lifting and translational motion command is executed sequentially. If the object is transported to the specified height and does not fall off during subsequent movement, the grasp is considered successful. The number of successful grasps n(o,g) is counted. For each successful grasp, an assembly task is simulated in the simulation to verify the task relevance of the grasping point. If the gripper does not obstruct the assembly, the shaft hole contacts in a suitable posture, and no force or torque exceeding the threshold is generated during the assembly process, the assembly is considered successful. The number of successful assembly tasks n(o,g,T) is counted. After collecting sufficient data, P is calculated. G (o,g) = n(o,g) / N, used to measure the ability of a gripping point to achieve a smooth gripping motion, calculate P. T|G(T,o,g) = n(o,g,T) / n(o,g) is used to measure the ability of a grasping point to complete the assembly task T∈{T1,T2}, P T (T,o,g)=P T|G (T,o,g)·P G (o,g) evaluates whether a grab point can both grab smoothly and complete the task.

[0060] Figure 4 A self-supervised training method for the grasp point evaluation network is shown, which uses the collected data to train the grasp posture evaluation model.

[0061] To reduce the difficulty of transferring the network from the simulation environment to reality, the vector consisting of the current pose o of the target object and the grasping point g to be evaluated is input into the network, instead of directly using the visual sensor information as the input to the model.

[0062] In the simulation environment, o can be read directly. In real-world scenarios, o can be obtained through a trained pose recognition network model. In this embodiment, FFB6D is used.

[0063] The quality of steady crawling evaluation network prediction crawling Q G (o,g), and then the result is compared with the discretized capture score P. G (o,g) comparison is used to calculate the softmax cross-entropy loss.

[0064] To alleviate the sparsity of positive samples, the task-grabbing evaluation network is responsible for predicting Q. T|G (T,o,g) instead of Q T (T,o,g), using the discretized task-related capture score P T|G (o,g) as labels, Q T Calculated using equation (1).

[0065] Q T (T,o,g)=Q T|G (T,o,g)·Q G (o,g) (1)

[0066] The final Q obtained T It is used to determine whether an ideal grasping posture exists.

[0067] When Q exists T (T,o,g)>ε T When g is the object in the specified position, it is considered a grasping point capable of completing the assembly task; otherwise, it indicates that task-oriented grasping is not possible under the current object pose. At this point, the attitude adjustment module needs to be invoked to change the state of the target object.

[0068] Figure 5The diagram illustrates the process of a target object's posture adjustment and assembly task. (a), (b), (c), and (d) represent the posture adjustment process using a grasp-place-grab-place sequence, (e) shows the task-related grasping, and (f) demonstrates the assembly task. It can be seen that by flexibly selecting the grasp point and placement action, the target object can achieve different postures. Often, the goal cannot be achieved through a single grasp and place; a continuous grasp-place sequence is more common.

[0069] The attitude adjustment task can be modeled as a Markov decision process (MDP):

[0070] The agent determines the state s based on the current time t. t and its own strategy π(s) t Choose an action a t Then transition to the new state s t+1 and receive corresponding rewards. This forms a sequence of states, actions, and rewards.

[0071] The goal of the agent is to find an optimal policy π. * Increase long-term cumulative reward value Maximize, where γ is the discount factor for future returns.

[0072] In this embodiment, a greedy deterministic policy π(s) is trained using deep Q-learning. t This strategy maximizes the action-value function Q. π (s t ,a t Select the action. Where Q... π (s t ,a t The definition is in state s t Take action a t The expected reward the agent receives. The learning objective is to make Q... π (s t ,a t ) and target value y t Error δ t Minimize, where

[0073] δ t =|Q(s) t ,a t )-y t | (2)

[0074]

[0075] A 3D vector is used to encode different tasks, and together with the position and pose information of the target object, a 10D vector is constructed as the state space S of the MDP, where the state s is at time t. t ∈S is

[0076] s t =[task1,task2,task3,x,y,z,q] x ,q y ,q z ,q w ]

[0077] The task encoding is predefined. Pose information is generated by a trained FFB6D network.

[0078] This embodiment will define each action a t ∈A is defined as a basic motion unit that includes grasping and placing, and the basic motion unit is constructed as follows:

[0079] Grasping: The grasping posture g is sampled from the grasping posture set G. Because the description method is built around the target object, it is not affected by changes in the target object's pose. This fixes the action a that it participates in constructing. t The position within the action space avoids confusion caused by different descriptions.

[0080] Before grasping, the description needs to be mapped to the robot's Cartesian space using equation (4):

[0081]

[0082] in, The pose of the grasping point is described in the form of a homogeneous transformation matrix and is obtained by transforming g. It is a homogeneous transformation matrix that describes the pose of the gripper in the robot's base coordinate system; It is a homogeneous transformation matrix that describes the pose relationship between the camera coordinate system and the robot base coordinate system, obtained through hand-eye calibration; This represents the pose of the object's coordinate system within the visual sensor's coordinate system, obtained by a pose prediction network. The relationships between them are as follows: Figure 6 As shown.

[0083] Once the grasping pose is described in Cartesian space, the robot can perform the grasping action. However, if the trajectory to the grasping point is not constrained, the robot may disturb the target object during its movement.

[0084] Therefore, the grabbing preparatory point g is defined. pre ,like Figure 7As shown, its posture is the same as the grasping posture, and its position is at a safe distance l in the negative direction of the grasping point's Z-axis. Therefore, the homogeneous transformation matrix expressing the pose relationship between the two is:

[0085]

[0086] pass After converting the pose to Cartesian space, the robotic arm is controlled to reach that pose. Then, the robotic arm moves forward a distance l along the Z-axis of the end-effector coordinate system, reaching the gripping point. Subsequently, the closing gripper operation is performed to complete the gripping.

[0087] Placement: The placement action of the robotic arm is predefined as place∈P, P={place vert ,place hori This includes two types of actions: downward admittance movement in a horizontal posture and downward vertical posture, as shown below. Figure 5 As shown in (b) and (d) in the middle.

[0088] It is worth noting that the trajectory of the placement action is described directly in the robot's Cartesian coordinate system, without the need for coordinate transformation, which is different from the way the grasping posture is described.

[0089] The action space is defined as A = G × P, and action a t ∈A is:

[0090] a=(g,place)|g∈G,place∈P (6)

[0091] In fact, the actual set of actions available in a given state is The set of available crawl points Filtering can be performed using a stable capture evaluation network.

[0092] In reinforcement learning, unavailable actions are avoided from being selected by reducing their value. Specific methods include... Figure 8 As shown, when the stable grasp evaluation network determines that the grasp pose g cannot achieve stable grasping, the action (g, place) is... vert ) and (g,place hori ) is considered unavailable.

[0093] For different states s t An action mask M(s) is constructed based on the output of the stationary capture evaluation network. t ,a t The agent will M(s) t ,a t )×Q(s t ,a t The value of Q(s) is used as the basis for selecting the action, instead of simply relying on the value of Q(s).t ,a t This method ensures that the selected actions can be performed smoothly in grasping and placing, reducing the time spent on trial and error during training, and has proven feasible in our experiments.

[0094] During execution, the robot receives discrete motion values, decodes them into two parts: the grasping posture g and the placement action place, and executes the grasping and placement commands sequentially.

[0095] After each grasp-placement action, the pose of the target object changes, and the grasp evaluation module calculates the Q of each grasp point in this state. T The agent is rewarded based on the presence of high-value task-oriented grasp points. The reward function guides the agent to complete the pose adjustment task in the fewest possible steps, defined as:

[0096]

[0097] Among them, step max This is the maximum number of steps that can be taken in a posture adjustment task. It is the number of steps required to get the target object to the ideal posture, and λ is the scaling factor.

[0098] Use a fully connected layer to fit the strategy π(s) t The network maps the state space to the action space. It predicts the Q-value of each basic action unit. The agent employs an ε-greedy policy to select actions, aiming to balance exploration and exploitation.

[0099] In summary, the working framework of the pose adjustment module is as follows: Figure 8 As shown, during training, a target object appears in the workspace with a random pose and is randomly assigned an assembly task type. Guided by the motion value prediction network, the robot adjusts the target object's pose. Each adjusted pose is evaluated by the grasping evaluation module, which generates a corresponding reward. When the object reaches the ideal pose or the operation reaches its step limit, the object is placed back into the workspace with a random pose and a new task type assigned. In this process, the smooth grasping evaluation network optimizes the motion space.

[0100] In step 4, the assembly task can be summarized into four stages: posture matching, approach, hole finding, and hole insertion.

[0101] For a given task scenario, the pose that the target object needs to maintain during the assembly process is... It is unique. Once a stable grasp of an object is achieved, the relative posture between the robotic arm's end effector and the target object can be easily obtained from the grasp point information. pass Calculate how to make the target object reach The required end effector posture of the robotic arm is needed to complete the posture matching task.

[0102] During the approach process, after the robot obtains the approximate assembly position (which can be acquired through vision sensors or teaching), it moves the target object above the assembly position using position control. The orientation of the robotic arm's end effector coordinate system during the movement is as follows: Due to positioning errors or limitations in robot position control accuracy, axis-hole alignment cannot be achieved during the approach process. In this case, force control guides the robot to move downwards along the Z-axis. After the assembly object and the object to be assembled come into contact, the desired contact force is maintained. After contact, the target object moves along a predetermined trajectory to locate the hole; the trajectory is an Archimedean spiral.

[0103] After the object to be assembled is moved to the assembly position, it will enter the insertion process under the force control along the assembly axis. During this stage, the robot uses admittance control in other directions. When the sensor detects a force or torque on the end effector, it will guide the end effector to rotate accordingly to prevent jamming during the insertion process.

[0104] To verify the technical solution of this invention, simulation and verification were performed.

[0105] First, the experimental scenario was set up. This invention was tested in the PyBullet simulation environment. The goal was to control a robotic arm to complete the assembly task of a target object. The experimental scenario is shown below. Figure 9 Two downstream assembly tasks were set up: (a) controlling the power adapter to insert the plug into the junction box, and (b) controlling the power adapter to place the base into the groove with the plug facing upwards. Initially, the power adapter was placed on the worktable in a random pose, similar to a typical everyday scenario. Each task involved two scenarios: if a suitable gripping point existed in the initial state, the robotic arm performed the gripping and assembly; if no suitable gripping point existed in the initial state, the robotic arm adjusted the pose of the target object through a continuous sequence of gripping and placing actions until an ideal gripping point was generated, and then performed the gripping and assembly. Each gripping, placing, or assembling action of the robotic arm was considered one operation, and completing the assembly within 20 operations was considered a successful task.

[0106] Then, the experimental results were analyzed. In order to verify the role of the posture adjustment strategy in task-related grasping, it was compared with the following two benchmarks: (1) Task-related grasping framework without posture adjustment process: After the task-related grasping evaluation network scores the candidate grasping poses, it directly executes the grasping pose with the highest score. (2) Task-related grasping framework with posture adjustment achieved by random grasping and placement: The posture adjustment action is not generated by a dedicated policy network, but is randomly selected from the ideal grasping pose and placement action after being filtered by the stable grasping evaluation network. (3) Task-related grasping framework with posture adjustment policy network but no action space mask.

[0107] The experimental results are shown in Table 1. It can be seen that for task-related grasping in such complex scenarios, the framework of this invention outperforms the benchmarks of all settings. During testing, it was found that due to noise, situations such as the target object falling out of the gripper may occur during the robotic arm's movement. If the object does not fall outside the workspace, the agent will still generate reasonable actions based on the new pose, and the impact on the final result is only reflected in the increase in the number of task operations completed.

[0108] Table 1 Test Results

[0109]

[0110]

[0111] Other benchmark algorithms fail to effectively complete the task due to the lack of necessary components. Only when the target object is randomly placed in a suitable initial pose can the (1) framework, lacking a pose adjustment module, complete the task through a single grasping and assembling operation, thus lacking versatility. The (2) framework, which uses random grasping and placement for adjustment, suffers mostly from failures due to exceeding the maximum number of operations. It is foreseeable that increasing the maximum number of operation steps would further improve the success rate, but its practical significance is limited due to the long time required. Notably, the (3) scheme, which does not optimize the action space, is only close to the (2) scheme in performance. This is attributed to the scarcity of high-value actions, making it difficult to explore sparse positive rewards during training. Choosing a higher-performing deep reinforcement learning algorithm or designing a clever reward function might help solve this problem, but this is not the focus of this study.

[0112] Example 2

[0113] This embodiment provides a robotic arm grasping and assembly control system for assembly tasks, including:

[0114] The pose prediction module is used to predict the pose of the object to be grasped and obtain a set of candidate grasping point poses.

[0115] The grasping evaluation and posture adjustment module is used to evaluate the candidate grasping posture set according to the assembly task type. If there is a grasping posture with a score higher than the threshold, the assembly action is executed; otherwise, the pose adjustment module is called to adjust the posture of the target object until a grasping posture with a score higher than the threshold is obtained and the assembly action is executed.

[0116] The step of invoking the pose adjustment module to adjust the pose of the target object includes:

[0117] The candidate grasp point pose set and the encoding of the assembly task constitute the state information;

[0118] The posture adjustment task is modeled as a Markov decision process. Through a deep reinforcement learning policy network, the state information is mapped into a sequence of consecutive grasping and placing actions. Based on this action sequence, the robotic arm is controlled to grasp and place the object, adjusting the object to a posture suitable for the corresponding assembly task.

[0119] Example 3

[0120] This embodiment provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps in the robotic arm grasping and assembly control method for assembly tasks as described above.

[0121] Example 4

[0122] This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps in the robotic arm grasping and assembly control method for assembly tasks as described above.

[0123] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of hardware embodiments, software embodiments, or embodiments combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0124] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0125] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0126] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0127] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0128] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A robotic arm grasping and assembly control method for assembly tasks, characterized in that, include: The pose of the object to be grasped is predicted to obtain a set of candidate grasping point poses; The candidate grasping pose set is evaluated using a grasping point evaluation model, including: The evaluation criteria for grasp points differ at different task stages. In the stage of adjusting the posture of the target object through grasping and placement, a steady grasp evaluation network is used to predict the grasp quality. Then, the grasp quality result is compared with the discretized grasp score to calculate the softmax cross-entropy loss. When performing pre-assembly grabbing, additional physical and semantic constraints specific to the current task are satisfied. The task grabbing evaluation network is used to predict the ability to complete the assembly task, and the discrete task-related grabbing score is used as the label. The training process of the grasp point evaluation model includes: For a randomly given positioning pose The object to be grasped causes the robotic arm to repeat the operation. The grasping posture obtained from the second sampling After each grab, a lifting and translation motion command is executed sequentially. If the object is transported to the designated height and does not fall off during subsequent movement, the grab is considered successful, and the number of successful grabs is counted. ; For each successful gripping, an assembly task is simulated in the simulation to verify the task relevance of that gripping point. If the grippers do not obstruct the assembly, the shaft and hole contact in a suitable orientation, and no force or torque exceeding a threshold is generated during the assembly process, then the assembly is considered successful. The number of successful assembly tasks is counted. ; calculate It is used to measure the ability of a gripping point to achieve smooth gripping, and is calculated. The assembly task is completed by measuring the gripping points. The ability to adopt Evaluate whether a grab point can both grab smoothly and complete the task; The candidate grasping posture set is evaluated according to the assembly task type. If there is a grasping posture with a score higher than the threshold, the assembly action is executed; otherwise, the pose adjustment module is called to adjust the pose of the target object until a grasping posture with a score higher than the threshold is obtained and the assembly action is executed. The step of invoking the pose adjustment module to adjust the pose of the target object includes: The candidate grasp point pose set and the encoding of the assembly task constitute the state information; For different states An action mask is constructed based on the output of the steady-state capture evaluation network. The intelligent agent will The value is used as the basis for selecting the action; The posture adjustment task is modeled as a Markov decision process. Through a deep reinforcement learning policy network, the state information is mapped into a sequence of grasping and placing actions. Based on this action sequence, the robotic arm is controlled to grasp and place the object, adjusting the object to a posture that adapts to the corresponding assembly task. For training a portion of the network, a self-supervised learning method is used, and training labels are automatically generated in a simulation environment based on the results of robot operations.

2. The robotic arm grasping and assembly control method for assembly tasks as described in claim 1, characterized in that, During training, a target object appears in the workspace with a random pose, and an assembly task type is randomly assigned. Guided by the motion value prediction network, the robot adjusts the pose of the target object. The pose after each adjustment is judged by the grasping evaluation module and a corresponding reward is generated.

3. The robotic arm grasping and assembly control method for assembly tasks as described in claim 1, characterized in that, The process of controlling the robotic arm to grasp and place objects based on this action sequence includes: Before grasping, the grasping pose description is mapped to the robot's Cartesian space; After obtaining the description of the grasping pose in Cartesian space, the grasping action is performed. Define a grasping preparation point, and obtain the homogeneous transformation matrix of the pose relationship between the two based on the spatial relationship between the grasping preparation point and the grasping pose. Based on the homogeneous transformation matrix of the pose relationship between the two, the pose is transformed into Cartesian space, the robotic arm is controlled to reach the pose, and then the robotic arm moves along the Z-axis of the end-effector coordinate system to reach the gripping point. The predefined robotic arm placement action includes two types of actions: horizontal and vertical downward admittance motion. The trajectory of the placement action is described in the robot's Cartesian coordinate system.

4. The robotic arm grasping and assembly control method for assembly tasks as described in claim 1, characterized in that, During execution, discrete action values ​​are received and decoded into two parts: grasping posture and placement action. Grasping and placement commands are executed sequentially. After each grasping-placement action, the pose of the target object changes. The grasping evaluation module calculates the grasping score of each grasping point in this state and rewards the agent based on whether there are high-value task-oriented grasping points.

5. The robotic arm grasping and assembly control method for assembly tasks as described in claim 1, characterized in that, The assembly task execution process includes four stages: posture matching, approach, hole finding, and insertion.

6. A robotic arm grasping and assembly control system for assembly tasks, employing the robotic arm grasping and assembly control method for assembly tasks as described in any one of claims 1-5, characterized in that, include: The pose prediction module is used to predict the pose of the object to be grasped and obtain a set of candidate grasping point poses. The grasping evaluation and posture adjustment module is used to evaluate the candidate grasping posture set according to the assembly task type. If there is a grasping posture with a score higher than the threshold, the assembly action is executed; otherwise, the pose adjustment module is called to adjust the posture of the target object until a grasping posture with a score higher than the threshold is obtained and the assembly action is executed. The step of invoking the pose adjustment module to adjust the pose of the target object includes: The candidate grasp point pose set and the encoding of the assembly task constitute the state information; The posture adjustment task is modeled as a Markov decision process. Through a deep reinforcement learning policy network, the state information is mapped into a sequence of consecutive grasping and placing actions. Based on this action sequence, the robotic arm is controlled to grasp and place the object, adjusting the object to a posture suitable for the corresponding assembly task.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps in the robotic arm grasping and assembly control method for assembly tasks as described in any one of claims 1-5.

8. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps in the robotic arm grasping and assembly control method for assembly tasks as described in any one of claims 1-5.