Autonomous Operation Decision-Making Methods for Harvesting Robots
By constructing fruit and branch models in a virtual scene, determining the target picking point and plane, and using a reward function for reinforcement learning, the problem of the picking robot rubbing against the fruit and branches during its operation was solved, thus achieving efficient picking operations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INTELLIGENT EQUIPMENT RESEARCH CENTER BEIJING ACADEMY OF AGRICULTURE AND FORESTRY SCIENCES
- Filing Date
- 2023-09-07
- Publication Date
- 2026-06-30
AI Technical Summary
Existing harvesting robots are prone to rubbing against fruits and branches during operation, leading to harvesting failures.
By constructing a virtual scene, collecting sample images of fruits and branches, determining the target picking point and plane of the end effector, using a reward function for reinforcement learning training, determining the optimal picking action function, and executing the picking task in a real environment.
This improved the success rate of harvesting, avoided damage to fruits and branches, and reduced the cost and risk of trial and error in experiments.
Smart Images

Figure CN117621046B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robotic arm technology, and in particular to a method for autonomous operation decision-making of a harvesting robotic arm. Background Technology
[0002] Developing a fully automated harvesting robot to replace or assist manual harvesting of fruits is an effective way to reduce production costs and improve industrial efficiency.
[0003] Currently, harvesting robots primarily target fruits supported by stems, whose growth is highly random, and whose shapes vary greatly from fruit to stem. This necessitates precise manipulation of plant tissues by robots in a highly unstructured environment. However, due to the complex shape of fruit and stems, robots are prone to rubbing against and damaging fruits and branches during harvesting, which can easily lead to harvesting failures. Summary of the Invention
[0004] This invention provides an autonomous operation decision-making method for a harvesting robot, which solves the problem in the prior art that the robot easily rubs against the fruit and branches during operation, causing damage to the fruit and branches and resulting in harvesting failure.
[0005] This invention provides a method for autonomous operation decision-making of a harvesting robot, wherein the harvesting robot is equipped with an end effector for picking fruits from branches;
[0006] The autonomous operation decision-making method for the harvesting robot includes:
[0007] Collect sample images of fruits and branches to construct multiple virtual scenes; each virtual scene includes a model of a harvesting robot, a model of fruits, and a model of branches.
[0008] In the virtual scene, the target picking point and target picking plane of the end effector model are determined, and the orientation information of the target picking point and target picking plane is used as parameters input into the reward function;
[0009] The optimal picking action function is determined by performing reinforcement learning training on the picking action process of the picking robot in multiple virtual scenarios based on the reward function.
[0010] Based on the optimal picking action function, the picking robot is controlled to perform picking tasks in the actual environment.
[0011] According to the autonomous operation decision-making method for a harvesting robot provided by the present invention, the step of performing reinforcement learning training on the harvesting action flow of the harvesting robot in multiple virtual scenarios based on a reward function to determine the optimal harvesting action function includes:
[0012] The picking process of the robotic arm was repeatedly simulated in multiple virtual scenarios;
[0013] During the simulation, the motion information of the harvesting robot model is acquired, and the motion information is input into the reward function to calculate the total reward value;
[0014] Based on the total reward value, determine the optimal picking action process for each virtual scene;
[0015] Based on the virtual scene and the corresponding optimal picking action process, determine the optimal picking action function.
[0016] According to the autonomous operation decision-making method for the harvesting robot provided by the present invention, the motion information includes the position and orientation of the end effector model, whether the harvesting robot model collides with other models, and the motion parameters of each joint of the harvesting robot model;
[0017] The step of acquiring motion information of the harvesting robot model during the simulation and inputting the motion information into the reward function to calculate the total reward value includes:
[0018] Obtain the position of the end effector model, and calculate the proximity reward based on the position of the end effector model and the position of the target picking point;
[0019] Obtain the orientation of the end effector model, and calculate the picking posture reward based on the target picking plane and the orientation and position of the end effector model;
[0020] Obtain the motion parameters of each joint of the harvesting robot model and calculate the smooth trajectory reward;
[0021] Calculate obstacle avoidance reward based on whether the harvesting robot model collides with other models;
[0022] The total reward value is calculated based on the reward for approaching the target, the reward for picking posture, the reward for smooth trajectory, and the reward for obstacle avoidance.
[0023] According to the autonomous operation decision-making method for a harvesting robot provided by the present invention, the step of calculating the proximity reward based on the position of the end effector model and the position of the target harvesting point includes:
[0024] Calculate the first distance between the end effector model and the target picking point based on the position of the end effector model and the position of the target picking point;
[0025] When the first distance is greater than the first preset distance, the reward for getting closer to the target is calculated based on the first preset distance and the first distance.
[0026] When the first distance is less than or equal to the first preset distance, the reward for approaching the target is calculated based on the first distance.
[0027] According to the autonomous operation decision-making method for a harvesting robot provided by the present invention, the step of calculating the harvesting posture reward based on the orientation and position of the target harvesting plane and the end effector model includes:
[0028] Set up a target picking area that surrounds the target picking point;
[0029] When the end effector model is outside the target picking area, the picking posture reward is equal to the first negative constant.
[0030] When the end effector model is located within the target picking area, the picking posture reward is calculated based on the orientation of the end effector model and the orientation of the target picking plane.
[0031] According to the autonomous operation decision-making method for a harvesting robot provided by the present invention, the step of calculating obstacle avoidance reward based on whether the harvesting robot model collides with other models includes:
[0032] When the harvesting robot model collides with other models, the obstacle avoidance reward function is equal to the second negative constant.
[0033] When the harvesting robot model does not collide with other models, the obstacle avoidance reward function is zero.
[0034] According to the autonomous operation decision-making method for the harvesting robot provided by the present invention, the target harvesting area is located on the side of the branch model facing the harvesting robot model.
[0035] According to the autonomous operation decision-making method for harvesting robots provided by the present invention, the branch includes a fruit stalk and a main stem, and the two ends of the fruit stalk are respectively connected to the fruit and the main stem;
[0036] The branch model includes a first column and a second column; the first column has the same size and extension direction as the main stem and is used to simulate the main stem; the second column has the same size and extension direction as the fruit stalk and is used to simulate the fruit stalk.
[0037] The fruit model includes a third column, the size and extension direction of which are the same as those of the fruit, and is used to simulate the fruit.
[0038] According to the autonomous operation decision-making method for the harvesting robot provided by the present invention, the target harvesting point is on the second column; the target harvesting plane passes through the target harvesting point, the target harvesting plane is perpendicular to the plane where the branch model is located, and is parallel to the first column.
[0039] According to the autonomous operation decision-making method for harvesting robots provided by the present invention, the step of controlling the harvesting robot to perform harvesting tasks in a real environment based on the optimal harvesting action function includes:
[0040] Based on the images of the fruit and branches, determine the input parameters of the optimal picking action function;
[0041] Input the input parameters into the optimal action function to obtain the optimal picking action process;
[0042] Based on the optimal harvesting procedure, the harvesting robot is controlled to complete the harvesting.
[0043] The autonomous operation decision-making method for the harvesting robot of this invention constructs a virtual scene including a fruit model, a branch model, and a harvesting robot model based on images of the fruit and branches. This simulates the actual harvesting scenario, making it more flexible and efficient, and reducing the cost and risk of trial and error in actual harvesting scenarios. By determining the target harvesting point and target harvesting plane of the end effector model and inputting them into a reward function, the reward function can incentivize and guide the end effector model to reach the target harvesting point with the desired harvesting posture during reinforcement learning training. Finally, the harvesting robot's harvesting action process is trained using reinforcement learning based on the reward function to determine the optimal harvesting action function. The optimal harvesting action function can provide guidance and reference for the actual harvesting action of the harvesting robot, enabling it to reach the desired harvesting position with the desired harvesting posture and complete the harvesting, avoiding collision damage to the fruit and branches, thereby improving the harvesting success rate. This solves the problem in the prior art where robots easily rub against and damage the fruit and branches during operation, leading to harvesting failure. Attached Figure Description
[0044] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0045] Figure 1 This is one of the flowcharts of the autonomous operation decision-making method for the harvesting robot provided in the embodiments of the present invention;
[0046] Figure 2 This is a schematic diagram of a virtual scene provided in an embodiment of the present invention;
[0047] Figure 3 This is a schematic diagram of the fruit model and branch model provided in the embodiments of the present invention;
[0048] Figure 4 This is the second flowchart of the autonomous operation decision-making method for the harvesting robot provided in the embodiments of the present invention;
[0049] Figure 5 This is a schematic diagram of the autonomous operation decision-making system for the harvesting robot provided in an embodiment of the present invention;
[0050] Figure 6 This is a schematic diagram of the autonomous operation decision-making system for the harvesting robot provided in an embodiment of the present invention;
[0051] Figure label:
[0052] 1. Fruit model; 2. Branch model; 3. Harvesting robot model; 4. Target harvesting point; 5. Target harvesting plane;
[0053] 11. Third column; 21. First column; 22. Second column; 31. End effector model;
[0054] 510. Scene Construction Module; 520. First Determination Module; 530. Reinforcement Learning Module; 540. Execution Module;
[0055] 610. Processor; 620. Communication interface; 630. Memory; 640. Communication bus. Detailed Implementation
[0056] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0057] The following is combined with Figure 1-4 This invention describes the autonomous operation decision-making method for a harvesting robot provided by the present invention.
[0058] like Figures 1 to 3 As shown, harvesting robots are typically equipped with end effectors to pick fruits off branches.
[0059] The autonomous operation decision-making method for harvesting robots provided by this invention includes the following steps:
[0060] Step S101: Collect sample images of fruits and branches to construct multiple virtual scenes; each virtual scene includes a picking robot model, a fruit model, and a branch model.
[0061] Step S102: In the virtual scene, determine the target picking point and target picking plane of the end effector model, and input the orientation information of the target picking point and target picking plane as parameters into the reward function.
[0062] Step S103: Based on the reward function, perform reinforcement learning training on the picking action process of the picking robot in multiple virtual scenarios to determine the optimal picking action function.
[0063] Step S104: Based on the optimal picking action function, control the picking robot to perform the picking task in the actual environment.
[0064] In this embodiment, the autonomous operation decision-making method for the harvesting robot of the present invention is typically used to make decisions on the harvesting action flow of the harvesting robot before harvesting fruit. First, a sample of crop plants to be harvested is selected, and the fruit and branches of the sample are photographed using devices such as visual sensors and cameras. Based on the sample images of the fruit and branches, corresponding fruit model 1 and branch model 2 are constructed. According to the relative positions of the harvesting robot, fruit, and branches, fruit model 1, branch model 2, and harvesting robot model 3 are set in a virtual scene to simulate the actual harvesting scene, so as to carry out reinforcement learning training on the harvesting action flow. Specifically, a large number of crop plant samples can be selected to construct fruit model 1 and branch model 2 of different shapes and positions, thereby constructing different virtual scenes to simulate different actual harvesting scenarios; or, by directly adjusting the position, angle, etc. of the fruit model 1 and branch model 2 in the virtual scene, different virtual scenes can be constructed to simulate different actual harvesting scenarios.
[0065] After constructing the virtual scene, the target picking point 4 of the end effector model 31 is determined on the branch model 2. The target picking point 4 is the expected contact point between the end effector model 31 and the branch model 2 during simulated picking. The target picking point 4 is usually set on the branch of the branch model 2 that connects to the fruit model 1, so that the end effector model 31 can cut off the branch model 2 from the target picking point 4 and pick the fruit model 1. At the same time, there is also a certain distance between the target picking point 4 and the fruit model 1 to avoid the picking robot model 3 colliding with the fruit model 1 during the operation. After determining the target picking point 4, the target picking plane 5 is determined based on the target picking point 4. The target picking plane 5 is the plane where the end effector model 31 is in the expected picking posture when it reaches the target picking point 4. The target picking plane 5 usually avoids the fruit model 1 and the branch model 2 so that the end effector model 31 will not collide with the fruit model 1 and the branch model 2 when it moves within the target picking plane 5.
[0066] After determining the target picking point 4 and the target picking plane 5, the position parameters (such as spatial coordinates) of the target picking point 4 and the direction and position parameters (such as spatial coordinates and normal vectors) of the target picking plane 5 are input into the reward function. By reasonably designing the reward function, the reward function can motivate the end effector model 31 of the picking robot model 3 to be located in the target picking plane 5 when it reaches the target picking point 4 for picking, so that the end effector model 31 can reach the target picking point 4 in the desired picking posture.
[0067] Next, the picking action process of the picking robot is simulated in multiple virtual scenarios using the picking robot model 3. Reinforcement learning training is then conducted under the incentive guidance of the reward function on the simulated picking action process to determine the optimal action trajectory. This enables the picking robot model 3 to reach the target picking point 4 with the desired picking posture, and to avoid colliding with the branch model 2 and the fruit model 1 during the action. After repeated training in multiple virtual scenarios, the optimal action trajectory and the mapping relationship between various parameters in the virtual scenario are constructed, that is, the optimal picking action function is determined.
[0068] Finally, when harvesting in a real-world environment, the optimal harvesting trajectory can be obtained by inputting the corresponding parameters of the real-world environment into the optimal harvesting action function. Based on the optimal harvesting trajectory, the harvesting robot can be controlled to harvest, allowing it to reach the harvesting position in the desired posture without colliding with or damaging branches and fruits during the process.
[0069] The autonomous operation decision-making method for the harvesting robot of the present invention constructs a virtual scene including a fruit model 1, a branch model 2, and a harvesting robot model 3 based on images of the fruit and branches to simulate the actual harvesting scene. This method is more flexible and efficient, reducing the cost and risk of experimental trial and error in the actual harvesting scene. By determining the target harvesting point 4 and the target harvesting plane 5 of the end effector model 31 and inputting the target harvesting point 4 and the target harvesting plane 5 into the reward function, the reward function can incentivize and guide the end effector model 31 to adopt the desired harvesting posture during reinforcement learning training. The robot reaches the target picking point 4. Next, based on the reward function, the picking motion process of the robotic arm is trained through reinforcement learning. Finally, the optimal picking motion function is determined. This optimal function provides guidance and reference for the actual picking motion of the robotic arm, enabling it to reach the desired picking position with the expected picking posture, avoiding collisions that damage the fruit and branches. This improves the picking success rate and solves the problem in existing technologies where robots easily rub against and damage the fruit and branches during their movements, leading to picking failures.
[0070] Specifically, in reinforcement learning of the harvesting action flow of the harvesting robot, the harvesting robot model 3 with end effector model 31 is typically treated as a whole, and its motion and posture are controlled by the HER-SAC policy algorithm of deep reinforcement learning. The entire harvesting action flow of the harvesting robot model 3 is planned without the need to plan intermediate action points. The HER-SAC algorithm uses experience replay and a designed reward function to accelerate the learning process, enabling the end effector model 31 of the harvesting robot model 3 to reach the target harvesting point 4 with the desired harvesting posture, while avoiding obstacles through smooth movements.
[0071] Specifically, in some embodiments, such as Figure 4 As shown, step S104: Based on the reward function, perform reinforcement learning on the picking action flow of the picking robot in the virtual scene to determine the optimal picking action function, including:
[0072] Step S1041: Repeatedly simulate the picking action process of the picking robot in multiple virtual scenarios.
[0073] Step S1042: Acquire the motion information of the harvesting robot model during the simulation process, and input the motion information into the reward function to calculate the total reward value.
[0074] Step S1043: Determine the optimal picking action process for each virtual scene based on the total reward value.
[0075] Step S1044: Determine the optimal picking action function based on the virtual scene and the corresponding optimal picking action flow.
[0076] In this embodiment, the position parameters of the target picking point 4 and the direction and position parameters of the target picking plane 5 are input into the reward function. The picking action process of the picking robot model 3 is repeatedly simulated and tested in multiple virtual scenarios. Simultaneously, the action information of the picking robot model 3 is acquired during its actions, and the corresponding total reward value is calculated based on this information. Specifically, a single picking action process can be divided into several time periods, and the action information of the picking robot model 3 in the final state of each time period can be acquired to calculate the total reward value of the picking robot model 3's action process for that time period, thereby incentivizing and optimizing the picking action process for each time period; alternatively, the total reward value of the entire picking action process can be calculated to incentivize and guide the entire picking action process. Through repeated simulations for reinforcement learning training, the optimal picking process in the current virtual scenario is finally determined. Finally, after completing reinforcement learning training in different virtual scenarios, a mapping relationship between the virtual scenarios and the optimal picking process can be constructed, thereby obtaining the optimal picking action function. Specifically, the relationship between relevant parameters of the virtual scene (such as the position and shape parameters of the real model 1 and branch model 2) and the optimal picking process can be constructed so that the optimal picking action function can be applied in the actual scene to obtain the optimal picking process.
[0077] Specifically, in some embodiments, the motion information includes the position and orientation of the end effector model 31 of the picking robot model 3, whether the picking robot model 3 collides with other models, and the motion parameters of each joint of the picking robot model 3.
[0078] Step S1042: The step of acquiring the motion parameters of the harvesting robot model during the simulation and inputting the motion parameters into the reward function to calculate the total reward value includes:
[0079] Obtain the location of the end effector model, and calculate the proximity reward based on the location of the end effector model and the location of the target picking point.
[0080] Obtain the orientation of the end effector model, and calculate the picking posture reward based on the target picking plane and the orientation and position of the end effector model.
[0081] Obtain the motion parameters of each joint of the harvesting robot model and calculate the smooth trajectory reward.
[0082] The obstacle avoidance reward is calculated based on whether the harvesting robot model collides with other models.
[0083] The total reward value is calculated based on the reward for approaching the target, the reward for picking posture, the reward for smooth trajectory, and the reward for obstacle avoidance.
[0084] In this embodiment, the reward function includes four parts: proximity reward, picking posture reward, smooth trajectory reward, and obstacle avoidance reward. The reward for proximity to the target is calculated based on the position of the end effector model 31 and the position of the target picking point 4. It is usually used to guide and motivate the picking robot model 3 to move the end effector model 31 to the target picking point 4 during the operation to complete the picking. The reward for picking posture is calculated based on the target picking plane 5 and the direction and position of the end effector model 31. It is used to guide and motivate the picking robot model 3 to move the end effector model 31 to the target picking plane 5 during the operation so that the end effector model 31 can finally achieve the desired picking posture. The reward for smooth trajectory is mainly calculated based on the motion parameters of each joint of the picking robot model 3. It is used to evaluate whether the movement process of the picking robot model 3 is smooth and to motivate and guide the movement process of the picking robot model 3 to be smoother, avoiding robot malfunctions caused by excessive movement amplitude. The reward for obstacle avoidance is mainly used to evaluate whether the picking robot model 3 collides with other models during the operation. It motivates and guides the picking robot model 3 to avoid other models during the operation process and avoid collisions that damage the fruit model 1 and the branch model 2 before reaching the target picking point 4. The reward function calculates the total reward value based on the above four parts, thereby incentivizing and guiding the picking action process of the picking robot model 3 from four aspects. After reinforcement learning training, the optimal picking action process is determined, enabling the end effector model 31 to reach the target picking point 4 with the desired picking posture. At the same time, during the movement, the picking robot model 3 moves smoothly and stably, avoiding obstacles and improving the final picking success rate.
[0085] Specifically, in some embodiments, the total reward value is typically a weighted sum of four components: proximity reward, picking posture reward, smooth trajectory reward, and obstacle avoidance reward; the expression for the reward function is as follows:
[0086]
[0087] In the formula r t (s t ,a t ) represents the total reward value at state t; s t ,a t The motion information of the harvesting robot model 3 in state t; r goal r obs r ctrl r posThese are the rewards for approaching the target, avoiding obstacles, smoothing the trajectory, and picking the posture in state t, respectively; ω1, ω2, ω3, and ω4 are the task adaptive parameters corresponding to the target reward, obstacle avoidance reward, smoothing trajectory reward, and picking posture reward, respectively; among them, the four task adaptive parameters can be designed according to actual needs to adjust the degree of influence of each part of the reward on the total reward, thereby adjusting the magnitude of the incentive effect of each part of the reward.
[0088] Specifically, in some embodiments, the step of calculating the proximity reward based on the position of the end effector model and the position of the target picking point includes the following steps:
[0089] Calculate the first distance between the end effector model and the target picking point based on the position of the end effector model and the position of the target picking point.
[0090] When the first distance is greater than the first preset distance, the reward for getting closer to the target is calculated based on the first preset distance and the first distance.
[0091] When the first distance is less than or equal to the first preset distance, the reward for approaching the target is calculated based on the first distance.
[0092] In this embodiment, by setting a first preset distance, and when the first distance between the end effector model 31 and the target picking point 4 is greater than or less than the first preset distance, two calculation methods are used to calculate the reward for approaching the target. When the end effector model 31 is far from the target picking point 4 (i.e., outside the first preset distance), the picking robot model 3 will receive a negative reward with a larger absolute value (i.e., a larger penalty), thereby incentivizing and guiding the picking robot model 3 to move so that the end effector model 31 can quickly enter the range of the first preset distance. When the end effector model 31 gradually approaches the target picking point 4 within the range of the first preset distance, the picking robot model 3 will receive a negative reward with a gradually decreasing absolute value (i.e., a gradually decreasing penalty), so that the end effector model 31 gradually approaches the target picking point 4 and avoids unstable oscillations.
[0093] In one specific embodiment, the reward for approaching the target is r. goal The calculation formula is as follows:
[0094]
[0095] In the formula, d tar This is the first distance; The first preset distance, the first preset distance It can be adjusted according to actual needs.
[0096] Specifically, in some embodiments, the step of calculating the picking posture reward based on the orientation and position of the target picking plane and the end effector model includes the following steps:
[0097] Set up a target picking area that surrounds the target picking point.
[0098] When the end effector model is outside the target picking area, the picking posture reward is equal to the first negative constant.
[0099] When the end effector model is located within the target picking area, the picking posture reward is calculated based on the orientation of the end effector model and the orientation of the target picking plane.
[0100] In this embodiment, by setting a target picking area outside the target picking point 4, when the end effector model 31 is outside the target picking area, the picking posture reward is equal to the first negative constant. The first negative constant is usually a negative constant with a large absolute value, that is, a large penalty is given to the picking robot model 3 to encourage and guide the picking robot model 3 to move quickly, so that the end effector model 31 enters the target picking area, so as to get closer to the target picking point 4 and adjust its posture. When the end effector model 31 enters the target picking area, the picking posture reward is calculated according to the direction of the end effector model 31 and the direction of the target picking plane 5 to encourage and guide the picking robot model 3 to adjust the direction of the end effector model 31 so that it is parallel to the target picking plane 5, so as to achieve the desired picking posture.
[0101] In one specific embodiment, the picking posture reward r pos The calculation formula is as follows:
[0102]
[0103] In the formula, q is the three-dimensional vector of the end effector model 31; n is the normal vector of the target picking plane 5; Ω is the target picking area; and C1 is the first negative constant.
[0104] Specifically, in some embodiments, the step of calculating obstacle avoidance rewards based on whether the harvesting robot model collides with other models includes the following steps:
[0105] When the harvesting robot model collides with other models, the obstacle avoidance reward function equals the second negative constant.
[0106] When the harvesting robot model does not collide with other models, the obstacle avoidance reward function is zero.
[0107] In this embodiment, when the picking robot model 3 collides with other models, the picking robot model 3 will be subject to a greater penalty in order to guide and incentivize the picking robot model 3 to avoid other models during the operation.
[0108] In one specific embodiment, a virtual environment is constructed using the MuJoCo environment. The fruit model 1, branch model 2, and harvesting robot model 3 within the MuJoCo environment can all be configured with collision detection attributes. During the movement of the harvesting robot model 3, the collision detection attributes can be used to detect whether the harvesting robot model 3 collides with the fruit model 1 or branch model 2, thereby calculating obstacle avoidance rewards. The specific calculation formula is as follows:
[0109]
[0110] In the formula, C2 is the second negative constant, and collision is the value of the collision detection attribute of the picking robot model 3, which is usually a Boolean value. When the picking robot model 3 collides, the collision value is True.
[0111] In one specific embodiment, the smooth trajectory reward r ctrl The calculation formula is as follows:
[0112]
[0113] In the formula n links The number of joints in the harvesting robot model 3; a i The motion parameters for each joint are defined, such as rotation angle and displacement distance. When the range of motion of each joint is too large, a large negative reward (i.e., penalty) is generated for the harvesting robot model 3 to guide the harvesting robot model 3 to adjust the range of motion of each joint and make the movement smoother.
[0114] Specifically, such as Figure 2 and Figure 3 As shown, in some embodiments, the target picking area is located on the side of the branch model 2 facing the picking robot model 3.
[0115] In this embodiment, by setting the target picking area on the side of the branch model 2 facing the picking robot model 3, the picking robot model 3 is encouraged and guided to move on the side of the branch model 2 facing the picking robot model 3. This avoids the picking robot model 3 moving past the branch model 2 and behind the branch model 2, thereby increasing the risk of collision and making the success rate of the entire picking process higher.
[0116] In some embodiments, the target picking area is further provided with multiple conical grooves, the apexes of which are located at the target picking point 4. Part of the structure of the branch model 2 and the fruit model 1 are located in the conical grooves, so as to further motivate and guide the picking robot model 3 to avoid the fruit model 1 and the branch model 2 during the operation, thereby avoiding collisions between the picking robot model 3 and the fruit model 1 and the branch model 2 during the picking operation process, and thus improving the picking success rate.
[0117] like Figure 3 As shown, in one specific embodiment, the target picking area is a hemisphere (as shown in the dot matrix surrounding the area), the surface of the hemisphere faces the picking robot model 3, the plane of the hemisphere is on the plane of the branch model 2, the target picking point 4 is located at the center of the hemisphere, and conical grooves are constructed on both the upper and lower sides of the hemisphere to avoid the fruit model 1 and the branch model 2.
[0118] In some embodiments, such as Figure 3 As shown, the branches connecting the fruit typically include a fruit stalk and a main stem, with the two ends of the fruit stalk connected to the fruit and the main stem, respectively; the branch model 2 includes a first column 21 and a second column 22; the size and extension direction of the first column 21 are the same as those of the main stem, and it is used to simulate the main stem; the size and extension direction of the second column 22 are the same as those of the fruit stalk, and it is used to simulate the fruit stalk; the fruit model 1 includes a third column 11, the size and extension direction of the third column 11 are the same as those of the fruit, and it is used to simulate the fruit.
[0119] In this embodiment, when constructing the fruit model 1 and the branch model 2, the actual fruit and branches are simplified. The branch model 2 is simplified into the first column 21 and the second column 22, and the fruit model 1 is simplified into the third column 11. This makes it easier to adjust in the virtual scene, so that the branch model 2 and the fruit model 1 can be adjusted for fruits and branches in different postures, which is more flexible and convenient, and can also reduce the amount of computer computation during reinforcement learning training.
[0120] Optionally, such as Figure 3 As shown, when the fruit stalk is bent to a large degree, multiple second pillars 22 can be constructed, and the multiple second pillars 22 are connected at an angle in sequence to simulate the bent fruit stalk.
[0121] Specifically, such as Figure 3 As shown, the target picking point 4 is on the second column 22; the target picking plane 5 passes through the target picking point 4, is perpendicular to the plane where the branch model 2 is located, and is parallel to the first column 21.
[0122] In this embodiment, during actual harvesting, the harvesting robot typically cuts the fruit stalk at the stem to harvest the fruit. By setting the target harvesting point 4 on the second column 22, the actual harvesting can be simulated. At the same time, by making the target harvesting plane 5 perpendicular to the plane where the branch model 2 is located and parallel to the first column 21, the end effector model 31 is guided and stimulated to move to the target harvesting plane 5 to perform harvesting, thus avoiding the end effector model 31 from colliding with the first column 21. This prevents the harvesting robot from colliding with and damaging the main stem during actual harvesting, thereby avoiding damage to the plant.
[0123] In one specific embodiment, the target picking plane 5 is perpendicular to the plane containing the first column 21 and the second column 22, and the distance between the first column 21 is 20mm.
[0124] Specifically, in some embodiments, step S104: controlling the harvesting robot to perform harvesting tasks in the actual environment according to the optimal harvesting action function includes the following steps:
[0125] Based on images of the fruit and branches, determine the input parameters for the optimal picking action function.
[0126] Input the input parameters into the optimal action function to obtain the optimal picking action process.
[0127] Based on the optimal harvesting procedure, the harvesting robot is controlled to complete the harvesting.
[0128] When performing harvesting tasks in a real-world environment, the input parameters of the optimal harvesting action function can be determined by using images of the fruit and branches to be harvested. These parameters include the relative positions of the fruit, branches, and harvesting robot, as well as the shapes of the fruit and branches. By inputting these parameters into the optimal action function, the optimal harvesting action flow for the current scenario can be obtained. Based on the robot program corresponding to the optimal harvesting action flow, the harvesting robot is controlled to approach the branch and complete the harvesting at the desired location according to the optimal action trajectory and harvesting posture. The movement is smooth during the harvesting process, avoiding collisions with the fruit and branches that could cause damage, thus improving the harvesting success rate.
[0129] The following describes the autonomous operation decision-making system for the harvesting robot provided by the present invention. The autonomous operation decision-making system for the harvesting robot described below can be referred to in correspondence with the autonomous operation decision-making method for the harvesting robot described above.
[0130] like Figure 5 As shown, the autonomous operation decision-making system of the harvesting robot includes: a scene construction module 510, a first determination module 520, a reinforcement learning module 530, and an execution module 540.
[0131] The scene construction module 510 is used to collect sample images of fruits and branches to construct multiple virtual scenes. Each virtual scene includes a picking robot model, a fruit model, and a branch model. The first determination module 520 is used to determine the target picking point and target picking plane of the end effector model in the virtual scene, and input the orientation information of the target picking point and target picking plane as parameters into the reward function. The reinforcement learning module 530 is used to perform reinforcement learning training on the picking action process of the picking robot in multiple virtual scenes according to the reward function, and determine the optimal picking action function. The execution module 540 is used to control the picking robot to perform picking tasks in the actual environment according to the optimal picking action function.
[0132] Figure 6 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6 As shown, the electronic device may include a processor 610, a communication interface 620, a memory 630, and a communication bus 640. The processor 610, communication interface 620, and memory 630 communicate with each other via the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute the autonomous operation decision-making method for the harvesting robot provided in the above embodiment. This method includes: acquiring sample images of fruits and branches to construct multiple virtual scenes; each virtual scene includes a harvesting robot model, a fruit model, and a branch model; determining the target harvesting point and target harvesting plane of the end effector model in the virtual scene, and inputting the orientation information of the target harvesting point and target harvesting plane as parameters into a reward function; performing reinforcement learning training on the harvesting robot's harvesting action flow in multiple virtual scenes based on the reward function to determine the optimal harvesting action function; and controlling the harvesting robot to perform harvesting tasks in the actual environment based on the optimal harvesting action function.
[0133] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0134] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the autonomous operation decision-making method for the harvesting robot provided by the above methods. The method includes: collecting sample images of fruits and branches to construct multiple virtual scenes; wherein each of the multiple virtual scenes includes a harvesting robot model, a fruit model, and a branch model; in the virtual scenes, determining the target harvesting point and the target harvesting plane of the end effector model, and inputting the orientation information of the target harvesting point and the target harvesting plane as parameters into a reward function; performing reinforcement learning training on the harvesting action flow of the harvesting robot in multiple virtual scenes according to the reward function to determine the optimal harvesting action function; and controlling the harvesting robot to perform harvesting tasks in the actual environment according to the optimal harvesting action function.
[0135] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements the autonomous operation decision-making method for the harvesting robot provided by the above methods. The method includes: collecting sample images of fruits and branches to construct multiple virtual scenes; wherein each of the multiple virtual scenes includes a harvesting robot model, a fruit model, and a branch model; in the virtual scenes, determining the target harvesting point and the target harvesting plane of the end effector model, and inputting the orientation information of the target harvesting point and the target harvesting plane as parameters into a reward function; performing reinforcement learning training on the harvesting action flow of the harvesting robot in the multiple virtual scenes according to the reward function to determine the optimal harvesting action function; and controlling the harvesting robot to perform harvesting tasks in the actual environment according to the optimal harvesting action function.
[0136] The embodiments described above are merely illustrative, and some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement these embodiments without any creative effort.
[0137] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A picking robot autonomous operation decision method, characterized by, The harvesting robot is equipped with an end effector for picking the fruit off the branches; The autonomous operation decision-making method for the harvesting robot includes: Collect sample images of fruits and branches to construct multiple virtual scenes; each virtual scene includes a model of a harvesting robot, a model of fruits, and a model of branches. In the virtual scene, the target picking point and target picking plane of the end effector model are determined, and the orientation information of the target picking point and target picking plane is used as parameters input into the reward function; The optimal picking action function is determined by performing reinforcement learning training on the picking action process of the picking robot in multiple virtual scenarios based on the reward function. Based on the optimal picking action function, control the picking robot to perform picking tasks in the actual environment; The step of performing reinforcement learning training on the harvesting action process of the harvesting robot in multiple virtual scenarios based on the reward function to determine the optimal harvesting action function includes: During the simulation, the motion information of the harvesting robot model is acquired, and the motion information is input into the reward function to calculate the total reward value; The step of acquiring motion information of the harvesting robot model during the simulation and inputting the motion information into the reward function to calculate the total reward value includes: Calculate the first distance between the end effector model and the target picking point based on the position of the end effector model and the position of the target picking point; When the first distance is greater than the first preset distance, the reward for getting closer to the target is calculated based on the first preset distance and the first distance. When the first distance is less than or equal to the first preset distance, the reward for approaching the target is calculated based on the first distance. Set up a target picking area that surrounds the target picking point; When the end effector model is outside the target picking area, the picking posture reward is equal to the first negative constant. When the end effector model is located within the target picking area, the picking posture reward is calculated based on the orientation of the end effector model and the orientation of the target picking plane; When the harvesting robot model collides with other models, the obstacle avoidance reward function is equal to the second negative constant. When the harvesting robot model does not collide with other models, the obstacle avoidance reward function is zero. Obtain the motion parameters of each joint of the harvesting robot model and calculate the smooth trajectory reward; when the motion amplitude of each joint is too large, it will generate a negative reward for the harvesting robot model. The total reward value is calculated based on the reward for approaching the target, the reward for picking posture, the reward for smooth trajectory, and the reward for obstacle avoidance.
2. The picking robot autonomous operation decision method according to claim 1, characterized in that, The step of performing reinforcement learning training on the harvesting action process of the harvesting robot in multiple virtual scenarios based on the reward function to determine the optimal harvesting action function includes: The picking process of the robotic arm was repeatedly simulated in multiple virtual scenarios; During the simulation, the motion information of the harvesting robot model is acquired, and the motion information is input into the reward function to calculate the total reward value; Based on the total reward value, determine the optimal picking action process for each virtual scene; Based on the virtual scene and the corresponding optimal picking action process, determine the optimal picking action function.
3. The picking robot autonomous operation decision method according to claim 1, characterized in that, The target picking area is located on the side of the branch model facing the picking robot model.
4. The picking robot autonomous operation decision method according to claim 1, characterized in that, The branch includes a fruit stalk and a main stem, with the two ends of the fruit stalk connected to the fruit and the main stem, respectively; The branch model includes a first column and a second column; the first column has the same size and extension direction as the main stem and is used to simulate the main stem. The second column has the same dimensions and extension direction as the fruit stalk, and is used to simulate the fruit stalk; The fruit model includes a third column, the size and extension direction of which are the same as those of the fruit, and is used to simulate the fruit.
5. The picking robot autonomous operation decision method according to claim 4, characterized in that, The target picking point is on the second column; the target picking plane passes through the target picking point, the target picking plane is perpendicular to the plane where the branch model is located, and is parallel to the first column.
6. The picking robot autonomous operation decision method according to claim 1, characterized in that, The step of controlling the harvesting robot to perform harvesting tasks in the actual environment according to the optimal harvesting action function includes: Based on the images of the fruit and branches, determine the input parameters of the optimal picking action function; Input the input parameters into the optimal action function to obtain the optimal picking action process; Based on the optimal harvesting procedure, the harvesting robot is controlled to complete the harvesting.