Path planning methods, devices, electronic equipment, storage media and software products
By using a CVAE-based path planning model and knowledge distillation and reinforcement learning, path planning waypoints are directly output, solving the problem of excessively long path planning time for high-degree-of-freedom robots and achieving fast path planning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- MIDEA GROUP CO LTD
- Filing Date
- 2026-02-11
- Publication Date
- 2026-06-30
AI Technical Summary
Existing path planning algorithms require the entire search tree to be regenerated in high-degree-of-freedom robots, resulting in excessively long path planning times that cannot meet the needs of industrial applications.
By constructing a path planning model based on CVAE, and utilizing knowledge distillation and reinforcement learning, a path planning model is generated that directly outputs the waypoints required for path planning, avoiding the generation of search trees.
It effectively shortens the path planning time, enabling robots to quickly generate planned paths to meet the needs of industrial scenarios.
Smart Images

Figure CN122306064A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a path planning method, apparatus, electronic device, storage medium, and program product. Background Technology
[0002] With the rapid development of industrial automation, intelligent services, and other fields, intelligent agents such as mobile robots and robotic arms face the need for efficient and safe movement in complex environments. Path planning, as a core component of autonomous decision-making, aims to find a collision-free path from the initial state to the target state in a given obstacle environment, and typically needs to meet certain optimization metrics, such as path length, smoothness, or planning time.
[0003] Currently, the path planning algorithms used in the robotics field are mainly sampling algorithms. These algorithms gradually construct a graph or tree by randomly sampling the state space, replacing analytical completeness with probabilistic completeness. They are suitable for high-dimensional, nonholonomic constraints, or complex geometric obstacle scenarios, and their core relies on nearest neighbor search, collision detection, and connection strategies. Taking a two-dimensional path search problem as an example, the sampling algorithm explores the feasible region randomly and generates a search graph or search tree until a feasible path is found. Typically, a maximum search time and a maximum number of failures are set; if the solution is not found within the specified time and number of failures, the planning task fails.
[0004] However, existing algorithms require regenerating the entire search tree for different path planning tasks of robots, meaning that each planning session needs to start from scratch to plan a feasible path. For low-dimensional tasks, the search space is relatively small, which usually meets the needs of industrial applications. However, as the degrees of freedom of the robotic arm increase, the search space expands exponentially with the degrees of freedom. At this point, a huge amount of time is required to explore the search space, which directly leads to excessively long path planning time for robots, making it unsuitable for industrial applications. Summary of the Invention
[0005] This application aims to address at least one of the technical problems existing in the related art. To this end, this application proposes a path planning method that can effectively reduce the path planning time of robots to meet the needs of industrial applications.
[0006] This application also proposes a path planning device, electronic device, storage medium, and program product.
[0007] The path planning method according to the first aspect of this application includes: Obtain the robot's path planning task to be processed; the path planning task to be processed includes at least one of the robot's actual target state information and actual initial state information; The path planning task to be processed is input into the path planning model to obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model through a sample path dataset; the preset model is built based on CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, the robot's sample initial state information, and the robot's sample target state information; The robot's planned path is generated based on each waypoint.
[0008] According to the path planning method of this application embodiment, a preset model is constructed in advance based on CVAE, and a sample path planning task is constructed based on information such as robot model, task environment information, robot sample initial state information, and robot sample target state information. Path planning is then performed on the sample path planning task to obtain sample paths. Furthermore, knowledge distillation is performed on the preset model using the sample path dataset constructed from the sample paths to obtain a path planning model. Thus, after obtaining the path planning task to be processed, which includes the robot's real target state information and real initial state information, it is input into the path planning model to obtain waypoints output by the path planning model. Based on each waypoint, the planned path of the robot can be generated. Since the path planning model has been obtained by knowledge distillation on the preset model based on the sample path dataset, when the path planning model performs path planning based on the path planning task to be processed, regardless of the number of robotic arms or degrees of freedom of the robot, there is no need to generate a search tree. This reduces the optimization time of the high-dimensional motion space to the time of one network feedforward, effectively shortening the time for generating waypoints. This allows for rapid generation of the robot's planned path based on each waypoint, thereby effectively reducing the robot's path planning time and meeting the needs of industrial applications.
[0009] According to one embodiment of this application, the path planning model is obtained based on the following method: Obtain the sample path dataset; Remove the sample paths that fail the collision detection from the sample path dataset to obtain the target path dataset; The path planning model is obtained by performing knowledge distillation on the preset model based on the target path dataset.
[0010] According to one embodiment of this application, the sample path is generated based on the following method: Obtain the sample path planning task; Based on the sample path planning task, path planning is performed to obtain at least one sample waypoint; Trajectory optimization is performed on each of the sample waypoints to obtain the sample path.
[0011] According to one embodiment of this application, the path planning model is obtained by optimizing the decoder of CVAE in the preset model through reinforcement learning based on the pre-trained preset model.
[0012] According to one embodiment of this application, the variance required for the probability distribution of the action policy in the reinforcement learning is obtained from the decoder of the CVAE.
[0013] According to one embodiment of this application, the decoder of the CVAE includes an input interface for the preceding trajectory.
[0014] The path planning apparatus according to a second aspect embodiment of this application includes: The acquisition module is used to acquire the robot's path planning task to be processed; the path planning task to be processed includes at least one of the robot's real target state information and real initial state information; A planning module is used to input the path planning task to be processed into a path planning model and obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model through a sample path dataset; the preset model is built based on CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, sample initial state information of the robot, and sample target state information of the robot; The generation module is used to generate the planned path of the robot based on each waypoint.
[0015] An electronic device according to a third aspect of this application includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement any of the path planning methods described above.
[0016] According to a fourth aspect of this application, the storage medium is a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements any of the path planning methods described above.
[0017] A computer program product according to a fifth aspect of this application includes a computer program that, when executed by a processor, implements any of the path planning methods described above.
[0018] The above-described one or more technical solutions in the embodiments of this application have at least the following technical effects: By pre-constructing a preset model based on CVAE and building a sample path planning task based on the robot model, task environment information, robot sample initial state information, and robot sample target state information, a sample path is generated. Then, the preset model is subjected to knowledge distillation using the sample path dataset constructed from the sample paths, resulting in a path planning model. Thus, after obtaining the path planning task containing the robot's real target state information and real initial state information, it is input into the path planning model to obtain waypoints. Based on these waypoints, the planned path for the robot can be generated. Since the path planning model is pre-distilled based on the sample path dataset, it eliminates the need for search tree generation when planning paths based on the task, regardless of the number of robotic arms or degrees of freedom. This reduces the optimization time of the high-dimensional motion space to the time of a single network feedforward, effectively shortening the time for waypoint generation. This allows for rapid generation of the robot's planned path based on the waypoints, significantly reducing the robot's path planning time and meeting the needs of industrial applications.
[0019] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in the embodiments or related technologies of this application, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0021] Figure 1 This is a flowchart illustrating the path planning method provided in the embodiments of this application.
[0022] Figure 2 This is a schematic diagram of the process of knowledge distillation of a CVAE-based preset model based on a sample path dataset (or target path dataset) in the path planning method provided in this application embodiment.
[0023] Figure 3 This is a schematic diagram of the path planning architecture and process of the robot in the path planning method provided in the embodiments of this application.
[0024] Figure 4 This is a schematic diagram of the structure of the electronic device provided in this application. Detailed Implementation
[0025] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0026] It should be noted that the field of robotics, in addition to traditional single-arm robotic arms and dual-arm collaborative robots, also includes super-humanoid robots with extremely high degrees of freedom, such as four-arm and six-arm super-humanoid robots. Furthermore, each arm of a super-humanoid robot has at least six degrees of freedom; therefore, in path planning tasks, the number of degrees of freedom for super-humanoid robots is far greater than that of single-arm robotic arms and dual-arm collaborative robots.
[0027] However, as the number of degrees of freedom increases, the success rate and planning speed of traditional path planning algorithms decrease, making them unable to meet the needs of industrial production.
[0028] Based on this, this application proposes a path planning method, device, electronic device, storage medium, and program product. It proposes a collaborative planning strategy based on knowledge distillation and knowledge reinforcement for the path planning problem of super humanoid robots, aiming to quickly plan the robot's path and improve the robot's working efficiency, thereby meeting the needs of industrial applications.
[0029] It should be noted that all actions involving the acquisition of signals, information, or data in this application are carried out in compliance with the relevant data protection laws and regulations of the locality and with authorization from the owner of the relevant device.
[0030] Figure 1 This is a flowchart illustrating the path planning method provided in the embodiments of this application, as shown below. Figure 1 As shown, the path planning method includes: Step 110: Obtain the robot's path planning task to be processed; the path planning task to be processed includes at least one of the robot's real target state information and real initial state information.
[0031] Step 120: Input the path planning task to be processed into the path planning model to obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of the preset model through the sample path dataset; the preset model is built based on CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, robot sample initial state information and robot sample target state information.
[0032] Step 130: Generate the robot's planned path based on each waypoint.
[0033] It should be noted that the path planning method provided in this application embodiment can be applied to robots. These robots may include, but are not limited to, single-arm robots, dual-arm collaborative robots, and superhumanoid robots.
[0034] In one embodiment, the path planning method of this application can be specifically applied to a six-armed superhumanoid robot. This six-armed superhumanoid robot has six robotic arms, each with at least six degrees of freedom.
[0035] Specifically, this application can pre-construct different sample path planning tasks for the same type of robot in a simulation environment based on one or more of the following information: robot model, task environment information, robot sample initial state information, and robot sample target state information.
[0036] Among them, the initial state information (including sample initial state information and real initial state information) is the original state information of the robot when it is planning the path, and the target state information (including sample target state information and real target state information) is the final state information that the robot needs to reach when it is planning the path.
[0037] The robot model contains relevant information about the robot needed for path planning simulation, which may include, but is not limited to, the robot's size, the number of robotic arms, etc.
[0038] The mission environment information can be related to the actual industrial use environment.
[0039] Furthermore, the robot's state information (including sample initial state information, sample target state information, and real target state information and real initial state information) in this application may include the robot's manipulator joint angles, the robot's manipulator end effector Cartesian position and orientation, and may also include a mixture of the robot's manipulator joint angles and the robot's manipulator end effector Cartesian position and orientation.
[0040] In one embodiment, a large number of different sample path planning tasks for the same type of robot can be constructed based on information such as robot model, task environment information, robot sample initial state information and robot sample target state information.
[0041] Furthermore, in path planning tasks for the same robot or different samples of the same type of robot, the robot model and task environment information are the same.
[0042] Furthermore, this application can perform path planning separately for each sample path planning task, thereby obtaining a large number of sample paths (in this application, sample paths can also be referred to as expert paths), and each sample path constitutes a sample path dataset as an expert path dataset.
[0043] After obtaining the sample path dataset, this application can perform knowledge distillation on a pre-built model based on a Conditional Variational Autoencoder (CVAE) using the sample path dataset. Specifically, the pre-built model acts as a student network for knowledge distillation from the sample path dataset, allowing it to learn the planning strategies for each sample path in the dataset. This results in a path planning model containing end-to-end path planning strategies after knowledge distillation. Notably, after knowledge distillation, the path planning model in this application does not output a complete path based on the input content, but rather a corresponding number (one or more) of discrete waypoints for the required path.
[0044] Therefore, this application can implicitly store information related to the expert paths of a superhumanoid robot through knowledge distillation using a CVAE with a generative architecture.
[0045] Furthermore, this path planning model can be deployed to the controllers or processors of the same or similar robots.
[0046] It should be noted that since each sample path planning task contains the same robot model and task environment information, the path planning model can also learn the task environment or have memory of the task environment during the knowledge distillation process. This means that after the path planning model is deployed, it only needs the robot's initial state information and target state information to perform path planning according to the learned or memorized task environment.
[0047] Therefore, the robot can receive path planning tasks containing the robot's real target state information and real initial state information as path planning tasks to be processed.
[0048] Furthermore, the path planning task to be processed can be input into the path planning model to obtain at least one waypoint output by the path planning model.
[0049] After obtaining each discrete waypoint, this application can perform time-optimal planning on each discrete waypoint to ultimately form the planned path of the robot.
[0050] In one embodiment, when performing time-optimal planning, in addition to each discrete waypoint, the maximum speed and maximum acceleration of each joint of the robot arm can also be input as constraints.
[0051] Furthermore, trajectory preprocessing can be performed on the trajectory waypoints (i.e., each discrete waypoint) to obtain a continuously differentiable trajectory expression, and then time optimization can be performed to ensure that the robot's robotic arm joints do not exceed the speed and acceleration limits during the trajectory movement, thereby ensuring safety.
[0052] The specific planning methods and processes are not described in detail in this application. Typical processing methods may include Time-Optimal Trajectory Generation (TOTG), Time-Optimal Path Parameterization (TOPP), and other methods.
[0053] According to the path planning method of this application embodiment, a preset model is constructed in advance based on CVAE, and a sample path planning task is constructed based on information such as robot model, task environment information, robot sample initial state information, and robot sample target state information. Path planning is then performed on the sample path planning task to obtain sample paths. Furthermore, knowledge distillation is performed on the preset model using the sample path dataset constructed from the sample paths to obtain a path planning model. Thus, after obtaining the path planning task to be processed, which includes the robot's real target state information and real initial state information, it is input into the path planning model to obtain waypoints output by the path planning model. Based on each waypoint, the planned path of the robot can be generated. Since the path planning model has been obtained by knowledge distillation on the preset model based on the sample path dataset, when the path planning model performs path planning based on the path planning task to be processed, regardless of the number of robotic arms or degrees of freedom of the robot, there is no need to generate a search tree. This reduces the optimization time of the high-dimensional motion space to the time of one network feedforward, effectively shortening the time for generating waypoints. This allows for rapid generation of the robot's planned path based on each waypoint, thereby effectively reducing the robot's path planning time and meeting the needs of industrial applications.
[0054] Based on the above embodiments, the sample path is generated in the following manner: Obtain the sample path planning task; Based on the sample path planning task, path planning is performed to obtain at least one sample waypoint; Trajectory optimization is performed on each sample waypoint to obtain the sample path.
[0055] Specifically, this application can, in a simulation environment, utilize the robot model r, task environment information e, and the robot's initial sample state information. Sample target state information of the robot This information is used to construct different sample path planning tasks for the same type of robot. The initial state information in this application can also be called the initial configuration, and the target state information can also be called the target configuration.
[0056] Since in practical applications, robots often only need to switch between a few given configurations, this application can define a set of preset configurations for a six-armed humanoid robot, denoted as follows: When constructing the target task (i.e., the sample path planning task), the initial configuration... It can be calculated using the following formula: ; in, Uniform sampling of [0,1] Uniform sampling within the feasible region of the joint space for a six-armed superhumanoid robot. Represented as a Gaussian distribution, For sets Uniform sampling, , are adjustable parameters, representing the random probability and covariance matrix, respectively.
[0057] Target configuration Calculation and They are close, the only difference is that they will be close. , Replace with , .
[0058] Furthermore, for each sample path planning task, the generatePath function can be used to generate the corresponding sample path.
[0059] One feasible approach to the specific implementation of the generatePath function is to use the Open Motion Planning Library (OMPL) sampling method to obtain discrete waypoints (i.e., sample waypoints) and then perform targeted optimization through post-processing to obtain sample paths.
[0060] Specifically, after sampling to obtain the initial and target configurations and importing the task environment information into the simulation environment, path planning can be performed using the path planner AiT* (Asymmetric Information-Theoretic A*) based on a given duration to obtain discrete sample waypoints. Furthermore, covariant Hamiltonian optimization for motion planning (CHOMP) can be used to optimize the trajectory of each sample waypoint, resulting in a smooth sample path, i.e., an expert path.
[0061] Among them, AIT* is a sampling-based path planning algorithm designed to solve the problem of balancing efficiency and optimality in high-dimensional robot planning.
[0062] It should be noted that this application can also use various path planners such as Rapidly-exploring RandomTree (RRT) and Probabilistic Roadmap (PRM) for path planning.
[0063] This application addresses the issue of generating numerous initial and target configurations for super-humanoid robots within a reasonable task space, incorporating prior knowledge to form sample path planning tasks. These tasks then use a path planner to obtain feasible paths, thereby generating an expert path dataset. Therefore, the expert path dataset stores a large number of trajectory points successfully planned by the super-humanoid robot. This allows the model to effectively learn path planning knowledge when performing knowledge distillation on a pre-defined model based on the expert path dataset. Consequently, the path planning model can quickly output the discrete waypoints required for path planning, reducing the time required to generate waypoints and enabling rapid generation of the robot's planned path based on these waypoints. This effectively reduces the robot's path planning time and meets the needs of industrial applications.
[0064] Based on the above embodiments, the path planning model is obtained in the following way: Obtain the sample path dataset; Remove the sample paths that fail the collision detection from the sample path dataset to obtain the target path dataset; Based on the target path dataset, knowledge distillation is performed on the pre-defined model to obtain the path planning model.
[0065] Specifically, this application can obtain a sample path dataset pre-composed of each sample path.
[0066] After obtaining the sample path dataset, this application can perform collision detection on each sample path in the sample path dataset separately.
[0067] Specifically, for any sample path, this application can detect whether the initial configuration and target configuration of the sample path collide or overlap with the colliders in the task environment information. If the initial configuration or target configuration collide or overlap with the colliders in the task environment information, the sample path is determined to have failed the collision detection and is retained. If neither the initial configuration nor the target configuration collide or overlap with the colliders in the task environment information, the sample path is determined to have passed the collision detection and is removed.
[0068] The sample path dataset after collision detection-based sample path removal is used as the target path dataset.
[0069] Furthermore, a pre-defined model can be used as a student network to distill knowledge from the target path dataset. Since robot data is often multimodal, this application uses a CVAE-based pre-defined model as the trajectory generation architecture of the student network to fit expert paths in the target path dataset.
[0070] It should be noted that in traditional action generation architectures, the sequence of actions preceding any given action is not used as input. However, this application requires action sequence information as a decision-making basis for generating smooth trajectories in subsequent knowledge reinforcement (i.e., reinforcement learning). Therefore, the decoder in the traditional CVAE network structure design has been augmented with an input interface for the preceding trajectory. The preceding trajectory is the combination of all trajectories obtained up to the current time step.
[0071] Figure 2 This is a schematic diagram illustrating the process of knowledge distillation of a CVAE-based preset model based on a sample path dataset (or target path dataset) in the path planning method provided in this application embodiment. Figure 2 As shown, this application can take the sample target state information of the robot in each sample path as the task target, the sample initial state information of the robot as the initial state, and the trajectory corresponding to the sample path as the instruction trajectory to form the input of the preset model. Among them, the other states after the initial state can be used as the current state.
[0072] Therefore, in the knowledge distillation process, the task objective, the current state (initially the initial state information of the sample), and the instruction trajectory can be embedded to obtain the corresponding vectors.
[0073] Then, the concatenated vectors are input into the self-attention Transformer backbone network of the encoder. The output of the backbone network is flattened and then passed through a fully connected (FC) layer in the encoder to obtain the mean μ and variance σ of the latent variables. The mean μ and variance σ of the latent variables define a Gaussian distribution, so the pre-defined model can randomly sample from this distribution to obtain a specific latent variable z.
[0074] Furthermore, the latent variable z and the preceding trajectory are embedded and concatenated before being input into the Decoder's self-attention Transformer backbone network. The self-attention Transformer backbone network gradually deduces the subsequent trajectory sequence based on the input content, and its output is mapped through a fully connected (FC) layer to finally generate a new instruction trajectory.
[0075] Furthermore, the new instruction trajectory can be fed back to the decoder as a new "preceding trajectory" to generate the trajectory for the next moment, thereby achieving long-term, coherent trajectory generation.
[0076] After the above training (i.e., knowledge distillation), the resulting student network can be used as a path planning model for trajectory generation.
[0077] In the reasoning (i.e. model application) process, this application only needs to generate a sequence of actions with a fixed frequency, so the latent variable z can be set to zero directly during reasoning.
[0078] This application improves the path planning accuracy of the path planning model after knowledge distillation by performing collision detection on the sample paths in the sample path dataset, thus avoiding interference from abnormal samples.
[0079] Furthermore, by adding an input interface for the preceding trajectory, the corresponding action sequence information can be used as the basis for decision-making in subsequent knowledge reinforcement, which facilitates the generation of smooth trajectories.
[0080] In one embodiment, the path planning model is obtained by optimizing the decoder of CVAE in the pre-trained preset model through reinforcement learning.
[0081] Furthermore, the variance required for the probability distribution of action policies in reinforcement learning is obtained from the CVAE decoder.
[0082] Specifically, since path planning does not require dynamic simulation and can be optimized directly in the simulation, and considering that the pre-trained model in this application, after adding the input interface of the preceding trajectory to the CAVE, already has the ability to generate collision-free trajectories, this application can further use reinforcement learning to fine-tune the network architecture of the pre-trained model to further improve the success rate and smoothness of trajectory generation. Specifically, this can be done by optimizing the decoder of the CVAE in the pre-trained model using reinforcement learning.
[0083] Among them, reinforcement learning action strategies It is a probability distribution, which can be modeled using a Gaussian distribution. The mean of the Gaussian distribution is provided by the output of the Decoder, while the variance can be specified manually or obtained through autonomous exploration.
[0084] If obtained through autonomous exploration, a common approach is to add a head (specifically, a fully connected (FC) layer) to the output of the self-attention Transformer backbone network of the CVAE decoder as the output of the variance.
[0085] Since the pre-trained model network already possesses a strong ability to mimic experts, only smoothness needs to be optimized. Therefore, this application can use the Proximal Policy Optimization (PPO) algorithm to provide appropriate rewards, penalize the jump and fluctuation terms in the network output, and add strong negative feedback for collisions to avoid collisions caused by ignoring necessary waypoints due to smoothness optimization.
[0086] It should be noted that PPO is a classic reinforcement learning algorithm, but traditional applications of PPO have focused on the design and optimization of the reward function, without considering the overall deployment of the algorithm. As an on-policy algorithm, PPO requires a significant amount of time to collect simulation data under the current policy. Optimizing PPO from scratch would consume a considerable amount of time for the network to learn basic skills.
[0087] In this application, the basic ability to generate trajectories has already been obtained through knowledge distillation. The PPO algorithm is only a fine-tuning of the pre-trained preset model network on this basis, rather than retraining. This will greatly reduce training time and improve training effect.
[0088] In addition, in the current cutting-edge field of action generation, Flow Matching or Diffusion is often used for multi-step generation. This application emphasizes that using the CVAE architecture can avoid multi-step iterations, reducing the length of an episode (i.e., a complete task cycle) by about 10 times, which is more conducive to the model's value network estimation and makes the overall solution more "reinforcement learning friendly". Because the total number of optimization steps required is shortened, better optimization results can be achieved per unit time. Flow Matching and Diffusion are two current model paradigms in the field of generative artificial intelligence.
[0089] In some cases, this application can also employ generative architectures such as Flow Matching and Diffusion to achieve knowledge distillation. Specifically, if a generative architecture with multi-step iterations is used, the backend knowledge reinforcement needs to match the augmented state, that is, add the state of the iteration steps to the original state input.
[0090] Furthermore, this application can also use other reinforcement learning algorithms for knowledge reinforcement. For example, A2C, A3C, and SAC can be used. Among them, A2C, A3C, and SAC are all Actor-Critic algorithms in reinforcement learning.
[0091] Therefore, this application adopts a high-degree-of-freedom path generation strategy learning process through knowledge distillation combined with knowledge reinforcement, and incorporates past time trajectory inputs into the knowledge distillation process to provide a strategic basis for subsequent reinforcement learning strategy improvement.
[0092] Based on this, after completing the path planning model and obtaining the path planning task to be processed, the path planning task to be processed can be input into the path planning model to obtain at least one waypoint output by the path planning model.
[0093] After obtaining each discrete waypoint, this application can perform time-optimal planning on each discrete waypoint to ultimately form the planned path of the robot.
[0094] In one embodiment, when performing time-optimal planning, in addition to each discrete waypoint, the maximum speed and maximum acceleration of each joint of the robot arm can also be input as constraints.
[0095] Furthermore, trajectory preprocessing can be performed on the trajectory waypoints (i.e., each discrete waypoint) to obtain a continuously differentiable trajectory expression, and then time optimization can be performed to ensure that the robot's robotic arm joints do not exceed the speed and acceleration limits during the trajectory movement, thereby ensuring safety.
[0096] It should be noted that if the input trajectory waypoints themselves have significant fluctuations, the planning time will be greatly extended, highlighting the necessity of knowledge reinforcement in this application. That is, this application uses reinforcement learning to make the trajectory waypoints output by the model more accurate and stable, avoiding significant fluctuations in the trajectory waypoints themselves, thereby effectively reducing the path planning time. Ultimately, the planning time can be limited to the sub-second level, which can meet the needs of industrial applications.
[0097] Figure 3 This is a schematic diagram of the path planning architecture and process of the robot in the path planning method provided in the embodiments of this application, such as... Figure 3 As shown, in one embodiment, the robot of this application may include a knowledge generation module, a knowledge distillation module, a knowledge reinforcement module, and a trajectory post-processing module.
[0098] The knowledge generation module generates a large number of initial and target configurations for the super-humanoid robot within a reasonable task space, combining prior knowledge to form tasks. These tasks are then processed by a path planner to obtain feasible paths, generating an expert path dataset. This expert path dataset stores a large number of trajectory points successfully planned by the super-humanoid robot. This expert data is used by the knowledge distillation module to obtain path generation strategies. In the knowledge distillation module, a student network distills knowledge from the expert path dataset. Since the robot-generated data often exhibits multimodality, CVAE is used as the trajectory generation architecture for the student network to fit the data in the expert path dataset. After training, the resulting network can be applied to trajectory generation.
[0099] To further improve the success rate and smoothness of trajectory generation, this path generation strategy is trained based on PPO through a knowledge reinforcement module. By randomizing the domain of joint input parameters and penalizing discontinuities in the generated trajectory, the robustness and smoothness of trajectory generation can be improved.
[0100] Finally, the discrete trajectory (i.e., discrete waypoints) is parameterized and collision detection is performed again through the trajectory post-processing module to ensure that the trajectory meets the joint safety constraints.
[0101] In practical applications, the process is divided into two phases: training and deployment / inference. The knowledge generation module generates expert data for training, the knowledge distillation module uses the dataset generated by the knowledge generation module to train the student network and obtain a path generation strategy, and the knowledge reinforcement module optimizes the obtained path generation strategy through parallel simulation to obtain an optimized path generation strategy. At this point, the training phase ends.
[0102] In actual reasoning, the current state needs to be obtained at each time step, and the trajectory discrete points are obtained by optimizing the path generation strategy. These discrete points are used as the trajectory post-processing module for smoothing and time optimization, and the output result is the final output path.
[0103] The path planning apparatus provided in this application is described below. The path planning apparatus described below and the path planning method described above can be referred to in correspondence.
[0104] Furthermore, this application also provides a path planning device.
[0105] The path planning device includes: The acquisition module is used to acquire the robot's path planning task to be processed; the path planning task to be processed includes at least one of the robot's real target state information and real initial state information; A planning module is used to input the path planning task to be processed into a path planning model and obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model through a sample path dataset; the preset model is built based on CVAE; the preset model is built based on a conditional variational autoencoder CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: the robot model, task environment information, the robot's sample initial state information, and the robot's sample target state information; The generation module is used to generate the planned path of the robot based on each waypoint.
[0106] The path planning device of this application constructs a preset model based on CVAE and a sample path planning task based on the robot model, task environment information, robot sample initial state information, and robot sample target state information. It then performs path planning on the sample path planning task to obtain sample paths. Furthermore, it performs knowledge distillation on the preset model using the sample path dataset constructed from the sample paths to obtain a path planning model. Thus, after obtaining the path planning task to be processed, which includes the robot's real target state information and real initial state information, it inputs it into the path planning model to obtain waypoints output by the path planning model. Based on these waypoints, the planned path of the robot can be generated. Since the path planning model is obtained by pre-distilling knowledge on the preset model based on the sample path dataset, the path planning model does not need to generate a search tree when planning paths based on the path planning task to be processed, regardless of the number of robotic arms or degrees of freedom of the robot. This reduces the optimization time of the high-dimensional motion space to the time of a single network feedforward, effectively shortening the time for generating waypoints. This allows for rapid generation of the robot's planned path based on each waypoint, thereby effectively reducing the robot's path planning time and meeting the needs of industrial applications.
[0107] Figure 4 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 4 As shown, the electronic device includes a processor 410, a communication interface 420, a memory 430, and a communication bus 440. The processor 410, communication interface 420, and memory 430 communicate with each other via the communication bus 440. The processor 410 can call logical instructions from the memory 430 to execute the following method: acquiring a path planning task for the robot; the path planning task includes at least one of the robot's actual target state information and actual initial state information. The path planning task to be processed is input into the path planning model to obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model through a sample path dataset; the preset model is built based on CVAE; the preset model is built based on Conditional Variational Autoencoder CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, the robot's sample initial state information, and the robot's sample target state information; The robot's planned path is generated based on each waypoint.
[0108] Furthermore, the logical instructions in the aforementioned memory 430 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to related technologies, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0109] In another aspect, embodiments of this application also provide a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, is implemented to perform the methods provided in the above embodiments, such as: acquiring a path planning task to be processed for a robot; the path planning task to be processed includes at least one of the robot's real target state information and real initial state information; The path planning task to be processed is input into the path planning model to obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model through a sample path dataset; the preset model is built based on CVAE; the preset model is built based on Conditional Variational Autoencoder CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, the robot's sample initial state information, and the robot's sample target state information; The robot's planned path is generated based on each waypoint.
[0110] In another aspect, embodiments of this application also provide a computer program product, on which a computer program is stored. When the computer program is executed by a processor, it is implemented to perform the methods provided in the above embodiments, such as: obtaining a path planning task to be processed for a robot; the path planning task to be processed includes at least one of the robot's real target state information and real initial state information. The path planning task to be processed is input into the path planning model to obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model through a sample path dataset; the preset model is built based on CVAE; the preset model is built based on Conditional Variational Autoencoder CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, the robot's sample initial state information, and the robot's sample target state information; The robot's planned path is generated based on each waypoint.
[0111] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0112] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the parts that contribute to the related technology, can be embodied in the form of software products. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0113] Finally, it should be noted that the above embodiments are only used to illustrate this application and are not intended to limit this application. Although this application has been described in detail with reference to the embodiments, those skilled in the art should understand that various combinations, modifications, or equivalent substitutions of the technical solutions of this application do not depart from the spirit and scope of the technical solutions of this application and should be covered within the scope of the claims of this application.
Claims
1. A path planning method, characterized in that, include: Obtain the robot's pending path planning tasks; The path planning task to be processed includes at least one of the robot's real target state information and real initial state information; The path planning task to be processed is input into the path planning model to obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model using a sample path dataset; The preset model is constructed based on the Conditional Variational Autoencoder (CVAE); any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, the robot's sample initial state information, and the robot's sample target state information. The robot's planned path is generated based on each waypoint.
2. The path planning method according to claim 1, characterized in that, The path planning model was obtained based on the following method: Obtain the sample path dataset; Remove the sample paths that fail the collision detection from the sample path dataset to obtain the target path dataset; The path planning model is obtained by performing knowledge distillation on the preset model based on the target path dataset.
3. The path planning method according to claim 1, characterized in that, The sample path is generated based on the following method: Obtain the sample path planning task; Based on the sample path planning task, path planning is performed to obtain at least one sample waypoint; Trajectory optimization is performed on each of the sample waypoints to obtain the sample path.
4. The path planning method according to any one of claims 1 to 3, characterized in that, The path planning model is obtained by optimizing the decoder of CVAE in the pre-trained preset model through reinforcement learning.
5. The path planning method according to claim 4, characterized in that, The variance required for the probability distribution of the action policy in the reinforcement learning is obtained from the decoder of the CVAE.
6. The path planning method according to claim 4, characterized in that, The decoder of the CVAE includes an input interface for the preceding trajectory.
7. A path planning device, characterized in that, include: The acquisition module is used to acquire the robot's pending path planning tasks; The path planning task to be processed includes at least one of the robot's real target state information and real initial state information; A planning module is used to input the path planning task to be processed into a path planning model and obtain at least one waypoint output by the path planning model; wherein, the path planning model is obtained by knowledge distillation of a preset model through a sample path dataset; the preset model is built based on CVAE; the preset model is built based on a conditional variational autoencoder CVAE; any sample path in the sample path dataset is obtained by path planning based on the robot's sample path planning task; the sample path planning task includes at least one of the following: robot model, task environment information, sample initial state information of the robot, and sample target state information of the robot; The generation module is used to generate the planned path of the robot based on each waypoint.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the path planning method as described in any one of claims 1 to 6.
9. A storage medium, said storage medium being a non-transitory computer-readable storage medium, wherein a computer program is stored thereon, characterized in that, When the computer program is executed by a processor, it implements the path planning method as described in any one of claims 1 to 6.
10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the path planning method according to any one of claims 1 to 6.