An unmanned aerial vehicle formation path planning method, device and medium

By combining the Q-learning algorithm and B-spline curve fitting method with a six-degree-of-freedom full-state nonlinear mathematical model of UAVs and multi-agent consensus theory, the path planning problem of UAV formation in unknown environments is solved, improving the safety and efficiency of formation flight.

CN118859985BActive Publication Date: 2026-06-19NANCHANG HANGKONG UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANCHANG HANGKONG UNIVERSITY
Filing Date
2024-07-01
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to plan the path of drone formations in unknown environments, especially in complex urban low-altitude environments, making it difficult to guarantee flight safety and work efficiency. Furthermore, the capabilities of a single drone are limited, making it unsuitable for comprehensive missions.

Method used

The Q-learning algorithm combined with the B-spline curve fitting method is used for virtual lead aircraft path planning. The path planning and flight cooperative control of UAV formation are realized by combining the six-degree-of-freedom full-state nonlinear mathematical model of UAVs and the multi-agent consensus theory.

🎯Benefits of technology

It enables UAV formation path planning in unknown environments, improving flight safety and work efficiency, and expanding the application scenarios of single-aircraft path planning to formation path planning.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118859985B_ABST
    Figure CN118859985B_ABST
Patent Text Reader

Abstract

This invention discloses a method, device, and medium for unmanned aerial vehicle (UAV) formation path planning and cooperative control, relating to the fields of path planning and flight control. The method includes: acquiring three-dimensional terrain data of the UAV formation's working area; the UAV formation includes a virtual lead aircraft and wingmen; discretizing the environmental space and actions based on the three-dimensional terrain data; using a Q-learning algorithm to plan the path for the virtual lead aircraft; and then combining a six-degree-of-freedom full-state nonlinear mathematical model of the UAV and multi-agent consensus theory to perform flight control on the wingmen within the formation. This invention solves the problem of UAV formation flight path planning and cooperative control in complex and unknown environments. Furthermore, by combining the artificial potential field method and multi-agent consensus control theory, obstacle avoidance and collision avoidance are achieved in UAV formation flight.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of path planning technology, and in particular to a method, apparatus, and medium for unmanned aerial vehicle (UAV) formation path planning. Background Technology

[0002] Due to their high flexibility and maneuverability, drones face numerous limitations and safety issues when operating in complex urban low-altitude environments. This leads to compromises in flight safety and operational efficiency, making it difficult to meet the potentially explosive growth in low-altitude flight demand in the future. Therefore, finding an environmentally adaptable path planning and control system for urban low-altitude drones is a critical issue that urgently needs to be addressed for future urban low-altitude transportation.

[0003] For some complex tasks, the capabilities of a single drone are limited and insufficient. Therefore, with the rapid development of new-generation information technology, artificial intelligence, and swarm intelligence, there will be many scenarios in future urban low-altitude transportation where drone swarms will operate. Unlike path planning for a single drone, path planning for drone swarms also requires consideration of collision avoidance within the swarm.

[0004] Compared to common path planning algorithms such as A* algorithm, ant colony optimization, and particle swarm optimization, reinforcement learning methods, which do not require building complex models, can be applied to path planning tasks in unknown environments. Q-learning, as the most common reinforcement learning method, has gained favor among researchers. Then, combining a six-degree-of-freedom full-state nonlinear mathematical model of UAVs and multi-agent consensus theory, flight control of wingmen within a formation is achieved, realizing path planning for UAV formations. Summary of the Invention

[0005] The purpose of this invention is to provide a method, apparatus, and medium for drone formation path planning, which solves the problem of drone formation path planning in unknown environments. Furthermore, by incorporating a consistency control algorithm, the application scenario is extended from single-drone path planning to drone formation path planning.

[0006] To achieve the above objectives, the present invention provides the following solution:

[0007] In a first aspect, the present invention provides a method for unmanned aerial vehicle (UAV) formation path planning, the method comprising:

[0008] Acquire 3D terrain data of the working area of ​​the drone formation; the drone formation includes: virtual lead drone and wingman drone; the 3D terrain data includes: the starting point of each drone in the drone formation and the desired formation of the drone formation;

[0009] Based on the three-dimensional terrain data, a virtual leader state set S is established, the virtual leader action set A is established by discretizing the actions, and a profit function table Q(S,A) is established based on the state set and the action set.

[0010] Determine the strategy for the virtual primary machine to select actions from the action set when interacting with the ambient space;

[0011] The reward function table Q(S,A) is updated using the Q-learning algorithm to obtain the path points of the virtual primary machine;

[0012] Based on the virtual steer path points, a virtual steer path is generated using the B-spline curve fitting method.

[0013] Based on the virtual lead aircraft path and expected formation, the formation flight cooperative control of the following drones is carried out by combining the six-degree-of-freedom full-state mathematical model of the drones and the multi-agent consensus theory.

[0014] Preferably, the three-dimensional environment that the drone formation needs to interact with is discretized into a unitized mesh using the raster method; for the discretization of the drone's actions, a regular hexahedron is constructed with the drone as the center point and the edge length as 2 units, and the action space is discretized into 26 directions that the drone can choose (including the normal directions of the 6 faces of the regular hexahedron, the directions of the 8 vertices, and the directions of the midpoints of the 12 edges).

[0015] Preferably, the reward function Q(s,a) table, established based on the agent's state set S and action set A, is updated according to the immediate reward r obtained by the agent after taking action a (a∈A) in its current state s (s∈S) and transitioning to the next state s' (s'∈S).

[0016] The update function is described as follows:

[0017]

[0018] Where α represents the learning rate, used to weigh the impact of the value of the next state on the value of the current state; ζ is the learning discount factor, ζ∈(0,1), the closer it is to 1, the more this iterative formula focuses on the value that can be obtained by the subsequent state; This indicates that in the next moment, action a is selected from action set A to maximize Q(s',a).

[0019] Preferably, the action selection strategy employed balances the relationship between exploration and exploitation during the Q-learning algorithm's learning process, enabling the virtual primary machine to explore the environment more comprehensively when searching for a path. This strategy can be specifically described as follows:

[0020]

[0021] Here, π(s) represents the action selection strategy in state s. First, a small probability ε∈(0,1) is initialized. For a given state s, a random action is selected with probability ε, and the action a that can obtain the maximum Q value in the current state is selected with probability (1-ε).

[0022] The preferred instantaneous reward r can be described by the following formula:

[0023] r = r action +r environment

[0024] Where, r action With r environment The possible values ​​for are as follows:

[0025]

[0026] Where d represents the step length of each movement of the UAV. In equation (8), Λ obs Λ represents the area of ​​obstacles in the environment. env Represents the entire environmental region, (x goat ,y goat The value represents the coordinates of the target point. The state reward value obtained by the drone when it moves onto an environmental grid with obstacles is -50, the state reward value obtained when it moves to the target point is 100, and the state reward value is 0 at other grid points.

[0027] Preferably, the specific method for deriving the path points of the virtual primary machine based on the profit function table Q(S,A) can be described as follows: Starting from the virtual primary machine state corresponding to the starting point, select the action a corresponding to the maximum value in Q(s,A) corresponding to the current state s, execute this action to update the state, until the target point is reached, and record each state of the virtual primary machine. Export all the states that have appeared, which are the path points of the virtual primary machine.

[0028] Preferably, B-spline curves are used to fit the obtained path points, where the mathematical expression of a B-spline curve is:

[0029]

[0030] Where P0(t) is the fitted virtual long machine path, The discrete path points are obtained based on the Q-learning algorithm, where m is the number of discrete path points and l represents the degree of the B-spline curve. For B-spline basis functions:

[0031]

[0032] Preferably, the kinematic equations of the UAVs within the formation can be described by the following formula:

[0033]

[0034] Wherein, P is defined i =[x i ,y i ,z i ] T , Let represent the position and velocity information of the i-th wingman, respectively. Based on the kinematic equations of the following drone and the multi-agent consensus theory, the following virtual control law can be designed for the position layer of the follower drone's movement:

[0035]

[0036] Furthermore, the virtual control rate at the location layer Substituting the kinematic equations of each wingman, we get:

[0037]

[0038] in, and As a design constant, the desired trajectory of the i-th follower drone is defined as P. i d =P0-Δ i0 , Let these be the desired formation parameters. The position and velocity tracking errors are respectively... and i = 1, 2, ..., N, F i =[F i x ,F i y ,0] T Let F be the repulsive force from the obstacle experienced by the i-th follower drone during flight. Based on the artificial potential field theory, F can be expressed as... i x and F i y The specific design is as follows:

[0039]

[0040] Among them, κ oi >0 represents the design repulsion coefficient. This represents the distance between the i-th wingman and the k-th obstacle. and They represent Components in the x and y directions, This represents the maximum sensing distance of the drone for obstacles.

[0041] Preferably, the dynamic equations of the wingman in the formation can be described as follows:

[0042]

[0043] Where, Θ i =[φ i ,θ i ,ψ i ] T φ i ,θ i and ψ i Let represent the roll angle, pitch angle, and yaw angle of the i-th wingman, respectively. The UAV formation trajectory-attitude coupling six-degree-of-freedom full-state distributed cooperative control protocol is as follows:

[0044]

[0045] Among them, T i κ represents the total lift of the i-th UAV. Ωi >0 represents the design parameters, m represents the mass of the UAV, and g represents the acceleration due to gravity. Θi =Θ i -Θ i d , E Ωi =Ω i -Ω i d , κ Θi >0 represents a design parameter. The expected value is obtained according to claim 7. gravitational acceleration g and desired heading angle The desired pitch and roll angles can be obtained as follows:

[0046]

[0047] Design the following disturbance observer to estimate the external error:

[0048]

[0049] and For the parameters to be designed, D i The estimated value of (t), Ω i The estimated value of G. i =diag{1 / J x ,1 / J y ,1 / J z},

[0050] Where, φi ,θ i and ψ i Let J represent the roll angle, pitch angle, and yaw angle of the i-th follower UAV, respectively. x J y J z These represent the moments of inertia on the three coordinate axes, respectively. Substituting these equations into the dynamic equations of the wingmen in the formation allows us to calculate the attitude angles and attitude angular rates of each wingman, thereby enabling attitude control of each wingman.

[0051] In a second aspect, the present invention provides a computer device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the unmanned aerial vehicle (UAV) formation path planning method described in the first aspect.

[0052] Thirdly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the unmanned aerial vehicle (UAV) formation path planning method described in the first aspect.

[0053] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects:

[0054] This invention provides a method, apparatus, medium, and product for unmanned aerial vehicle (UAV) formation path planning. It utilizes a Q-learning algorithm and a B-spline curve fitting method for path planning of a virtual lead UAV. Furthermore, by combining a six-degree-of-freedom full-state nonlinear mathematical model of UAVs and multi-agent consensus theory, the application scenario of the path planning method, previously focused on single-aircraft UAVs, is extended to path planning for UAV formations. Attached Figure Description

[0055] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0056] Figure 1 This is a flowchart illustrating a drone formation path planning method provided in Embodiment 1 of the present invention.

[0057] Figure 2 The virtual steer provided in Embodiment 1 of the present invention offers 26 selectable movement directions.

[0058] Figure 3 This is a virtual long machine path graph obtained by the Q-learning algorithm, as provided in Embodiment 1 of the present invention.

[0059] Figure 4 This is a virtual long machine smooth and differentiable path after B-spline curve fitting provided in Embodiment 1 of the present invention.

[0060] Figure 5 The UAV formation path planning diagram provided in Embodiment 1 of the present invention combines a six-degree-of-freedom full-state nonlinear mathematical model of a UAV and multi-agent consensus theory.

[0061] Figure 6 This is an internal structural diagram of a computer device provided in Embodiment 5 of the present invention. Detailed Implementation

[0062] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0063] Most existing research on drone formation path planning methods only focuses on planar path planning, but in reality, drone formations operate in a three-dimensional space. Simply simulating a three-dimensional space with a two-dimensional planar environment cannot meet the actual operational needs of modern drones.

[0064] The purpose of this invention is to provide a method, apparatus, medium, and product for unmanned aerial vehicle (UAV) formation path planning, which uses the Q-learning algorithm and B-spline curve fitting method to plan the path of a virtual lead UAV. Furthermore, by combining a six-degree-of-freedom full-state nonlinear mathematical model of UAVs and multi-agent consensus theory, the application scenario of the path planning method, originally designed for single UAVs, is extended to path planning of UAV formations.

[0065] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0066] Example 1

[0067] like Figure 1 As shown in the figure, this embodiment provides a method for unmanned aerial vehicle (UAV) formation path planning, the method including...

[0068] ①Acquire 3D terrain data of the working area of ​​the drone formation; the drone formation includes: navigator drone and follower drone; the 3D terrain data includes: the starting point of each drone in the drone formation and the expected formation of the drone formation;

[0069] ② Establish the virtual alpha drone's state set S and action set A, and create a revenue function table Q(S,A) based on the state set and action set. Based on the aforementioned 3D terrain data, rasterize the environmental space and discretize the virtual alpha drone's actions. Rasterize the 3D environment that the drone formation needs to interact into unit grids. For the drone's action discretization, construct a regular hexahedron with the drone as the center point and an edge length of 2 units. Discretize the action space into 26 directions selectable by the drone (including the normal directions of the 6 faces of the hexahedron, the directions of the 8 vertices, and the directions of the midpoints of the 12 edges).

[0070] ③ Determine the strategy for the virtual primary machine to select actions from the action set when interacting with the environment space. This action selection strategy is as follows:

[0071]

[0072] Here, π(s) represents the action selection strategy in state s. First, a small probability ε∈(0,1) is initialized. For a given state s, a random action is selected from the action set A with probability ε, and the action a that can obtain the maximum Q value in the current state is selected with probability (1-ε).

[0073] ④ Update the reward function table Q(S,A) using the Q-learning algorithm to obtain the path points of the virtual primary machine. The reward function table Q(S,A) is updated according to the immediate reward r obtained by transitioning to the next state after taking action a (a∈A) from the current state s (s∈S). The update function is described as follows:

[0074]

[0075] Where α represents the learning rate, used to weigh the impact of the value of the next state on the value of the current state; ζ is the learning discount factor, ζ∈(0,1), the closer it is to 1, the more this iterative formula focuses on the value that can be obtained by the subsequent state; This indicates that in the next moment, action 'a' will be selected from action set A to maximize Q(s',a). The immediate reward 'r' can be described by the following formula:

[0076] r = r action +r environment

[0077] Where, r action With r environment The possible values ​​for are as follows:

[0078]

[0079] Where d represents the step length of each movement of the UAV. In equation (8), Λ obs Λ represents the area of ​​obstacles in the environment.env Represents the entire environmental region, (x goat ,y goat The coordinates of the target point are represented by . The state reward value for the UAV when it moves onto an environmental grid with obstacles is -50, the state reward value when it moves to the target point is 100, and the state reward value is 0 at other grid points. Thus, the complete virtual primary flight path planning method based on the Q-learning algorithm can be summarized by the following pseudocode.

[0080]

[0081]

[0082] ⑤ Based on the obtained virtual long-flight path points, a smooth and differentiable virtual long-flight path is generated using the B-spline curve fitting method. The mathematical expression of the B-spline curve is:

[0083]

[0084] in, The discrete path points are obtained based on the Q-learning algorithm, where m is the number of discrete path points and l represents the degree of the B-spline curve. The B-spline basis functions can be obtained using the de Boor-Cox recursive formula. The de Boor-Cox recursive formula is as follows:

[0085]

[0086] Based on the above equations, substituting the virtual steer path points obtained in ④, we can calculate the virtual steer fitting path P0(t).

[0087] ⑥ Based on the fitted path P0(t) of the virtual lead drone and the desired formation of the drone formation in ①, the following drones are subjected to coordinated control of formation flight by combining the six-degree-of-freedom full-state mathematical model of the drones and the multi-agent consensus theory. Based on the kinematic equations of the wingman and the multi-agent consensus theory, the following virtual control law can be designed for the position layer of the wingman's motion:

[0088]

[0089] in, and As a design constant, the desired trajectory of the i-th wingman is defined as P. i d =P0-Δ i0 , These are the desired formation parameters.

[0090] The position and velocity tracking errors are respectively and i = 1, 2, ..., N, F i =[F i x ,F i y ,0] T Let F be the repulsive force from the obstacle experienced by the i-th wingman during flight. Based on the theory of artificial potential fields, F can be expressed as... i x and F i y The specific design is as follows:

[0091]

[0092] Among them, κ oi >0 represents the design repulsion coefficient. This represents the distance between the i-th wingman and the k-th obstacle. and They represent Components in the x and y directions, This represents the maximum sensing distance of the drone for obstacles.

[0093] The UAV formation trajectory-attitude coupling six-degree-of-freedom full-state distributed cooperative control protocol is as follows:

[0094]

[0095] Among them, κ Ωi >0 represents the design parameters, m represents the mass of the UAV, and g represents the acceleration due to gravity. Θi =Θ i -Θ i d , E Ωi =Ω i -Ω i d , κ Θi >0 represents a design parameter. It is the expected value, and

[0096]

[0097] Meanwhile, the following disturbance observer is designed to estimate the external error:

[0098]

[0099] G i =diag{1 / J x ,1 / J y ,1 / J z}, where φ i ,θi and ψ i Let J represent the roll angle, pitch angle, and yaw angle of the i-th follower UAV, respectively. x J y J z These represent the moments of inertia on the three coordinate axes, respectively.

[0100] This invention utilizes a Q-learning algorithm combined with B-spline curve fitting to perform path planning for a virtual lead drone within a UAV formation. Then, by incorporating a six-degree-of-freedom full-state nonlinear mathematical model of the UAV and multi-agent consensus theory, flight control is applied to the wingmen within the formation, thus completing the UAV formation's path planning. Furthermore, simulation verification, including UAV formation path planning simulations, demonstrates the practicality and effectiveness of this invention.

[0101] Example 2

[0102] This embodiment provides a computer device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the UAV formation path planning method described in Embodiment 1.

[0103] Example 3

[0104] This embodiment provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the UAV formation path planning method described in Embodiment 1.

[0105] Example 4

[0106] An embodiment provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the UAV formation path planning method described in Embodiment 1.

[0107] Example 5

[0108] A computer device, which may be a database, may have an internal structure diagram as shown below. Figure 6As shown, the computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores pending transactions. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When the computer program is executed by the processor, it implements the steps of the UAV formation path planning method described in Embodiment 1.

[0109] It should be noted that the object information (including but not limited to object device information, object personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this invention are all information and data authorized by the object or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0110] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided by this invention can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided by this invention may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided by this invention may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0111] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0112] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A method for unmanned aerial vehicle (UAV) formation path planning, characterized in that, The method includes: Acquire 3D terrain data of the working area of ​​the drone formation; the drone formation includes: virtual lead drone and wingman drone; the 3D terrain data includes: the starting point of each drone in the drone formation and the desired formation of the drone formation; Based on the aforementioned three-dimensional terrain data, a state set for the virtual prime aircraft is established. Discretize the actions of the virtual primary machine to create an action set. And establish a revenue function table based on the state set and action set. ; Determine the strategy for the virtual primary machine to select actions from the action set when interacting with the ambient space; Profit function table based on Q-learning algorithm The update yields the pathpoint of the virtual primary machine; Based on the virtual steer path points, a virtual steer path is generated using the B-spline curve fitting method. Based on the virtual lead aircraft path and expected formation, the following UAVs are subjected to coordinated formation flight control by combining the UAV's six-degree-of-freedom full-state mathematical model and multi-agent consensus theory; wherein, the coordinated formation flight control includes: designing a position-layer virtual control law for the wingman based on multi-agent consensus theory, which includes a formation tracking error term and an artificial potential field repulsion term.

2. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 1, characterized in that, The interactive 3D environment of the drone formation is discretized into a unitized mesh using the raster method. For the discretization of the drone's actions, a regular hexahedron is constructed with the drone as the center point and the edge length as 2 units. The action space is discretized into 26 directions that the drone can choose, including the normal directions of the 6 faces of the regular hexahedron, the directions of the 8 vertices, and the directions of the midpoints of the 12 edges.

3. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 1, characterized in that, Based on the agent's state set With Action Set Established profit function The table, according to its current state The selected action Then it transforms into the next state. Instant rewards obtained Update The update function is described as follows: in, The learning rate is used to weigh the impact of the value of the next state on the value of the current state. It is a learning discount factor. The closer it is to 1, the more this iterative formula considers the value that can be obtained from subsequent states; Indicates the next moment in the action set Select to make Action to obtain the maximum value .

4. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 1, characterized in that, The action selection strategy employed balances the exploration and exploitation aspects of the Q-learning algorithm, enabling the virtual primary machine to explore the environment more comprehensively when searching for a path. This strategy can be specifically described as follows: in, Indicates the state The action selection strategy below; first initialize a small probability For a given state ,by The probability of choosing a random action, in order to The probability of choosing the option that yields the maximum result in the current state. Value action .

5. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 2, characterized in that, Instant rewards It can be described by the following formula: in, and The possible values ​​for are as follows: in, This indicates the step length of each movement of the drone; Indicates the area of ​​obstacles in the environment. Indicates the entire environmental area. The coordinates of the target point are represented by the state reward value obtained by the drone when it moves onto the environmental grid with obstacles. The state reward value obtained when moving to the target point is 100, and the state reward value at other grid points is 0.

6. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 1, characterized in that, According to the profit function table The specific method for exporting the path points of the virtual primary machine can be described as follows: Starting from the virtual primary machine state corresponding to the starting point, select the current state... corresponding The action corresponding to the maximum value in This action is performed to update the state until the target point is reached, and every state of the virtual primary is recorded. All the states that have occurred are exported, which are the path points of the virtual primary.

7. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 1, characterized in that, The obtained path points are fitted with B-spline curves, where the mathematical expression of the B-spline curve is: in, This is the fitted virtual long machine path. These are discrete path points obtained based on the Q-learning algorithm. The number of discrete path points. Indicates the degree of the B-spline curve; For B-spline basis functions: 。 8. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 1, characterized in that, The kinematic equations of UAVs within a formation can be described by the following formula: Among them, the definition They represent the first Based on the position and velocity information of the wingman drone, and according to the kinematic equations of the following drone and the multi-agent consensus theory, the following virtual control law can be designed for the position layer of the follower drone's movement: Furthermore, the virtual control rate at the location layer Substituting the kinematic equations of each wingman, we get: in, and For design constants, the first The desired trajectory of a follower drone is defined as , Here are the desired formation parameters; the position and velocity tracking errors are respectively... and , For the first The repulsive force from obstacles experienced by a follower drone during flight can be analyzed using artificial potential field theory. and The specific design is as follows: in, For the designed repulsion coefficient, Indicates the first Wingman and the The distance between the obstacles and They represent exist and Components in the axial direction, This represents the maximum sensing distance of the drone for obstacles.

9. The method for unmanned aerial vehicle (UAV) formation path planning according to claim 1, characterized in that, The dynamic equations of wingman dynamics within a formation can be described as follows: in, , and They represent the first The roll angle, pitch angle, and yaw angle of the wingman. The UAV formation trajectory-attitude coupling six-degree-of-freedom full-state distributed cooperative control protocol is as follows: in, Indicates the first The total lift of the drone For design parameters, Indicates the quality of the drone. Represents gravitational acceleration; , For design parameters, , The expected value is obtained according to claim 7. Gravitational acceleration and the desired heading angle The desired pitch and roll angles can be obtained as follows: Design the following disturbance observer to estimate the external error: and For the parameters to be designed, for The estimated value, for The estimated value; , ,in, and They represent the first The roll, pitch, and yaw angles of the follower drone. These represent the moments of inertia on the three coordinate axes, respectively. Substituting these equations into the dynamic equations of the wingmen in the formation allows us to calculate the attitude angles and attitude angular rates of each wingman, thereby enabling attitude control of each wingman.

10. A computer device comprising: The memory, the processor, and the computer program stored in the memory and executable on the processor are characterized in that the processor executes the computer program to implement the steps of the UAV formation path planning method according to any one of claims 1-9.

11. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the unmanned aerial vehicle (UAV) formation path planning method according to any one of claims 1-9.

Citation Information

Patent Citations

  • Unmanned aerial vehicle formation path planning algorithm based on three-dimensional global artificial potential function

    CN107219857A

  • Method for navigation following type multi-agent formation path planning and storage medium

    CN113534819A