A multi-unmanned aerial vehicle cooperative jamming method based on deep reinforcement learning

By employing deep reinforcement learning in three-dimensional space, a three-dimensional motion model and multi-objective reward function for UAVs are designed, solving the problem of UAV interference in complex three-dimensional environments using traditional methods, and achieving high efficiency, safety, and robustness in multi-UAV cooperative interference.

CN122247553APending Publication Date: 2026-06-19NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Filing Date
2026-01-27
Publication Date
2026-06-19

Smart Images

  • Figure CN122247553A_ABST
    Figure CN122247553A_ABST
Patent Text Reader

Abstract

This invention discloses a multi-UAV cooperative jamming method based on reinforcement learning, belonging to the field of UAV communication countermeasures and intelligent control technology. This method constructs a three-dimensional UAV jamming environment and utilizes multiple jamming UAVs to collaboratively jam a target UAV. The method includes: establishing a jamming environment model, constructing a state space, action space, and reward function; training a cooperative control strategy for the jamming UAVs based on a deep reinforcement learning algorithm; during mission execution, each jamming UAV selects jamming actions in real time based on the observed target state and its own state, achieving dynamic cooperative jamming. Through this method, the jamming UAVs can autonomously learn and optimize their jamming strategies in a complex three-dimensional environment, improving the continuous jamming effect on the target UAV.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of UAV communication countermeasures and intelligent control technology, and in particular to a multi-UAV cooperative jamming method based on deep reinforcement learning. Background Technology

[0002] In recent years, with the continuous advancement of drone technology, drones have been widely used in environmental protection, surveying and mapping, power facility inspection, digital management of agriculture and forestry, and personal image acquisition, significantly improving the efficiency and accuracy of related tasks. However, the high maneuverability of unauthorized drones poses a significant challenge to traditional air defense systems. Therefore, the development of drone interception and jamming technology is of great importance.

[0003] The main methods of countering drones include hijacking and deceiving their navigation systems, radio frequency jamming, and physically capturing or destroying unauthorized drones using other drones. Among these methods, using drones to emit radio frequency jamming signals to intercept unauthorized drones has advantages such as low cost, strong multi-target jamming capability, and high flexibility. In practice, developing effective jamming strategies in three-dimensional scenarios involving multiple drones combating unauthorized drones has become a key research challenge. Essentially, this is a high-dimensional, dynamic, and nonlinear control optimization problem, and traditional rule-based or model-based methods often cannot handle such complex problems. Reinforcement learning, as an important branch of artificial intelligence, enables agents to learn optimal policies through interaction with the environment and exhibits strong autonomous decision-making capabilities. When combined with deep learning, deep reinforcement learning has shown great potential in solving high-dimensional optimization problems in complex environments and has been widely applied.

[0004] Existing research on cooperative drone jamming primarily considers two-dimensional environments or scenarios where drone altitude remains constant. However, in reality, drone countermeasures typically occur in three-dimensional space. In complex three-dimensional environments, environmental information needs to be analyzed to infer drone trajectories and understand their motion patterns. Therefore, tracking and jamming unauthorized drones in three-dimensional environments presents a significant challenge. Summary of the Invention

[0005] Purpose of the invention: The purpose of this invention is to provide a multi-UAV cooperative interference method based on deep reinforcement learning. By establishing a three-dimensional motion model of the UAV and designing a corresponding reward function, the UAV can autonomously learn effective interference strategies in three-dimensional space. Through learning, it can obtain optimal or suboptimal cooperative action strategies, thereby improving the interference effect on the communication link and achieving effective interference on the communication link of the target UAV.

[0006] Technical Solution: A multi-UAV cooperative jamming method based on deep reinforcement learning, wherein the cooperative jamming scenario includes N jamming UAVs and M randomly distributed target UAVs, the target UAVs moving dynamically in three-dimensional space; a centralized agent is responsible for controlling the joint actions of all jamming UAVs to cooperatively jam the target UAVs; the steps include the following:

[0007] S1, Initialize the target UAV's position and environmental parameters; S2 models the UAV jamming task as a finite Markov decision process, and designs the state representation using a local observation mechanism based on relative position; the global state is input into the agent policy network, and the jamming UAV completes the selection and execution of jamming actions; S3, based on the cooperative interference algorithm, uses a multi-objective reward function to calculate rewards and realize environmental updates of UAV status; S4 stores observations, actions, rewards, and new states in the experience replay buffer; S5, a centralized training platform, samples data from the experience pool to update the agent's policy network; S6. Repeat steps S3-S5 until convergence, and obtain the optimal multi-UAV cooperative interference decision model.

[0008] Furthermore, the target drone rotates horizontally along a circular path at a fixed speed, while slowly ascending at each time step, forming a spiral trajectory, with its altitude limited to [missing information]. Within the range, This indicates the lower limit of the height restriction. This represents the upper limit of the altitude; the state of each target drone is represented by a 6-dimensional vector, denoted as . Where (x,y,z) represents the current position coordinates, v represents the horizontal circular flight speed, ψ represents the current heading angle, and Δψ represents the heading angle increment at each step; during environment initialization, the j-th target UAV is randomly assigned a spiral trajectory center. The target UAV takes off along a circular trajectory with radius r centered at the circle's center; the initial position and heading angle of the j-th target UAV are expressed as follows: , in, , a random angle ranging from 0 to 2π, represents the initial polar angle, which determines the initial position of the target UAV on the circumference; This represents the initial position of the j-th target UAV. Let represent the initial heading angle of the j-th target UAV; At time step t, the j-th target UAV updates its position and heading angle using the following formula: , in, For time step, Indicates the change in heading angle; The change in altitude of the target UAV within each time step is denoted as . , where T is the time step for completing one round of updates.

[0009] Furthermore, the finite Markov decision process is defined as M=(S,A,P,r,γ), where the environment adopts partially observable states. Here, S represents the system's state space, used to characterize the overall state information of the UAV interference scenario; A represents the interference decision action space, used to describe the interference control behaviors that the UAV can take; P represents the state transition probability function, used to describe the probability relationship of the system state evolving to the next state after performing a certain interference action in the current state; r represents the reward function, used to quantify the immediate interference effect of the interference action on the UAV's communication performance; and γ is a discount factor, used to balance the weight relationship between immediate interference gains and long-term interference effects. In a multi-agent drone jamming mission, at each time t, the state is the relative position of the first jamming drone with respect to all target drones and other jamming drones; then the three-dimensional position of the i-th jamming drone at time t is... for: , Similarly, the three-dimensional position of the j-th target UAV for: , Therefore, the global state vector The definition is as follows: , , , in, Let represent the relative position vector between the first jamming drone and the target drone j; This indicates the relative position of jamming drone i to other jamming drone k.

[0010] Furthermore, in step S2, for the i-th jamming drone, at time... Execute action Subsequently, its location was updated as follows: , , in, , indicating that the i-th jamming drone is at time i. The three-dimensional coordinate position; Indicates the time of the i-th jamming drone. The three-dimensional coordinate position; For N jamming drones, their combined actions are as follows: This joint action represents the direction of movement chosen by N jamming drones at the current time step.

[0011] Furthermore, in step S3, the multi-objective reward function includes: interference effect reward, tracking reward, and collision avoidance penalty; The expression for the multi-objective reward function R is as follows: , Interference effect reward This is used to measure the intensity of interference from friendly drones to enemy communications; the interference power is estimated based on the free-space path loss model, as shown in the following expression: , , in, Let f represent the distance between the i-th interfering drone and the j-th target drone, f represent the carrier frequency, and c represent the speed of light. This indicates the transmission power of the interfering UAV i at time step t; Tracking rewards To encourage jamming drones to get as close as possible to the target drone to improve jamming efficiency, the expression is as follows: , in, This is the scaling factor; Collision avoidance penalty To prevent interference drones from colliding or getting too close, a fixed penalty is applied if the distance between any two interference drones is less than a set minimum safe distance d. The expression is as follows: , in, This represents the distance between the i-th interfering drone and the h-th target drone; In the cooperative interference algorithm, the reward at each time step is defined as the differential reward, denoted as . The formula is as follows: , in, The total reward for the current time step. The difference in total reward for the previous time step. It is a weighted system.

[0012] Compared with the prior art, the significant advantages of this invention are as follows: 1. This invention designs a three-dimensional motion model and reward function for UAVs, enabling the intelligent agent to effectively track and interfere with targets while simultaneously improving the interference power coverage and effectiveness through the coordinated deployment and autonomous learning of multiple interference UAVs in three-dimensional space. 2. This invention proposes a cooperative interference algorithm based on centralized reinforcement learning, which is used to learn the joint interference strategy of one or more friendly UAVs working together in three-dimensional space. 3. By utilizing the policy network trained by this invention, the jamming drone can make rapid decisions locally, making it suitable for practical deployment; at the same time, the framework can be extended to more drone collaborative scenarios. 4. The safety constraints introduced in the reward function of this invention ensure a safe distance between interfering UAVs and enhance the robustness of the strategy to environmental uncertainties; it achieves efficient, intelligent and robust interference of the communication link of the target UAV in a complex three-dimensional airspace environment, and has broad application prospects and practical value. Attached Figure Description

[0013] Figure 1 This is a flowchart of the cooperative interference method of the present invention; Figure 2 This is a schematic diagram of the UAV cooperative interference model of the present invention; where Jamming represents the interference UAV and the interference against the target UAV, Interference represents the mutual interference between the interference UAVs, J-UAV represents the interference UAV, and T-UAV represents the target UAV; Figure 3 Visualizations of the interference process under different experimental configurations, where (a) is a 1v1 scenario, (b) is a 2v2 scenario, and (c) is a 3v3 scenario; Figure 4 This is a comparison chart of the interference rewards of the present invention and existing algorithms, that is, the instantaneous rewards obtained by each algorithm at each step. Among them, RL-JAM is the multi-UAV cooperative interference method of the present invention, Q-learning is the tabular Q-learning algorithm, PSO is the particle swarm optimization algorithm, and Random is the random strategy. Figure 5 This is a comparison chart of the cumulative interference reward of the present invention and existing algorithms, where the cumulative interference reward is defined as the sum of the interference rewards up to the current time step. Detailed Implementation

[0014] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.

[0015] The following detailed description of the operation process of the UAV cooperative jamming system, using a typical embodiment of the present invention, provides a clearer understanding of the technical principles and effects of the invention. This embodiment uses the example of multiple jamming UAVs collaboratively interfering with a target UAV in air-to-air jamming. The system workflow includes multiple stages such as initialization, information acquisition, decision execution, jamming effect evaluation, and cooperative control adjustment, and operates dynamically in three-dimensional space.

[0016] The UAV cooperative jamming system of the present invention mainly consists of jamming UAVs and target UAVs.

[0017] Each jamming drone includes an observation module, a decision-making module, and a communication module. The functions of each module are as follows: The observation module is used to sense environmental information, including the position and movement trends of the target UAV, as well as the positions of other friendly aircraft. This module can be implemented in conjunction with GPS, inertial navigation systems, wireless ranging modules, or visual sensors.

[0018] The decision-making module, used for autonomous decision-making, embeds a reinforcement learning policy network that can quickly generate action decisions based on the current observed input after training. This module is supported by an embedded GPU or a high-performance processor to meet real-time requirements.

[0019] The communication module is used for collaborative communication, sharing some information with other jamming drones, and interacting with the training and control platform to support centralized training and strategy updates.

[0020] The target drone is used to perform communication or mission transmission, and the signals it transmits are used as jamming targets. The target drone can adopt different motion models such as straight flight, spiral ascent, and evasive maneuvers for training and testing. Figure 2 This diagram illustrates a cooperative interference scenario involving unmanned aerial vehicles (UAVs). A reinforcement learning-based cooperative interference scenario is considered, involving N interfering UAVs and M randomly distributed target UAVs in a 3D environment. The target UAVs move dynamically in 3D space. A centralized agent controls the joint actions of all interfering UAVs to cooperatively interfere with the target UAVs in the environment. This cooperative interference problem can be described as a Markov decision process, where the objective is to learn a policy that produces joint actions that maximize reward even when the target is in different positions. This embodiment includes multiple interfering UAVs, target UAVs, a training and control platform, and a communication module. Each interfering UAV is equipped with an observation and computing unit, which can acquire relative position information in real time and execute actions according to the learned interference policy. The training and control platform is used for centralized training and policy updates.

[0021] like Figure 1 The diagram shows a flowchart of the multi-UAV cooperative jamming method of the present invention. The specific implementation steps are as follows: Step 1: Initialize the target UAV's position and environmental parameters; In this embodiment, the motion model of the target UAV in the environment simulates uniform circular upward motion. Specifically, the target UAV rotates horizontally along a circular path at a fixed speed while slowly rising at each time step, forming a spiral trajectory, with the altitude limited to a certain range. Within the range, This indicates the lower limit of the height restriction. This represents the upper limit of the altitude. The state of each target drone is represented by a 6-dimensional vector, denoted as . Where (x, y, z) represents the current position coordinates, v represents the horizontal circular flight speed, ψ represents the current heading angle, i.e., the angle with the x-axis, and Δψ represents the heading angle increment per step, i.e., the circular trajectory rotation step size. During environment initialization, the j-th target UAV is randomly assigned a spiral trajectory center. It takes off along a circular trajectory with radius r centered at the given center. The initial position and heading angle of the j-th target UAV are expressed as follows: (1) in, , a random angle ranging from 0 to 2π, represents the initial polar angle, which determines the initial position of the target UAV on the circumference; This represents the initial position of the j-th target UAV. Let represent the initial heading angle of the j-th target UAV.

[0022] At time step t, the j-th target UAV updates its position and heading angle according to formula (2): (2) in, For time step, Indicates the change in heading angle; The change in altitude of the target UAV within each time step can be denoted as: , where T is the time step for completing one round of updates.

[0023] The multi-UAV cooperative interference method of the present invention does not depend on the target motion model of formula (2). It is compatible with general 3D nonlinear trajectory and has applicability to real multi-UAV combat scenarios.

[0024] The action space of the jamming drone consists of 11 discrete actions, each of which is a direction vector in the 3D coordinate system corresponding to a specific direction of motion, as detailed in Table 1.

[0025] Table 1. Direction vectors of each discrete action and its direction of motion.

[0026] Table 1 covers 10 typical motion directions in three-dimensional space (front, back, left, right, up, down, upper left, lower left, upper right, lower right) as well as the stationary hovering state.

[0027] Step 2: Model the UAV jamming task as a finite Markov decision process, and design the state representation using a local observation mechanism based on relative position; input the global state into the agent policy network, and the jamming UAV selects and executes the jamming action; In this invention, the agent policy network adopts a Dueling Deep Q-Network (DQN) based on the Dueling architecture. This network consists of a shared feature extraction layer, a state value evaluation branch, and an action advantage evaluation branch, which is used to model the value relationship between the disturbance state and the joint disturbance action.

[0028] For the i-th jamming drone, at time... Execute action Subsequently, its location was updated as follows: (3) (4) in, , indicating that the i-th jamming drone is at time i. The three-dimensional coordinate position; Indicates the time of the i-th jamming drone. The three-dimensional coordinate position.

[0029] For N jamming drones, their coordinated actions for: (5) This joint action represents the direction of movement chosen by N jamming drones at the current time step. The execution of the joint action determines the transition of the overall system state and affects the calculation of the global reward function.

[0030] In wireless communication, signals attenuate due to path loss during propagation. For signal propagation between drones in open space, the free space path loss (FSPL) model is typically used to estimate the degree of signal attenuation. When a jamming drone transmits jamming signals to a target drone, the jamming effect is closely related to the distance; the closer the distance, the stronger the jamming. At time step t, the transmit power of jamming drone i is... The jamming power received by target UAV j from jamming UAV i Follows the free space path loss model: (6) in, Let f represent the distance between the i-th interfering drone and the j-th target drone, f represent the carrier frequency, and c represent the speed of light.

[0031] Then, the jamming power received by the target drone The calculation expression is: (7) The UAV jamming task in this embodiment can be modeled as a finite Markov decision process, defined as a quintuple: M=(S,A,P,r,γ), where the environment adopts partially observable states. Here, S represents the system's state space, used to characterize the overall state information of the UAV jamming scenario; A represents the jamming decision action space, used to describe the jamming control behaviors that the jamming UAV can take; P represents the state transition probability function, used to describe the probability relationship of the system state evolving to the next state after performing a certain jamming action in the current state; r represents the reward function, used to quantify the immediate jamming effect of the jamming action on the UAV's communication performance; and γ is a discount factor, used to balance the weight relationship between the immediate jamming gain and the long-term jamming effect. In multi-agent UAV jamming tasks, the construction of the jamming UAV's observed state has a significant impact on the training performance and generalization ability of the agents. This invention adopts a local observation mechanism based on relative position to design the state representation, aiming to enhance the model's scalability and environmental adaptability. At each time point (i.e., time t), the state is the relative position of the first jamming UAV relative to all target UAVs and other jamming UAVs. Suppose there are N jamming drones and M target drones in the environment. The three-dimensional position of the i-th jamming drone at time t is denoted as: (8) Similarly, the three-dimensional position of the j-th target UAV at time t is: (9) Therefore, the global state vector is defined as follows: (10) (11) (12) in, Let represent the relative position vector between the first jamming drone and the target drone j; This indicates the relative position of jamming drone i to other jamming drone k.

[0032] Step 3: Based on the cooperative interference algorithm, a multi-objective reward function is used to calculate the reward, thereby updating the UAV status in the environment; In a drone jamming system, the design of the reward function directly determines the optimization direction of the learning strategy. To enable jamming drones to learn effective cooperative behavior, maximally disrupt the communication links of the target drone, and simultaneously prevent collisions or clustering between jamming drones, this invention designs a weighted multi-objective reward function. This reward function mainly consists of three parts: a jamming effect reward, a guidance reward for approaching the enemy (i.e., a tracking reward), and a collision avoidance penalty between jamming drones.

[0033] The overall reward function is as follows: (13) Interference effect reward This item measures the intensity of interference that friendly drones can cause to enemy communications.

[0034] The interference power is estimated based on the free-space path loss model, and a reward for interference effectiveness is defined. as follows: (14) Tracking rewards To encourage jamming drones to get as close as possible to the target drone to improve jamming efficiency, the following settings are configured: (15) in, This is a scaling factor used to adjust the impact of this item on the total reward.

[0035] Collision avoidance penalty To prevent interference drones from colliding or getting too close, this measure determines whether the distance between any two interference drones is less than a set minimum safe distance d. If this condition is met, a fixed penalty is applied. The definition is as follows: (16) in, This represents the distance between the i-th interfering drone and the h-th target drone.

[0036] In the cooperative interference algorithm, the reward at each time step is defined as the differential reward, denoted as . Its calculation method is the total reward at the current time step. Total reward compared to the previous time step The difference, multiplied by a weighting factor The formula is shown below: (17) In the proposed three-dimensional multi-UAV jamming environment, a centralized DQN network is used to train multiple cooperative jamming UAVs to maximize the jamming efficiency against enemy communication UAVs. Under this DQN network architecture, the global state vector is input into a centralized Q-network. This network outputs the Q-values ​​corresponding to all possible joint actions based on the global state, thereby selecting the optimal action for execution. Within this DQN network framework, the actions of all jamming UAVs at each time t are jointly determined by a centralized policy network, i.e., decision modeling is performed using the following joint action space:

[0037] in, Let be the action performed by the i-th jamming drone at time t, where i = 1, 2, ..., N.

[0038] Step 4: Store the observations, actions, rewards, and new states in the experience replay buffer.

[0039] Step 5: Use a centralized training platform to sample data from the experience pool and update the agent policy network.

[0040] Step 6: Repeat steps 3-5 above until convergence, and obtain the optimal multi-UAV cooperative interference decision model.

[0041] A computer-readable storage medium stores a computer program thereon, which, when executed by a processor, implements the circuit-level simulation method of the present invention. The processor contains a kernel that retrieves corresponding program units from memory.

[0042] An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the computer program is executed, it implements the steps of the multi-UAV cooperative jamming method of the present invention.

[0043] The memory may include non-permanent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM, and the memory includes at least one memory chip.

[0044] A computer program product includes a computer program / instructions that, when executed by a processor, implement the steps of the multi-UAV cooperative jamming method of the present invention.

[0045] In a typical embodiment of the present invention, when the UAV cooperative jamming system performs a mission in the three-dimensional airspace, its operation process includes continuous steps such as environmental initialization, state perception, cooperative decision-making, maneuver execution, jamming assessment, and adaptive strategy adjustment. Figure 3(a), (b), and (c) in the figure are visualizations of the interference process under different experimental configurations. Taking the air-to-air signal interference of multiple jamming UAVs against a target UAV as an example, at the start of the mission, the system first initializes the airspace model, the initial position of the UAVs, the dynamic parameters, and the communication channel model. It sets the speed, heading, position, and other states of each UAV and loads the reinforcement learning policy network and its parameters, and then enters the operation phase. At each discrete time step, each jamming UAV obtains in real time the three-dimensional relative position information, velocity vector, and environmental information such as the distribution of potential obstacles or no-fly zones between itself and the target UAV and other jamming UAVs through sensors and airborne communication modules. The system normalizes this information and constructs a local observation vector, which is then input to the policy network. The policy network independently generates the action command for the next moment under the centralized training and distributed execution architecture, including forward, backward, left, right, up, down, and combinations of these directions, or hovering actions. Each UAV executes the action according to the dynamic constraints under the control of the flight control module. Meanwhile, the system calculates the received power of the interference signal at the target based on the real-time spatial location of the UAV, thereby evaluating the current interference effect and providing feedback for the reward signal in reinforcement learning based on factors such as interference effect, distance constraints, and team coordination stability. In this embodiment, the interference reward and cumulative interference reward of the present invention are compared with those of existing algorithms, such as... Figure 4 and Figure 5 As shown, the results indicate that the method of the present invention can significantly improve interference power coverage and effectiveness.

[0046] The above are merely preferred embodiments of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should be considered within the scope of protection of the present invention.

Claims

1. A multi-UAV cooperative jamming method based on deep reinforcement learning, wherein the cooperative jamming scenario includes N jamming UAVs and M randomly distributed target UAVs, the target UAVs moving dynamically in three-dimensional space; a centralized agent is responsible for controlling the joint actions of all jamming UAVs to cooperatively jam the target UAVs; characterized in that, The steps include the following: S1, Initialize the target UAV's position and environmental parameters; S2 models the UAV jamming task as a finite Markov decision process, and designs the state representation using a local observation mechanism based on relative position; the global state is input into the agent policy network, and the jamming UAV completes the selection and execution of jamming actions; S3, based on the cooperative interference algorithm, uses a multi-objective reward function to calculate rewards and realize environmental updates of UAV status; S4 stores observations, actions, rewards, and new states in the experience replay buffer; S5, a centralized training platform, samples data from the experience pool to update the agent's policy network; S6. Repeat steps S3-S5 until convergence, and obtain the optimal multi-UAV cooperative interference decision model.

2. The multi-UAV cooperative interference method based on deep reinforcement learning according to claim 1, characterized in that, The target drone rotates horizontally along a circular path at a fixed speed, while slowly ascending at each time step, forming a spiral trajectory, with its altitude limited to [missing information]. Within the range, Indicates the lower limit of the height limit. Indicates the upper limit of height; The state of each target drone is represented by a 6-dimensional vector, denoted as . Where (x,y,z) represents the current position coordinates, v represents the horizontal circular flight speed, ψ represents the current heading angle, and Δψ represents the heading angle increment at each step; during environment initialization, the j-th target UAV is randomly assigned a spiral trajectory center. The target UAV takes off along a circular trajectory with radius r centered at the circle's center; the initial position and heading angle of the j-th target UAV are expressed as follows: in, , a random angle ranging from 0 to 2π, represents the initial polar angle, which determines the initial position of the target UAV on the circumference; This represents the initial position of the j-th target UAV. Let represent the initial heading angle of the j-th target UAV; At time step t, the j-th target UAV updates its position and heading angle using the following formula: in, For time step, Indicates the change in heading angle; The change in altitude of the target UAV within each time step is denoted as . , where T is the time step for completing one round of updates.

3. The multi-UAV cooperative interference method based on deep reinforcement learning according to claim 1, characterized in that, The finite Markov decision process is defined as M=(S,A,P,r,γ), where the environment uses partially observable states. Here, S represents the system's state space, characterizing the overall state information of the UAV interference scenario; A represents the interference decision action space, describing the interference control behaviors that the UAV can take; P represents the state transition probability function, describing the probability relationship of the system state evolving to the next state after performing a certain interference action in the current state; r represents the reward function, quantifying the immediate interference effect of the interference action on the UAV's communication performance; and γ is a discount factor, used to balance the weight relationship between immediate interference gains and long-term interference effects. In a multi-agent drone jamming mission, at each time t, the state is the relative position of the first jamming drone with respect to all target drones and other jamming drones; then the three-dimensional position of the i-th jamming drone at time t is... for: Similarly, the three-dimensional position of the j-th target UAV for: Therefore, the global state vector The definition is as follows: , , , in, Let represent the relative position vector between the first jamming drone and the target drone j; This indicates the relative position of jamming drone i to other jamming drone k.

4. The multi-UAV cooperative interference method based on deep reinforcement learning according to claim 1, characterized in that, In step S2, for the i-th jamming drone, at time... Execute action Subsequently, its location was updated as follows: in, , indicating that the i-th jamming drone is at time i. The three-dimensional coordinate position; Indicates the time of the i-th jamming drone. The three-dimensional coordinate position; For N jamming drones, their combined actions are as follows: This joint action represents the direction of movement chosen by N jamming drones at the current time step.

5. The multi-UAV cooperative interference method based on deep reinforcement learning according to claim 1, characterized in that, In step S3, the multi-objective reward function includes: interference effect reward, tracking reward, and collision avoidance penalty; The expression for the multi-objective reward function R is as follows: Interference effect reward This is used to measure the intensity of interference from friendly drones to enemy communications; the interference power is estimated based on the free-space path loss model, as shown in the following expression: in, Let f represent the distance between the i-th interfering drone and the j-th target drone, f represent the carrier frequency, and c represent the speed of light. This indicates the transmission power of the interfering UAV i at time step t; Tracking rewards To encourage jamming drones to get as close as possible to the target drone to improve jamming efficiency, the expression is as follows: in, This is the scaling factor; Collision avoidance penalty To prevent interference drones from colliding or getting too close, a fixed penalty is applied if the distance between any two interference drones is less than a set minimum safe distance d. The expression is as follows: in, This represents the distance between the i-th interfering drone and the h-th target drone; In the cooperative interference algorithm, the reward at each time step is defined as the differential reward, denoted as . The formula is as follows: in, The total reward for the current time step. The difference in total reward for the previous time step. It is a weighted system.

6. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1 to 5.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the computer program is executed, it implements the steps of the method as described in any one of claims 1 to 5.

8. A computer program product comprising a computer program / instructions, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method according to any one of claims 1 to 5.