Fault-Tolerant Flight Control Method for Multirotor UAVs Based on End-to-End Direct Thrust Control

By constructing a differentiable simulation environment and a deep learning framework through an end-to-end direct thrust control method, the problem of smooth transition and stable control of multi-rotor UAVs under single motor failure was solved. Fault-tolerant control and efficient training without detection delay were achieved, and the extreme survivability under single motor failure was improved.

CN122018327BActive Publication Date: 2026-06-30NANJING UNIV OF INFORMATION SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING UNIV OF INFORMATION SCI & TECH
Filing Date
2026-04-10
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing fault-tolerant control technologies for multi-rotor UAVs suffer from limitations in control architecture, low training efficiency, and insufficient extreme survivability under single motor failure, especially in achieving smooth transition and stable control under single motor failure.

Method used

An end-to-end direct thrust control method is adopted. By constructing a differentiable simulation environment and a deep learning framework, and utilizing tensor computation and gradient backpropagation, an end-to-end control network is built to achieve adaptive control without the need for an independent fault detection and diagnosis module. By combining adaptive loss function and multi-order course learning, the control network parameters are optimized to achieve a smooth transition from normal operation to single motor failure.

Benefits of technology

It achieves fault-tolerant control with no detection delay, improves training efficiency, has extreme survivability under single motor failure, the control commands conform to physical laws, avoids control quantity jitter, and solves the problem of drone crashes under single motor failure.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122018327B_ABST
    Figure CN122018327B_ABST
Patent Text Reader

Abstract

This invention discloses a fault-tolerant flight control method for a multi-rotor UAV based on end-to-end direct thrust control. The method includes: constructing a purely analytical differentiable simulation environment for a quadrotor UAV oriented towards implicit feature observation and gradient backpropagation, embedding a continuously differentiable fault injection and actuator mapping mechanism, and using a fourth-order Runge-Kutta method to discretize and iterate continuous operators; constructing an end-to-end control network based on spatiotemporal feature structure decoupling and latent variable direct drive, and outputting four-motor action commands through dual-stream feature extraction; training the end-to-end control network in the differentiable simulation environment, and optimizing the control network parameters based on a differentiable physics engine and time-varying backpropagation; and combining an adaptive composite loss function and a multi-order course learning mechanism to enable the control network to converge to an effective fault-tolerant control strategy. This invention can achieve adaptive control without an independent fault detection and diagnosis module under single-motor fault conditions, establishing a smooth transition mechanism from normal operation to single-motor failure.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of aircraft flight control technology, and in particular to a fault-tolerant flight control method for multi-rotor unmanned aerial vehicles based on end-to-end direct thrust control. Background Technology

[0002] With the widespread application of multi-rotor drones in logistics, inspection, and security, their flight safety is receiving increasing attention. Because quadcopter drones are underactuated, strongly coupled, and nonlinear systems lacking physical redundancy, they are highly susceptible to crashes if actuator malfunctions occur (such as motor failure or blade damage).

[0003] Current fault-tolerant control technologies mainly suffer from the following problems:

[0004] 1) Limitations of Control Architecture: Existing fault-tolerant control is mainly divided into passive fault tolerance (PFTC) and active fault tolerance (AFTC). Passive fault tolerance typically uses a single robust controller to handle all situations, making it difficult to balance high performance during normal flight and stability under fault conditions, often at the expense of maneuverability during normal flight. Active fault tolerance relies on explicit fault detection and diagnosis (FDD) modules. FDD modules are not only complex to design, but also have significant detection delays or risks of false positives (false negatives). In the critical few milliseconds after a fault occurs, the delay of FDD often leads to attitude divergence, thus losing the opportunity for recovery.

[0005] 2) Training Efficiency and Physical Consistency Issues: In recent years, end-to-end control based on reinforcement learning (RL) has shown great potential. However, mainstream "model-free" algorithms (such as PPO) employ a random sampling trial-and-error mechanism, requiring hundreds of millions of interactions to converge, resulting in extremely low training efficiency. Furthermore, due to the lack of physical constraints, the trained policies often suffer from large control variable jitter and fail to meet the requirements of dynamic smoothness.

[0006] 3) Insufficient extreme survivability under single-motor failure: Complete failure of a single motor is one of the most challenging failure modes for quadcopter UAVs. In this condition, the system loses a thrust source and becomes severely underactuated. Traditional methods based on geometric control or PID control fail completely because they cannot solve the pseudo-inverse of the control allocation matrix. Existing methods often maintain total lift by allowing the UAV to enter a high-speed spin state, but how to smoothly trigger this strategy switch and how to stabilize altitude control in the spin state remain challenges. Current technology lacks an end-to-end learning mechanism that can automatically sacrifice yaw tracking accuracy to maintain altitude in the event of complete single-motor failure.

[0007] Therefore, there is an urgent need for an end-to-end fault-tolerant control method that can perform high-speed training, adapt strategies, and maintain high survivability under extreme fault conditions, while also being able to handle single-motor faults. Summary of the Invention

[0008] The problem to be solved by this invention is to provide a fault-tolerant flight control method for multi-rotor UAVs based on end-to-end direct thrust control, which can realize adaptive control without independent fault detection and diagnosis modules under single motor failure, solve the problem of attitude loss of control under complete single motor failure, improve the training efficiency and physical consistency of end-to-end control strategy, and establish a smooth transition mechanism from normal to single motor failure.

[0009] This invention adopts the following technical solution: a fault-tolerant flight control method for multi-rotor unmanned aerial vehicles based on end-to-end direct thrust control, comprising the following steps:

[0010] Step 1: Using the tensor computation framework, we build a purely analytical differentiable simulation environment for quadrotor UAVs that is oriented towards implicit feature observation and gradient backpropagation from the bottom layer. It embeds a continuously differentiable fault injection and actuator mapping mechanism, and uses the fourth-order Runge-Kutta method to discretize and iterate the continuous operators, transforming them into differentiable discrete state transition operators under the deep learning framework.

[0011] Step 2: Construct an end-to-end control network based on spatiotemporal feature structure decoupling and latent variable direct drive. With the observed state tensor as input, extract dual-stream features through kinematic intention perception branch and physical time series diagnosis perception branch. Generate latent states based on gated loop units and output four motor action commands.

[0012] Step 3: Train the end-to-end control network constructed in Step 2 in the differentiable simulation environment constructed in Step 1. Based on the differentiable physics engine and backpropagation over time, optimize the control network parameters. Combine the adaptive composite loss function and multi-order course learning mechanism to perform direct gradient analytical transmission from physical constraints to network weights, so that the control network converges to an effective fault-tolerant control strategy and realizes fault-tolerant flight control of multi-rotor UAVs.

[0013] Specifically, in step 1, a purely analytical differentiable simulation environment for implicit feature observation and gradient backpropagation is constructed: a fully differentiable purely analytical dynamic environment for a quadrotor UAV is established, embedding a continuously differentiable fault injection and actuator mapping mechanism. This includes:

[0014] Define the network observation state tensor This includes: positional deviation in the body coordinate system. linear velocity of the machine body Angular velocity of the body Tiling column vectors of the attitude rotation matrix Previous control command Actual motor speed observation value ;in, The orthogonal rotation matrix represents the attitude relationship between the current body coordinate system and the inertial coordinate system;

[0015] Define the network output four-dimensional action tensor , representing the target PWM duty cycle of the four motors of the multi-rotor drone.

[0016] Furthermore, the action tensor output by the network... By inverse normalization mapping to the physical target rotational speed vector :

[0017] ;

[0018] in, This is the Hadamard product (element-by-element multiplication). , This refers to the calibrated minimum / maximum steady-state speed of the motor.

[0019] Define the fault attenuation matrix , For the first The health coefficients of each motor, i=1,2,3,4; this matrix is ​​randomly initialized during training and is not used as network input, but only used inside the physics engine to affect the next dynamic state;

[0020] Fault injection uses a speed-level model, and the actual motor speed is... :

[0021] ;

[0022] Then calculate the actual thrust. :

[0023] ;

[0024] in, This is the thrust coefficient.

[0025] Furthermore, a continuous analytical dynamic operator is established. The operator sequentially performs inverse normalization, fault injection, thrust calculation, force and torque allocation, and the system state differential equations, outputting the system state derivative. :

[0026] ;

[0027] in, This represents the current system state. Indicates the actual location at the current moment;

[0028] Using the fourth-order Runge-Kutta method (RK4) to Discretization into differentiable state transition operators :

[0029] ;

[0030] in, The system state at the next moment; and the differentiable state transition operator The automatic differentiation computation graph, mounted on the deep learning framework, ensures that subsequent BPTT gradients can propagate backward along the time axis.

[0031] Specifically, in step 2, an end-to-end control network based on spatiotemporal feature structure decoupling and latent variable direct drive is constructed: a single neural network is designed. , in the state of observation As input, directly output four-motor commands. The network structure includes:

[0032] Step 2.1, Dual-stream feature extraction:

[0033] Receive positional deviation by sensing branches through kinematic intent. linear velocity of the machine body Tiling column vectors of the attitude rotation matrix Motion features are extracted through a two-layer fully connected network. ;

[0034] By using the physical timing diagnostic sensing branch, control commands from the previous moment are received. Current motor speed observation value Angular velocity of the body First, calculate the rotational speed residual. :

[0035] ;

[0036] in, The motor steady-state gain matrix is ​​pre-calibrated. Indicates the first Command-speed linear gain for each motor, i=1,2,3,4; This represents the operation of converting a vector into a diagonal matrix. The minimum steady-state speed vector;

[0037] Then, the calculated speed residuals The data is concatenated with the original input and then processed through a single fully connected network to extract diagnostic features. .

[0038] Step 2.2, Hidden State Generation:

[0039] Diagnostic features Input gated recurrent unit (GRU), through time steps Execute the state update equation to update the hidden state. This layer does not directly decode the fault attenuation matrix, but instead internalizes its impact on UAV dynamics. Distribution evolution in high-dimensional space:

[0040] ;

[0041] in, For time steps The hidden state, The function represents a gated loop unit used to internalize the influence of the fault decay matrix on UAV dynamics into a hidden state. Distribution evolution in higher-dimensional space.

[0042] Step 2.3, Strategy Execution:

[0043] Motion characteristics With hidden state Input to the fusion layer and concatenate them into joint features. The input is fed into a three-layer fully connected network; dynamic redistribution is performed from the input space to the control output space. The last layer is activated by a Sigmoid function to output a normalized instruction, which is the action tensor. .

[0044] Specifically, in step 3, the control network parameters are optimized based on the differentiable physics engine and backpropagation over time (BPTT). In the differentiable simulation environment of step 1, the network is trained in the following way:

[0045] Step 3.1, Differentiable closed-loop sampling:

[0046] For each training round, a fault matrix is ​​sampled based on the fault types and probabilities set at the current learning stage of the course. The corresponding fault modes are expanded forward along the time step, preserving the complete computational graph and forming a trajectory set. ;in, This represents the total number of time steps.

[0047] Step 3.2: Construct the adaptive loss function:

[0048] Total loss Defined as:

[0049] ;

[0050] in, The time discount rate, These are the weighting coefficients for position, velocity, and control smoothing terms, respectively. As a penalty for location tracking, For speed tracking penalties, For the physical-induced attitude compromise loss, For instruction smoothing and action penalty;

[0051] Step 3.3, Multi-level Course Learning:

[0052] A three-stage progressive training method was adopted. The first stage was fault-free, and the health coefficient of all motors was [not specified]. The second stage introduces soft faults, randomly selecting one motor according to a preset probability. A soft fault occurred, and the motor... Health coefficient In the first stage, the probability of a hard fault is 0; in the second stage, a hard fault is introduced, and a faulty motor is randomly selected according to a preset probability. Set the motor with the highest probability. Health coefficient Set the motor with the second probability. Health coefficient , It follows a uniform distribution within the interval (0.3, 0.8).

[0053] The multi-stage learning process gradually increases the difficulty of faults, enabling the control network to converge stably to an effective fault-tolerant strategy.

[0054] Step 3.4, Gradient Calculation and Parameter Update:

[0055] The gradient of the loss with respect to the network parameters is calculated using the automatic differentiation mechanism of the deep learning framework. The AdamW optimizer is used, supplemented by gradient clipping, to update the parameters and prevent gradient explosion.

[0056] Compared with the prior art, the present invention, employing the above technical solution, has the following technical effects:

[0057] 1. Fault Response Without Independent Fault Detection and Diagnosis Modules: This invention eliminates the need for independent fault detection and diagnosis modules. It uses the residuals from historical commands and real-time speed observations to drive the GRU to generate implicit efficiency states, enabling the network to complete fault perception and policy adjustment within one forward inference cycle (typically within 2.5ms). Traditional active fault-tolerant methods rely on independent fault detection and diagnosis modules, whose detection and decision-making links typically require a delay of tens of milliseconds. This method, through a diagnostic-free end-to-end architecture, avoids this latency problem and achieves fault-tolerant control without detection delay.

[0058] 2. Extreme survivability under single-motor failure: Through the adaptive yaw weight mechanism in the loss function, the control network automatically reduces the penalty for yaw error when altitude drops, prioritizing the use of the thrust of the remaining three motors to maintain altitude. In the case of complete single-motor failure, the method of this invention enables the UAV to enter a spin state, solving the problem of easy crashes after single-motor failure in traditional methods.

[0059] 3. Significantly improved training efficiency: By utilizing differentiable physical simulation and the BPTT algorithm, the physical Jacobian matrix can be directly used to guide parameter updates during network training, eliminating the need for millions of random sampling steps required for model-free reinforcement learning. This solves the problems of difficult training and slow convergence in traditional end-to-end control.

[0060] 4. Control smoothness and physical consistency: Since the gradient directly originates from the physical dynamics equations, the control commands output by the network naturally conform to the laws of rigid body motion, and the command smoothing term... It further suppresses drastic changes in the control quantity and avoids the high-frequency chattering problem common in model-free methods.

[0061] 5. Progressive Learning and Stable Convergence: The three-stage learning strategy of this invention effectively solves the problem of training instability under extreme fault conditions. The first stage establishes basic dynamic perception, the second stage learns thrust compensation through soft faults, and the third stage challenges the limits of hard faults, ensuring that the network can still stably converge to an effective fault-tolerant strategy under extreme conditions of complete failure of a single motor. Attached Figure Description

[0062] Figure 1 This is a flowchart illustrating the overall process of the fault-tolerant flight control method for multi-rotor unmanned aerial vehicles (UAVs) of the present invention.

[0063] Figure 2 This is a structural diagram of the micronable simulation environment of the present invention;

[0064] Figure 3 This is a diagram of the end-to-end control network structure of the present invention;

[0065] Figure 4 This is a flowchart illustrating the three-stage learning strategy of this invention. Detailed Implementation

[0066] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of the application will be further described in detail below with reference to the accompanying drawings. The described embodiments are only a part of the embodiments involved in this invention. All non-innovative embodiments based on these embodiments by other researchers in the art are within the protection scope of this invention. Furthermore, the step numbers in the embodiments of this invention are only set for ease of explanation and do not limit the order of the steps. The execution order of each step in the embodiments can be adaptively adjusted according to the understanding of those skilled in the art.

[0067] In one embodiment of the present invention, a fault-tolerant flight control method for multi-rotor UAVs based on an implicit specific state single direct drive end-to-end network is proposed. This method completely abandons the traditional hierarchical serial architecture of "position control-attitude control-control allocation (hybrid controller)" and eliminates the need to build multiple independent expert controllers for explicit switching. Instead, it utilizes a single deep neural control network with time-series awareness, taking target deviation, UAV attitude, action history, and actual observed motor speed as inputs. The network's hidden layer compares the features of the "historical setpoint" and the "current physical response" and then directly outputs the optimal low-level motor control command, which includes thrust compensation and non-standard fault-tolerant reconfiguration rules, within a single computation graph.

[0068] Specifically, the overall process of the fault-tolerant flight control method for multi-rotor UAVs in this embodiment is as follows: Figure 1 As shown, it includes the following steps:

[0069] Step 1: Construct a purely analytical differentiable simulation environment for implicit feature observation and gradient backpropagation.

[0070] To enable a single neural network to "self-aware" of faults without external state switching instructions and to support the backpropagation-time (BPTT) algorithm in transmitting physical gradients across the entire computation graph, this embodiment abandons the non-differentiable black-box simulator commonly used in reinforcement learning. Instead, it utilizes tensor computing frameworks (such as PyTorch or JAX) to build a fully differentiable purely analytical dynamic environment for a quadcopter UAV from the bottom layer, and embeds a continuously differentiable fault injection and actuator mapping mechanism.

[0071] This embodiment can micro-simulate the environment structure as follows: Figure 2 As shown, the input action tensor and current system status Injection failure By constructing a differentiable discrete state transition operator Calculate the state at the next time step. The specific implementation includes the following sub-steps:

[0072] Step 1.1: Tensor quantization definition of observation state space and action space.

[0073] This step is designed as a physical prerequisite for a single network to have the ability to "identify implicit systems". At discrete time step t, the network observation state tensor and the network output action tensor are defined.

[0074] Network observation state tensor To eliminate yaw angle singularities and improve feature consistency, all planar physical quantities are projected onto the UAV body frame, specifically including:

[0075] (1) Positional deviation (3D): Represents the target location of the UAV. Current actual location The projection of the difference vector in the body coordinate system;

[0076] (2) Linear velocity of the body With body angular velocity (Each in 3 dimensions): Represents the translational velocity and rotational angular velocity along the three axes in the body coordinate system, respectively;

[0077] (3) Tiling column vectors of the attitude rotation matrix (9-dimensional): The 3×3 orthogonal rotation matrix, which characterizes the attitude relationship between the current body frame and the inertial frame, is flattened into a 9-dimensional column vector input.

[0078] (4) Control commands from the previous moment (4-dimensional): Networks in The duty cycle control command issued at any time;

[0079] (5) Actual motor speed observation value (4D): The actual rotational speeds of the four motors fed back by the airborne sensors are used for subsequent physical timing residual diagnosis of the network.

[0080] Network output action tensor (4-dimensional): Represents the four reference motor thrust control signals directly output from the network end-to-end, i.e., the target PWM duty cycle. This is used for subsequent calculations to obtain the target rotational speed. .

[0081] Step 1.2, Core Physics Engine: Perform fault parameterization injection and construct a continuous analytical dynamics operator with actuator constraints. .

[0082] To perform end-to-end fault simulation during the training phase, this embodiment utilizes the Newton-Euler rigid body dynamics equations to construct a continuous set of equations encompassing actuator constraints and fault decay. The method is as follows:

[0083] 1.2.1 Control variable inverse normalization and aerodynamic mapping:

[0084] Establish a linear / affine mapping relationship between the network output space and the actual physical rotational speed, and convert the dimensionless control quantity output by the network into a linear / affine mapping relationship. Inverse normalization to the physical target rotational speed vector:

[0085] ;

[0086] in, For Hadamard products (element-by-element multiplication). The physical target rotational speed vector. , The maximum and minimum steady-state calibration speeds are set for the actual motor structure:

[0087] ;

[0088] ;

[0089] ;

[0090] in, For the first The target speed of each motor , No. The maximum and minimum steady-state rated speeds of each motor .

[0091] 1.2.2 Fault parameterization injection:

[0092] definition The fault attenuation matrix is ​​a 4×4 diagonal matrix. .in, Represents the current moment. The health efficiency coefficient of each motor.

[0093] For example, when The motor was intact. Simulated motor aging or blade damage can cause the thrust to be halved under the same feedforward command. 0 indicates that the arm motor is completely damaged and has stopped rotating.

[0094] It is particularly important to note that in this embodiment, the fault attenuation matrix... and its elements It should never be used as direct data input to control the network; it can only be used to influence the next dynamic state, forcing the network to "implicitly infer" it.

[0095] Based on aerodynamic characteristics, the thrust generated by the four rotors on the fuselage shaft system... Not perfectly following control commands Instead, it is regulated by the fault matrix, the mathematical expression of which is:

[0096] ;

[0097] in, It is the total thrust coefficient (constant) of the rotor system, determined by the aerodynamic characteristics of the propeller.

[0098] 1.2.3 System state equations:

[0099] Obtain the real thrust through differentiable injection Then, the simulation system proceeds to the next time-instance state according to the Newton-Euler equations. Because the black-box residual model is eliminated, this step is entirely composed of white-box analysis, ensuring the absolute physical consistency of the gradient.

[0100] First, perform the control and distribution calculations for force and torque:

[0101] Total lift is , is used to represent the total thrust acting in the positive direction of the Z-axis of the body coordinate system, and the calculation formula is as follows:

[0102] ;

[0103] The three-axis control torque of the machine body is: The calculation formula is as follows:

[0104] ;

[0105] in, They are respectively The control torque of the shaft, These are the thrust generated by the airframe shaft system from the four rotors. A 3×4 control allocation matrix is ​​configured for the rotor geometry.

[0106] Then, construct the system state differential equations, including:

[0107] (1) Linear velocity state derivative (translation equation):

[0108] ;

[0109] in, This is the linear acceleration vector (3D) of the UAV in the body coordinate system. Let be the total mass of the drone (a scalar constant). This is the transpose of the attitude rotation matrix (equivalent to a rotational transformation from the inertial frame to the machine frame). The vector of gravitational acceleration in the inertial frame is usually . ; The vector of the body's angular velocity. This is the linear velocity vector in the body coordinate system. This is the Coriolis auxiliary centripetal acceleration term caused by the rotating coordinate system of the machine.

[0110] (2) Angular velocity state derivative (rotation equation):

[0111] ;

[0112] in, This is the angular acceleration vector (3D) of the UAV in the body coordinate system. The 3×3 diagonal matrix of rotational inertia of the UAV is obtained through geometric approximation or actual measurement. It is the inverse of the moment of inertia matrix; This is the gyro torque crosslinking term, used to represent the triaxial nonlinear coupling effect under the high-speed spin of the UAV.

[0113] (3) Attitude update derivative (kinematic equations):

[0114] ;

[0115] in, This is the time rate of change matrix of the attitude rotation matrix; This is the attitude rotation matrix (3×3) at the current moment, used to describe the attitude of the body coordinate system relative to the inertial coordinate system; For the three-dimensional angular velocity vector The resulting antisymmetric matrix.

[0116] (4) Position update derivative (position kinematic equation):

[0117] ;

[0118] in, It is the time rate of change of the UAV's position vector in the inertial frame (i.e., the three-axis world velocity in the inertial frame).

[0119] Calculate based on the Newton-Euler differential equation ,in, This is the time derivative of the system state.

[0120] The inverse normalization of the aforementioned control quantities, fault injection, thrust calculation, force and torque distribution, along with the state differential equations consisting of translational, rotational, kinematic, and positional kinematic equations, together constitute the continuous analytical dynamics operator. :

[0121] ;

[0122] in, For the system state, the operator This allows every extremely small tensor adjustment in the network output layer to elicit a clear and differentiable rigid body rotation and translation response.

[0123] Step 1.3: Perform time-domain discretization and computation graph mounting to construct a differentiable discrete state transition operator. .

[0124] Because neural networks employ discrete clock-cycle inference, directly using continuous differential equations cannot connect to the computational graph composed of tensors. To support the subsequent acquisition of backpropagation gradients across time during network training, this embodiment uses continuous operators. It is transformed into a differentiable computational loop under the deep learning framework.

[0125] The set control period constant Below, the fourth-order Runge-Kutta (RK4) method is used to analyze the above continuous system. Perform discretization iterative processing.

[0126] The RK4 integration steps are as follows:

[0127] ;

[0128] ;

[0129] ;

[0130] ;

[0131] Thus, the state at the next moment is obtained. Accurate predictions:

[0132] ;

[0133] in, , , , These are four slope estimates of the fourth-order Runge-Kutta method within the integration interval, used to perform a weighted average of the system state to update the state at the next time step.

[0134] The complete processing pipeline, which consists of "actuator limiting constraints → fault attenuation → inertial coupling deduction", is encapsulated and defined as a differentiable discrete state transition operator. Its mathematical paradigm is rigorously expressed as:

[0135] ;

[0136] Unlike traditional external simulation software used in robotics or flight control (such as Gazebo and MATLAB Simulink, which block the backpropagation of mathematical gradients when outputting the next state), this invention... All physical computations are fully integrated as underlying tensor operations into the automatic differentiation computation graph of the deep learning framework. This allows the final loss function calculated in subsequent steps to pass unimpeded along the time axis through the dynamics engine, providing gradient guidance carrying the real physical Jacobian matrix to the end-to-end network.

[0137] Step 2: Construct an end-to-end control network based on spatiotemporal feature structure decoupling and latent variable direct drive.

[0138] This embodiment abandons the traditional serial "position-attitude-control allocation" architecture and the multi-expert network switching architecture that relies on explicit fault diagnosis and isolation (FDD). Its core lies in designing an end-to-end single neural network with spatiotemporal feature decoupling, continuous hidden state representation, and intrinsic feature redistribution capabilities. This network can autonomously and smoothly switch its internal control manifold based on historical commands and current physical feedback without relying on an external explicit fault detection and isolation (FDD) module.

[0139] The end-to-end control network structure in this embodiment is as follows: Figure 3 As shown, the input observation state vector Through the kinematic intent perception branch and the physical temporal diagnosis perception branch, the action tensor is output. The specific implementation details are as follows:

[0140] Step 2.1: Design of network feature coding topology and dual-stream decoupling structure.

[0141] In order for the network to have the structural ability to clearly separate the desired motion requirements from the system's own physical health, the 26-dimensional input state vector obtained in step one must be processed. Perform forced structured splitting and feed it into a parallel two-stream computation graph structure:

[0142] The first branch is the kinematic intention perception branch:

[0143] (1) Characteristic flow direction: Directional reception of the current position deviation (3D), linear velocity (3D) Pose rotation matrix flattening amount (9-dimensional), totaling 15-dimensional tensors.

[0144] (2) Implementation: This branch is constructed as a two-layer fully connected multilayer perceptron (MLP_kin). The specific node flow is as follows: input layer (15-dimensional) → hidden layer (64-dimensional, using the Mish non-linear activation function to ensure smooth gradient transmission) → output layer (64-dimensional, Mish activation).

[0145] (3) Module output: Extract kinematic feature vectors that are detached from the underlying hardware information and focus on expressing the intention of spatial maneuvering. .

[0146] The second branch is the Physical Timing Diagnostic Branch:

[0147] (1) Feature flow direction: In order to capture the coupling relationship between the command and the physical response without introducing artificial empirical parameters, the control command generated by the network in the previous moment is directly extracted. (4D) Current actual observed motor speed (4-dimensional) and body angular velocity (3D) forms an 11-dimensional basic observation channel.

[0148] (2) Residual quantization calculation process: In order to eliminate the blind mapping of the network, physical priors are forcibly introduced.

[0149] Define the rotational speed residual vector The calculation formula is:

[0150] ;

[0151] in, The motor steady-state gain matrix is ​​pre-calibrated. Indicates the first Command for each motor - linear gain of speed.

[0152] With the motors in good working order, multiple open-loop commands are sent to each motor, and steady-state speeds are measured. The speeds of each motor are then obtained through least-squares linear fitting. This matrix remains fixed during training to ensure that the residuals always reflect the true physical deviations.

[0153] (3) Hierarchical pre-assembly and purification: The calculated 4D dynamic residuals are then pre-assembled and purified. By concatenating the tensors along the feature dimensions of the aforementioned 11-dimensional basic observation channels, a 15-dimensional diagnostic input feature is generated. .

[0154] Subsequently, a fully connected layer, MLP_diag [Linear(15, 32) + Mish activation], is used to extract the physical temporal feature vector. Here, Linear(15, 32) represents a fully connected linear transformation with an input dimension of 15 and an output dimension of 32, and MLP_diag represents the sub-network formed by this fully connected layer and the Mish activation function. This serves as the input for subsequent time-series networks.

[0155] Step 2.2: Mechanism for continuous representation of high-dimensional hidden states based on gated recurrent units (GRUs).

[0156] Traditional logic relying on explicit selectors requires manually defined hard decision thresholds (such as "speed decreases by 30%). This embodiment abandons manual labeling and utilizes the specific topological structure of recurrent neural network (RNN) units to perform implicit temporal dynamic modeling in high-dimensional space for external physical evolution that is difficult to analyze.

[0157] (1) Input structure definition: The 32-dimensional feature vector output from the diagnostic branch in step 2.1 is used as the input structure definition. As input, it is fed into the computational core of the Long Short-Term Memory Unit, namely the Gated Cyclic Unit (GRU).

[0158] (2) GRU recursive formula: The state update equation is executed at time step t: The hidden layer is defined as having 64 channels, therefore the real-time generated hidden state features are as follows: .

[0159] (3) Implicit Representation Network Mechanism: This layer does not directly decode the fault attenuation matrix, but internalizes its impact on UAV dynamics. Distribution evolution in higher-dimensional space.

[0160] When the expected motor speed closely matches the actual speed (in a healthy state), the high redundancy of the input causes the update gates within the GRU to spontaneously maintain the latent variables evolving near the nominal manifold; however, in the event of a physical abrupt change such as motor stalling, the aforementioned pre-defined... A significant bias will occur, driving the GRU update gate to rapidly absorb the current singular input, forcing The vectors are smoothly and quickly transferred to a high-dimensional subspace representing the specific damage pattern. This structure ensures that the network can dynamically respond to continuous decay of varying complexity without relying on any external discrete labels.

[0161] Step 2.3: Policy execution network segmented linear mapping and adaptive topology routing.

[0162] The control strategy "manifold switching" referred to in this embodiment is not an external program jump logic. In essence, it is strictly driven by the piecewise linear (PWL) mapping characteristics of a deep network with a ReLU activation function, completing the dynamic redistribution from the input space to the control output space.

[0163] (1) Fusion Layer: Combines the signals from the motion splitter. With the inclusion of dynamic time-series latent variables By splicing them together, a joint latent variable that determines the global actions of the system is formed. .

[0164] (2) Policy Execution Master Network (Policy MLP): The deep feedforward network, fed into which control assignment is computed, has the following complete topology:

[0165] The first layer is a fully connected layer. The result is processed as [Linear(128, 128) + RULE], and the calculation is performed. Output dimension 128 → Non-linear activation layer ;

[0166] The second layer is a fully connected layer. The result is processed as [Linear(128, 64) + RULE], and the calculation is performed. Output dimension 64 → Non-linear activation layer ;

[0167] The third layer is the output layer. The result is processed as [Linear(64, 4) + Sigmoid], and the calculation is performed. Output dimension 4 → Compressed activation layer ;

[0168] in, , , These represent the weight matrices for the first, second, and third layers, respectively. , These represent the outputs of the first and second fully connected layers after the ReLU activation function, respectively. , , These represent the bias vectors for the first, second, and third layers, respectively. , , These represent the linear outputs of the first, second, and third layers, respectively. For activation function, This is the normalization function.

[0169] (3) Implicit feedforward reconstruction based on activation region transfer:

[0170] According to the characteristics of ReLU networks, when a high-dimensional input space is mapped through this network, it will be divided into numerous local linear activation regions by a multidimensional hyperplane.

[0171] As time progresses, when latent variables... When drift occurs due to changes in underlying physical characteristics (such as motor failure), the joint feature vector A large-scale positional shift occurs in the system's input tensor space. This spatial shift causes the activation states of a large number of neurons to be flipped after passing through the fully connected layer mapping (Wx+b) (i.e., the input falls from a positive value into the ≤0 interval, or vice versa).

[0172] This nonlinear global region shift leads to the equivalent mapping operator exhibited by Policy MLP as a whole. A fundamental change has occurred. In macroscopic applications, this manifests as the reduction or elimination of the original yaw maintenance compensation pathway, while the differential spin anti-roll mapping is activated. Through continuous switching of local affine functions, the network achieves a natural transition from one control strategy (distribution) to another, preserving the inherently strong nonlinear expression of neural networks.

[0173] (4) Physical macroscopic manifestation - comprehensive collaborative degradation: This switching of node weights is physically macroscopically manifested as the network automatically reducing the yaw axis of the UAV. and location By reducing the tracking weight, the yaw control precision is effectively sacrificed. The differential speed of the remaining three motors is used to generate spin to maintain altitude, thereby breaking through the degree of freedom constraint under single motor failure and achieving stable flight in a reduced-dimensional control space.

[0174] Due to a single network The structural design encompasses state decoupling, implicit residual quantization, manifold routing, and ultimate hybrid control, thus achieving extreme driving efficiency.

[0175] (1) Computation graph transient disk placement: After the above policy network completes the calculation, the output layer The generated Sigmoid-limited data This refers to the target PWM duty cycle of the four motors represented by the physical layer.

[0176] (2) Effect of anti-lag technology: It eliminates the huge inter-level communication loss and complicated sequence under the traditional cascade architecture system: fault diagnosis requires cumulative window diagnosis → notifying external decision-makers to replace the controller array → inverse kinematics control allocation matrix to find pseudo-inverse → resetting the complex long link of PID integral term and saturation dead zone.

[0177] This embodiment of fault detection starts with sensor tensors, completing the entire process of timing identification (GRU refresh), internal region reconstruction (based on PWL attributes), and providing the direct-drive control law, all contained within a single forward inference. Since the entire forward inference only involves matrix multiplication and activation function calculation, its computational latency is much lower than the diagnosis-switching-reconstruction process in traditional cascaded architectures, achieving fault-tolerant control without the need for independent fault detection and diagnosis modules.

[0178] Step 3: Optimization method for control network parameters based on differentiable physics engine and backpropagation over time (BPTT).

[0179] Because this embodiment constructs an end-to-end single-drive control network (GRU units including temporal evaluation and piecewise linear policy mapping) have extremely complex deep mapping boundaries. Traditional model-free reinforcement learning (ML) based on zero-order moment estimation of policy gradients is prone to getting trapped in local optima or causing training collapse. Therefore, a parameter optimization scheme based entirely on white-box differentiable dynamics and backpropagation over time (BPTT) is adopted, combined with an adaptive composite loss function and a multi-order course learning mechanism, to achieve direct analytical gradient propagation from physical constraints to network weights. The specific implementation steps are as follows:

[0180] Step 3.1: Differentiable closed-loop tensor sampling mechanism covering the entire physical manifold.

[0181] To generate sufficiently rich and continuously differentiable training trajectories, this embodiment constructs a closed-loop sampling pipeline based on batch tensor operations. Let the preset evaluation time field of view length be... Step (e.g.) (corresponding to 1 second of simulated physical time), the sampling process for a single training round is as follows:

[0182] (1) State initialization: Extract kinematic state tensors based on the initial distribution. Initialize the hidden variables of the internal GRU Based on the fault types and probabilities set at the current learning stage of the course, a fault matrix is ​​sampled. The corresponding fault mode is defined in this matrix. It is used only within the physics engine and is not used as network input or involved in loss calculation, ensuring that the network cannot directly obtain fault information.

[0183] At each time step t, the observed state Based on the current system state Previous control command and current rotational speed observation It is constructed.

[0184] (2) Construction of the forward unfolded graph: In The internal sequence recursion includes:

[0185] State decoupling extraction: According to the definition in step 2.1, from Extracting motion features Pretreatment diagnostic features ;

[0186] Independent evolution of latent variables: According to the definition in step 2.2, the GRU updates the temporal latent state independently. ;

[0187] Direct-drive inference based on strategy: According to the definitions in steps 2.3 and 2.4, the motion features are... With hidden state splicing as joint features Finally, control commands are output through the policy network. .

[0188] Physical differential derivation: control commands The fully differentiable hybrid power mechanical operator established in step one is fed into the input. This generates the state for the next time step: .

[0189] (3) Collection of temporal cascaded computation graphs: In the complete derivation process from t=0 to t=T, the computation graph is no longer discarded as in reinforcement learning. Instead, all tensor operation operators (including matrix multiplication in neural networks, gating and closing in GRU, and inversion of inertial matrices and partial derivatives under gravity generated in the physics engine) are retained in the unidirectional dynamic computation graph cache of the deep learning framework (such as PyTorch / JAX), forming a set of joint trajectories with complete topological links. .

[0190] Step 3.2: Design of a fully state-driven adaptive loss function.

[0191] The reference trajectory is generated by an external task planner, providing the desired position at each time step t. Expected speed and expected rotation matrix During the training phase, reference trajectories can be randomly generated (such as fixed-point tracking, circular trajectories, etc.) to cover various working conditions; during the deployment phase, the reference trajectory is calculated in real time by user commands.

[0192] To guide the network in learning control manifold switching (such as abandoning yaw to maintain altitude under extreme faults), while avoiding the loss function being affected by the unknown fault matrix. To generate explicit dependencies (preventing the breakdown of generalization in end-to-end learning), this embodiment further designs a composite loss function that depends only on observable physical errors. :

[0193] ;

[0194] The time discount rate constant is set to... To encourage long-term survival, These are the weighting coefficients for position, velocity, and control smoothing terms, which can be adjusted according to task requirements. Each sub-term has specific tensor mapping constraints.

[0195] Specifically, location tracking penalties and speed tracking penalty Used to force drones to execute three-dimensional spatial intentions:

[0196] ;

[0197] ;

[0198] in, This is the current actual location. For the target location, This represents the current actual linear velocity of the machine body. The linear velocity of the target body.

[0199] Instruction smoothing and action penalty Used to punish severe high-frequency oscillations of the motor's duty cycle and long-term residence at the saturation boundary:

[0200] ;

[0201] in, The preset L2 regularization coefficients are used. The control command at the current moment. The control command from the previous moment; preferably, .

[0202] Physics-Induced Attitude Compromise Loss Includes pitch / roll error terms and adaptive yaw weighting terms:

[0203] First, define the pose error vector in the form of a rotation matrix, from the desired rotation matrix. and the current rotation matrix Calculate axis angle error :

[0204] ;

[0205] in, This indicates that the antisymmetric matrix is ​​mapped to a three-dimensional vector.

[0206] Then, Decomposed into pitch / roll components and yaw component .

[0207] The attitude compromise loss based on physics is:

[0208] ;

[0209] in, Use a fixed weight (can be 1.0); The penalty weight for yaw redirection is dynamically adjusted based on the current altitude error:

[0210] ;

[0211] In the formula, For high tracking error, The nominal yaw weight (e.g., 1.0). The relaxation sensitivity coefficient (e.g.) This is used to enable the control network to automatically reduce yaw penalties during altitude drops, prioritizing the maintenance of altitude stability.

[0212] This design allows the network to withstand extreme failures during training and a sharp drop in available thrust, once the altitude begins to fall irreversibly ( Increase the penalty weight for yaw redirection. It will decay exponentially. At this point, the loss surface will automatically establish the guaranteed altitude and guaranteed roll / pitch (ensuring that the lift direction is not reversed) as the global minimum point. By utilizing the natural convergence of gradient descent, it forces the ReLU weights in the neural network to evolve into a nonlinear manifold mapping that "sacrifices yaw spin for thrust survival".

[0213] Step 3.3: Course learning strategy based on single motor fault evolution.

[0214] When faced with highly underactuated conditions caused by partial or complete damage to a single motor, the dynamic nonlinearity of a multirotor system is drastically different (for example, in the case of complete single-motor failure, the UAV must sacrifice its yaw angle to enter high-speed spin to maintain physical equilibrium). If the extreme boundary condition of a single motor completely shutting down is directly injected into the simulator at the beginning of training, the dynamic state equation will immediately fall into a singularity or divergence region (such as entering an irreversible tumble crash), causing the physical gradients on the backpropagation-time (BPTT) computation chain to explode, and the network will be completely unable to extract effective fault-tolerant features.

[0215] To address this technical bias, this embodiment proposes a three-stage progressive learning strategy that dynamically adjusts the fault attenuation matrix in environmental parameters. The probability distribution and lower bound of the values ​​guide the control network to gradually learn the implicit fault-tolerant mechanism of thrust redistribution and spin-based soft landing in a smooth physical transition manifold.

[0216] Step 3.3.1, Definition of Single Motor Fault Sampling Distribution and Health Coefficient:

[0217] This implementation models the fault sampling process as a strict conditional sampling model. Before the start of each training batch, the system evaluates the health coefficients of the four motors according to the following rules. Assign a value:

[0218] (1) Fault introduction determination: based on the probability set in the current training phase. This will determine whether to introduce the faulty motor in this round.

[0219] (2) Selection of the faulty motor: If a fault is introduced, one motor is randomly selected from the four motors as the target faulty motor, and its index is set as follows. For the remaining three healthy motors that were not selected, their health efficiency coefficients are always rigidly locked at the ideal state: .

[0220] (3) Dual-state health sampling of faulty motors: for the selected target faulty motors A health coefficient is generated based on the currently set fault intensity (soft / hard). :

[0221] Soft fault intensity (local damage degradation): Simulates partial degradation scenarios such as blade breakage, motor winding aging, or ESC overheating. The values ​​are independently sampled from a preset uniform distribution. Where U(0.3,0.8) represents a uniform distribution over the interval [0.3,0.8]. This range can be adjusted according to the actual fault characteristics; in this embodiment, this range is selected to balance fault salience and training stability.

[0222] Hard Failure Intensity (Single Unit Complete Failure): Simulates extreme crash scenarios such as motor power failure, structural fracture, or complete blade detachment. The failure coefficient is forcibly reset to zero, i.e., assigned a value. .

[0223] Step 3.3.2, Three-stage evolution schedule planning for single-motor fault-tolerant control:

[0224] In this embodiment, the expert network training is conducted in rounds over time. Divided into three progressive constraint phases, this systematically narrows the cognitive gap between the "ideal state" and the "catastrophic state," with a total of [number] training rounds. Three-stage threshold , , Set proportionally , , .

[0225] Specific evolution strategies include Figure 4 As shown, it includes:

[0226] Phase 1: Ideal envelope establishment and control basis anchoring phase, during training rounds At this time, the system is in a fault-free ideal training environment, and no fault motor selection is triggered. (i.e., the health coefficient of all motors in the entire system) ).

[0227] This stage is used to guide the network to learn basic aerodynamic mapping and dynamic constraints, thereby achieving basic high-precision hovering, trajectory tracking and standard attitude control.

[0228] Phase Two: Single Rotor Soft Degradation and Thrust Compensation Adaptation Phase, when the training rounds enter... During this phase, soft faults begin to be introduced into the system. The system operates with a fixed probability ( Randomly select one motor Soft fault occurred The probability of a hard failure is 0.

[0229] This stage introduces a continuously differentiable single-point thrust attenuation effect, forcing the network to learn adaptive adjustment and utilize three additional non-faulty motors to compensate for the thrust shortfall within a physical framework where the system will not directly overturn and collapse. The underlying control network is trained to implicitly predict and residual-tolerantly allocate asymmetric torque loss.

[0230] Phase 3: Single-machine extreme power failure reconstruction phase, when the training rounds reach... At this point, the system begins to introduce hard faults. The system then proceeds according to a fixed probability ( Extract faulty motors, including those with hard faults ( The probability is 0.7, soft fault ( The probability is 0.3.

[0231] This stage relates to the adaptive yaw weights in the loss function. The synergistic effect forces the network to evolve a fault-tolerant strategy of "sacrificing yaw spin in exchange for thrust to maintain altitude", effectively addressing the crash protection problem in the event of complete failure of a single motor.

[0232] Step 3.4: BPTT gradient analytical propagation and parameter update.

[0233] Unlike the black-box estimation in reinforcement learning, this embodiment uses analytical gradients to directly update the parameter θ, expressed as: The gradient calculation formula is:

[0234] ;

[0235] in, Represents the gradient. The calculation is performed by accumulating data backwards along time, specifically by the automatic differentiation mechanism of the deep learning framework. The Jacobian matrix of the system state transition is given by the differentiable dynamic operator. supply; and It is obtained by analytical differentiation of the loss function; By network The hidden state is obtained by automatic differentiation during forward propagation. Although not explicitly stated in the formula, its gradient is implicitly implied through the chain rule. middle.

[0236] Parameter updates use the AdamW optimizer with an initial learning rate of [missing information]. And a cosine annealing strategy is used to decay to To prevent gradient explosion, gradients are pruned before each update:

[0237] ;

[0238] Then perform the parameter update:

[0239] ;

[0240] in, This is a gradient clipping function used to restrict each element of the gradient vector to a range. Inside, to prevent gradient explosion, The preset pruning threshold is η, and the current learning rate is η. This is the AdamW optimizer. The above process ensures training stability, enabling the network to converge to an effective fault-tolerant control strategy.

[0241] In summary, this invention (1) implicitly senses the health status of the motor through a single neural network, completes fault response and strategy adjustment within a single control cycle, and does not require an independent fault diagnosis stage. This achieves fault-tolerant control without an independent fault detection and diagnosis module under single motor failure, solving the detection delay problem caused by the reliance on explicit fault detection and diagnosis (FDD) modules in traditional active fault-tolerant control. (2) After a single motor completely fails, the adaptive yaw weight mechanism in the loss function enables the network to automatically relax the yaw tracking accuracy when the altitude drops, prioritize maintaining altitude stability, and learn the extreme survival strategy of "maintaining altitude" in the spin state, solving the attitude loss problem under a single motor complete failure. (3) By utilizing a differentiable physical simulation environment and the backpropagation over time (BPTT) algorithm, the network can directly use physical gradients to optimize parameters, significantly improving the training convergence speed and ensuring the smoothness and physical realizability of control commands, solving the problems of low training efficiency and large control jitter in model-free reinforcement learning methods. (4) A progressive learning strategy was designed. By warming up with soft faults and progressively introducing hard faults, the network was guided to gradually learn the thrust redistribution and spin survival strategies in a smooth physical transition manifold. By establishing a smooth transition mechanism from normal to single-machine failure, the problem of gradient explosion and network non-convergence caused by directly injecting hard faults in the early stage of single-motor fault training was solved.

[0242] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A multi-copter unmanned aerial vehicle fault-tolerant flight control method based on end-to-end direct thrust control, characterized in that, Includes the following steps: Step 1: Using the tensor computation framework, we build a purely analytical differentiable simulation environment for quadrotor UAVs that is oriented towards implicit feature observation and gradient backpropagation from the bottom layer. It embeds a continuously differentiable fault injection and actuator mapping mechanism, and uses the fourth-order Runge-Kutta method to discretize and iterate the continuous operators, transforming them into differentiable discrete state transition operators under the deep learning framework. Step 2: Construct an end-to-end control network based on spatiotemporal feature structure decoupling and latent variable direct drive. With the observed state tensor as input, extract dual-stream features through kinematic intention perception branch and physical time series diagnosis perception branch. Generate latent states based on gated loop units and output four motor action commands. Step 3: Train the end-to-end control network constructed in Step 2 in the differentiable simulation environment constructed in Step 1. Based on the differentiable physics engine and backpropagation over time, optimize the control network parameters. Combine the adaptive composite loss function and multi-order course learning mechanism to perform direct gradient analytical transmission from physical constraints to network weights, so that the control network converges to an effective fault-tolerant control strategy and realizes fault-tolerant flight control of multi-rotor UAVs. The differentiable discrete state transition operator described in step 1 is constructed as follows: Establishing a continuous analytical dynamics operator , the operator sequentially performs de-normalization, fault injection, thrust calculation, force and torque distribution, and system state differential equations inside the operator, and outputs system state derivatives : ; in, This represents the current system state. Indicates the current actual location. Indicates the linear velocity of the machine body. This represents the orthogonal rotation matrix that indicates the attitude relationship between the current body coordinate system and the inertial coordinate system. Indicates the angular velocity of the aircraft; For a four-dimensional action tensor, This is the fault attenuation matrix; Using the fourth-order Runge-Kutta method Discretization into differentiable state transition operators : ; in, The system state at the next moment; the differentiable state transition operator The automatic differentiation computation graph mounted on the deep learning framework enables subsequent BPTT gradients to propagate backward along the time axis. In the end-to-end control network described in step 2, the two-stream feature extraction includes: Receive positional deviation by sensing branches through kinematic intent. linear velocity of the machine body Tiling column vectors of the attitude rotation matrix Motion features are extracted through a two-layer fully connected network. ; By using the physical timing diagnostic sensing branch, control commands from the previous moment are received. Current motor speed observation value Angular velocity of the body Calculate the rotational speed residual : ; in, The motor steady-state gain matrix is ​​pre-calibrated. Indicates the first Command-speed linear gain for each motor, i=1,2,3,4; This represents the operation of converting a vector into a diagonal matrix. The minimum steady-state speed vector; The calculated speed residual The data is concatenated with the original input and then processed through a single fully connected network to extract diagnostic features. ; Step 3 describes training the end-to-end control network constructed in step 2 within the differentiable simulation environment built in step 1. The method includes: Step 3.1 Differentiable closed-loop sampling: For each training round, sample the fault matrix according to the fault type and probability set in the current course learning stage. The corresponding fault modes are expanded forward along the time step, preserving the complete computational graph and forming a trajectory set. ;in, The total number of time steps. It is in a hidden state; Step 3.2: Construct the adaptive loss function, total loss Defined as: ; in, The time discount rate, These are the weighting coefficients for position, velocity, and control smoothing terms, respectively. As a penalty for location tracking, For speed tracking penalties, For the physical-induced attitude compromise loss, For instruction smoothing and action penalty; Step 3.3, Multi-stage course learning: A three-stage progressive training is adopted. The first stage is fault-free, the second stage introduces soft faults, and the third stage introduces hard faults. Through multi-stage course learning, the difficulty of faults is gradually increased, so that the control network can stably converge to an effective fault-tolerant strategy. Step 3.4, Gradient Calculation and Parameter Update: Calculate the gradient of the loss with respect to the network parameters using the automatic differentiation mechanism of the deep learning framework. Use the AdamW optimizer and gradient clipping to update the parameters and prevent gradient explosion.

2. The fault-tolerant flight control method for multi-rotor unmanned aerial vehicles according to claim 1, characterized in that, The differentiable simulation environment described in step 1 includes: Define the network observation state tensor This includes: positional deviation in the body coordinate system. linear velocity of the machine body Angular velocity of the body Tiling column vectors of the attitude rotation matrix Previous control command Actual motor speed observation value ; Define the network output four-dimensional action tensor , representing the target PWM duty cycle of the four motors of the multi-rotor drone.

3. The fault-tolerant flight control method for multi-rotor unmanned aerial vehicles according to claim 2, characterized in that, The continuously differentiable fault injection and actuator mapping mechanism described in step 1 includes: Establish a linear mapping relationship between the network output space and the actual physical rotation speed, and convert the network output motion tensor into a linear mapping relationship. Inverse normalization to the physical target rotational speed vector ; Define the fault attenuation matrix , For the first The health coefficients of each motor are i=1,2,3,4; the fault attenuation matrix is ​​only used within the physical engine and affects the next dynamic state. Fault injection uses a speed-level model, and the actual motor speed is... : ; Calculate the actual thrust : ; in, This is the thrust coefficient.

4. The fault-tolerant flight control method for multi-rotor unmanned aerial vehicles according to claim 1, characterized in that, Step 2, which involves generating hidden states based on gated loop units, includes: Diagnostic features Input gated loop unit, through time steps Execute the state update equation to update the hidden state. : ; in, For time steps The hidden state, The function represents a gated loop unit used to internalize the influence of the fault decay matrix on UAV dynamics into a hidden state. Distribution evolution in higher-dimensional space.

5. The fault-tolerant flight control method for multi-rotor unmanned aerial vehicles according to claim 4, characterized in that, Step 2, which outputs the four-motor action command, includes the following processing: Motion characteristics With hidden state Input to the fusion layer and concatenate them into joint features. It is sent to a three-layer fully connected network; A three-layer fully connected network is used to dynamically redistribute the input space to the control output space, outputting an action tensor. .

6. The fault-tolerant flight control method for multi-rotor unmanned aerial vehicles according to claim 5, characterized in that, The three-layer fully connected network is as follows: The first layer is a fully connected layer, input... With a dimension of 128, calculate The output dimension is 128, and after passing through a non-linear activation layer, the following is obtained: ; The second layer is a fully connected layer, used for computation. The output dimension is 64, and after passing through a non-linear activation layer, the following is obtained: ; The third layer is the output layer, which performs calculations. The output dimension is 4, and the normalized action tensor is obtained after passing through a compressed activation layer. : ; in, , , These represent the weight matrices for the first, second, and third layers, respectively. , These represent the outputs of the first and second fully connected layers after the ReLU activation function, respectively. , , These represent the bias vectors for the first, second, and third layers, respectively. , , These represent the linear outputs of the first, second, and third layers, respectively. For activation function, This is the normalization function.

7. The fault-tolerant flight control method for multi-rotor unmanned aerial vehicles according to claim 6, characterized in that, Location tracking penalty and speed tracking penalty Used to force drones to execute three-dimensional spatial intentions: ; ; in, For the target location, The linear velocity of the target body; The instruction smoothing and action penalty Used to punish severe high-frequency oscillations of the motor's duty cycle and long-term residence at the saturation boundary: ; in, The default L2 regularization coefficient; The physical-induced attitude compromise loss It includes pitch / roll error terms and adaptive yaw weight terms: ; in, , These are the pitch / roll components and the yaw component, respectively. For fixed weights; As a yaw weight, it is dynamically adjusted according to the altitude error, so that the control network automatically reduces the yaw penalty when the altitude drops, and prioritizes maintaining altitude stability.