Heterogeneous aircraft fire rescue task allocation method based on deep reinforcement learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a multi-constraint task allocation model using deep reinforcement learning, the problem of dynamic task allocation for heterogeneous aircraft in forest fires was solved, achieving efficient resource utilization and task optimization, and improving fire fighting efficiency.

CN122198211APending Publication Date: 2026-06-12NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Filing Date: 2026-02-06
Publication Date: 2026-06-12

Application Information

Patent Timeline

06 Feb 2026

Application

12 Jun 2026

Publication

CN122198211A

IPC: G06Q10/04; G06Q10/0631; G06Q50/26; G06F17/10; G06N3/092

AI Tagging

Application Domain

Forecasting Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies are insufficient to effectively address the task allocation of heterogeneous aircraft during forest fires, especially in dynamic fire environments, and cannot achieve efficient resource utilization and task optimization.

⚗Method used

A deep reinforcement learning-based approach is used to construct a multi-constraint task allocation optimization model. By combining an aircraft-task association model and a forest fire spread model, a state space, action space, and reward function are designed to achieve adaptive task scheduling under dynamic fire conditions. The optimal allocation scheme is obtained through a near-end policy optimization algorithm.

🎯Benefits of technology

It has improved the efficiency of forest fire fighting, reduced the time required to complete tasks, ensured the efficient completion of rescue missions, and provided scientific decision-making support for emergency command departments.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122198211A_ABST

Patent Text Reader

Abstract

The application discloses a heterogeneous aircraft fire rescue task allocation method based on deep reinforcement learning, constructs a multi-constraint task allocation optimization model which comprehensively considers aircraft performance, task demand and fire dynamic, and ensures that the task allocation process meets the dynamic constraint conditions under the actual rescue environment; the state space, action space and reward function of the multi-constraint task allocation optimization model are respectively designed, a task allocation strategy model of the heterogeneous aircraft is constructed, and adaptive expression of the task scheduling behavior under the dynamic fire is realized; the task allocation strategy model of the heterogeneous aircraft is combined with the proximal policy optimization, efficient solution and strategy optimization of the high-dimensional state space and time-varying environment are realized, so that a stable and generalizable optimal allocation scheme can be obtained under the dynamic fire condition. The application can effectively reduce the task completion time, realize rapid and scientific deployment of aerial fire-fighting resources in a complex and uncertain rescue scene, and improve rescue efficiency and overall task completion benefit.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of aviation emergency rescue, specifically involving a method for allocating forest fire rescue tasks for heterogeneous aircraft based on deep reinforcement learning. Background Technology

[0002] In recent years, forest fires have occurred frequently around the world, with an average of more than 10,000 forest fires occurring in my country each year. These fires not only cause extensive damage to forest resources and severe ecological destruction, but also result in numerous casualties and economic losses. Given the suddenness and dynamic evolution of forest fires, ground rescue personnel find it difficult to accurately assess the spread of fires in real time. Therefore, aerial resources such as drones and helicopters are crucial for reconnaissance, monitoring, and coordinated firefighting efforts.

[0003] In forest fire rescue scenarios, the task allocation problem for aviation resources faces three core challenges: First, there are significant differences in performance among heterogeneous aircraft, such as payload capacity, flight speed, endurance, and communication capabilities, which necessitates that task allocation strategies fully consider the resource characteristics of different aircraft. Second, the fire environment is highly dynamic, with fire development influenced by a combination of factors such as wind speed, temperature, fuel type, and terrain conditions, leading to continuous changes in task priorities and spatial distribution, requiring task allocation methods to have strong adaptability. Finally, the forest fire rescue task allocation problem involves multiple objectives and constraints, requiring a balance between multiple optimization indicators such as improving task effectiveness, shortening completion time, and enhancing resource utilization efficiency.

[0004] While operations research and reinforcement learning methods have been successfully applied in areas such as drone collaboration, supply chain optimization, and emergency vehicle dispatching, their effective application in aerial rescue for forest fires remains relatively limited. In particular, forest fires are characterized by dynamic evolution and rapid spread, leading to complex spatiotemporal uncertainties, which existing models have not yet adequately addressed. Furthermore, current research typically focuses on typical regions and scenarios, and most solutions are based on the assumption that problem elements are in a deterministic state, lacking research on the impact of fire dynamics on the allocation of firefighting resources and tasks. Simultaneously, existing research ignores the realities of complex aircraft structures, significant differences in operational efficiency, and weak coordination, lacking efficient, flexible, and scalable heterogeneous aircraft task planning methods. These research gaps highlight the importance of conducting specialized modeling and optimization research on aircraft task allocation for forest fire rescue. Summary of the Invention

[0005] Purpose of the invention: To address the problem of heterogeneous aircraft task allocation in forest fires with complex fire environments and dynamically changing tasks, this invention provides a method for heterogeneous aircraft forest fire rescue task allocation based on deep reinforcement learning. This method can effectively reduce task completion time, achieve efficient integration of dynamic characteristics of fire spread and aircraft performance attributes, and improve the efficiency of forest fire fighting.

[0006] Technical solution: The forest fire rescue task allocation method based on deep reinforcement learning for heterogeneous aircraft described in this invention includes the following steps:

[0007] (1) Construct a multi-constraint task allocation optimization model that comprehensively considers aircraft performance, mission requirements and fire dynamics to characterize complex forest fire rescue scenarios and ensure that the task allocation process conforms to the dynamic constraints under the actual rescue environment; The modeling method of the multi-constraint task allocation optimization model is as follows: establish a task assignment relationship based on the aircraft-task association model so that different types of aircraft can achieve reasonable division of labor according to their own attributes; characterize the priority of target tasks based on the forest fire spread model; set the constraints and objective functions of the multi-constraint task allocation optimization model respectively to achieve the optimal overall response efficiency of the fire site;

[0008] (2) Design the state space, action space and reward function of the multi-constraint task allocation optimization model respectively, construct the task allocation strategy model of heterogeneous aircraft, and realize the adaptive expression of task scheduling behavior under dynamic fire situation;

[0009] (3) Combining the near-end strategy optimization of the task allocation strategy model of heterogeneous aircraft, we can achieve efficient solution and strategy optimization for high-dimensional state space and time-varying environment, so as to obtain a stable and generalizable optimal allocation scheme under dynamic fire conditions.

[0010] Furthermore, the aircraft-mission association model includes a set of heterogeneous aircraft. and target task set Define and establish attribute mapping relationships;

[0011] The heterogeneous aircraft collection Used to describe heterogeneous resources involved in the rescue, including performance parameters and availability constraints of different types of aircraft:

[0012]

[0013] in, The number of aircraft, and each aircraft has 4 elements. These factors collectively determine the accessibility and operational efficiency of aircraft in different fire scene environments, reflecting resource constraints in actual rescue scenarios. For aircraft The number, For aircraft Location information For aircraft The abilities and attributes they possess For aircraft Flight speed;

[0014] aircraft The set of one's own ability attributes is represented as:

[0015]

[0016] in, For aircraft The first This is a type of ability attribute, with values ranging from 1 to 2. , ;

[0017] The task set This indicates the rescue mission that needs to be completed, including its spatial location, the intensity of the mission requirements, and the time requirements:

[0018]

[0019] in, The number of tasks is given, and each task has 5 elements. , For the task The number, For the task Location information For the task Regarding the demand attributes of resources, the first The value of each attribute is , , For the task The priority coefficient of rescue, For the task The overall level of importance;

[0020] The attribute mapping relationship is the correspondence established between aircraft capability attributes and mission requirement attributes.

[0021] Furthermore, the forest fire spread model calculates the fire spread rate based on the classical fire spread model. Specifically, it is expressed as:

[0022]

[0023]

[0024] in, The initial spread rate, For wind speed, This is a correction factor for the type of combustible material. This is the wind correction factor. This is the terrain slope correction factor. For terrain slope, The highest temperature of the day. The average wind speed at noon that day. For temperature parameters, For wind force parameters, For humidity parameters;

[0025] Then at time Task The included fire area is expressed as follows:

[0026]

[0027] The fire area is used to represent the priority of tasks. The rescue priority coefficient is expressed as:

[0028] .

[0029] Furthermore, the specific constraints of the multi-constraint task allocation optimization model are as follows:

[0030] Task allocation constraints:

[0031] For any aircraft and tasks If there is an allocation relationship ( There are two possible scenarios: one is an aircraft After completing the task The task was then assigned. That is, the transfer variable Secondly, aircraft Upon first use, tasks are directly assigned for processing. In this case, there is no transfer variable, i.e., no transfer variable. Therefore, it is assumed that virtual tasks exist. , It is the starting point for all missions. At the initial allocation, all aircraft missions are in the virtual mission. Above, record Then the aircraft-task assignment variable Transfer variables between missions for aircraft There are constraints:

[0032]

[0033] Furthermore, since an aircraft can only perform one mission at a time, the following applies:

[0034] ;

[0035] Task timing constraints:

[0036] aircraft After completing the task The task was then assigned to be processed. At that time, aircraft Start processing tasks start time Not smaller than an aircraft To the mission The time of the region, that is:

[0037]

[0038] in, For the task Processing time, For the task With the task The spatial distance between them is expressed as:

[0039]

[0040] in, , Tasks With the task Geographical location;

[0041] like This is the upper limit of the total mission completion time. The allocation and sequence constraints of aircraft among missions are expressed as follows:

[0042]

[0043] When the task With the task There is an order relationship. ,and When the above formula describes the waiting behavior of an aircraft during the allocation process; when ,and Then, the above formula can be further expressed as:

[0044] ;

[0045] Aircraft capability constraints:

[0046] In order for an aircraft to perform a task, every capability attribute value it possesses must be greater than or equal to the required attribute value of the target task.

[0047] ;

[0048] Firefighting efficiency constraints:

[0049] To effectively extinguish the fire, the following constraints apply:

[0050]

[0051] in, For aircraft To the target task Spatial distance, The initial time allocated for aircraft missions;

[0052] Task completion time constraints:

[0053] The time required to complete all tasks is [time]. Then the processing time constraint for any task is expressed as:

[0054] .

[0055] Furthermore, the objective function of the multi-constraint task allocation optimization model is as follows:

[0056] The goal of heterogeneous aircraft mission allocation for forest fires is to shorten the time to complete the overall firefighting mission and improve the effectiveness of completing all missions. This effectiveness requires allocating appropriate aircraft or aircraft clusters to the correct areas to perform appropriate tasks, thereby maximizing the overall utility of all missions.

[0057]

[0058] Specifically, this means improving the utilization rate of aviation firefighting resources, shortening the time to complete the mission, and increasing the overall effectiveness of the mission while meeting mission requirements, and simultaneously reducing resource consumption. The resource consumption matrix is defined as follows: , and when When, among which elements ;

[0059] When an aircraft flies at a constant speed, the energy it consumes can be measured in time. Therefore, the forest fire aerial rescue mission allocation problem with multiple optimization objectives is modeled as a unified objective function, expressed as:

[0060]

[0061] The first item describes the utility benefits brought about by the completion of the mission, reflecting the overall effectiveness of the rescue operation; the second item reflects the level of resource consumption during mission execution, reflecting the requirements of reducing costs and improving efficiency; and the third item indicates the degree of matching between resources and missions, ensuring that heterogeneous resources can be rationally and efficiently allocated to appropriate mission points.

[0062] Furthermore, the state space of the multi-constraint task allocation optimization model described in step (2) is as follows:

[0063] The state space is used to characterize the system state in forest fire rescue missions, including three types of features: first, the state of the aircraft cluster, including location, remaining endurance and payload capacity; second, mission requirement features, including the spatial location of mission points, requirement intensity and priority coefficient; and third, environmental descriptors, including fire spread rate, wind speed and terrain. These features together constitute a high-dimensional state space, whose high dimensionality stems from the large number of heterogeneous aircraft, the complex distribution of mission points and the combined superposition of dynamic environmental factors.

[0064] The state space defines the states of dynamic task allocation, including the current task queue, resource status, task priority, and environmental changes. To enhance the flexibility and scalability of the state space, state information is normalized. The Each component can be obtained through the Min-Max normalization method shown below:

[0065] .

[0066] Furthermore, the action space described in step (2) is as follows:

[0067] The action space is used to characterize the executable task allocation schemes in each decision step. Specifically, each action corresponds to an allocation decision, that is, assigning an aircraft to a specific task point or keeping it on standby when necessary. For heterogeneous aircraft clusters, actions not only reflect the mapping relationship between aircraft and tasks, but also implicitly contain the matching of resource attributes and task requirements. Thus, the action space essentially reflects the dynamic allocation decision process between "aircraft and task".

[0068] The action space includes a length of vector The elements therein represent the allocation of aircraft to target mission points. Indicates assignment to task The set of aircraft is used to define the allocation strategy.

[0069] Furthermore, the reward function described in step (2) is as follows:

[0070] The reward function design needs to focus on the indicators of forest fire rescue tasks in order to optimize the total utility of the task. The reward function is designed as follows:

[0071]

[0072] in, The reward is for completing the task within the specified time. To simulate the current time, This is the simulation start time. For the simulation duration, Weighting of rewards based on task completion time;

[0073] Positive rewards are given for the successful completion of the target task, and weighted by a task priority coefficient to reflect the importance and urgency of different tasks, as detailed below:

[0074]

[0075] in, To receive a reward for completing the target task, The target is the maximum task point. For a moment The target task, Reward weight for completing target tasks;

[0076] Combining the two types of rewards above, the immediate reward is defined as the linear sum of the time penalty and the task completion reward, expressed as:

[0077] .

[0078] Furthermore, the allocation strategy is implemented as follows:

[0079] The policy network has an encoder-temporal aggregation-attention mechanism-decoder structure. The input features, namely aircraft, tasks, and environment, are encoded into a unified representation, processed by the temporal module GRU, and refined by the attention mechanism that emphasizes urgent tasks and critical aircraft, and decoded into probabilistic task allocation decisions. The encoder maps heterogeneous aircraft, task points, and environmental information into compact representation vectors, thereby achieving feature extraction and dimensionality reduction while retaining the most critical information for decision-making. The GRU is used to capture the temporal dependencies of fire evolution and task dynamics, thereby modeling the impact of historical information on current task allocation. The attention mechanism is used to highlight the most critical elements for decision-making, and in the context of multiple tasks and multiple aircraft, it can dynamically focus on more urgent or higher-value allocation objects. The decoder transfers the output of the attention mechanism and the features generated by temporal aggregation to the action space and outputs the task allocation scheme in probabilistic form.

[0080] Furthermore, the implementation process of step (3) is as follows:

[0081] In forest fire rescue scenarios, fires exhibit non-stationarity and dynamic evolution. The pruning and updating mechanism of the near-end policy optimization algorithm can adapt to the changing characteristics of task priorities and environmental constraints over time, achieving efficient task allocation while ensuring learning stability. The adopted policy loss function is expressed as:

[0082]

[0083]

[0084] in, Will Cut off at In the middle, ensure the new and old strategies and resemblance;

[0085] Advantage function Represented as:

[0086]

[0087]

[0088] The value network loss function is specifically expressed as:

[0089]

[0090] in, The state value estimated by the value function network. The target value used to update the value function during training;

[0091] Initialize policy network With value network Set clipping factor Discount Factor With the number of participating agents At the start of each training round, the environmental state is initialized, including the distribution of fire sources, the locations of each task point, and the positions of aircraft, establishing a task-resource correspondence state set; within each time step, each agent selects an action based on the current policy network. That is, making task allocation decisions, executing corresponding task allocation actions, and receiving rewards. With the next state Simultaneously store After completing one round of environmental interaction, the dominance function is calculated using the collected samples. Evaluate the performance of the strategy; then calculate the loss function based on the value network. To minimize the deviation between predicted values and actual returns; based on the pruning strategy update rules, backpropagation and parameter updates are performed on the policy network and value network respectively, ensuring that the policy improvement is limited by... This maintains training stability; multiple training iterations are repeated to continuously optimize the task allocation scheme output by the policy network until convergence is achieved; the final optimal policy network is output, which yields the dynamic task allocation result of heterogeneous aircraft in the forest fire rescue scenario.

[0092] Beneficial effects: Compared with other technologies, the beneficial effects of this invention are: This invention can solve the problem of mission allocation for heterogeneous aircraft in forest fires with complex fire environments and dynamic changes in missions, ensuring the efficient completion of rescue missions and effectively reducing mission completion time, thereby improving the efficiency of forest fire fighting; This invention can provide operable decision support for emergency command departments, enabling them to quickly and scientifically deploy aerial firefighting resources in complex and uncertain rescue scenarios. Attached Figure Description

[0093] Figure 1 This is a flowchart of the present invention;

[0094] Figure 2 This is a schematic diagram of the strategy network architecture for generating allocation decisions proposed in this invention;

[0095] Figure 3 This is a schematic diagram of the distribution of aviation fire stations and mission points in the verification scenario provided by the present invention;

[0096] Figure 4 This is a schematic diagram of simulation results comparing different indicators with other methods provided by the present invention;

[0097] Figure 5 This is a schematic diagram of simulation results comparing different methods with other methods under different task numbers provided by the present invention. Detailed Implementation

[0098] The following are specific embodiments of the present invention, which are described in conjunction with the accompanying drawings to further illustrate the technical solutions of the present invention. However, the present invention is not limited to these embodiments.

[0099] like Figure 1 As shown, this invention proposes a method for assigning heterogeneous aircraft forest fire rescue tasks based on deep reinforcement learning. The problem to be solved is the dynamic task assignment of heterogeneous aircraft forest fire rescue: a cluster of heterogeneous aircraft, including drones, helicopters and fixed-wing aircraft, departs from an aerial fire station to conduct fire detection and firefighting operations on a dynamic fire site, realizing collaborative task assignment and efficient rescue among multiple types of aircraft.

[0100] Define a two-dimensional plane The area is a forest fire zone, and environmental factors considered include wind speed, terrain slope, relative humidity, and vegetation type. It is assumed that the number of fire points requiring aircraft assistance and their specific locations are known, and the fire point outlines are simplified to circles, with the center of the circle representing the mission location. The number of aircraft deployed is... This includes drones, helicopters, and fixed-wing aircraft; the characteristics of each type of aircraft are known. The capability matrix is .

[0101] A multi-constraint task allocation modeling method comprehensively considers aircraft performance, mission requirements, and fire dynamics. It establishes task assignment relationships based on an aircraft-task association model, enabling different types of aircraft to achieve reasonable division of labor according to their own attributes. The priority of target tasks is characterized based on a forest fire spread model. Constraints and objective functions are set for the task allocation model to optimize the overall response efficiency at the fire scene. By designing state spaces, action spaces, and reward functions separately and modeling the task allocation strategy for heterogeneous aircraft, an adaptive expression of task scheduling behavior under dynamic fire conditions is achieved. A dynamic task allocation algorithm for forest fire rescue using heterogeneous aircraft is executed to obtain a stable and generalizable optimal allocation scheme.

[0102] Heterogeneous aircraft collection Used to describe heterogeneous resources involved in the rescue, including performance parameters and availability constraints of different types of aircraft:

[0103]

[0104] in, The number of aircraft, and the fact that each aircraft has four elements, collectively determine the aircraft's accessibility and operational efficiency in different fire environments, reflecting resource constraints in actual rescue scenarios:

[0105]

[0106] in, For aircraft The number, For aircraft Location information For aircraft The abilities and attributes they possess For aircraft Flight speed.

[0107] aircraft The set of one's own ability attributes is represented as:

[0108]

[0109] in, For aircraft The first This is a type of ability attribute, with values ranging from 1 to 2. , .

[0110] Task Collection This indicates the rescue mission that needs to be completed, including its spatial location, the intensity of the mission requirements, and the time requirements.

[0111]

[0112] in, The number of tasks, with each task having 5 elements, reflects the core decision-making elements in emergency rescue and is widely used in relevant aviation fire dispatch practices and research, thus ensuring the practical applicability of the model:

[0113]

[0114] in, For the task The number, For the task Location information For the task Regarding the resource demand attributes, the first one... The value of each attribute is , , For the task The priority coefficient of rescue, For the task The overall level of importance.

[0115] To facilitate the study of the degree to which target mission requirements are met, the correspondence between aircraft capability attributes and mission requirement attributes is established, as shown in Table 1:

[0116] Table 1 Correspondence between Aircraft Capabilities and Mission Requirement Attributes

[0117]

[0118] Calculate the fire spread rate based on the classic fire spread model Specifically, it is expressed as:

[0119]

[0120]

[0121] in, The initial spread rate, For wind speed, This is a correction factor for the type of combustible material. This is the wind correction factor. This is the terrain slope correction factor. For terrain slope, The highest temperature of the day. The average wind speed at noon that day. For temperature parameters, For wind force parameters, This is a humidity parameter.

[0122] Then at time Task The included fire area is expressed as follows:

[0123]

[0124] Therefore, the fire area can be used to represent task priority. The rescue priority coefficient is expressed as:

[0125] .

[0126] In the process of multi-constraint task assignment modeling, for any aircraft and tasks If there is an allocation relationship ( There are two possible scenarios: one is an aircraft After completing the task The task was then assigned. That is, the transfer variable Secondly, aircraft Upon first use, tasks are directly assigned for processing. In this case, there is no transfer variable, i.e., no transfer variable. Therefore, it is assumed that virtual tasks exist. , It is the starting point for all missions. At the initial allocation, all aircraft missions are in the virtual mission. Above, record Then the aircraft-task assignment variable Transfer variables between missions for aircraft There are constraints:

[0127]

[0128] Furthermore, since an aircraft can only perform one mission at a time, the following applies:

[0129]

[0130] aircraft After completing the task The task was then assigned. At that time, aircraft Start processing tasks start time Not smaller than an aircraft To the mission The time of the region, that is:

[0131]

[0132] in, For the task Processing time, For the task With the task The spatial distance between them is expressed as:

[0133]

[0134] in, , Tasks With the task Its geographical location.

[0135] like This is the upper limit of the total mission completion time. The allocation and sequence constraints of aircraft among missions are expressed as follows:

[0136]

[0137] When the task With the task There is an order relationship. ,and When the above formula describes the waiting behavior of an aircraft during the allocation process; when ,and Then, the above formula can be further expressed as:

[0138]

[0139] In order for an aircraft to perform a task, all of its capability attributes must be greater than or equal to the required attribute value of the target task.

[0140]

[0141] To effectively extinguish the fire, the following constraints apply:

[0142]

[0143] in, For aircraft To the target task Spatial distance, The initial time allocated for aircraft missions.

[0144] The time required to complete all tasks is [time]. Then the processing time constraint for any task is expressed as:

[0145] .

[0146] The objective function of the task allocation model is defined as follows:

[0147] The goal of heterogeneous aircraft mission allocation for forest fires is to shorten the time to complete the overall firefighting mission and improve the effectiveness of completing all missions. This effectiveness requires allocating appropriate aircraft or aircraft clusters to the correct areas to perform appropriate tasks, thereby maximizing the overall utility of all missions.

[0148] .

[0149] Specifically, this means improving the utilization rate of aviation firefighting resources, shortening the time to complete the mission, and increasing the overall effectiveness of the mission while meeting mission requirements, and simultaneously reducing resource consumption. The resource consumption matrix is defined as follows: , and when When, among which elements .

[0150] When an aircraft flies at a constant speed, the energy it consumes can be measured in time. Therefore, the forest fire aerial rescue mission allocation problem with multiple optimization objectives is modeled as a unified objective function, expressed as:

[0151] .

[0152] The state space defines the states of dynamic task allocation, including the current task queue, resource status, task priority, and environmental changes (including temporal and spatial distribution). To enhance the flexibility and scalability of the state space, state information is normalized. The Each component can be obtained through the Min-Max normalization method shown below:

[0153] .

[0154] The action space contains a length of vector The elements therein represent the allocation of aircraft to target mission points. Indicates assignment to task The set of aircraft is used to define the allocation strategy.

[0155] In this embodiment, the reward function is designed as follows:

[0156]

[0157] in, The reward is for completing the task within the specified time. To simulate the current time, This is the simulation start time. For the simulation duration, The reward weight is assigned to the task completion time.

[0158] Positive rewards are given for the successful completion of the target task, and weighted by a task priority coefficient to reflect the importance and urgency of different tasks, as detailed below:

[0159]

[0160] in, To receive a reward for completing the target task, The target is the maximum task point. For a moment The target task, The reward weight is determined by the completion of the target task.

[0161] Combining the two types of rewards above, this paper defines immediate reward as the linear superposition of time penalty and task completion reward, expressed as:

[0162] .

[0163] The policy network has an encoder-temporal aggregation-attention mechanism-decoder structure. Input features (aircraft, mission, environment) are encoded into a unified representation, processed by a temporal module (GRU), and refined through an attention mechanism emphasizing urgent missions and critical aircraft. The final representation is decoded into probabilistic mission assignment decisions. Figure 2 As shown, the functions of each module and the overall workflow in the policy network structure are as follows:

[0164] Encoder: Maps heterogeneous aircraft, mission points and environmental information into compact representation vectors, thereby achieving feature extraction and dimensionality reduction, while retaining the information most critical to decision-making.

[0165] GRU: Used to capture the temporal dependencies of fire evolution and mission dynamics, thereby modeling the impact of historical information on current mission allocation.

[0166] Attention mechanism: used to highlight the most critical elements for decision-making, and in the context of multi-tasking and multi-aircraft, it can dynamically focus on more urgent or higher-value allocation objects.

[0167] Decoder: Transfers the output of the attention mechanism and the features generated by temporal aggregation to the action space, and outputs the task allocation scheme in probabilistic form.

[0168] In forest fire rescue scenarios, fires exhibit significant non-stationarity and dynamic evolution characteristics. The pruning update mechanism of the near-end policy optimization algorithm can effectively adapt to the changing characteristics of task priorities and environmental constraints over time, thereby achieving efficient task allocation while ensuring learning stability. The adopted policy loss function is expressed as follows:

[0169]

[0170]

[0171] in, Will Cut off at In the middle, ensure the new and old strategies and resemblance.

[0172] Advantage function Represented as:

[0173]

[0174]

[0175] The value network loss function is specifically expressed as:

[0176]

[0177] in, The state value estimated by the value function network. This is the target value used to update the value function during training.

[0178] Based on the above model, a dynamic task allocation algorithm for forest fire rescue using heterogeneous aircraft is constructed by combining near-end strategy optimization. The specific steps are as follows:

[0179] S1. Initialize the policy network With value network Set clipping factor Discount Factor With the number of participating agents ;

[0180] S2. At the beginning of each training round, initialize the environmental state (including fire source distribution, location of each task point and aircraft position) and establish a task-resource corresponding state set.

[0181] S3. At each time step, each agent selects an action based on the current policy network. (i.e., task allocation decision-making), execute the corresponding task allocation behavior, and receive rewards. With the next state Simultaneously store ;

[0182] S4. After completing one round of environmental interaction, calculate the dominance function using the collected samples. Evaluate the performance of the strategy; then calculate the loss function based on the value network. This minimizes the discrepancy between the predicted value and the actual return;

[0183] S5. Based on the pruning strategy update rules, perform backpropagation and parameter updates on the policy network and value network respectively, ensuring that the policy improvement is limited by... This helps maintain training stability;

[0184] S6. Repeatedly perform multiple rounds of training iterations to continuously optimize the task allocation scheme output by the policy network until the convergence condition is met.

[0185] S7. Output the final optimal policy network to obtain the dynamic task allocation results of heterogeneous aircraft in the forest fire rescue scenario.

[0186] Figure 3 This is a scenario setting provided by an embodiment of the present invention. The region is a representative forest fire-prone area with complex terrain and diverse vegetation types, which can well represent the uncertainties and challenges in real fire rescue. The aerial fire station is considered as point 0. Assuming the distribution of each point is known, the distances between points and the environmental conditions within each region are also known. The aircraft swarm consists of 2 drones, 6 helicopters, and 1 fixed-wing aircraft, with specific performance parameters shown in Table 2. The initial positions of the aircraft swarm are all located at point 0.

[0187] Table 2 Specific Performance Parameters of Aircraft

[0188]

[0189] Figure 4 This is a simulation result of an embodiment provided by the present invention: a comparison of the average task utility (AVGR), average convergence cycles required (AVGI), and average task completion time (AVGT) obtained by using the PSO algorithm, GWO algorithm, DQN algorithm, and the method proposed in this invention, when the number of tasks is 7 and the number of available aircraft is 8. The average task utility of the different algorithms is 102 (PSO), 88 (GWO), 93 (DQN), and 115 (the method proposed in this invention), respectively, and the average task completion time is 77934s (PSO), 71947s (GWO), 62998s (DQN), and 62320s (the method proposed in this invention), respectively, indicating that the proposed method can achieve the maximum task utility and the shortest task completion time. This verifies the adaptability and superiority of the present invention in complex dynamic environments, enabling heterogeneous aircraft clusters to efficiently and collaboratively complete forest fire rescue missions and effectively control the spread of fire in the shortest possible time.

[0190] Figure 5This is a simulation result of an embodiment provided by the present invention: a comparison of the average task utility (AVGR), average convergence period number (AVGI), and average task completion time (AVGT) obtained by the PSO algorithm, GWO algorithm, DQN algorithm, and the method proposed in this invention when setting 1, 2, 3, 4, 5, 6, and 7 target tasks respectively. When the number of target tasks increases from 1 to 7, the superiority of the AVGT, AVGR, and AVGI obtained by the method proposed in this invention compared with other algorithms shows an increasing trend. The average task utility is improved by 2.91%, 4.95%, 5.40%, 6.61%, 6.80%, 11.17%, and 12.75% compared with the PSO algorithm, by 10.59%, 13.11%, 15.91%, 19.58%, 27.75%, 29.59%, and 30.68% compared with the GWO algorithm, and by 4.94%, 6.04%, and 6.04% compared with the DQN algorithm. The average task completion times are 32%, 14.81%, 17.62%, 17.74%, and 23.66%, respectively. Compared to the PSO algorithm, the average task completion time is reduced by 1.90%, 2.65%, 3.33%, 7.21%, 9.07%, 17.96%, and 20.07%; compared to the QWO algorithm, the average task completion times are reduced by 4.40%, 5.76%, 7.25%, 8.08%, 10.97%, 11.53%, and 13.38%; and compared to the DQN algorithm, the average task completion times are reduced by 0.46%, 0.78%, 0.90%, 0.90%, 0.99%, 1.05%, and 1.08%. In other words, the advantages of the method proposed in this invention become increasingly apparent as the task size increases.

[0191] The preferred embodiments of the present invention have been described in detail above. However, the present invention is not limited to the specific details in the above embodiments. Within the scope of the technical concept of the present invention, various equivalent transformations can be made to the technical solutions of the present invention, and these equivalent transformations all fall within the protection scope of the present invention.

Claims

1. A method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning, characterized in that, Includes the following steps: (1) Construct a multi-constraint task allocation optimization model that comprehensively considers aircraft performance, mission requirements and fire dynamics to characterize complex forest fire rescue scenarios and ensure that the task allocation process conforms to the dynamic constraints under the actual rescue environment; the modeling method of the multi-constraint task allocation optimization model is as follows: establish task assignment relationship based on aircraft-task association model so that different types of aircraft can achieve reasonable division of labor according to their own attributes; characterize the priority of target tasks based on forest fire spread model; The constraints and objective functions of the multi-constraint task allocation optimization model are set separately to achieve the optimal overall response efficiency of the fire scene; (2) Design the state space, action space and reward function of the multi-constraint task allocation optimization model respectively, construct the task allocation strategy model of heterogeneous aircraft, and realize the adaptive expression of task scheduling behavior under dynamic fire situation; (3) Combining the near-end strategy optimization of the task allocation strategy model of heterogeneous aircraft, we can achieve efficient solution and strategy optimization for high-dimensional state space and time-varying environment, so as to obtain a stable and generalizable optimal allocation scheme under dynamic fire conditions.

2. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The aircraft-mission association model includes heterogeneous aircraft sets. and target task set Define and establish attribute mapping relationships; The heterogeneous aircraft collection Used to describe heterogeneous resources involved in the rescue, including performance parameters and availability constraints of different types of aircraft: in, The number of aircraft, and each aircraft has 4 elements. These factors collectively determine the accessibility and operational efficiency of aircraft in different fire scene environments, reflecting resource constraints in actual rescue scenarios. For aircraft The number, For aircraft Location information For aircraft The abilities and attributes they possess For aircraft Flight speed; aircraft The set of one's own ability attributes is represented as: in, For aircraft The first This is a type of ability attribute, with values ranging from 1 to 2. , ; The task set This indicates the rescue mission that needs to be completed, including its spatial location, the intensity of the mission requirements, and the time requirements: in, The number of tasks is given, and each task has 5 elements. , For the task The number, For the task Location information For the task Regarding the demand attributes of resources, the first The value of each attribute is , , For the task The priority coefficient of rescue, For the task The overall level of importance; The attribute mapping relationship is the correspondence established between aircraft capability attributes and mission requirement attributes.

3. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The forest fire spread model is based on the classic fire spread model to calculate the fire spread rate. Specifically, it is expressed as: in, The initial spread rate, For wind speed, This is a correction factor for the type of combustible material. This is the wind correction factor. This is the terrain slope correction factor. For terrain slope, The highest temperature of the day. The average wind speed at noon that day. For temperature parameters, For wind parameters, For humidity parameters; Then at time Task The included fire area is expressed as follows: The fire area is used to represent the priority of tasks. The rescue priority coefficient is expressed as: 。 4. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The specific constraints of the multi-constraint task allocation optimization model are as follows: Task allocation constraints: For any aircraft and tasks If there is an allocation relationship ( There are two possible scenarios: one is an aircraft After completing the task The task was then assigned to be processed. That is, the transfer variable Secondly, aircraft Upon first use, tasks are directly assigned for processing. In this case, there is no transfer variable, i.e., no transfer variable. Therefore, it is assumed that virtual tasks exist. , It is the starting point for all missions. At the initial allocation, all aircraft missions are in the virtual mission. Above, record Then the aircraft-task assignment variable Transfer variables of aircraft between missions There are constraints: Furthermore, since an aircraft can only perform one mission at a time, the following applies: ； Task timing constraints: aircraft After completing the task The task was then assigned to be processed. At that time, aircraft Start processing tasks start time Not smaller than an aircraft To the mission The time of the region, that is: in, For the task Processing time, For the task With the task The spatial distance between them is expressed as: in, , Tasks With the task Geographical location; like This is the upper limit of the total mission completion time. The allocation and sequence constraints of aircraft among missions are expressed as follows: When the task With the task There is an order relationship. ,and When the above formula describes the waiting behavior of an aircraft during the allocation process; when ,and Then, the above formula can be further expressed as: ； Aircraft capability constraints: In order for an aircraft to perform a task, every capability attribute value it possesses must be greater than or equal to the required attribute value of the target task. ； Firefighting efficiency constraints: To effectively extinguish the fire, the following constraints apply: in, For aircraft To the target task Spatial distance, The initial time allocated for aircraft missions; Task completion time constraints: The time required to complete all tasks is [time]. Then the processing time constraint for any task is expressed as: 。 5. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The objective function of the multi-constraint task allocation optimization model is as follows: The goal of heterogeneous aircraft mission allocation for forest fires is to shorten the time to complete the overall firefighting mission and improve the effectiveness of completing all missions. This effectiveness requires allocating appropriate aircraft or aircraft clusters to the correct areas to perform appropriate tasks, thereby maximizing the overall utility of all missions. Specifically, this means improving the utilization rate of aviation firefighting resources, shortening the time to complete the mission, and increasing the overall effectiveness of the mission while meeting mission requirements, and simultaneously reducing resource consumption. The resource consumption matrix is defined as follows: , and when When, among which elements ; When an aircraft flies at a constant speed, the energy it consumes can be measured in time. Therefore, the forest fire aerial rescue mission allocation problem with multiple optimization objectives is modeled as a unified objective function, expressed as: The first item describes the utility benefits brought about by the completion of the mission, reflecting the overall effectiveness of the rescue operation; the second item reflects the level of resource consumption during mission execution, reflecting the requirements of reducing costs and improving efficiency; and the third item indicates the degree of matching between resources and missions, ensuring that heterogeneous resources can be rationally and efficiently allocated to appropriate mission points.

6. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The state space of the multi-constraint task allocation optimization model described in step (2) is as follows: State space is used to characterize the system state in forest fire rescue missions, and includes three types of features: One is the status of the aircraft cluster, including its location, remaining flight time, and payload capacity; Secondly, there are mission requirement characteristics, including the spatial location of the mission point, the intensity of the requirement, and the priority coefficient; thirdly, there are environmental descriptors, including the fire spread rate, wind speed, and terrain; the above features together constitute a high-dimensional state space, the high dimensionality of which stems from the large number of heterogeneous aircraft, the complex distribution of mission points, and the combined superposition of dynamic environmental factors. The state space defines the states of dynamic task allocation, including the current task queue, resource status, task priority, and environmental changes. To enhance the flexibility and scalability of the state space, state information is normalized. The Each component can be obtained through the Min-Max normalization method shown below: 。 7. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The action space described in step (2) is as follows: The action space is used to characterize the executable task allocation schemes in each decision step. Specifically, each action corresponds to an allocation decision, that is, assigning an aircraft to a specific task point or keeping it on standby when necessary. For heterogeneous aircraft clusters, actions not only reflect the mapping relationship between aircraft and tasks, but also implicitly contain the matching of resource attributes and task requirements. Thus, the action space essentially reflects the dynamic allocation decision process between "aircraft and tasks". The action space includes a length of vector The elements therein represent the allocation of aircraft to target mission points. Indicates assignment to task The set of aircraft is used to define the allocation strategy.

8. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The reward function described in step (2) is as follows: The reward function design needs to focus on the indicators of forest fire rescue tasks in order to optimize the total utility of the task. The reward function is designed as follows: in, The reward is for completing the task within the specified time. To simulate the current time, This is the simulation start time. For the simulation duration, Weighting of rewards based on task completion time; Positive rewards are given for the successful completion of the target task, and weighted by a task priority coefficient to reflect the importance and urgency of different tasks, as detailed below: in, To receive a reward for completing the target task, The target is the maximum task point. For a moment The target task, Reward weight for completing target tasks; Combining the two types of rewards above, the immediate reward is defined as the linear sum of the time penalty and the task completion reward, expressed as: 。 9. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 8, characterized in that, The allocation strategy is implemented as follows: The policy network has an encoder-temporal aggregation-attention mechanism-decoder structure. The input features, namely aircraft, tasks, and environment, are encoded into a unified representation, processed by the temporal module GRU, and refined by the attention mechanism that emphasizes urgent tasks and critical aircraft, and decoded into probabilistic task allocation decisions. The encoder maps heterogeneous aircraft, task points, and environmental information into compact representation vectors, thereby achieving feature extraction and dimensionality reduction while retaining the most critical information for decision-making. The GRU is used to capture the temporal dependencies of fire evolution and task dynamics, thereby modeling the impact of historical information on current task allocation. The attention mechanism is used to highlight the most critical elements for decision-making, and in the context of multiple tasks and multiple aircraft, it can dynamically focus on more urgent or higher-value allocation objects. The decoder transfers the output of the attention mechanism and the features generated by temporal aggregation to the action space and outputs the task allocation scheme in probabilistic form.

10. The method for allocating forest fire relief tasks for heterogeneous aircraft based on deep reinforcement learning according to claim 1, characterized in that, The implementation process of step (3) is as follows: In forest fire rescue scenarios, fires exhibit non-stationarity and dynamic evolution. The pruning and updating mechanism of the near-end policy optimization algorithm can adapt to the changing characteristics of task priorities and environmental constraints over time, achieving efficient task allocation while ensuring learning stability. The adopted policy loss function is expressed as: in, Will Cut off at In the middle, ensure the new and old strategies and resemblance; Advantage function Represented as: The value network loss function is specifically expressed as: in, The state value estimated by the value function network. The target value used to update the value function during training; Initialize policy network With value network Set clipping factor Discount Factor With the number of participating agents At the start of each training round, the environmental state is initialized, including the distribution of fire sources, the locations of each task point, and the positions of aircraft, establishing a task-resource correspondence state set; within each time step, each agent selects an action based on the current policy network. That is, making task allocation decisions, executing corresponding task allocation actions, and receiving rewards. With the next state Simultaneously store After completing one round of environmental interaction, the dominance function is calculated using the collected samples. Evaluate the performance of the strategy; then calculate the loss function based on the value network. To minimize the deviation between predicted values and actual returns; based on the pruning strategy update rules, backpropagation and parameter updates are performed on the policy network and value network respectively, ensuring that the policy improvement is limited by... This maintains training stability; multiple training iterations are repeated to continuously optimize the task allocation scheme output by the policy network until convergence is achieved; the final optimal policy network is output, which yields the dynamic task allocation result of heterogeneous aircraft in the forest fire rescue scenario.