A cost-sensitive planning learning method for manned machine-human group collaborative task allocation
By constructing a unified utility function and using Q-learning-guided iterative local search, the adaptive decision-making problem of manned-unmanned swarm cooperative systems in dynamic environments is solved, achieving efficient and low-risk joint scheduling and path planning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTHWESTERN POLYTECHNICAL UNIV
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies struggle to achieve efficient, low-risk, and high-return joint scheduling and path planning for manned-unmanned swarm collaborative systems in dynamic and uncertain environments, especially lacking adaptive capabilities and cost-sensitive decision-making under strong constraints.
By constructing a unified utility function and learning the value function of intervention actions through iterative local search guided by Q-learning, a cost-sensitive collaborative planning method is formed. This method unifies decision-making intervention strategies and combinatorial planning search, adaptively adjusts intervention granularity and roles, and responds to changes in the situation.
Under dynamic conditions, the collaborative system achieves cost sensitivity and adaptive decision-making, generates high-quality feasible solutions, maintains task quality and risk control capabilities, and adapts to different collaborative situation changes.
Smart Images

Figure CN122288231A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of task planning technology, specifically involving a cost-sensitive planning learning method for manned-machine and manned-machine swarm collaborative task allocation. Background Technology
[0002] Manned-unmanned swarm collaborative systems are gradually becoming an important technological form for executing missions in complex airspace. Compared with a single platform, manned aircraft have stronger range, payload, anti-jamming capabilities, and maneuverability in complex situations, while unmanned swarms have advantages in high coverage, high parallelism, and distributed execution. Their collaboration can form complementary capabilities in a wide range of scenarios, including disaster relief, inspection and monitoring, target search, and emergency delivery. However, collaborative operation in real-world environments is not simply a matter of platform aggregation, but depends on the ability to continuously generate executable, low-risk, and high-return joint scheduling and path solutions under dynamic, uncertain, and multi-constrained conditions. Especially in strongly constrained collaborative tasks, task quality is often determined by information confidence, time limit fulfillment, and risk boundaries, which makes the planning problem inherently characterized by the coupling of information, risk, and resources.
[0003] Currently, manned-unmanned swarm collaborative systems face three core challenges in decision-making and planning:
[0004] First, mission environments are typically dynamic and uncertain. Risk fields, communication conditions, and mission information phases evolve over time, making it difficult to maintain feasibility and high quality under changing conditions by relying solely on fixed rules or static planning. There are clear information phases evolving during mission execution; some critical tasks require UAV swarms to conduct reconnaissance and confirm the situation with sufficient confidence before proceeding to the next execution phase. Under high-threat or strong interference conditions, UAV swarms often cannot independently obtain stable, high-confidence observations and feedback, necessitating supplementation from manned aircraft's onboard sensing, mobile reconnaissance, or communication relay capabilities. The collaborative system must not only plan once but also possess the ability to update decision boundaries as the situation changes, ensuring that the information and execution chains remain consistent under changing risks and communication conditions, and restoring feasibility and quality limits through manned aircraft intervention when necessary.
[0005] Secondly, collaborative resources are not free. Manned aircraft intervention brings fuel consumption, exposure risks, communication overhead, and opportunity costs within the mission window. The intensity and granularity of intervention must be commensurate with mission benefits and safety objectives. Manned aircraft intervention brings fuel and flight time consumption, exposure risks, mission window overhead, and additional demands on communication bandwidth and collaborative organization. Different granularities of intervention alter the structural boundaries of the collaborative system: task-point level intervention can directly undertake critical execution or high-confidence confirmation; cluster-level intervention can provide continuous support and stabilize communication links for a group of tasks; and segment-level intervention can alter the accessibility and safety channels of a drone swarm in risky environments. Excessive intervention intensity can easily lead to increased overall costs and introduce new exposure risks, while insufficient intervention may result in decreased mission quality or even infeasibility due to insufficient information or uncontrollable risks. "Whether to intervene, how much to intervene, and at what granularity to intervene" should not be predetermined rules, but rather should become a learnable and weighable decision-making object for the collaborative system as the situation changes, and should be explicitly incorporated into the unified objectives of mission benefits, costs, and risks.
[0006] Third, joint scheduling and path planning are typical combinatorial optimization problems, with the solution space growing exponentially with the scale of the task and the platform. The collaborative system needs to simultaneously determine task allocation, execution order, track connections, time coordination, and how to satisfy confirmation-execution dependencies. It must also maintain overall feasibility under constraints such as risk budget, resource budget, and communication maintainability. End-to-end learning methods struggle to consistently generate executable solutions under limited samples and strong constraints. While relying entirely on traditional heuristic search can provide some feasibility control, it often lacks adaptability in dynamic collaborative situations. Especially when the risk field, communication conditions, and task information stages change, the search process's decision of "whether to prioritize cross-platform allocation, repair the confirmation-execution chain, or reconstruct the risk exposure structure" often relies on human experience, limiting efficiency and generalization ability. Pure local search, while offering advantages in feasibility control, lacks the ability to adaptively select the direction of structured transformation under different collaborative situations.
[0007] In summary, existing technologies have not yet developed a planning method that can unify collaborative intervention strategies and combinatorial planning search under the same utility objective, while simultaneously possessing cost sensitivity, dynamic situational adaptability, and strong constraint feasibility control capabilities. Summary of the Invention
[0008] To address the aforementioned problems in the existing technology, this application provides a cost-sensitive planning and learning method for manned-machine / manned-machine swarm collaborative task allocation. The technical problem to be solved by this application is achieved through the following technical solution:
[0009] A cost-sensitive planning learning method for manned-machine / man-machine swarm collaborative task allocation includes:
[0010] S100, Construct a unified utility function. The unified utility function uses a weighted combination of task completion quality, collaborative intervention cost, and risk penalty as a comprehensive evaluation metric for collaborative planning schemes. The achievable upper bound of collaborative intervention strategy on task completion quality, the shaping effect of risk exposure structure, and the boundary of feasible solution set are written into the unified utility function.
[0011] S200, Construct a joint state space, which is formed by merging the structural feature mappings of the cooperative situation state and the current cooperative solution;
[0012] S300, using the joint state space as input, learn the intervention action value function, and use the difference of the unified utility function as feedback to update the intervention strategy. The intervention strategy is used to determine the granularity, role and budget boundary of human-machine intervention, and induce a feasible solution set that satisfies the joint constraint set.
[0013] S400, within the feasible solution set induced by the intervention strategy, the structured operator family is value-learned through iterative local search guided by Q-learning, with the joint state space as input, and the difference of the unified utility function is used as operator feedback to generate a cooperative scheduling solution under the current situation. The cooperative scheduling solution includes at least a cross-platform allocation sequence, confirmation-execution dependency relationship and risk exposure path structure.
[0014] S500, in response to changes in the cooperative situational state, takes the decrease or loss of feasibility of the unified utility function as the trigger condition, and sequentially executes intervention strategy update and cooperative scheduling deconstruction, and outputs a joint planning scheme composed of the intervention strategy and the cooperative scheduling solution.
[0015] A cost-sensitive planning and learning system for manned-machine / human-machine swarm collaborative task allocation, used in the aforementioned cost-sensitive planning and learning method for manned-machine / human-machine swarm collaborative task allocation, wherein the cost-sensitive planning and learning system for manned-machine / human-machine swarm collaborative task allocation includes:
[0016] The utility function construction module is used to construct a unified utility function. The unified utility function uses a weighted combination of task completion quality, collaborative intervention cost and risk penalty as a comprehensive evaluation metric for collaborative planning schemes. The upper bound of the achievable task completion quality, the shaping effect of collaborative intervention strategy on the boundary of feasible solution set, and other factors are written into the unified utility function.
[0017] The state representation module is used to construct a joint state space, which is formed by merging the structural feature mappings of the cooperative situation state and the current cooperative solution.
[0018] The intervention strategy learning module is used to learn the intervention action value function with the joint state space as input, and update the intervention strategy with the difference of the unified utility function as feedback. The intervention strategy is used to determine the granularity, role and budget boundary of human-machine intervention, and induce a set of feasible solutions that satisfy the joint constraint set.
[0019] The structured search module is used to perform value learning on the family of structured operators through iterative local search guided by Q-learning within the set of feasible solutions induced by the intervention strategy, with the joint state space as input, and to generate a cooperative scheduling solution under the current situation with the difference of the unified utility function as operator feedback. The cooperative scheduling solution includes at least a cross-platform allocation sequence, confirmation-execution dependency relationship and risk exposure path structure.
[0020] The rolling update and output module is used to respond to changes in the cooperative situational state, and to execute the intervention strategy update and cooperative scheduling deconstruction in sequence, triggered by the decrease or loss of feasibility of the unified utility function, and output a joint planning scheme composed of the intervention strategy and the cooperative scheduling deconstruction.
[0021] Beneficial effects:
[0022] 1. This application addresses the core challenges arising from the overlapping of "dynamic situation, cost constraints, and combinatorial complexity" in manned-unmanned swarm collaborative missions. It proposes a cost-sensitive collaborative planning learning method, organically unifying collaborative intervention strategy learning and combinatorial planning search into a single decision-making process under unified mathematical modeling and a unified utility scale. Unlike traditional approaches that treat manned intervention as external rules or ex-post fixes, this application formalizes intervention strategies as key variables of the induced feasible domain and the achievable boundary of mission quality. This makes "whether to intervene, at what granularity of intervention, what role to assume, and how to allocate the budget" learnable and comparable collaborative decision-making objects. Through this modeling approach, the compensatory role of manned capabilities is naturally incorporated into information sequence constraints, risk achievable boundaries, and communication maintainability, enabling collaborative planning to be balanced and updated with a unified objective under dynamic situations.
[0023] 2. At the methodological level, this application constructs a unified utility function. This approach incorporates task completion quality, collaboration costs, and risk penalties into a single optimization objective, using utility difference as a learning signal to ensure that intervention strategy learning and solution structure rewriting share a consistent evaluation metric. At the outer layer, the intervention action value function... In a coordinated state of affairs As input, it learns the marginal value of intervention granularity and role under different situations, thereby achieving cost-sensitive adjustment of human-machine intervention intensity and adaptive response to risky communication conditions. In the inner layer, the operator action value function... In a joint state As input, guide the iterative local search within the feasible region induced by the intervention strategy. Internally, the cross-platform allocation structure, confirmation-execution dependency chain, and risk exposure structure are iteratively rewritten, enabling combinatorial search to maintain feasibility control under strong constraints while gaining the ability to adjust the search direction according to changes in the collaborative situation. By internalizing feasibility maintenance as "feasibility domain induction and operator closure," the method in this application achieves feasibility maintenance and rapid regression under sudden changes in the situation at the runtime level, enabling online rolling updates to stably output executable solutions even when collaborative constraints tighten.
[0024] 3. At the application process level, this application further provides the running objects and information flow structure of the method in online collaborative tasks, clarifying that... As the core operating state of the system, rolling update triggering, intervention strategy update, solution structure update, constraint maintenance, and output interpretation are unified into a coherent collaborative planning chain. The output form of this application's method includes not only joint planning output... It also naturally includes decision-making criteria based on value functions, enabling "why human-machine intervention is needed, why the intervention granularity and role should be adopted, and why this type of structural rewriting should be performed" to be interpreted as the result of maximizing marginal value under a unified utility scale. This interpretability is not an additional addition, but is naturally derived from the modeling structure of value learning and utility difference, thus making the method more usable and controllable in collaborative tasks with human-machine participation, risk sensitivity, and limited resources.
[0025] The present application will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description
[0026] Figure 1 This is a flowchart illustrating the cost-sensitive planning and learning method for manned-machine / manned-machine swarm collaborative task allocation provided in this application.
[0027] Figure 2 This is a schematic diagram of the overall process of the cost-sensitive planning and learning method for manned-machine and manned-machine swarm collaborative task allocation provided in this application. Detailed Implementation
[0028] The present application will be described in further detail below with reference to specific embodiments, but the implementation of the present application is not limited thereto.
[0029] Manned-unmanned swarm collaborative systems are gradually becoming an important technological form for executing missions in complex airspace. Compared with a single platform, manned aircraft have stronger range, payload, anti-jamming capabilities, and maneuverability in complex situations, while unmanned swarms have advantages in high coverage, high parallelism, and distributed execution. Their collaboration can form complementary capabilities in a wide range of scenarios, including disaster relief, inspection and monitoring, target search, and emergency delivery. However, collaborative operation in real-world environments is not simply a matter of "platform stacking," but depends on the ability to continuously generate executable, low-risk, and high-return joint scheduling and path solutions under dynamic, uncertain, and multi-constrained conditions. Especially in highly constrained collaborative tasks, task quality is often determined by information confidence, time limit fulfillment, and risk boundaries, which makes the planning problem inherently characterized by the coupling of "information-risk-resources."
[0030] First, collaborative mission environments are typically characterized by significant uncertainty and dynamism. On the one hand, risk areas, threat intensity, weather, and communication link quality can evolve over time, causing plans based on static assumptions to quickly become invalid. On the other hand, there are distinct information phases during mission execution; some critical tasks require reconnaissance and confirmation by the UAV swarm to generate sufficiently confident situational information before proceeding to the next execution phase. Under high-threat or high-interference conditions, UAV swarms often cannot independently obtain stable, high-confidence observations and feedback, necessitating supplementation from manned aircraft's onboard sensing, mobile reconnaissance, or communication relay capabilities. Therefore, collaborative systems must not only "plan once" but also possess the ability to update decision boundaries as the situation changes, ensuring that the information and execution chains remain consistent under varying risks and communication conditions, and restoring feasibility and quality limits through manned aircraft intervention when necessary.
[0031] Secondly, the use of collaborative resources involves clear costs and risks, and collaborative decisions must be cost-sensitive. The intervention of manned aircraft is not without its costs; it brings fuel and flight time consumption, exposure risks, mission window occupancy, and additional demands on communication bandwidth and collaborative organization. More importantly, intervention at different granularities alters the structural boundaries of the collaborative system: task-point level intervention can directly undertake critical execution or high-confidence confirmation; cluster-level intervention can provide continuous support and stabilize communication links for a group of tasks; and segment-level intervention can alter the reachability and safe passage of UAV swarms in risky environments. Excessive intervention can easily lead to increased overall costs and introduce new exposure risks, while insufficient intervention may result in decreased mission quality or even infeasibility due to insufficient information or uncontrollable risks. Therefore, "whether to intervene, how much to intervene, and at what granularity to intervene" should not be predetermined rules, but rather should become a learnable and weighable decision-making object for the collaborative system as the situation changes, and should be explicitly incorporated into the unified goal of mission benefits, costs, and risks.
[0032] Furthermore, joint scheduling and path planning are essentially combinatorial optimization problems. Cooperative systems need to simultaneously determine task allocation, execution order, track connections, time coordination, and the way confirmation-execution dependencies are satisfied. They must also maintain overall feasibility under constraints such as risk budget, resource budget, and communication maintainability. The solution space grows exponentially with the number of tasks and platform scale, making it difficult for end-to-end learning methods to stably generate executable solutions under limited samples and strong constraints. While relying entirely on traditional heuristic search can provide some feasibility control, it often lacks adaptability in dynamic collaborative situations. Especially when the risk field, communication conditions, and task information stages change, the search process's decision of "whether to prioritize cross-platform allocation, repair the confirmation-execution chain, or reconstruct the risk exposure structure" often relies on human experience, resulting in limited efficiency and generalization ability. Therefore, a more feasible direction is to construct a planning system that embeds a learning mechanism into combinatorial search. This allows the search to retain the advantages of feasibility control while adaptively selecting the direction of structured transformation based on the collaborative situation and making learnable judgments about the marginal value of human-machine intervention.
[0033] Based on the above understanding, this application proposes a cost-sensitive collaborative planning learning framework for manned-unmanned swarm collaboration, unifying collaborative intervention strategies and combinatorial planning search under the same utility objective. This framework uses a unified utility function to characterize the trade-off between task benefits, collaborative costs, and risk penalties, and formalizes collaborative intervention as the induction and shaping of the feasible solution set. This ensures that intervention not only affects cost terms but also alters the feasible domain structure and the upper bound of task quality by changing information availability, risk reachability boundaries, and communication stability. Given an intervention strategy, this application employs Q-learning-guided Iterated Local Search (ILS) to solve the combinatorial planning problem. By constructing a family of structured operators oriented towards collaborative semantics, it jointly adjusts cross-platform allocation, confirmation-execution dependencies, and risk exposure structures. Utility differential gain is used as a feedback signal to learn operator selection preferences, enabling the search to adaptively select perturbation and local improvement directions according to changes in risk field strength, communication conditions, and resource reserves. Meanwhile, this application learns the value of intervention actions in a collaborative situational state, enabling the system to adaptively adjust the granularity and role of human-machine intervention at different task information stages and under risk communication conditions, thereby forming a consistent decision-making logic between task quality improvement and cost and risk constraints.
[0034] The core objective of this application is to provide a scalable methodology that, without pre-setting fixed collaboration rules, enables collaborative systems to adaptively adjust the intensity and granularity of human-machine intervention based on environmental dynamics, task information stages, and resource constraints. Simultaneously, it utilizes value-guided combinatorial search to stably generate high-quality feasible solutions. This methodology provides a unified modeling and solution paradigm for joint scheduling and path planning under broad collaborative tasks, and also provides a feasible technical foundation for further research on cost-constrained collaborative intervention decision-making, structured combinatorial search, and dynamic situational adaptation.
[0035] Combination Figure 1 and Figure 2 This application provides a cost-sensitive planning and learning method for manned-machine / manned-machine swarm collaborative task allocation, including:
[0036] S100, Construct a unified utility function. The unified utility function uses a weighted combination of task completion quality, collaborative intervention cost, and risk penalty as a comprehensive evaluation metric for collaborative planning schemes. The achievable upper bound of collaborative intervention strategy on task completion quality, the shaping effect of risk exposure structure, and the boundary of feasible solution set are written into the unified utility function.
[0037] S200, Construct a joint state space, which is formed by merging the structural feature mappings of the cooperative situation state and the current cooperative solution;
[0038] S300, using the joint state space as input, learn the intervention action value function, and use the difference of the unified utility function as feedback to update the intervention strategy. The intervention strategy is used to determine the granularity, role and budget boundary of human-machine intervention, and induce a feasible solution set that satisfies the joint constraint set.
[0039] S400, within the feasible solution set induced by the intervention strategy, the structured operator family is value-learned through iterative local search guided by Q-learning, with the joint state space as input, and the difference of the unified utility function is used as operator feedback to generate a cooperative scheduling solution under the current situation. The cooperative scheduling solution includes at least a cross-platform allocation sequence, confirmation-execution dependency relationship and risk exposure path structure.
[0040] S500, in response to changes in the cooperative situational state, takes the decrease or loss of feasibility of the unified utility function as the trigger condition, and sequentially executes intervention strategy update and cooperative scheduling deconstruction, and outputs a joint planning scheme composed of the intervention strategy and the cooperative scheduling solution.
[0041] In one specific embodiment of this application, S100 includes:
[0042] S110, Define a task completion quality item that is explicitly dependent on the intervention strategy. The dependency includes at least the quantitative contribution of the intervention to the improvement of information confidence, the improvement of time limit satisfaction, and the feasibility of the dependency.
[0043] S120, define a collaborative intervention cost item, which includes at least manned aircraft flight time consumption, payload occupancy cost and communication relay resource occupancy cost;
[0044] S130, define a risk penalty item, which includes at least the cumulative integral of manned aircraft exposure risk, unmanned aircraft swarm loss risk and cooperative link exposure risk;
[0045] S140, the weighted sum of the task completion quality item, the collaborative intervention cost item, and the risk penalty item constitutes the unified utility function;
[0046] S150, the upper bound of the reachable quality of task completion, the shaping effect of the risk exposure structure and the boundary of the feasible solution set on the collaborative intervention strategy are written into the unified utility function.
[0047] In one specific embodiment of this application, S150 includes:
[0048] S151, establish an explicit dependency relationship between the task completion quality item and the intervention strategy, so that the information confidence threshold, time limit satisfaction condition and dependency feasibility in the task completion quality item are all expressed as monotonically non-decreasing functions of intervention granularity, role and budget intensity.
[0049] S152, establish an explicit dependency relationship between the drone swarm exposure risk sub-item in the risk penalty item and the intervention strategy, so that the drone swarm exposure risk sub-item is represented as a monotonically non-increasing function of intervention granularity, role and budget intensity;
[0050] S153, Establish the explicit dependency of the boundary parameters of the feasible solution set on the intervention strategy;
[0051] S154, through S151 to S153, makes the unified utility function mathematically include both cooperative scheduling solution variables and intervention strategy variables, and the intervention strategy has a continuous, differentiable or differential numerical expression for its shaping effect on task quality, risk exposure and feasible domain boundary.
[0052] In manned-unmanned swarm collaborative mission planning, the most challenging aspect is often not "how the unmanned swarm covers the mission points" or "how the manned aircraft patrols and executes," but rather the inherent structural complementarity between the two types of platforms: unmanned swarms excel at distributed reconnaissance, information updates, and low-cost coverage, but may be infeasible or of insufficient quality in high-risk, highly jammed, or missions requiring extremely high confidence levels; manned aircraft possess stronger endurance, payload, and anti-jamming capabilities, enabling them to undertake critical mission execution, mandatory confirmation, communication relay, or risk suppression, but their intervention implies significantly higher costs and exposure risks. Therefore, the key to collaboration lies not in "distributing the mission to the two types of platforms," but in identifying when manned aircraft intervention can bring sufficient quality improvement or feasibility restoration, while simultaneously constraining the costs and risks of intervention, ensuring that intervention is neither an afterthought nor an unrestrained enhancement of platform capabilities.
[0053] This application views "collaborative intervention" as a decision object that alters the structure of combinatorial optimization problems. Intuitively, when manned and automated systems (MAS) choose to intervene in a specific area or set of critical tasks, path segments previously prohibited by the UAV swarm, task points previously excluded by risk thresholds, and information chains previously unable to maintain coordination due to communication fading may all become feasible with the support of manned and automated systems. Conversely, if manned and automated systems do not intervene, the system must rely primarily on the UAV swarm to solve the problem under stricter risk and communication constraints, leading to a shrinking set of feasible solutions. Therefore, the intervention strategy is not "making an extra decision," but rather directly impacts the set of feasible solutions and the value structure. To incorporate this impact into a unified mathematical object, we first define the task set. Human-machine ensemble drone collection A cooperative scheduling solution
[0054] ,
[0055] in and These are sets of mission sequences for manned and unmanned aircraft, respectively. To avoid simplifying cooperative semantics to "shortest path length," this application defines mission completion as a quality concept rather than simply binary completion. (The task is described in the original text.) Importance weight is In solving Intervention strategies The quality of the task completion is recorded as follows: Here It can simultaneously encode multiple collaborative requirements, such as whether completion is within a time limit, whether the information confidence level reaches a threshold, whether the "reconnaissance and confirmation before execution" dependency is met, whether completion is within the risk limit, and whether completion is under conditions where communication can be maintained. Because human-machine intervention can alter the information chain and risk accessibility, Dependencies must be allowed Otherwise, the contribution of intervention to task quality will be obscured by the mathematical model.
[0056] Intervention itself brings explicit costs and risks. Let the cost of intervention be denoted as... This can include manned aircraft flight time and fuel consumption, opportunity costs of entering high-risk airspace, continuous relay costs required to maintain the drone swarm link, and resource occupation due to intervention; the collaborative risk is denoted as... This can include risks such as manned aircraft exposure, drone wear and tear, communication link exposure, and coordination conflict. By unifying benefits, costs, and risks, the unified utility function is written as...
[0057] ,
[0058] in and A coefficient representing the trade-off between cost and risk. mean and Simultaneously determine the system's revenue, given... Under the condition of Treat it as an optimization variable.
[0059] Under the aforementioned utility objective, collaborative programming can naturally be written as a nested optimization problem: the outer variable is the intervention strategy. It specifies the timing, granularity, and budget for manned aircraft intervention, thereby inducing a set of feasible solutions that satisfy collaborative constraints such as resources, risk, communication, airspace conflict, and information priority. The inner variable is the cooperative scheduling solution. It is given The feasible region is searched to maximize overall utility. To make the "value of the intervention strategy" have a learnable and comparable definition, the optimal achievable utility of the intervention strategy is first defined as...
[0060] ,
[0061] Collaborative programming is equivalent to selecting the strategy that maximizes the optimal reachability utility from all allowed intervention strategies, i.e.
[0062] ,
[0063] The significance of this approach lies in strictly defining the role of intervention strategies as "shaping the feasible region and its optimal reachable value." When changes occur, The status and boundaries change accordingly: human intervention may make some high-risk areas accessible, enable certain confirmation-dependent tasks to be executed within time limits, or ensure the maintainability of communication links, thereby improving... The upper bound of the cost; conversely, excessive intervention leads to higher costs and exposure risks, thus suppressing utility in terms of cost and risk. Therefore, The merits or demerits no longer depend on post-hoc subjective evaluation, but are determined by... A rigorous numerical comparison standard is provided. Subsequent intervention strategy value learning and structured search operator value learning in this application both use utility difference as the feedback signal, which is precisely for... The computable approximation of change: The outer layer selects a more appropriate intervention granularity and role by estimating the improvement of optimal achievable utility by different intervention actions, while the inner layer, given... Local structural improvements are driven by the same utility difference within the feasible region, thereby ensuring that intervention decisions and combinatorial search remain consistent on the same utility scale.
[0064] In one specific embodiment of this application, S200 includes:
[0065] S210, Construct a collaborative situational awareness; the collaborative situational awareness includes at least the task information phase, platform location and timestamp, resource reserves, communication capabilities and risk field;
[0066] S220, Construct a structural feature mapping of the current collaborative solution to supplement the missing information of the collaborative situation state. The structural feature mapping includes at least the cross-platform dependency strength, the intensity of intervention and use, and the proportion of risk exposure.
[0067] S230, the cooperative situational state and the structural feature mapping are merged to form a joint state space, which serves as a unified state representation for intervention strategy learning and operator value learning.
[0068] To ensure consistency in the selection of intervention strategies and operators for combinatorial search, a state description capable of simultaneously expressing both "cooperative situation" and "current solution structure" must be constructed. Here, "cooperative situation" is not a simple environmental description as in traditional path problems, but a comprehensive entity encompassing information phases, communication capabilities, risk fields, and the resource status of manned and unmanned aircraft swarms. "Solution structure," on the other hand, reflects the coupling between current task allocation and path sequences, such as whether the sequential dependency between unmanned reconnaissance confirmation and manned aircraft execution is satisfied, whether the concentration of critical tasks on a single platform leads to excessive risk exposure, and whether task clustering results in excessive cross-platform collaboration costs. This application defines the cooperative situation state as follows:
[0069]
[0070] in This is a task information stage vector used to express one of the core semantics of manned-unmanned swarm collaboration: information sequence constraints. To intuitively represent the process of "unconfirmed—confirmed—executed," we can take... Each corresponds to a task Information that has not yet been obtained with sufficient confidence, information that has been confirmed by the drone swarm but has not yet been executed, and information that has already been executed. This representation makes "information as a resource" explicit in the state, thereby allowing intervention strategies to use the capabilities of manned aircraft to increase confidence or accelerate information acquisition, and to reflect the benefits therein. In the middle. Variables This represents a set of platform locations and timestamps, used to characterize constraints related to coordinated rendezvous, arrival sequence, and airspace conflicts. (Variable) This represents the remaining resource availability, which directly determines the intervention cost and feasibility boundary: when manned aircraft resources are scarce, the marginal cost of intervention increases; when unmanned aircraft resources are nearing their limit, intervention can become a necessary means to maintain overall feasibility. (Variable) This refers to communication and coordination capabilities, which not only affect the quality of information transmission but also the ability of drone swarms to stably execute formation-related tasks. Manned aircraft intervention often acts as relays or command nodes to enhance [their capabilities]. This changes the task quality item. Variable It represents the risk field and threat distribution, which determines the safe accessibility of drone swarms and the exposure cost of manned aircraft intervention, and forces intervention strategies to weigh "capability enhancement" against "risk exposure".
[0071] Only with State alone is insufficient to guide operator selection in Iterated Local Search (ILS) because the benefits of local transformations are highly dependent on the structure of the current solution. This application employs Iterated Local Search (ILS) as the combinatorial optimization framework, which continuously generates structural improvements in the discrete solution space through an iterative process of "perturbation-local improvement," thereby maintaining feasibility and enhancing utility under complex cooperative constraints. Since each step of Iterated Local Search (ILS) essentially involves choosing between several structured transformations, and the transformation effect is determined not only by the cooperative situation but also by the coupling structure of the current solution, a structural feature mapping is required. To supplement the situation status The missing information, in form
[0072] ,
[0073] in Represents the real number field. For feature dimensions. Mapping This can include cross-platform dependency strength (reflecting the coupling degree between UAV reconnaissance and confirmation and manned aircraft execution), the allocation ratio of critical tasks between the two types of platforms (reflecting intervention intensity and cost trends), the proportion of high-risk flight segments (reflecting the risk exposure structure), and the distribution of communication-sensitive tasks along the path. The reason for introducing these features is that the difficulty of collaborative planning is often not "how to find a shorter path," but rather "how to make the solution structure more reasonable under collaborative semantics": manned aircraft should not be used to replace UAVs in completing all tasks, but should be used for those parts where their capabilities truly need compensation; UAV swarms should also not be deployed to areas in risk fields where confidence and communication stability cannot be guaranteed, otherwise the decline in mission quality will amplify the overall loss. These structural facts cannot be inferred solely from geometric location or resource reserves; they must be derived through… Express it in a learnable way.
[0074] By merging the situational state and structural features, this application defines a joint state.
[0075] ,
[0076] This joint state serves both the selection of intervention strategies and the evaluation of operator families. For intervention strategies, It provides evidence of the environment and capabilities that "human-machine intervention might improve," and It provides structural evidence as to whether the current solution structure reveals symptoms requiring intervention; regarding operator selection, It provides a situational context regarding "which local changes will trigger risks or communication costs," and This provides the semantic context of "how local transformations change the coupled structure." In this way, intervention decisions and operator selection no longer depend on two separate sets of incoherent information, but are compared and learned on the same joint state.
[0077] In a cooperative situation, any intervention strategy adjustment or operator transformation will lead to... Change to This leads to changes in the objective function. To ensure that learning and search share a consistent evaluation scale, this application uses the objective difference as a feedback signal, denoted as .
[0078] ,
[0079] This signal inherently encompasses changes in task quality, intervention costs, and risk exposure, thus simultaneously guiding the formation of preferences for intervention strategies and operator selection. Its significance lies in: if human-machine intervention leads to an increase in the confidence level and timeliness of critical tasks, then... If the increase exceeds the cost and risk penalty, then If intervention only brings exposure and cost without significantly improving task quality, then... The value is negative. The same principle applies to operator transformations: if a certain type of structured operator can improve task quality or reduce risk exposure under a given intervention strategy and cooperative situation, its value will be stably reflected in... Above. Based on the above definitions, the next section will further discuss intervention strategies. This is transformed into a class of "feasible region transformation" decisions, and linked to the selection of a family of structured operators. This allows intervention decisions to not only change the cost terms, but also influence the optimal solution form of the inner search by changing the feasible region and information structure.
[0080] In one specific embodiment of this application, S300 includes:
[0081] S310, define intervention actions, which describe the granularity, role, and budget intensity of manned aircraft intervention. The granularity includes task point-level intervention, task cluster-level intervention, and flight segment-level intervention. The roles include critical execution, high-confidence confirmation, and communication relay support. The budget intensity is used to constrain the intervention cost and resource consumption limits.
[0082] S320, In the cooperative situation, an intervention action is performed to obtain an updated intervention strategy. The updated intervention strategy induces a feasible solution set that satisfies the joint constraint set by changing information availability, risk reachability boundary and communication cooperation capability, and changes the reachability upper bound of the task completion quality term in the unified utility function.
[0083] S330: The difference of the unified utility function is used as the immediate feedback of the intervention action, wherein the immediate feedback is the difference of the optimal unified utility function that can be achieved in their respective induced feasible regions before and after the intervention action is performed.
[0084] S340, construct an intervention action value function with joint state space as input and intervention action value as output, and use the temporal difference method to update the intervention action value function with the real-time feedback, so that the system can adaptively select the intervention granularity, role and budget intensity that maximizes the intervention action value function under different cooperative situations and solution structures, and output the intervention strategy and the set of feasible solutions induced by it.
[0085] The set of collaborative constraints includes at least information sequence constraints, risk budget constraints, intervention budget constraints, communication sustainability constraints, and resource constraints. The information sequence constraint is expressed as the confirmation task completion time being earlier than the dependent task execution time, and the confirmation time shortens with the enhancement of the intervention strategy. The risk budget constraint is expressed as the risk exposure score of each platform within the task cycle not exceeding a preset threshold, and the risk exposure of the UAV swarm decreasing with the enhancement of the intervention strategy. The intervention budget constraint is expressed as the manned aircraft intervention cost not exceeding the upper limit of the allowed intervention cost within the task cycle. The communication sustainability constraint is expressed as the signal-to-noise ratio, bandwidth, and coverage of all tasks in the collaborative scheduling solution that depend on the collaborative link meet the minimum return quality and continuous availability requirements of the task throughout the entire task execution cycle, and communication sustainability improves with the enhancement of the communication relay role in the intervention strategy. The resource constraint is expressed as the consumption of fuel, payload, and flight time resources during task execution not exceeding the current resource reserve upper limit, and the resource consumption rate decreasing with the enhancement of the key execution role in the intervention strategy.
[0086] In manned-unmanned swarm collaboration, "intervention" is not a single action, but a set of decisions with clear semantics: whether manned aircraft should enter the collaboration, when to enter, at what granularity to act on the task and path, and how to set the intervention budget and risk ceiling. Different granularities of intervention will have different levels of structural impact on the collaborative system. Task-point level intervention means that manned aircraft directly undertake the execution or confirmation of certain key tasks, preventing the unmanned swarm from being exposed in high-risk areas; task cluster level intervention means that manned aircraft provide continuous support for a group of spatially or temporally coupled tasks, such as providing communication relays for the unmanned swarm to maintain information links, or providing escort and suppression in high-threat areas, making the task cluster as a whole feasible; segment level intervention is more about changing the risk structure of a path segment, such as opening a safe passage in a risk field through the intervention of manned aircraft, making a previously inaccessible task area accessible. Since these interventions directly change which tasks are feasible in the current situation, which information dependencies can be met, and which risk thresholds can be suppressed or avoided, the most natural mathematical expression of the intervention strategy is not a modification of a local variable, but rather a shaping of the feasible solution set.
[0087] To characterize this shaping effect, this application incorporates intervention strategies. This can be viewed as a mapping from the collaborative situational state to the "permissible set." The "permissible set" here has two meanings: firstly, the permissibility at the task and path level, i.e., under the current risk field and communication conditions, which tasks can be undertaken by manned aircraft, which tasks must be confirmed by the UAV swarm first, and which areas can only be entered after manned aircraft intervention; secondly, the permissibility at the search level, i.e., which structured transformation operators are allowed to be used during combinatorial optimization, and which transformations are prohibited because they would violate intervention budgets or risk constraints. Combining these two meanings, the intervention strategy can be abstracted as the state of collaboration... A decision variable on the [variable name], and use it to induce the feasible region. In other words, in a coordinated situation Given the circumstances, Determines the satisfaction of collaborative constraints The set is determined not only by geometric reachability, but also by information priority, communication maintenance, risk ceiling, and intervention budget.
[0088] In order to The meaning is more specific, requiring the collaborative constraints to be written in a verifiable form. First, consider the information sequence constraint. When a drone swarm undertakes reconnaissance and confirmation, the task... Information stage The process must first be moved from 0 to 1 before manned or unmanned aircraft can be allowed to perform the high-confidence tasks upon which they depend. Let... Indicates in solution Assign task Let the time to reach the confirmation state be set as follows. Indicates task The time when the task is completed. For tasks with confirmation dependencies, this must be satisfied.
[0089] ,
[0090] In manned-unmanned swarm collaboration, the above equation is not only a timing condition but also one of the sources of intervention benefits. Manned intervention can shorten confirmation time through high-confidence sensing and stronger maneuverability, or reduce information feedback delay through communication relays, thereby enabling more tasks to meet confirmation dependencies within the time limit. To reflect this, the confirmation time can be written as a function related to the intervention strategy, for example...
[0091] ,
[0092] in This indicates the reduction in confirmation time resulting from manned intervention, the magnitude of which is determined by the granularity of intervention, the role of the manned machine, and communication conditions.
[0093] Secondly, consider risk constraints. In collaborative tasks, risk is not a single-point attribute, but rather accumulates along the path and over time. Let... Indicates position With time The risk intensity can be assessed by considering the path as a continuous curve or a set of discrete segments. For the platform... path ,make Indicates its time The location, and If the platform's execution time interval is within the current task cycle, then the platform... Risk exposure can be written as cumulative exposure over time.
[0094] ,
[0095] When the path is represented by a set of discrete segments, the above formula can be equivalently understood as the cumulative sum of the risk intensity of each segment over its occupied time; its essence remains the integral of the risk intensity over the mission sequence. Furthermore, using total risk... This section summarizes the exposure and collaboration risks across various platforms. Manned aircraft intervention has two opposing effects: on the one hand, manned aircraft entering high-risk areas increases exposure; on the other hand, their capabilities can reduce the exposure of drone swarms in high-risk areas or lower the probability of collaboration failure. To mathematically represent this two-way effect, the total risk can be expressed as the sum of two parts.
[0096] ,
[0097] in Describe the intervention strategy of drone swarms The risk of exposure and loss, Describe the exposure risks associated with manned or drone intervention. Due to... It can alter the maneuverability and reachability of drone swarms, and influence the exposure duration of drone swarms in risky environments by changing their collaborative capabilities and mission timing. The value of will change structurally with the level of intervention. For example, under the escort or suppression support of manned aircraft, the equivalent exposure time of drone swarms to certain high-threat areas is significantly shortened, or drone swarms can complete the crossing during periods of lower risk, thereby reducing the overall exposure and potentially leading to an improvement in mission quality.
[0098] Let's reconsider the intervention budget and resource constraints. Manned intervention in a collaborative system must be subject to budget constraints; otherwise, the optimal solution will degenerate into "letting manned machines do everything," losing the meaning of collaboration. Let... The intervention cost can be naturally broken down into flight time cost, mission payload cost, and relay support cost, and is subject to budget constraints.
[0099] ,
[0100] Limit the scale of intervention, among which The upper limit of the budget for manned / machine intervention during the mission cycle is given by mission command constraints or platform resource boundaries. Budget constraints and risk constraints jointly determine this. The scope of intervention is expanded, making "intervention granularity" a true structural variable: task-point level intervention may be more sparse in cost but contributes more concentratedly to the quality of critical tasks; cluster-level intervention has more continuous costs but can change the structure of collaborative communication and risk; segment-level intervention is more like a holistic rewriting of reachability. Through these constraints, collaborative intervention is naturally transformed into feasible domain transformation, thereby supporting the value comparison of outer-layer policy learning.
[0101] Under the above formalization, the value of an intervention strategy does not come from "controlling a switch," but from its effectiveness in a given situation. The feasible region has been changed. This further alters the upper bounds of the quality and risk terms in the objective function. To ensure a uniform scale for subsequent value learning, this application still uses the objective difference as the fundamental quantity for evaluating the intervention strategy. When the intervention strategy... Become In this case, its value enhancement can be understood as achieving a higher target value in the new feasible domain, that is...
[0102] ,
[0103] This expression emphasizes that the essence of intervention is a rewriting of the "optimal reachable value," rather than a patching of a fixed solution. This definition provides clear mathematical semantics for unifying interventional learning and combinatorial search learning on the same incremental scale.
[0104] In one specific embodiment of this application, S400 includes:
[0105] S410, construct a family of structured operators, which includes cross-platform task migration operators, confirmation-execution chain rearrangement operators, task cluster reconstruction operators, and risk-sensitive flight segment detour operators;
[0106] In one specific embodiment of this application, S410 includes:
[0107] S411, based on the type of coupling relationship in the manned-unmanned swarm collaborative scheduling solution, determine the collaborative structure dimension that the structured operator family needs to cover. The collaborative structure dimension includes at least the cross-platform task allocation structure, confirmation-execution information dependency structure, task cluster space and communication coupling structure, and path and risk field exposure structure.
[0108] S412, For the cross-platform task allocation structure, construct a subset of cross-platform allocation adjustment operators, where each operator in the subset of cross-platform allocation adjustment operators takes changing the affiliation relationship of tasks between manned and unmanned aircraft swarms as its core semantics.
[0109] S413, For the confirmation-execution information dependency structure, construct a subset of information chain rearrangement operators, where each operator in the subset of information chain rearrangement operators has the core semantics of adjusting the temporal dependency relationship between the confirmation task and the execution task;
[0110] S414, For the task cluster space and communication coupling structure, construct a subset of task cluster reconstruction operators, wherein each operator in the task cluster reconstruction operator subset takes the division method of merging, splitting or recombining task clusters as its core semantics.
[0111] S415, for the path and risk field exposure structure, construct a subset of risk-sensitive flight segment adjustment operators, each operator in the subset of risk-sensitive flight segment adjustment operators has the core semantics of bypassing high-risk areas, inserting relay support paths, or smoothing flight segment risks;
[0112] S416, the cross-platform allocation adjustment operator subset, information chain rearrangement operator subset, task cluster reconstruction operator subset, and risk-sensitive flight segment adjustment operator subset are jointly encapsulated to form the structured operator family, and each operator in the structured operator family has a learnable semantic label, wherein the semantic label is the change in structural features.
[0113] S420, Under a given intervention strategy, a set of permissible operators is defined with the set of feasible solutions induced by the intervention strategy as the boundary, such that any operator in the set of permissible operators acting on a feasible solution maintains the transformation closure.
[0114] S430, In the joint state space, a structured operator is selected from the set of allowed operators according to the operator value function, and the structured operator is applied to the current cooperative scheduling solution to generate a candidate cooperative scheduling solution;
[0115] S440, calculate the utility improvement of the candidate cooperative scheduling solution relative to the current cooperative scheduling solution using the difference of the unified utility function, and use the utility improvement as the immediate reward of the structured operator;
[0116] S450, the operator value function is updated using the time-series difference method with the instantaneous reward;
[0117] S460, repeat S430 to S450, iteratively optimize the cooperative scheduling solution through the perturbation and local improvement mechanism of iterative local search until the termination condition is met, and output the cooperative scheduling solution that maximizes the unified utility function under the current situation.
[0118] In intervention strategy Given the condition, the inner problem becomes solving the problem within the feasible region. Searching for the The largest cooperative scheduling solution. This problem exhibits typical combinatorial explosion characteristics: This approach encompasses task allocation, sequence ordering, path geometry, and information dependency fulfillment. However, cooperative semantics leads to highly coupled constraints, causing traditional local search methods that rely solely on simple "swapping two nodes" or "inserting a node" to become inefficient and prone to infeasibility oscillations. Therefore, this application employs iterative local search as its basic framework. It uses a perturbation mechanism to escape local optima, a local improvement mechanism to achieve monotonic improvement, and Q-learning to learn "which structured transformations are more likely to produce effective improvements" under cooperative situational guidance. The key to this design is not treating Q-learning as a black-box controller, but rather using it to evaluate the value of operator families, enabling combinatorial search to be adaptive under different cooperative situations and intervention strategies.
[0119] To mathematically represent the learning object of "operator selection", we first define a family of structured operators. It depends on the intervention strategy The reason is that intervention changes the feasible region and also changes which structural transformations are allowed to be performed. For example, without manned aircraft intervention, certain insertion operators traversing high-risk areas must be prohibited; however, these operators only become feasible after manned aircraft intervene at the segment level to open a safe passage. Similarly, if the intervention budget is close to its limit, any task migration operators that further increase the workload of manned aircraft should be excluded. Therefore, the dependencies of the operator set can be written as follows:
[0120] ,
[0121] i.e., operator Feasibility must be maintained under a given intervention strategy. This approach makes explicit the most easily overlooked part of cooperative semantics: operators are not arbitrary combinations or transformations, but must respect the constraints of "information priority - risk budget - intervention budget", otherwise the search will oscillate meaninglessly between infeasible solutions.
[0122] In manned-unmanned swarm collaboration, the most semantically meaningful information does not come from the low-level details of operators, but from how operators alter the collaborative structure. This application designs a family of operators capable of directly acting on transformations of cross-platform coupled structures. An operator... Acting on solution A new solution was obtained. Its function can be to reconstruct task clusters, adjust cross-platform allocation, repair information dependency chains, and rebalance risk exposure structures. Operators can be viewed as... The mapping, and through its structural features The influence of [the operator's] semantics is used to describe the operator's effect. If If the operator includes statistics such as cross-platform dependency strength, risk exposure ratio, and intervention intensity, then... The semantics can be obtained through
[0123] This is reflected in the fact that, based on this, the basic process of iterative local search can be described by the monotonic improvement of the objective function. Let the current solution be... From the set of feasible operators Select Operator Generate candidate solutions If the candidate solution remains feasible under the constraints, its improvement is given by the objective difference.
[0124] ,
[0125] This difference simultaneously incorporates changes in task quality, intervention costs, and risks, thus enabling a unified evaluation of collaborative semantics: if the operator migrates certain high-risk tasks from UAV swarms to manned aircraft but incurs increased costs, these increases will be addressed through… The benefits will be reflected in the benefit items; if the migration significantly improves the quality of key tasks or significantly reduces risk exposure, it will be reflected in the risk items.
[0126] In order to enable the search to learn in the long run "which operators are more effective in which cooperative situations", this application focuses on the joint state. Establish the Q function This function represents the operator selection under the current cooperative situation and solution structure. The long-term value of iterative local search (ILS) lies in its ability to address perturbations and local improvements that can be viewed as sequential decision-making processes within the feasible solution space, where the next state is determined by the current solution and the operator. We then use standard temporal difference updates. Let the new solution after applying the operator be denoted as . The corresponding new union state is .here This represents the new collaborative posture after execution and information updates, which may change due to task confirmation, resource consumption, and risk exposure. Therefore, the update in Q-learning is written as...
[0127] ,
[0128] in For learning rate, The discount factor is used. The key to this update formula in cooperative semantics is that the set of actionable actions is... Instead of a fixed set of actions; this means that intervention strategies change This influences the alternatives for value updating, allowing the effect of "intervention to change the search method" to be naturally reflected in the learning equation. Meanwhile, rewards... The objective difference is given to ensure that the learning criterion for operator value is always consistent with the collaborative objective.
[0129] Q-learning updates do not abstractly learn "which operator is good," but rather, within the context of cooperative situational evolution, learn "what kind of local cooperative structural transformation is more likely to improve overall value under the joint constraints of the information stage, risk field, and communication conditions." Local search still undertakes the feasibility repair and local improvement of candidate solutions; it is responsible for quickly pushing solutions back to the high-quality region after structural transformation. The perturbation mechanism is responsible for introducing necessary structural changes when the search stalls, in order to overcome the energy barrier brought about by cooperative constraints. Due to the cooperative situational evolution... Includes risk field With communication capabilities and the task information phase Therefore, in high-threat or high-interference situations, value learning tends to choose operators that can reduce risk exposure or enhance the information chain; in stable communication and low-risk situations, value learning may prefer to choose operators that improve task coverage efficiency or reduce intervention costs.
[0130] In manned-unmanned swarm collaboration, the intervention strategy itself must be a learnable object. This is because intervention not only changes cost and risk factors, but more importantly, it alters information availability and the feasible domain structure, thereby changing the form of the optimal solution. The learning objective of the intervention strategy is not to maximize the participation of manned aircraft, but rather to select the most appropriate intervention granularity and role when the collaborative situation changes, enabling the system to achieve an optimal trade-off between task quality, intervention costs, and risk exposure.
[0131] Therefore, this application describes the learning of the intervention strategy as a state-action-feedback process, where the state is taken as a summary of the cooperative situation. Since the intervention strategy mainly depends on environmental and resource conditions, rather than directly on the fine-grained structure of the current solution, this application uses cooperative situation states. This can be used as a state variable for intervention learning. The intervention action is denoted as... It describes the granularity and role selection of manned aircraft intervention. Granularity determines whether intervention occurs at a mission point, mission cluster, or flight segment; role determines whether the manned aircraft undertakes critical execution, high-confidence confirmation, communication relay, or risk suppression. Intervention actions also implicitly involve budget control, i.e., the boundary of the intensity of intervention allowed under current resource and risk margins. Since these actions change the feasible region and the quality term in the objective function, the key to intervention strategy learning is to define a feedback signal that reflects the marginal value of intervention.
[0132] Within the framework of this application, intervention actions The effect is not directly scored by the "action itself," but rather by the difference in optimal attainable value it induces. Let's assume a cooperative situation... Next, perform interventional actions. Generate new intervention strategies , where the symbol This indicates that the action is applied to the strategy to obtain the updated intervention rules and budget configuration. Then, in the new feasible domain... Perform an inner search to obtain the corresponding high-value solutions. Without intervention, within the original feasible region The high-value solution obtained is denoted as The feedback from the intervention action is naturally written.
[0133] ,
[0134] It expresses the overall value enhancement that this intervention decision can bring under the current circumstances. Because This feedback loop encompasses mission quality, intervention costs, and risk penalties. It rewards interventions that significantly improve confidence and timeliness in critical missions, while penalizing interventions that only bring costs and exposure with limited benefits. Especially in high-threat situations, manned aircraft intervention can significantly reduce the risk of drone swarm losses and make some missions feasible. improvement and The decrease in [something] will collectively drive the feedback to positive; however, in a low-threat and well-communication situation, large-scale intervention is often counterproductive, and the feedback will naturally turn negative, thus suppressing meaningless intervention.
[0135] Based on the above feedback, define the intervention action value function. This indicates a coordinated situation. Next, select intervention action The long-term value of intervention. Because intervention affects the evolution of the subsequent situation—for example, it accelerates task confirmation, advances the execution of critical tasks, speeds up resource consumption, and alters the risk exposure structure—thereby a state transition. It also exhibits the Markov property. Therefore, it can be updated using the same timing difference form as the inner layer.
[0136] ,
[0137] in To influence the learning rate of value learning, This is a discount factor. The updated semantics are: under the current risk field, communication conditions, and task information stage, if a certain intervention granularity and role selection can generate higher overall value in subsequent combined searches, then its value estimate will be adjusted upwards and it will be selected more frequently under similar situations.
[0138] Intervention strategy learning and inner search are not two independent learning processes; intervention actions The feedback depends on the high-value solutions achievable within the updated feasible domain, while the effectiveness of the inner search is influenced by the feasible domain and risk communication structure shaped by the intervention. By defining the intervention feedback as "the difference in optimal achievable value before and after intervention," intervention learning naturally uses the results of combinatorial search as the evaluation criterion, while combinatorial search seeks better solutions under the structural conditions provided by the intervention. In this way, intervention learning does not degenerate into tuning parameters for a fixed heuristic, but rather learns, under collaborative semantics, "when should human and machine teams undertake key execution, how can the two types of platforms complement each other in terms of information and risk, and how should the intervention granularity match the task cluster structure."
[0139] The intervention action space does not need to be written as an enumerated list; it can be characterized by several continuous or discrete parameters, such as intervention granularity parameters, intervention role parameters, and budget intensity parameters. Importantly, these parameters always point to a clear collaborative semantic: granularity determines whether the intervention acts at a task point, task cluster, or flight segment; role determines whether manned or machine-controlled aircraft undertake execution, confirmation, or relay support; and budget intensity determines the sustainability and risk ceiling of the intervention. This is achieved through a value function. Through learning, the system can adaptively select intervention parameters under different collaborative situations, thereby using the capabilities of manned aircraft where they are most needed, and complementing the distributed capabilities of unmanned aerial vehicle swarms.
[0140] In one specific embodiment of this application, S500 includes:
[0141] S510: When the effectiveness of the current collaborative scheduling solution decreases beyond a preset threshold due to the advancement of task information stages, changes in risk field intensity, decline in communication capabilities, decrease in resource reserves, or changes in platform location, or when the current collaborative scheduling solution loses its feasibility, a rolling update is triggered.
[0142] S520, under the current collaborative situation, updates the intervention strategy based on the intervention action value function, so that the updated intervention strategy matches the current risk field, communication capabilities and resource reserves;
[0143] S530, within the new feasible solution set induced by the updated intervention strategy, structural reconstruction of the cooperative scheduling solution is performed based on the operator value function-guided iterative local search.
[0144] S540 outputs a joint planning scheme consisting of the updated intervention strategy and the updated cooperative scheduling solution.
[0145] The core object maintained by the method in this application during task execution is the cooperative situational state. Collaborative intervention strategy With coordinated scheduling solution .in Capturing the environmental and resource context necessary for manned-unmanned swarm collaborative decision-making: the mission information phase Describe the progress of a task from unconfirmed to confirmed and then to completed execution, including platform location and timestamps. Geometric basis for depicting convergence and temporal sequence, resource surplus. This describes the continuous execution capabilities and marginal cost of intervention for both types of platforms, as well as their communication capabilities. Characterizing the maintainability and quality of information feedback in collaborative links, risk field This limits the reachability of the drone swarm and the exposure cost of manned intervention. Intervention strategy Define the granularity and roles of human-machine intervention, and induce feasible domains under budget and risk constraints. Collaborative solutions This refers to the joint scheduling and path scheme given within the feasible domain, which simultaneously encodes task allocation, execution order, path structure, and the timing-based satisfaction of acknowledgment-execution dependencies. Therefore, the object of online collaborative planning is not a single "path," but a triple. ,in and All follow The need for updates arises from the evolution of [the material / organization].
[0146] In manned-unmanned swarm collaborative missions, status Its evolution originates from two sources. One is from changes in the external environment, such as the risk field. Local enhancement or communication conditions The first is the occurrence of decay, which renders previously feasible flight segments or cooperative behaviors no longer satisfying constraints. The second is the execution advancement from within the system itself, such as when a drone swarm completes reconnaissance and confirmation. Propulsion, or platform navigation and mission execution consume resources A descent, or a change in platform position, causes Changes have occurred in communication visibility. The common result of both types of changes is: the current intervention strategy. Induced feasible region Contraction or expansion may occur; the current solution The utility The situation may rise or fall, and even the current solution may become infeasible due to tightening constraints. Therefore, the "planning request" is not an external instruction, but an endogenous demand triggered by state evolution: when... When the combination of solutions no longer represents a reasonable collaborative structure under the current situation, an update process is needed to re-evaluate the intervention strategy and reshape the structure of the collaborative solution.
[0147] To ensure that such update requirements have a unified, computable, and consistent criterion with collaborative semantics, this application uses a unified utility function. As an evaluation scale, the utility function places task completion quality, intervention costs, and risk penalties on the same scale, allowing the impact of state changes on collaboration quality to be directly reflected. When the information phase advances and more critical tasks meet execution conditions, there is room for structural adjustments to improve task benefits; however, when the risk field strengthens or communication declines, increasing collaboration exposure risks or reducing task quality, the utility of the current solution decreases, and the marginal value of intervention strategies may change. This is because the quality term in the utility function... Allowing dependency intervention strategy State changes not only affect whether the current solution is good, but also whether intervention is worthwhile under the current situation, and at what granularity is intervention more cost-effective. This is precisely the core of manned-unmanned swarm collaboration: intervention is not a fixed rule, but rather changes with the situation. Decisions that change marginal value based on changes in the value of the product.
[0148] In this sense, the operational process of online collaborative planning can be understood as a consistent information flow chain: state updates generate new... new Changing the value assessment and feasible domain structure of intervention decisions, and updating intervention strategies alter the feasible domain. And the achievable quality level of the task, and then the cooperative solution within the updated feasible domain. Structural rewriting is performed to improve utility, and the utility difference is used as feedback to update the intervention value. With operator value .
[0149] The key issue with online rolling updates is determining when to initiate an update iteration. This depends on the collaborative situation. Changes are enough to affect the feasible region The boundary may be sufficient to significantly change utility. An update should be triggered when the state changes are small and the current solution remains stable under the utility metric; otherwise, the current solution should be maintained. Continue execution. Because... Simultaneously encompassing information phase, platform geometry, resource reserves, communication capabilities, and risk field, this triggering logic essentially judges, under collaborative semantics, whether "the current intervention strategy and the current solution structure still match the situation."
[0150] During the task information phase When progress is made, triggering updates has direct collaborative implications. The confirmation of a drone swarm's completion advances some tasks from an unexecutable state to an executable state. This alters the adjustable sequence space in the confirmation-execution dependency chain and changes the likelihood of critical tasks achieving high-quality completion within the timeframe. At this point, maintaining the original execution sequence often results in unnecessary losses of benefits, as the original solution was based on an earlier confirmation state, and its structure may not adequately utilize newly available information. Therefore, once... The advancement of technology has changed the execution conditions of critical missions, which necessitates triggering updates to reconfigure cross-platform allocation and execution order, enabling a better coupling between the information advantages of UAV swarms and the execution capabilities of manned aircraft in the new information phase.
[0151] When risk field When changes occur, triggering updates is usually directly related to feasibility or risk costs. Increased risk may cause the risk exposure of certain flight segments to exceed the budget, rendering the original plan infeasible; changes in the risk structure may also significantly worsen the original plan in terms of risk penalties, making it no longer reasonable in a utility scale even if it remains feasible. Such triggers are particularly critical for manned-unmanned swarm collaboration, because changes in risk often mean a "shrinkage of the swarm's reachability boundaries," while manned intervention can restore some of these boundaries at a certain cost, or undertake high-risk executions to reduce swarm losses. Therefore, when... When changes alter the accessibility or risk exposure structure of a drone swarm, an update should be triggered so that intervention value assessment and destructive restructuring can jointly respond to the new risk constraints.
[0152] When communication capabilities When changes occur, the triggered update reflects changes in the maintainability of the collaborative link. Communication fading reduces the quality of information return and affects the stability of collaborative execution, thereby reducing the quality of task completion. This can even render certain task structures that rely on cooperative links infeasible. In such cases, the significance of updating lies not only in changing the path, but also in resetting the role of human-machine interfaces in the intervention strategy, enabling them to assume relay support or modify the cooperative structure to reduce sensitivity to communication conditions when necessary. Because Since it is explicitly incorporated into the state variable and enters the utility scale through the quality and risk terms, the update triggered by changes in communication conditions is not an external rule, but a collaborative response naturally required by the utility structure.
[0153] When resources are available or platform location When changes occur, updates are triggered by alterations in cost boundaries and temporal accessibility. Decreasing resource reserves increase the marginal cost of further intervention and compress the space for sustainable intervention, requiring intervention strategies to re-evaluate intervention granularity and task priority. Changes in location and time progression alter rendezvous timing and accessibility boundaries, potentially rendering previously reasonable task cluster partitioning and allocation structures inefficient or even infeasible. For manned aircraft, location and resource status particularly affect their suitability for performing certain critical tasks; for unmanned aerial vehicle (UAV) swarms, location changes alter coverage efficiency and confirmation timing. These changes are reflected in the search as areas for improvement or losses through utility differentials; therefore, when… and When changes lead to a decrease in utility or a tightening of the feasible domain boundary, an update should be triggered to rematch the collaborative structure.
[0154] Once the update is triggered, it will first occur in the coordinated situation state. Reassess the value of intervention actions This leads to an update of the intervention strategy. , making the new Matching current risk communication conditions and resource reserves; subsequently in the updated Internally, in a united state Based on this, using operator value Guided iterative local search reconstructs and improves the solution structure, generating new collaborative solutions. .
[0155] After the rolling update is triggered, the method in this application first updates the collaborative intervention strategy. Intervention strategy Induced feasible region Furthermore, it directly impacts task quality by altering information availability, risk accessibility boundaries, and communication and collaboration capabilities. The achievable level. If the intervention strategy is directly applied before it is matched with the new situation... Performing a combinatorial search will consume computation in an unsuitable feasible region, and the resulting structural rewriting may be delayed during updates. This becomes meaningless, or even loses feasibility due to changes in the feasible domain boundary. Therefore, intervention updates must precede solution structure updates to ensure that subsequent searches always proceed within capability boundaries and budget constraints consistent with the current cooperative situation.
[0156] The input for intervention decision updates comes from the cooperative situational state. Under the combined effect of these factors, the core of intervention decision-making is not simply answering "whether human or machine should intervene," but rather choosing the granularity and role of intervention under cost and risk constraints, so that intervention can generate the greatest utility improvement in the position where capability compensation is most needed.
[0157] To ensure that this choice has a uniform value assessment scale, the intervention action value function defined in this application... Intervention actions This indicates an intervention or adjustment in response to the current situation. Semantically, this action corresponds to a response... The update, i.e. This manifests as changes in intervention granularity, intervention roles, or a redistribution of intervention budget intensity. Changes in granularity correspond to changes in the objects of intervention: when critical tasks require high-confidence confirmation or high-risk execution, intervention tends to operate at the task-point level; when the entire UAV swarm is affected by communication fading, intervention tends to provide continuous support at the task cluster level; when the risk field causes a significant contraction in the reachability of the UAV swarm, intervention may manifest as segment-level support to change the accessibility of certain areas. Changes in roles correspond to the way manned aircraft compensate for their capabilities: manned aircraft can undertake critical execution to reduce the risk of UAV swarm losses in high-threat areas, can undertake high-confidence confirmation to advance the information phase, and can also undertake communication relay to restore the maintainability of collaborative links. Adjustments to budget intensity are reflected in the control of intervention scale and frequency, which determines the scope and duration of manned aircraft capabilities used and the intervention cost. Risk exposure The upper boundary.
[0158] The value of an intervention is determined by the overall utility increase it brings. Specifically, in the current situation... Perform intervention actions get And new collaborative solutions are obtained in the new feasible domain through inner-layer combinatorial search. Compared to the baseline solution without changing the intervention strategy. In contrast, the feedback from the intervention is given by the utility difference:
[0159] ,
[0160] This difference simultaneously reflects the improved task quality, increased intervention costs, and changes in risk exposure brought about by intervention. Therefore, it can naturally express, under collaborative semantics, whether the marginal benefits of intervention are sufficient to cover its costs. When it indicates that the critical task has not yet been identified and the deadline is tight, intervention that can significantly improve the speed or confidence of identification will be implemented through... The rise is reflected as positive feedback; when When the risk of drone swarms operating in critical areas is too high, intervention that can shift critical operations to manned aircraft or provide support to reduce drone swarm exposure will result in a positive feedback loop through a decrease in risk. This indicates that when human resources are scarce or budgets are tight, even if intervention can improve quality, the feedback may become negative due to excessive cost penalties, thus inhibiting excessive intervention. Therefore, the outcome of intervention updates is cost-sensitive; it will not tend to unconditionally increase the intensity of synergy, but will select the intervention granularity and role that can generate the maximum marginal value under the utility scale.
[0161] In update During the process, the intervention value function is updated in the form of temporal difference:
[0162] ,
[0163] in This represents a new collaborative posture following intervention, adjustment, and subsequent execution. This update allows the system to gradually accumulate empirical patterns of intervention decisions under different collaborative postures. For manned-unmanned swarm collaboration, the transferable portion of these patterns precisely stems from the semantic structure of the state variables: when communication capabilities... When the risk level decreases, the value of intervention actions involving relay support is enhanced; when the risk field... When the task is reinforced and critical tasks still need to be completed, the value of critical execution-related interventions is enhanced; during the task information phase... When stagnation occurs but critical tasks rely on confirmation, the value of intervention actions involving high-confidence confirmation is enhanced; when resources are available... As the boundary approaches, the overall value of intervention actions is suppressed, leading to a more conservative intervention strategy. Updated intervention strategy. Therefore, it is not an artificial rule, but the optimal response to the current cooperative situation in the sense of utility difference, and provides a feasible domain boundary and capability compensation structure consistent with the situation for the next solution structure update.
[0164] After the intervention strategy was updated Afterwards, the method of this application enters the destructure update stage, that is, in the new feasible domain Generate new collaborative programming solutions internally. .because Information priority, risk budget, communication maintainability, resource constraints, and intervention budget have been unified into a feasibility boundary. Therefore, structural updates no longer require "patching" feasibility through external rules, but rather continuously improve utility within the feasible domain through structural transformations. The key to this stage is not to solve from scratch, but to iteratively rewrite the current solution structure to address its shortcomings, enabling the collaborative structure to make optimal adjustments to new intervention capability boundaries and new collaborative situations.
[0165] The input to the solution structure update, excluding the cooperative state In addition, it also includes the current solution. Structural feature mapping Thus forming a joint state. Introduction The significance lies in the fact that, in manned-unmanned swarm collaboration, the effectiveness of local transformations is highly dependent on the coupling structure of the current solution. Relying solely on the situational state... These structural differences cannot be distinguished, therefore the joint state This is a necessary condition for value-guided search to respond correctly to collaborative structures.
[0166] Solution structure update through structured operators Acting on solution Generate candidate solutions The semantics of operators must directly correspond to adjustments in collaborative coupling relationships. This can alter cross-platform allocation structures, rearrange confirmation-execution chains, or restructure the organization of paths and task clusters in risk-sensitive segments. Due to the set of operators... For feasible regions Maintaining closure, any permitted operator action will not cause a candidate solution to leave the feasible region. Therefore, solution structure updates can focus on utility improvement without oscillating in infeasibility penalties. The quality of candidate solutions is evaluated by utility difference.
[0167] ,
[0168] This difference has a clear semantic interpretation for manned-unmanned swarm collaboration. If an operator migrates high-risk critical executions from the unmanned swarm to manned aircraft and significantly reduces the exposure risk of the unmanned swarm, while the improvement in critical task quality exceeds the increase in intervention costs, then the difference is positive. If an operator rearranges and confirms tasks, allowing critical dependent tasks to enter the execution window earlier, thereby improving timeliness and confidence, then the difference is positive. If an operator leads to an increase in intervention costs but limited improvement in task quality, or causes an increase in the exposure risk of manned aircraft to exceed the benefits brought by the reduction in the risk of the unmanned swarm, then the difference is negative.
[0169] To enable the search process to adaptively select the direction of structured transformation under different cooperative situations and different solution structures, this application uses an operator value function. Guiding operator selection. For any joint state This indicates the operator to be selected in this state. And continue to improve the expected cumulative utility that can be obtained by subsequent structural rewriting. Since the state transition is jointly determined by the structural changes of the candidate solution and the execution progress, the new joint state corresponding to the candidate solution is ,in This reflects the new situation following changes in mission information phases, resource consumption, risk exposure, and communication conditions. The update of operator values still adopts a time-difference form.
[0170] ,
[0171] Among the alternative operators The algorithm still originates from the set of operators that maintain feasibility closure under the current intervention strategy. This update enables the system to progressively learn the patterns of "which type of structural rewriting is more likely to bring about utility improvement" under different cooperative situations and different solution structures. Because these patterns come from the cumulative learning of utility differences rather than being manually defined, the search process can maintain consistent cost sensitivity under dynamic cooperative situations.
[0172] During the iterative process, the search does not rely solely on local improvements at a single scale. The combinatorial space of collaborative planning contains structural barriers formed by information priority and risk budgeting, which are often difficult to overcome by simply relying on minor modifications. Therefore, iterative local search alternates between structural perturbations and local improvements, enabling the solution to leave its current structural neighborhood and enter a new collaborative structural region. The perturbation phase introduces significant adjustments to cross-platform allocation, task cluster partitioning, or path structure, thereby obtaining a new starting point for exploration; the local improvement phase rapidly performs small-scale structural repairs and utility enhancements around this starting point, allowing the solution to quickly return to a high-quality feasible region. Since the operator selection is determined by… The guidance, perturbation, and improvement are not blindly random, but rather adaptively select the structural rewriting direction that is more likely to bring positive utility differences as the situation and solution structure change. The final solution obtained is... With the updated intervention strategy Together, they constitute the rolling updated planning outputs, which jointly maximize the utility function under the current collaborative situation and provide a new benchmark for the next round of state evolution and value update.
[0173] The difficulty of manned-unmanned swarm collaborative planning stems not only from the scale of the solution space but also from the strong coupling of collaborative constraints. The mission information phase determines whether the confirmation-execution dependency is satisfied; the risk field determines the platform's reachability boundary and exposure cost; communication capabilities determine whether collaborative behavior can be maintained; and resource reserves and intervention budget jointly determine the sustainability of manned-unmanned intervention. Because these constraints continuously change during online execution, if the planning process lacks a stable feasibility maintenance mechanism, two unacceptable scenarios can easily occur: first, frequent occurrences of solutions violating collaborative constraints during the search process lead to utility difference being dominated by numerous infeasibility penalties, resulting in unstable value learning; second, when external circumstances change abruptly, the currently executed solution suddenly becomes infeasible. Without a rapid regression mechanism, the collaborative system will stagnate in an infeasible state, potentially even causing critical mission failure. The method in this application has internalized feasibility maintenance as the principle of "intervention strategy to induce feasible domain and structured operators to maintain closure". This section explains at the application process level how this principle ensures the feasibility stability of online operation, and explains how the method can bring the solution back into the feasible domain through intervention update and structure rewriting when the situation changes suddenly and the feasible domain boundary tightens.
[0174] Given a cooperative situation Intervention strategies Below, feasible region It is defined as a set of collaborative solutions that simultaneously satisfies information sequence, risk budget, communication maintainability, resource constraints, and intervention budget. Unlike traditional path planning, which only emphasizes geometric reachability, feasibility here is first reflected in the correctness of the information chain. That is, for tasks with confirmation dependencies, the confirmation action must occur before the execution action, and the confirmation quality and timeliness must be able to support subsequent execution. Secondly, feasibility is reflected in the boundaries of risk and budget: UAV swarms must not be planned to enter areas exceeding the risk budget, manned aircraft intervention must not exceed the upper limit allowed by the intervention budget and its own resource margin, and task structures relying on high-intensity collaborative links must not be planned when communication capabilities are insufficient. Because these conditions are uniformly written into... As long as the search is always available By moving within the framework, the solution remains executable in terms of collaborative semantics.
[0175] In order to keep the search stable Within this application, the set of structured operators is designed as a set of closed transformations over the feasible region. Specifically, in the intervention strategy... When fixed, the set of allowed operators is denoted as . It satisfies the condition for any feasible solution. With any operator They all Enclosure does not mean that operators cannot be significantly refactored, but rather that regardless of how operators rearrange cross-platform allocation and execution order, they must respect the semantic structure of collaborative constraints: if an operator changes the order of confirmation tasks, the execution order of dependent tasks must be adjusted synchronously to maintain sequential dependencies; if an operator migrates tasks from UAV swarms to manned aircraft, intervention costs and exposure risks must be within budget, and the temporal reachability of manned aircraft must be satisfied; if an operator adjusts the task cluster structure to change communication support relationships, the new structure must be compatible with current communication capabilities. Compatibility. This closure allows the search process to focus on utility enhancement rather than repeatedly fixing around infeasible solutions, thus ensuring that the utility differential feedback has a stable semantic meaning, that is, it reflects the value comparison between feasible collaborative structures, rather than the penal swing between feasible and infeasible.
[0176] However, an unavoidable situation exists in online execution: sudden changes in the external situation can render the current solution infeasible, which cannot be avoided in advance even if the search process remains closed. The most typical sudden change comes from the risk field. The sudden enhancement of communication capabilities A sudden drop or resource surplus Rapid depletion. When the risk field intensifies, certain segments of the original path may no longer be permissible under the risk budget, leading to unfeasible drone swarm paths or excessively high coordination risks; when communication capabilities decline, mission structures that originally relied on stable data transmission or coordinated formations may lose sustainability, significantly reducing mission quality and becoming unacceptable on a utility scale; when resource reserves decrease or intervention budgets tighten, the originally planned allocation for manned aircraft intervention may no longer be sustainable, thus depriving subsequent critical missions of capability support. A common characteristic of these mutations is that they all cause... The boundary shrinks, making the current It may no longer belong to the new feasible domain.
[0177] Faced with this contraction, the regression mechanism of this application does not rely on additional heuristic principles, but strictly uses a rolling update chain. Firstly, in the new situation... Reassess the value of intervention actions and update intervention strategies. This allows the new intervention granularity, roles, and budget allocation to maximize overall utility under new constraints. The role of intervention strategy updates is to alter the degree of contraction of the feasible domain boundary through capability compensation: when communication fading renders collaborative links unsustainable, manned aircraft providing relay support may make some collaborative behaviors feasible again; when increased risk prevents drone swarms from entering critical areas, manned aircraft providing critical execution or support may make critical tasks executable again; when resource constraints make intervention unsustainable, intervention strategy updates can also make limited intervention budgets more effective on a utility scale by reducing low-marginal-value interventions and concentrating resources on critical tasks. After intervention updates are completed, the new feasible domain... Once identified, a new collaborative solution is generated within the feasible region through a value-guided iterative local search. The cross-platform allocation, confirmation-execution chain and risk exposure structure are structurally restructured to bring the solution back into the feasible domain and restore its utility level as much as possible.
[0178] In this regression process, utility difference still provides a consistent evaluation metric. When a current solution changes from feasible to infeasible, the utility function reflects its unacceptability through risk penalty terms or quality decay. New intervention strategies and structural rewriting aim to improve utility, pushing the solution back to a region where it is executable and of higher value under the new situation. More importantly, the regression process also generates learning signals on intervention value and operator value, enabling the system to more quickly select effective intervention roles and structural rewriting directions when facing similar mutations in the future. Thus, feasibility maintenance and regression under situational mutation are not two parallel mechanisms, but rather different manifestations of the same rolling update chain under "normal situation" and "mutation situation": under normal situation, the feasible domain boundary is stable, and the search continuously improves under the protection of closure; under mutation situation, the feasible domain boundary shrinks, and intervention updates and structural reconstruction jointly pull the solution back into the new feasible domain, restoring the highest possible value under the utility metric.
[0179] After the online rolling update is completed, the output of the method in this application is not a single path or a single allocation table, but a joint programming output composed of intervention strategies and cooperative solutions. .in This paper outlines the granularity, roles, and budget allocation for manned and machine intervention in the current collaborative situation. This paper presents the joint scheduling and path scheme within the feasible domain induced by the intervention strategy. The significance of outputting both is that, in manned-unmanned swarm collaboration, the executability and quality upper bound of the scheme are determined not only by the path structure but also by the capability compensation provided by the intervention strategy. If only the following is output... Without output If the solution lacks a clear premise, the reader cannot determine whether it relies on manned relay, whether it relies on manned machines for critical execution or confirmation, or whether its cost and risk budget is valid. If only the output is... Without output If so, the value of the intervention strategy in the combinatorial space cannot be tested.
[0180] Intervention interpretation is based on the intervention action value function. In the current collaborative situation Next, intervene to update the selection. The largest intervention action Thus, new intervention strategies can be derived. Therefore, the intervention strategy can be explained as "given the current information stage, communication capabilities, risk environment, and resource budget, what level of intervention and role can bring the greatest expected utility improvement?" When the system chooses manned machines to act as communication relays, the explanation is based on the current... The high value of this intervention under certain conditions; when the system chooses to have a manned machine perform the critical execution, the explanation is based on the current situation. Under these conditions, this intervention can significantly reduce the risk exposure of drone swarms and improve the quality of critical missions; when the system chooses to reduce the intensity of intervention, the explanation is based on the current situation. The marginal cost of continuing intervention below the budget boundary exceeds the marginal benefit.
[0181] The structural transformation interpretation is based on the operator value function. ,in It simultaneously reflects the current collaborative situation and the current solution structure. The inner search selects solutions with higher... The structured operators are used to rewrite the destructured structure, so its interpretation can be expressed as "under the current risk communication conditions and the current cross-platform coupling structure, which type of local structural rewriting is most likely to bring about utility improvement". When the system chooses to perform cross-platform migration rewriting, the interpretation is based on the fact that the rewriting can reduce the overall risk penalty or improve the quality of critical tasks under the current risk field and budget boundary; when the system chooses to perform confirmation-execution chain rearrangement, the interpretation is based on the fact that the rewriting can improve the time limit satisfaction and confidence under the current information stage; when the system chooses to perform task cluster structure reconstruction, the interpretation is based on the fact that the rewriting can improve the collaborative maintainability and reduce quality degradation under the current communication capabilities.
[0182] This value function-based interpretability has direct implications for manned-unmanned swarm collaboration. Manned aircraft intervention often involves pilot workload and risk exposure, requiring an understandable justification for intervention decisions; similarly, adjustments to the mission structure of unmanned swarms can affect the collaboration links and execution rhythm, necessitating an understandable justification for structural rewriting.
[0183] In summary, the application process discussed in this chapter ultimately outputs a joint planning approach. The method converges on the explanatory basis provided by the two types of value functions. The intervention strategy output explicitly provides the granularity, roles, and budget boundaries of manned-machine collaboration, the collaborative solution output explicitly provides the cross-platform allocation and path structure within these boundaries, and the value function output provides an explanatory basis consistent with utility for intervention and structural rewriting. This output format enables the method not only to continuously generate executable solutions in dynamic collaborative situations, but also to explain the reasons for its decisions in a way consistent with collaborative semantics, providing usable, controllable, and interpretable planning support for the execution of manned-machine / unmanned swarm collaborative tasks.
[0184] Secondly, this application provides a cost-sensitive planning and learning system for manned-machine / manned-machine swarm collaborative task allocation, used to execute the aforementioned cost-sensitive planning and learning method for manned-machine / manned-machine swarm collaborative task allocation. The cost-sensitive planning and learning system for manned-machine / manned-machine swarm collaborative task allocation includes:
[0185] The utility function construction module is used to construct a unified utility function. The unified utility function uses a weighted combination of task completion quality, collaborative intervention cost and risk penalty as a comprehensive evaluation metric for collaborative planning schemes. The upper bound of the achievable task completion quality, the shaping effect of collaborative intervention strategy on the boundary of feasible solution set, and other factors are written into the unified utility function.
[0186] The state representation module is used to construct a joint state space, which is formed by merging the structural feature mappings of the cooperative situation state and the current cooperative solution.
[0187] The intervention strategy learning module is used to learn the intervention action value function with the joint state space as input, and update the intervention strategy with the difference of the unified utility function as feedback. The intervention strategy is used to determine the granularity, role and budget boundary of human-machine intervention, and induce a set of feasible solutions that satisfy the joint constraint set.
[0188] The structured search module is used to perform value learning on the family of structured operators through iterative local search guided by Q-learning within the set of feasible solutions induced by the intervention strategy, with the joint state space as input, and to generate a cooperative scheduling solution under the current situation with the difference of the unified utility function as operator feedback. The cooperative scheduling solution includes at least a cross-platform allocation sequence, confirmation-execution dependency relationship and risk exposure path structure.
[0189] The rolling update and output module is used to respond to changes in the cooperative situational state, and to execute the intervention strategy update and cooperative scheduling deconstruction in sequence, triggered by the decrease or loss of feasibility of the unified utility function, and output a joint planning scheme composed of the intervention strategy and the cooperative scheduling deconstruction.
[0190] This application addresses the core challenges arising from the interplay of dynamic situation, cost constraints, and combinatorial complexity in manned-unmanned swarm collaborative missions. It proposes a cost-sensitive collaborative planning learning framework, organically unifying collaborative intervention strategy learning and combinatorial planning search into a single decision-making process under unified mathematical modeling and a unified utility scale. Unlike traditional approaches that treat manned intervention as external rules or ex-post fixes, this application formalizes intervention strategies as key variables of the induced feasible domain and the achievable boundary of mission quality. This makes "whether to intervene, at what granularity of intervention, what role to assume, and how to allocate the budget" learnable and comparable collaborative decision-making objects. Through this modeling approach, the compensatory role of manned capabilities is naturally incorporated into information priority constraints, risk achievable boundaries, and communication sustainability, enabling collaborative planning to be balanced and updated with a unified objective under dynamic situations.
[0191] At the methodological level, this application constructs a unified utility function. This approach incorporates task completion quality, collaboration costs, and risk penalties into a single optimization objective, using utility difference as a learning signal to ensure that intervention strategy learning and solution structure rewriting share a consistent evaluation metric. At the outer layer, the intervention action value function... In a coordinated state of affairs As input, it learns the marginal value of intervention granularity and role under different situations, thereby achieving cost-sensitive adjustment of human-machine intervention intensity and adaptive response to risky communication conditions. In the inner layer, the operator action value function... In a joint state As input, guide the iterative local search within the feasible region induced by the intervention strategy. Internally, the cross-platform allocation structure, confirmation-execution dependency chain, and risk exposure structure are iteratively rewritten, enabling combinatorial search to maintain feasibility control under strong constraints while gaining the ability to adjust the search direction according to changes in the collaborative situation. By internalizing feasibility maintenance as "feasibility domain induction and operator closure," the method in this application achieves feasibility maintenance and rapid regression under sudden changes in the situation at the runtime level, enabling online rolling updates to stably output executable solutions even when collaborative constraints tighten.
[0192] At the application process level, this application further specifies the running objects and information flow structure of the method in online collaborative tasks, clarifying that... As the core operating state of the system, rolling update triggering, intervention strategy update, solution structure update, constraint maintenance, and output interpretation are unified into a coherent collaborative planning chain. The output form of this application's method includes not only joint planning output... It also naturally includes decision-making criteria based on value functions, enabling "why human-machine intervention is needed, why the intervention granularity and role should be adopted, and why this type of structural rewriting should be performed" to be interpreted as the result of maximizing marginal value under a unified utility scale. This interpretability is not an additional addition, but is naturally derived from the modeling structure of value learning and utility difference, thus making the method more usable and controllable in collaborative tasks with human-machine participation, risk sensitivity, and limited resources.
[0193] In summary, the framework proposed in this application provides a unified modeling and solution paradigm for manned-unmanned swarm collaborative planning: under dynamic situations and strong constraints, intervention strategies are learned as core variables for inducing feasible domains and value structures, and high-quality feasible solutions are generated through value-guided structured combinatorial search, thereby achieving an adaptive trade-off between task quality, collaborative costs, and risk constraints. This paradigm is not only applicable to different types of collaborative tasks and platform clusters of different sizes, but also provides a scalable foundation for further research on collaborative intervention mechanisms under cost constraints, structured search learning under dynamic situations, and the theory and methods of risk-sensitive collaborative planning.
[0194] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of this application and should not be construed as limiting the specific implementation of this application to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of this application, and all such modifications or substitutions should be considered within the scope of protection of this application.
Claims
1. A cost-sensitive planning learning method for manned machine-human swarm collaborative task allocation, characterized in that, include: S100, Construct a unified utility function. The unified utility function uses a weighted combination of task completion quality, collaborative intervention cost, and risk penalty as a comprehensive evaluation metric for collaborative planning schemes. The achievable upper bound of collaborative intervention strategy on task completion quality, the shaping effect of risk exposure structure, and the boundary of feasible solution set are written into the unified utility function. S200, Construct a joint state space, which is formed by merging the structural feature mappings of the cooperative situation state and the current cooperative solution; S300, using the joint state space as input, learn the intervention action value function, and use the difference of the unified utility function as feedback to update the intervention strategy. The intervention strategy is used to determine the granularity, role and budget boundary of human-machine intervention, and induce a feasible solution set that satisfies the joint constraint set. S400, within the feasible solution set induced by the intervention strategy, the structured operator family is value-learned through iterative local search guided by Q-learning, with the joint state space as input, and the difference of the unified utility function is used as operator feedback to generate a cooperative scheduling solution under the current situation. The cooperative scheduling solution includes at least a cross-platform allocation sequence, confirmation-execution dependency relationship and risk exposure path structure. S500, in response to changes in the cooperative situational state, takes the decrease or loss of feasibility of the unified utility function as the trigger condition, and sequentially executes intervention strategy update and cooperative scheduling deconstruction, and outputs a joint planning scheme composed of the intervention strategy and the cooperative scheduling solution.
2. The method according to claim 1, characterized in that, S100 includes: S110, Define a task completion quality item that is explicitly dependent on the intervention strategy. The dependency includes at least the quantitative contribution of the intervention to the improvement of information confidence, the improvement of time limit satisfaction, and the feasibility of the dependency. S120, define a collaborative intervention cost item, which includes at least manned aircraft flight time consumption, payload occupancy cost and communication relay resource occupancy cost; S130, define a risk penalty item, which includes at least the cumulative integral of manned aircraft exposure risk, unmanned aircraft swarm loss risk and cooperative link exposure risk; S140, the weighted sum of the task completion quality item, the collaborative intervention cost item, and the risk penalty item constitutes the unified utility function; S150, the upper bound of the reachable quality of task completion, the shaping effect of the risk exposure structure and the boundary of the feasible solution set on the collaborative intervention strategy are written into the unified utility function.
3. The method according to claim 2, characterized in that, S150 includes: S151, establish an explicit dependency relationship between the task completion quality item and the intervention strategy, so that the information confidence threshold, time limit satisfaction condition and dependency feasibility in the task completion quality item are all expressed as monotonically non-decreasing functions of intervention granularity, role and budget intensity. S152, establish an explicit dependency relationship between the drone swarm exposure risk sub-item in the risk penalty item and the intervention strategy, so that the drone swarm exposure risk sub-item is represented as a monotonically non-increasing function of intervention granularity, role and budget intensity; S153, Establish the explicit dependency of the boundary parameters of the feasible solution set on the intervention strategy; S154, through S151 to S153, makes the unified utility function mathematically include both cooperative scheduling solution variables and intervention strategy variables, and the intervention strategy has a continuous, differentiable or differential numerical expression for its shaping effect on task quality, risk exposure and feasible domain boundary.
4. The method according to claim 1, characterized in that, S200 includes: S210, Construct a collaborative situational awareness; the collaborative situational awareness includes at least the task information phase, platform location and timestamp, resource reserves, communication capabilities and risk field; S220, Construct a structural feature mapping of the current collaborative solution to supplement the missing information of the collaborative situation state. The structural feature mapping includes at least the cross-platform dependency strength, the intensity of intervention and use, and the proportion of risk exposure. S230, the cooperative situational state and the structural feature mapping are merged to form a joint state space, which serves as a unified state representation for intervention strategy learning and operator value learning.
5. The method according to claim 1, characterized in that, The S300 includes: S310, define intervention actions, which describe the granularity, role, and budget intensity of manned aircraft intervention. The granularity includes task point-level intervention, task cluster-level intervention, and flight segment-level intervention. The roles include critical execution, high-confidence confirmation, and communication relay support. The budget intensity is used to constrain the intervention cost and resource consumption limits. S320, In the cooperative situation, an intervention action is performed to obtain an updated intervention strategy. The updated intervention strategy induces a feasible solution set that satisfies the joint constraint set by changing information availability, risk reachability boundary and communication cooperation capability, and changes the reachability upper bound of the task completion quality term in the unified utility function. S330: The difference of the unified utility function is used as the immediate feedback of the intervention action, wherein the immediate feedback is the difference of the optimal unified utility function that can be achieved in their respective induced feasible regions before and after the intervention action is performed. S340, construct an intervention action value function with joint state space as input and intervention action value as output, and use the temporal difference method to update the intervention action value function with the real-time feedback, so that the system can adaptively select the intervention granularity, role and budget intensity that maximizes the intervention action value function under different cooperative situations and solution structures, and output the intervention strategy and the set of feasible solutions induced by it.
6. The method according to claim 5, characterized in that, The set of collaborative constraints includes at least information sequence constraints, risk budget constraints, intervention budget constraints, communication sustainability constraints, and resource constraints. The information sequence constraint is expressed as the confirmation task completion time being earlier than the dependent task execution time, and the confirmation time shortens with the enhancement of the intervention strategy. The risk budget constraint is expressed as the risk exposure score of each platform within the task cycle not exceeding a preset threshold, and the risk exposure of the UAV swarm decreasing with the enhancement of the intervention strategy. The intervention budget constraint is expressed as the manned aircraft intervention cost not exceeding the upper limit of the allowed intervention cost within the task cycle. The communication sustainability constraint is expressed as the signal-to-noise ratio, bandwidth, and coverage of all tasks in the collaborative scheduling solution that depend on the collaborative link meet the minimum return quality and continuous availability requirements of the task throughout the entire task execution cycle, and communication sustainability improves with the enhancement of the communication relay role in the intervention strategy. The resource constraint is expressed as the consumption of fuel, payload, and flight time resources during task execution not exceeding the current resource reserve upper limit, and the resource consumption rate decreasing with the enhancement of the key execution role in the intervention strategy.
7. The method according to claim 1, characterized in that, The S400 includes: S410, construct a family of structured operators, which includes cross-platform task migration operators, confirmation-execution chain rearrangement operators, task cluster reconstruction operators, and risk-sensitive flight segment detour operators; S420, Under a given intervention strategy, a set of permissible operators is defined with the set of feasible solutions induced by the intervention strategy as the boundary, such that any operator in the set of permissible operators acting on a feasible solution maintains the transformation closure. S430, In the joint state space, a structured operator is selected from the set of allowed operators according to the operator value function, and the structured operator is applied to the current cooperative scheduling solution to generate a candidate cooperative scheduling solution; S440, calculate the utility improvement of the candidate cooperative scheduling solution relative to the current cooperative scheduling solution using the difference of the unified utility function, and use the utility improvement as the immediate reward of the structured operator; S450, the operator value function is updated using the time-series difference method with the instantaneous reward; S460, repeat S430 to S450, iteratively optimize the cooperative scheduling solution through the perturbation and local improvement mechanism of iterative local search until the termination condition is met, and output the cooperative scheduling solution that maximizes the unified utility function under the current situation.
8. The method according to claim 7, characterized in that, S410 includes: S411, based on the type of coupling relationship in the manned-unmanned swarm collaborative scheduling solution, determine the collaborative structure dimension that the structured operator family needs to cover. The collaborative structure dimension includes at least the cross-platform task allocation structure, confirmation-execution information dependency structure, task cluster space and communication coupling structure, and path and risk field exposure structure. S412, For the cross-platform task allocation structure, construct a subset of cross-platform allocation adjustment operators, where each operator in the subset of cross-platform allocation adjustment operators takes changing the affiliation relationship of tasks between manned and unmanned aircraft swarms as its core semantics. S413, For the confirmation-execution information dependency structure, construct a subset of information chain rearrangement operators, where each operator in the subset of information chain rearrangement operators has the core semantics of adjusting the temporal dependency relationship between the confirmation task and the execution task; S414, For the task cluster space and communication coupling structure, construct a subset of task cluster reconstruction operators, wherein each operator in the task cluster reconstruction operator subset takes the division method of merging, splitting or recombining task clusters as its core semantics. S415, for the path and risk field exposure structure, construct a subset of risk-sensitive flight segment adjustment operators, each operator in the subset of risk-sensitive flight segment adjustment operators has the core semantics of bypassing high-risk areas, inserting relay support paths, or smoothing flight segment risks; S416, the cross-platform allocation adjustment operator subset, information chain rearrangement operator subset, task cluster reconstruction operator subset, and risk-sensitive flight segment adjustment operator subset are jointly encapsulated to form the structured operator family, and each operator in the structured operator family has a learnable semantic label, wherein the semantic label is the change in structural features.
9. The method according to claim 1, characterized in that, The S500 includes: S510: When the effectiveness of the current collaborative scheduling solution decreases beyond a preset threshold due to the advancement of task information stages, changes in risk field intensity, decline in communication capabilities, decrease in resource reserves, or changes in platform location, or when the current collaborative scheduling solution loses its feasibility, a rolling update is triggered. S520, under the current collaborative situation, updates the intervention strategy based on the intervention action value function, so that the updated intervention strategy matches the current risk field, communication capabilities and resource reserves; S530, within the new feasible solution set induced by the updated intervention strategy, structural reconstruction of the cooperative scheduling solution is performed based on the operator value function-guided iterative local search. S540 outputs a joint planning scheme consisting of the updated intervention strategy and the updated cooperative scheduling solution.
10. A cost-sensitive planning learning system for manned-machine / manned-machine swarm collaborative task allocation, characterized in that, The cost-sensitive planning and learning method for assigning manned-machine / human-machine swarm collaborative tasks according to any one of claims 1 to 9, wherein the cost-sensitive planning and learning system for assigning manned-machine / human-machine swarm collaborative tasks comprises: The utility function construction module is used to construct a unified utility function. The unified utility function uses a weighted combination of task completion quality, collaborative intervention cost and risk penalty as a comprehensive evaluation metric for collaborative planning schemes. The upper bound of the achievable task completion quality, the shaping effect of collaborative intervention strategy on the boundary of feasible solution set, and other factors are written into the unified utility function. The state representation module is used to construct a joint state space, which is formed by merging the structural feature mappings of the cooperative situation state and the current cooperative solution. The intervention strategy learning module is used to learn the intervention action value function with the joint state space as input, and update the intervention strategy with the difference of the unified utility function as feedback. The intervention strategy is used to determine the granularity, role and budget boundary of human-machine intervention, and induce a set of feasible solutions that satisfy the joint constraint set. The structured search module is used to perform value learning on the family of structured operators through iterative local search guided by Q-learning within the set of feasible solutions induced by the intervention strategy, with the joint state space as input, and to generate a cooperative scheduling solution under the current situation with the difference of the unified utility function as operator feedback. The cooperative scheduling solution includes at least a cross-platform allocation sequence, confirmation-execution dependency relationship and risk exposure path structure. The rolling update and output module is used to respond to changes in the cooperative situational state, and to execute the intervention strategy update and cooperative scheduling deconstruction in sequence, triggered by the decrease or loss of feasibility of the unified utility function, and output a joint planning scheme composed of the intervention strategy and the cooperative scheduling deconstruction.