Stable and partitioned multi-agent sparse collaborative decision-making method

By constructing a unified decision vector and sparse constraints, and combining a hierarchical strategy of impulse, stability and protection with inertial adjustment, the problems of learning plan expansion and instability in multi-agent systems are solved, and the sparsity and stability of learning plans in personalized teaching systems are realized.

CN122199230APending Publication Date: 2026-06-12SICHUAN QIMINGDAREN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SICHUAN QIMINGDAREN TECH CO LTD
Filing Date
2026-04-22
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing personalized learning systems lack a unified decision-making mechanism under a multi-agent architecture, resulting in bloated learning plans, inconsistent task priorities, and unstable learning paths, making it difficult to generate executable and stable learning plans within the time budget.

Method used

A unified decision vector is constructed, and a time budget and conflict constraint are introduced through a sparse constraint and a hierarchical strategy of impact, stability and protection. An inertial adjustment mechanism is adopted to achieve collaborative decision-making among multiple agents and ensure the sparsity and stability of the learning plan.

Benefits of technology

This effectively avoids the expansion of learning plans, ensuring that the learning plans are concise and stable within the time budget, and improving the stickiness and continuity of the learning path.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199230A_ABST
    Figure CN122199230A_ABST
Patent Text Reader

Abstract

The application discloses a method for sparse collaborative decision of a conflict-stable-preservation layered multi-agent, first, a candidate learning action set is constructed, and conflict, stability and preservation layered attributes are labeled, and time consumption of each action is estimated; then, evaluation information, action mapping relationship and agent weight are established, and a learning action selection vector with sparse constraint, time budget constraint, conflict constraint and inertia adjustment coefficient is constructed; an aggregated residual error direction and an adaptive step are calculated through iterative updating, a history structure is inherited by introducing an inertia term, and the number of control actions is controlled by sparse projection until convergence; constraint violation items are tested and corrected; finally, the selection result is split and output according to the conflict-stability-preservation layered labels.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of online education technology, and in particular to a sparse collaborative decision-making method for multi-agent systems with hierarchical structure. Background Technology

[0002] In personalized learning scenarios during the high school intensive preparation phase, students' effective learning time is typically constrained by multiple factors, including course schedules, homework loads, and exam-taking pace. The teaching system needs to select a limited number of well-structured and executable combinations of learning actions from multiple sources of resources, such as question banks, courses, error notebooks, assessments, and practice exercises, within a given time budget (e.g., available learning time counted daily or weekly), thereby forming a practical learning plan.

[0003] Learning activities typically include, but are not limited to: knowledge explanation, basic training, topic-based practice questions, phase tests, review of incorrect answers, summarization, and memorization. To accommodate the different review goals and risk tolerance of students at different levels, a tiered strategy of "targeted, stable, and guaranteed" is commonly adopted in actual teaching: Guaranteed strategy: focuses on providing a safety net, prioritizing coverage of frequently tested and easily lost points, emphasizing low risk and high certainty; Steady progress strategy: focuses on stable score improvement, providing continuous training and error correction for key scoring areas, emphasizing a balance between pace and return; Sprint strategy: focuses on breakthroughs, arranging challenging tasks around high-discrimination question types and comprehensive ability improvement, but also carries a higher risk of score fluctuation.

[0004] With the widespread adoption of online teaching platforms, these systems typically contain multiple decision-making and recommendation modules, such as subject diagnosis, weakness identification, task recommendation, schedule planning, risk control, and learning motivation maintenance. These modules often provide suggestions based on different optimization goals: some emphasize maximizing short-term scores, some emphasize comprehensive coverage of fundamental knowledge, and some emphasize the stability and sustainable execution of the learning process. Because each module has a different focus, when multiple modules simultaneously generate plans or recommendations for the same student, issues such as conflicting recommendations, inconsistent task priorities, repeated assignment of the same knowledge points, or frequent plan adjustments can easily arise, ultimately affecting the feasibility of the learning plan and the stability of the learning process.

[0005] Currently, existing personalized teaching and learning path generation systems generally suffer from the following shortcomings in practical applications: First, the superposition of results from multiple agents lacks controllability: In a multi-agent architecture, each functional module typically outputs recommendation results or candidate task sets independently. The system often uses rule merging, simple sorting, or direct superposition to form the final learning plan. This approach does not introduce clear quantitative or resource constraints at the algorithm level. When multiple modules simultaneously provide positive suggestions, it can easily lead to an overblown plan and a number of tasks exceeding the student's execution capacity, making it difficult to ensure that the learning plan has the characteristics of being "less is more" from a mechanistic perspective.

[0006] Second, there is a lack of a unified budget competition and global decision-making mechanism: In existing solutions, learning tasks are often scored independently by each module first, and then merged through Top-N selection or weighted ranking. This process is essentially a superposition of local optima, and it does not construct a unified decision variable to describe the competitive relationship between tasks. It cannot express the constraint logic that "under a limited time budget, choosing a certain task will inevitably crowd out the execution space of other tasks" at the global level, making it difficult for the system to effectively handle the problems of task substitution, mutual exclusion, and priority conflict.

[0007] Third, there is a lack of explicit control over the stability of the plan: In scenarios where learning plans are generated on a daily or weekly basis, some systems frequently adjust the task combination based on the latest diagnostic results or changes in scores. Because the algorithm does not introduce continuity constraints or stability measures for historical plans, the learning path is prone to frequent changes in topics, question types, or training focuses within a short period, reducing students' stickiness to the plan and affecting the integrity of the review loop.

[0008] Fourth, the stratification of "risk-reward, stability-preservation" remains at the level of demonstration or rules: Although the stratification strategy of "risk-reward, stability-preservation" has been widely used in teaching management and product presentation, in existing technologies, this stratification mostly exists in the form of labels, rules, or static grouping, without forming calculable stratification control parameters or constraints at the algorithm level. The stratification results are difficult to directly participate in the decision-making process, and there is a lack of quantifiable technical indicators to evaluate the impact of the stratification strategy on the stability of the plan, risk control, and execution effectiveness.

[0009] Therefore, there is an urgent need to propose a multi-agent sparse collaborative decision-making method that can coordinate time resources, resolve conflicts among multiple objectives, and ensure the stable execution of plans. Summary of the Invention

[0010] To address the aforementioned problems, the present invention aims to provide a hierarchical multi-agent sparse collaborative decision-making method for impact, stability, and protection. The technical solution adopted by the present invention is as follows: A hierarchical multi-agent sparse collaborative decision-making method for balancing stability and security includes the following steps: Step S1: Construct a set of candidate learning actions for the current decision-making cycle, label any candidate learning action in the set with one of the three hierarchical attributes of "rush", "stabilize", and "protect", and estimate the expected time consumption of any candidate learning action; Step S2: Using the candidate learning action set as the evaluation object, obtain the evaluation information, action mapping relationship and agent weight; Step S3: Based on evaluation information, action mapping relationship and agent weight, construct the learning action selection vector as a unified decision variable, and apply sparse constraints, time budget constraints, conflict constraints and inertial adjustment coefficient based on the hierarchical attribute of impulse, stability and protection. Step S4: Driven by evaluation information, the decision variables are iteratively updated; and in each iteration, the aggregated residual direction and adaptive step size are calculated, an inertia term is introduced to inherit the historical decision structure, and sparse projection is used to control the number of learning actions, iterating until convergence. Step S5: Check whether the results of the iteration meet the time budget constraint and conflict constraint. If not, remove the corresponding learning action according to the contribution rate per unit time and the conflict contribution degree until all constraints are met. Step S6: The learning action selection results that satisfy all constraints are split into protection layer subset, stable layer subset and impulse layer subset according to the risk, stability and protection layer labels and output. Then return to step S1 and enter the next decision cycle.

[0011] Compared with the prior art, the present invention has the following beneficial effects: This invention does not employ the method of directly merging or simply sorting the outputs of multiple agents. Instead, it constructs a unified decision vector to map the evaluation information of multiple agents to the same decision space for overall solution. Each agent acts only as a provider of evaluation and constraint information, and its output is collaboratively decided by a unified decision-maker. This algorithmic mechanism avoids the learning plan bloat problem caused by the simple superposition of results from multiple agents.

[0012] This invention does not employ the method of directly merging or simply sorting the outputs of multiple agents. Instead, it constructs a unified decision vector to map the evaluation information of multiple agents to the same decision space for overall solution. Each agent acts only as a provider of evaluation and constraint information, and its output is collaboratively decided by a unified decision-maker. This algorithmic mechanism avoids the learning plan bloat problem caused by the simple superposition of results from multiple agents.

[0013] This invention directly incorporates students' available learning time as a hard budget constraint into a unified decision-making model, enabling different learning actions to compete globally within the same time resource pool. This modeling approach accurately expresses the relationship that "selecting a certain learning action inevitably encroaches on the execution space of other actions," avoiding the unreasonable resource allocation problem caused by merging local scores in existing technologies.

[0014] This invention introduces an action conflict matrix and applies overall conflict constraints, suppressing learning actions with obvious mutual exclusion or interference relationships during the decision-making stage, thus reducing the occurrence of conflicting tasks scheduled within the same period. Compared to methods that rely solely on rule elimination or manual adjustment, this method offers stronger consistency and scalability.

[0015] This invention transforms the hedging, stability, and protection stratification strategy from displaying labels or static rules into a parameter-linked mechanism that can participate in the decision-making process. By adjusting the agent weights, inertia coefficients, and constraint strength at the stratification level, different stratification strategies exhibit differentiated stability and risk preferences at the algorithm level, thereby achieving computability, constraint, and verifiability of the hedging, stability, and protection stratification strategy.

[0016] This invention introduces an inertial adjustment term based on historical decision results during the iterative update process of the unified decision-maker, enabling the learning plan for the current cycle to be progressively adjusted while inheriting the structure of the previous cycle. This mechanism effectively suppresses the problem of frequent oscillations in the learning path during rolling updates, improving students' stickiness and continuity in executing the learning plan.

[0017] This invention has the advantages of simple logic and high accuracy and reliability, and has high practical and promotional value in the field of online education technology. Attached Figure Description

[0018] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope of protection. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a logic flowchart of the present invention. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of this application clearer, the present invention will be further described below with reference to the accompanying drawings and embodiments. The embodiments of the present invention include, but are not limited to, the following embodiments. All other embodiments obtained by those skilled in the art based on the embodiments in this application without inventive effort are within the scope of protection of this application.

[0021] like Figure 1 As shown, this embodiment provides a hierarchical multi-agent sparse cooperative decision-making method for impact, stability, and protection, which includes the following steps: The first step is to construct a set of candidate learning actions for the current decision-making cycle. Each candidate learning action in this set is labeled with one of three hierarchical attributes: "rush," "steady," or "protect." The estimated time consumption for each candidate learning action is also estimated. At the beginning of each decision-making cycle, based on the teaching resource database, question bank, course database, error records, and the results of the most recent one or more assessments, the available learning actions for the current student are summarized to form a set of candidate learning actions.

[0022] Let the set of candidate learning actions be ;in, This represents the total number of candidate learning actions; This represents the set of indices of all learning actions that can be scheduled by the system within the current period; This represents the i-th candidate learning action. .

[0023] Hierarchical attribute annotation is performed on any candidate learning action in the candidate learning action set to obtain: ; And it satisfies: ; ; ; in, Represents a layer-preserving subset; Represents a stable subset; This represents a subset of the punching layer.

[0024] The estimated time consumption for any candidate learning action is expressed as follows: ; in, Represents the duration vector of the learned action; Indicates execution of the first The estimated time required for each learning action ;T represents the matrix transpose operation.

[0025] The second step involves using the candidate learning action set as the evaluation object to obtain evaluation information, action mapping relationships, and agent weights. Specifically: Construct a set of multiple agents, denoted as ;in, This represents the total number of intelligent agents; This represents the m-th agent. Each agent type can be configured according to the system implementation needs, without limiting the specific number or combination. For example, agents may include, but are not limited to, the following types: Subject-specific agents: used to evaluate the necessity and benefits of learning actions from a single subject perspective; Diagnostic agents: used to identify weaknesses and error patterns based on assessment results; Planning agents: used to characterize the prerequisite relationships and learning pace between knowledge points; Risk agents: used to assess the potential risks of overload, fluctuations, or falling behind from the learning plan; Motivation agents: used to assess the impact of learning actions on persistence and the risk of frustration.

[0026] Let the student state vector be... ;in, Represents the dimension of the student state vector; This represents the set of student state vectors; these state vectors can be composed of multi-dimensional features such as knowledge mastery, historical accuracy, error distribution, remaining time, and exam objectives.

[0027] Using the student's state vector as input to any agent, the target or observation vector, mapping matrix, and agent weight coefficients are obtained; the target or observation vector is: ;in, This represents the target or observation vector that the m-th agent is interested in; A set of vectors representing the target or observation; This represents the dimension of the target or observation vector that the m-th agent is interested in; m∈ The target or observation vector can represent the desired knowledge point structure, risk control upper limit, training structure ratio, etc., without being limited to a specific semantic form.

[0028] In addition, the mapping matrix is ;in, This represents the mapping matrix between the m-th agent and the candidate learning actions; Represents a set of real numbers OK, The matrix space of columns; This represents the dimension of the evaluation vector output by the m-th agent; This indicates the number of candidate learning actions.

[0029] Furthermore, the expression for the agent weight coefficients is as follows: , And satisfy: ;in, This represents the weight coefficient of the m-th agent; This represents the nonnegative normalization operator function; This represents the original weight score of the m-th agent; This represents the weight scoring function for the m-th agent.

[0030] The third step involves constructing a learned action selection vector as a unified decision variable based on evaluation information, action mapping relationships, and agent weights. This vector is then subject to sparse constraints, time budget constraints, conflict constraints, and an inertial adjustment coefficient based on the hierarchical attributes of impulse, stability, and protection. Specifically: Let the learning action selection vector in the t-th decision period be... , If the first If a learning action is selected during the current decision-making cycle, then If the first If a learning action is not selected in the current decision-making cycle, then .

[0031] To ensure the learning plan is "less is more," this embodiment applies a sparsity constraint, expressed as follows: ;in, This indicates the number of non-zero components of the vector; This indicates the preset upper limit of sparsity.

[0032] Considering that students have a limited total time available for learning within the current period, a time budget constraint is imposed, the expression of which is: ;in, This indicates the total time budget within the current decision-making cycle.

[0033] To avoid scheduling learning actions with obvious mutual exclusion or interference relationships simultaneously within the same cycle, conflict constraints are imposed, the expression of which is: ;in, Represents the conflict matrix. ; The conflict tolerance threshold is represented by the conflict matrix. It contains several elements ; Indicates the first The first learning action and the first The intensity of conflict when multiple learning actions coexist within the same decision-making cycle.

[0034] An inertial adjustment coefficient based on the stratified properties of impact stability is applied, and its expression is: ;in, Indicates the inertia adjustment coefficient; Indicates the layer preservation adjustment coefficient; Indicates the strata stabilization adjustment coefficient; Represents the risk assessment function; Represents the stability evaluation function; This represents the amplitude limiting operator.

[0035] The fourth step involves iteratively updating the decision variables, driven by evaluation information. In each iteration, the aggregated residual direction and adaptive step size are calculated. An inertia term is introduced to inherit the historical decision structure, and sparse projection is used to control the number of learning actions. This iterative process continues until convergence. Specifically: (41) Let the intermediate learning action selection vector obtained in the nth iteration within the current decision period t be . , and used as iteration variables; .

[0036] (42) For any iteration, the direction of the aggregated residual is calculated, and its expression is: ; in, This represents the aggregated residual vector for the nth iteration within the current decision period t.

[0037] (43) The adaptive step size is obtained, and its expression is: ; ; in, This represents the support set for the nth iteration; The function representing the set of indices corresponding to non-zero components; Represents a hard-threshold sparse operator; This represents the support set projection operator acting on the vector space, which is used to preserve the index belonging to the vector space. The component with the specified value is set to zero, and the rest of the components are set to zero. Represents the numerical stability constant; This represents the Euclidean norm.

[0038] (44) Construct a candidate update vector with an inertia term, the expression of which is: ; in, This represents the candidate update vector constructed in the (n+1)th iteration within the current decision period t, used to temporarily store the intermediate update results after introducing the inertia term before performing sparse projection; This represents the intermediate learning action selection vector obtained in the (n-1)th iteration within the current decision period t; This represents the adaptive step size for the nth iteration within the current decision period t, used to control the direction of the aggregated residual. The magnitude of the correction to the candidate update vector; This represents the inertia adjustment coefficient, used to control the degree to which the current iteration result inherits from the previous iteration result.

[0039] (45) The number of learning actions is controlled by sparse projection, and its expression is: ; in, This represents the intermediate learning action selection vector obtained in the (n+1)th iteration within the current decision period t; This represents the hard-threshold sparse operator.

[0040] Preset iteration termination condition: ;in, This represents the iteration termination threshold, used to determine whether the difference between the results of two adjacent iterations is small enough. This indicates the maximum number of iterations.

[0041] The iteration results are used as the learning action selection vector for the current decision cycle. ; Indicates the iteration step index at which the iteration terminates.

[0042] The fifth step is to check whether the results of the iteration meet the time budget constraint and conflict constraint. If they do not meet the constraints, the corresponding learning actions are removed according to the contribution rate per unit time and the contribution degree of the conflict until all constraints are met.

[0043] (51) Time budget feasibility repair: when If the result satisfies the time budget constraint, then the iteration result satisfies the time budget constraint; otherwise, time budget repair is performed. Let the contribution rate vector per unit time be... , The i-th component of the unit time contribution rate vector is: .

[0044] The contribution rate of the i-th learning action per unit time within the current decision period t. Sort the results by their contribution rate per unit time from smallest to largest, resulting in a sorted set. In sorted sets Execute sequentially from front to back: And any update calculation until satisfied: .

[0045] (52) Feasibility of conflict constraint repair: when If the condition is met, the conflict constraint is satisfied; otherwise, the conflict constraint is imposed: Let the conflict contribution matrix be: ;in, Let represent the conflict contribution matrix within the current decision-making period t; This represents a diagonal matrix composed of vectors.

[0046] The elements of the conflict contribution matrix are: ;in, It indicates the number of decisions made within the t-th decision period. The first learning action and the first The contribution of each learning action to the overall level of conflict when they coexist; This represents the learning action selection vector for the t-th decision period. The There are 1 component, which is used to represent the selection intensity of the i-th learning action in the current decision cycle; This represents the learning action selection vector for the t-th decision period. The A component is used to represent the selection intensity of the j-th learning action in the current decision cycle.

[0047] Select the action pair that contributes the most to the conflict: ; Let the comprehensive contribution judgment function be: ;in, Indicates the first The layer weight parameters corresponding to the impulse-stability-protection layer to which each learning action belongs; Indicates the first The overall retention priority of each learning action within the current decision-making cycle.

[0048] like When, then execute Otherwise, execute. ;in, This represents the index of the first learned action in the action pair that contributes the most to the conflict within the current decision-making cycle; This represents the index of the second learning action in the action pair that contributes the most to the conflict within the current decision-making cycle.

[0049] The sixth step is to split the learning action selection results that satisfy all constraints into a protection layer subset, a stable layer subset, and a rush layer subset according to the risk, stability and protection layer labels, and output them. Then return to the first step and enter the next decision cycle.

[0050] To obtain the learning action selection result that satisfies all constraints, i.e. And form a set of learning actions for the current decision-making cycle: ;in, This represents the set of learning action indices selected within t decision cycles.

[0051] The layers are split into three subsets based on the tags of "heavy," "stable," and "heavy," and their expressions are as follows: ; ; ;in, This represents the set of learning actions for maintaining the layer within t decision cycles; This represents the set of steady-state learning actions within t decision cycles; This represents the set of learning actions for the layer within t decision cycles.

[0052] The expression for the rolling update of the student state vector is: ;in, Let represent the student state vector in the (t+1)th decision cycle; This represents the student state vector in the t-th decision cycle; Let represent the feedback vector in the t-th decision cycle; This represents the state update function, used to integrate historical states with the latest feedback information.

[0053] In this embodiment, the following steps are employed: unified modeling → sparse collaborative decision-making → feasibility repair → tiered output for sprint, stability, and protection → execution feedback collection → status and parameter updates. After completing the parameter update, the system enters the next decision-making cycle and repeats steps one through six, thereby achieving a continuous, stable, and executable personalized learning path generation process for teaching scenarios in the sprint phase.

[0054] The above embodiments are merely preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any changes made based on the design principles of the present invention, or any non-creative modifications made thereon, shall fall within the scope of protection of the present invention.

Claims

1. A method for sparse collaborative decision-making of a multi-agent system with layering, characterized in that, Includes the following steps: Step S1: Construct a set of candidate learning actions for the current decision-making cycle, label any candidate learning action in the set with one of the three hierarchical attributes of "rush", "stabilize", and "protect", and estimate the expected time consumption of any candidate learning action; Step S2: Using the candidate learning action set as the evaluation object, obtain the evaluation information, action mapping relationship and agent weight; Step S3: Based on evaluation information, action mapping relationship and agent weight, construct the learning action selection vector as a unified decision variable, and apply sparse constraints, time budget constraints, conflict constraints and inertial adjustment coefficient based on the hierarchical attribute of impulse, stability and protection. Step S4: Driven by evaluation information, iteratively update the decision variables; Furthermore, in each iteration, the aggregated residual direction and adaptive step size are calculated, an inertial term is introduced to inherit the historical decision structure, and sparse projection is used to control the number of learning actions, iterating until convergence. Step S5: Check whether the results of the iteration meet the time budget constraint and conflict constraint. If not, remove the corresponding learning action according to the contribution rate per unit time and the conflict contribution degree until all constraints are met. Step S6: The learning action selection results that satisfy all constraints are split into protection layer subset, stable layer subset and impulse layer subset according to the risk, stability and protection layer labels and output. Then return to step S1 and enter the next decision cycle.

2. The hierarchical multi-agent sparse collaborative decision-making method for impact, stability, and protection as described in claim 1, is characterized in that, Construct a set of candidate learning actions for the current decision-making cycle, label each candidate learning action in the set with one of three hierarchical attributes: impulse, stability, or protection, and estimate the expected time consumption of each candidate learning action, including the following steps: Based on the teaching resource database, question bank, course database, error records, and assessment results, let the candidate learning action set be . ;in, This represents the total number of candidate learning actions; This represents the i-th candidate learning action. ; Hierarchical attribute annotation is performed on any candidate learning action in the candidate learning action set to obtain: ; And it satisfies: ; ; ; in, Represents a layer-preserving subset; Represents a stable subset; Represents a subset of the punching layer; The estimated time consumption for any candidate learning action is expressed as follows: ; in, Represents the duration vector of the learned action; Indicates execution of the first The estimated time required for each learning action ;T represents the matrix transpose operation.

3. The hierarchical multi-agent sparse collaborative decision-making method for impact, stability, and protection according to claim 2, characterized in that, Using the candidate learning action set as the evaluation object, the evaluation information, action mapping relationship, and agent weights are obtained, including the following steps: Construct a set of multiple agents, denoted as ;in, This represents the total number of intelligent agents; This represents the m-th agent. ; Let the student state vector be... ;in, Represents the dimension of the student state vector; Represents the set of student state vectors; The student state vector is used as the input to any agent, and the target or observation vector, mapping matrix and agent weight coefficients are obtained. The target or observation vector is: ;in, This represents the target or observation vector that the m-th agent is interested in; A set of vectors representing the target or observation; This represents the dimension of the target or observation vector that the m-th agent is interested in; m∈ ; The mapping matrix is ;in, This represents the mapping matrix between the m-th agent and the candidate learning actions; Represents a set of real numbers OK, The matrix space of columns; This represents the dimension of the evaluation vector output by the m-th agent; This indicates the number of candidate learning actions; The expression for the agent weight coefficient is: , And satisfy: ;in, This represents the weight coefficient of the m-th agent; This represents the nonnegative normalization operator function; This represents the original weight score of the m-th agent; This represents the weight scoring function for the m-th agent.

4. The hierarchical multi-agent sparse collaborative decision-making method for impact, stability, and protection according to claim 3, is characterized in that, Based on evaluation information, action mapping relationships, and agent weights, a learning action selection vector is constructed as a unified decision variable. Sparse constraints, time budget constraints, conflict constraints, and an inertial adjustment coefficient based on the impulse-stability-protection hierarchical attribute are applied. The process includes the following steps: Let the learning action selection vector in the t-th decision period be... , If the first If a learning action is selected during the current decision-making cycle, then If the first If a learning action is not selected in the current decision-making cycle, then ; Applying sparsity constraints, the expression is as follows: ;in, This indicates the number of non-zero components of the vector; This indicates the preset upper limit of sparsity; Applying a time budget constraint, its expression is: ;in, This represents the total time budget within the current decision-making cycle; To impose conflict constraints, the expression is: ;in, Represents the conflict matrix. ; The conflict tolerance threshold is represented by the conflict matrix. It contains several elements ; Indicates the first The first learning action and the first The intensity of conflict when multiple learning actions coexist within the same decision-making cycle; An inertial adjustment coefficient based on the stratified properties of impact stability is applied, and its expression is: ;in, Indicates the inertia adjustment coefficient; Indicates the layer preservation adjustment coefficient; Indicates the strata stabilization adjustment coefficient; Represents the risk assessment function; Represents the stability evaluation function; This represents the amplitude limiting operator.

5. The hierarchical multi-agent sparse collaborative decision-making method for impact, stability, and protection according to claim 4, characterized in that, The decision variables are iteratively updated based on evaluation information. Furthermore, each iteration calculates the aggregated residual direction and adaptive step size, including the following steps: Let the intermediate learning action selection vector obtained in the nth iteration within the current decision period t be... , and used as iteration variables; ; For any iteration, the expression for calculating the aggregation residual direction is: ; in, This represents the aggregated residual vector of the nth iteration within the current decision period t; The adaptive step size is obtained by the following expression: ; ; in, Represents the support set for the nth iteration; The function representing the set of indices corresponding to non-zero components; This represents a hard-threshold sparse operator; This represents the support set projection operator acting on a vector space; Represents the numerical stability constant; This represents the Euclidean norm.

6. The hierarchical multi-agent sparse cooperative decision-making method for impact, stability, and protection according to claim 5, is characterized in that, Introducing an inertia term to inherit the historical decision structure and using sparse projection to control the number of learning actions includes the following steps: Construct a candidate update vector with an inertia term, its expression is: ; in, This represents the candidate update vector constructed in the (n+1)th iteration within the current decision period t; This represents the intermediate learning action selection vector obtained in the (n-1)th iteration within the current decision period t; This represents the adaptive step size for the nth iteration within the current decision period t; Indicates the inertia adjustment coefficient; The number of learning actions is controlled using sparse projection, and its expression is: ; in, This represents the intermediate learning action selection vector obtained in the (n+1)th iteration within the current decision period t; This represents the hard-threshold sparse operator; Preset iteration termination condition: ;in, Indicates the iteration termination threshold; Indicates the maximum number of iterations; The iteration result is used as the learning action selection vector for the current decision cycle. ; Indicates the iteration step index at which the iteration terminates.

7. The hierarchical multi-agent sparse cooperative decision-making method for impact, stability, and protection according to claim 6, characterized in that, Check whether the results of the iteration meet the time budget constraint. If not, remove the corresponding learning action according to the contribution rate per unit time until all constraints are met. This includes the following steps: when If the result satisfies the time budget constraint, then the iteration result satisfies the time budget constraint; otherwise, time budget repair is performed. Let the contribution rate vector per unit time be... , The i-th component of the unit time contribution rate vector is: ; This represents the contribution rate of the i-th learning action per unit time within the current decision period t; The contribution rate of the i-th learning action per unit time within the current decision period t Sort the results by their contribution rate per unit time from smallest to largest, resulting in a sorted set. In sorted sets Execute sequentially from front to back: And any update calculation until satisfied: .

8. The hierarchical multi-agent sparse cooperative decision-making method for impact, stability, and protection according to claim 6, characterized in that, Check whether the results of the iteration satisfy the conflict constraints. If not, remove the corresponding learning actions according to their conflict contribution until all constraints are satisfied. This includes the following steps: when If the condition is met, the conflict constraint is satisfied; otherwise, the conflict constraint is imposed: Let the conflict contribution matrix be: ;in, Let represent the conflict contribution matrix within the current decision-making period t; Represents a diagonal matrix composed of vectors; The matrix elements of the conflict contribution matrix are: ;in, It indicates the number of decisions made within the t-th decision period. The first learning action and the first The contribution of individual learning actions to the overall level of conflict when they coexist; This represents the learning action selection vector for the t-th decision period. The One component; This represents the learning action selection vector for the t-th decision period. The One component; Select the action pair that contributes the most to the conflict: ; Let the comprehensive contribution judgment function be: ;in, Indicates the first The layer weight parameters corresponding to the impulse-stability-protection layer to which each learning action belongs; Indicates the first The overall retention priority of each learning action within the current decision-making cycle; like When, then execute Otherwise, execute. ;in, This represents the index of the first learned action in the action pair that contributes the most to the conflict within the current decision-making cycle; This represents the index of the second learning action in the action pair that contributes the most to the conflict within the current decision-making cycle.

9. The hierarchical multi-agent sparse collaborative decision-making method for impact, stability, and protection according to claim 8, characterized in that, The learning action selection results that satisfy all constraints are split into three subsets—protected, stable, and impulsive—based on the risk, stability, and protection layer labels, and then output. This includes the following steps: To obtain the learning action selection result that satisfies all constraints, i.e. And form a set of learning actions for the current decision-making cycle: ;in, This represents the set of learning action indices selected within t decision cycles; The layers are split into three subsets based on the tags of "heavy," "stable," and "heavy," and their expressions are as follows: ; ; ;in, This represents the set of learning actions for maintaining the layer within t decision cycles; This represents the set of steady-state learning actions within t decision cycles; This represents the set of learning actions within t decision cycles; The expression for the rolling update of the student state vector is: ;in, Let represent the student state vector in the (t+1)th decision cycle; This represents the student state vector in the t-th decision cycle; Let represent the feedback vector in the t-th decision cycle; This represents the state update function.