A Multi-Agent Cooperative Task Allocation Method and System Based on Large Models
By acquiring task description text and agent capability data, a semantic blocking risk index is constructed. The observation window length is dynamically adjusted using a time-series prediction model, which solves the problem of insufficient risk assessment of task link blocking propagation in traditional methods and improves the resource utilization of multi-agent systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO ASIDUN ENG TECH TRANSFER CO LTD
- Filing Date
- 2026-06-04
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional multi-agent task allocation methods struggle to effectively assess and mitigate the risk of blocking propagation along task chains in scenarios with complex task dependencies and semantic relationships, resulting in low system resource utilization.
By acquiring task description text, task dependency graph, agent capability description and historical execution data, the semantic blocking strength of the task and the exclusivity index of the agent are determined. A semantic blocking risk index of the combination of the task to be assigned and the candidate agent is constructed, and the observation window length is dynamically adjusted using a time series prediction model to perform task assignment.
It achieves a scheduling shift from local optima to global collaborative optima, effectively reducing the blocking probability in multi-agent systems and improving system resource utilization.
Smart Images

Figure CN122309095A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence and distributed computing technology, specifically to a method and system for multi-agent collaborative task allocation based on a large model. Background Technology
[0002] With the rapid development of artificial intelligence technology, multi-agent collaborative systems driven by large language models are widely used in complex scenarios such as distributed development, cloud-edge collaborative computing, and automated process management. In multi-agent collaborative systems, agents need to collaborate to complete a series of tasks with complex dependencies. These tasks typically have explicit pre- and post-constraints, and the overall process exhibits characteristics such as strong dependencies, long links, semantic coupling, and dynamic blocking propagation.
[0003] Traditional multi-agent task allocation methods primarily employ static rule matching, simple load balancing, or greedy allocation strategies based on the optimal utility of a single agent. These methods typically rely on explicit metrics such as the agent's real-time load, computing power, and estimated task execution time for decision-making. However, in scenarios with complex task dependencies and semantic relationships, traditional methods struggle to effectively assess and mitigate the risk of congestion propagation along the task chain, leading to low system resource utilization. Summary of the Invention
[0004] To address the technical problem of low system resource utilization caused by the difficulty of effectively assessing and avoiding the risk of blocking propagation in task chains under scenarios with complex task dependencies and semantic relationships, the present invention aims to provide a multi-agent collaborative task allocation method and system based on a large model. The specific technical solution adopted is as follows: In a first aspect, the present invention provides a multi-agent collaborative task allocation method based on a large model. The method includes: acquiring task description text, task dependency graph, agent capability description text, semantic embedding vector of the task, and historical execution data and real-time load data of the agents output by the large model; determining the semantic blocking intensity of each task based on the task description text; determining the exclusivity index of each agent based on the agent capability description text and the agent's historical execution data; determining the semantic transmission capability of each task as a prerequisite task based on the task dependency graph, the semantic embedding vector of the task, and the semantic blocking intensity of the task; constructing a semantic blocking risk index for the combination of the task to be allocated and the candidate agents based on the semantic blocking intensity and semantic transmission capability of all prerequisite tasks of the task to be allocated, the exclusivity index of the candidate agents, and the real-time load data of the candidate agents; adjusting the observation window length of the time-series prediction model based on the global semantic blocking risk index, and using the adjusted time-series prediction model to predict the future load and blocking trend of each agent; and allocating tasks based on the prediction results and the semantic blocking risk index; the global semantic blocking risk index is the maximum value among the semantic blocking risk indices of all candidate agents at the current time.
[0005] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: segmenting and filtering stop words in the description text of each task, then calculating word frequencies to construct a word frequency vector for each task; clustering all word frequency vectors according to a preset clustering algorithm to determine multiple semantic clusters, and calculating the intra-cluster sum of squares for each semantic cluster; dividing the description text of each task into continuous blocking semantic units according to preset rules for the continuous occurrence of blocking words; determining the uniformity of blocking influence of blocking words in continuous blocking semantic units based on the frequency of occurrence of preset blocking words in continuous blocking semantic units and the co-occurrence relationship between different preset blocking words; and determining the semantic blocking intensity of each task through weighted fusion calculation based on the intra-cluster sum of squares of each semantic cluster, the uniformity of blocking influence of each continuous blocking semantic unit, and the number of preset blocking words contained in each continuous blocking semantic unit.
[0006] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: counting the frequency of occurrence of preset exclusive words in the ability description text of each agent, and calculating the exclusiveness tendency score of each agent based on the total number of words in the description text; determining the parallel efficiency ratio based on the ratio of the first average response time when the agent performs a single task to the second average response time of each task when performing multiple tasks; performing sliding window statistics on the parallel efficiency ratios corresponding to a preset number of historical parallel execution records to obtain the median and interquartile range; performing extreme value truncation processing on the current parallel efficiency ratio based on the median and interquartile range to determine the corrected parallel efficiency factor; and normalizing the exclusiveness tendency score and the corrected parallel efficiency factor respectively, and then calculating and determining the exclusiveness index of each agent according to a preset fusion rule.
[0007] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: obtaining all downstream descendant tasks of the target task from the task dependency graph; downstream descendant tasks refer to tasks that can be reached from the target task along the dependency direction; the target task is any task in each task; obtaining the semantic embedding vector of the target task, and calculating the similarity between the semantic embedding vector of the target task and each downstream descendant task; applying depth decay to the similarity based on the shortest path length from the target task to each downstream descendant task to obtain the decayed semantic dependency strength; the decay coefficient of the depth decay is determined based on the correlation between the semantic similarity of directly dependent tasks and the execution time interval in historical data; determining the strength distribution characteristics based on the decayed semantic dependency strength of all downstream descendant tasks, and constructing basic transitivity based on the strength distribution characteristics; weighting and correcting the basic transitivity based on the semantic blocking strength of all downstream descendant tasks and the global maximum semantic blocking strength to determine the semantic transitivity when the target task is a preceding task; and determining the semantic transitivity when each task is a preceding task.
[0008] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: acquiring all the preceding tasks of the task to be assigned; determining the blocking propagation contribution value of each preceding task based on the semantic transmission capability and semantic blocking intensity of each preceding task; determining the maximum value among the blocking propagation contribution values as the cumulative semantic pressure of the preceding chain; determining the load factor based on the sum of the remaining processing time in the real-time load data of the candidate agents; and determining the semantic blocking risk index based on the cumulative semantic pressure of the preceding chain, the exclusivity index of the candidate agents, and the load factor.
[0009] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: obtaining the semantic blocking risk index of all tasks awaiting assignment and their respective candidate agents at the current moment; determining the maximum value among the semantic blocking risk indices as the global semantic blocking risk index; establishing a dynamic mapping relationship between the global semantic blocking risk index and the observation window length of the time-series prediction model; the dynamic mapping relationship causes the observation window length to decrease when the value of the global semantic blocking risk index increases, and the observation window length to increase when the value of the global semantic blocking risk index decreases; and adjusting the observation window length of the time-series prediction model according to the dynamic mapping relationship.
[0010] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: adjusting the residual correction stage of the time-series prediction model by weighting it according to the semantic blocking risk index of each task to be assigned and the candidate agent combination, and constructing an improved time-series prediction model that integrates semantic features; predicting the historical load time-series data of each agent according to the improved time-series prediction model, and determining the load change trend, blocking probability, and task execution duration of each agent in the future time period.
[0011] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: assigning tasks that meet preset conditions to candidate agents whose exclusivity index is less than a third preset threshold, whose load factor is less than a fourth preset threshold, and whose blocking probability is less than a fifth preset threshold; the preset conditions are that the semantic blocking intensity is greater than a first preset threshold and the cumulative semantic pressure of the preceding chain is greater than a second preset threshold; assigning tasks that do not meet the preset conditions to candidate agents according to the load balancing principle; and adjusting the execution priority and allocation sequence of each task queue according to the predicted load and blocking trends.
[0012] In conjunction with the first aspect mentioned above, in one possible implementation, the method specifically includes: extracting task description text, task dependency graph, agent capability description text, and semantic embedding vector of the task from the large model parsing output; extracting historical task execution sequence, completion status, blocking events, and concurrent execution performance indicators of the agent from the system operation log; and obtaining the task queue length and remaining processing time of each agent from the real-time monitoring module.
[0013] In a second aspect, the present invention provides a multi-agent cooperative task allocation system based on a large model. The system includes: a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, it implements the steps of any of the methods described above.
[0014] The present invention has the following beneficial effects: This invention obtains task description text, task dependency graph, agent capability description, historical execution data, and real-time load to sequentially determine the semantic blocking strength of the task, the exclusivity index of the agent, and the semantic transmission capability of the task as a prerequisite. It then constructs a semantic blocking risk index for combinations of tasks to be assigned and candidate agents, and dynamically adjusts the observation window length of the time-series prediction model based on the global risk index. Task allocation is then performed using the prediction results. The overall process organically integrates the semantic risk of the task, the exclusivity of the agent, and the real-time load, achieving a scheduling transformation from local optimum to global collaborative optimum, effectively reducing the blocking probability in multi-agent systems. This solves the technical problem of existing methods struggling to effectively assess and avoid the risk of blocking propagation along the task chain in scenarios with complex task dependencies and semantic relationships, leading to low system resource utilization. Attached Figure Description
[0015] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0016] Figure 1 This is a flowchart illustrating a multi-agent cooperative task allocation method based on a large model, provided in one embodiment of the present invention. Detailed Implementation
[0017] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of the multi-agent cooperative task allocation method and system based on a large model proposed in this invention. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.
[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0019] The specific scheme of the multi-agent cooperative task allocation method and system based on a large model provided by the present invention will be described in detail below with reference to the accompanying drawings.
[0020] Please see Figure 1 This illustrates a flowchart of a multi-agent cooperative task allocation method based on a large model provided by an embodiment of the present invention. The method includes: S101. Obtain the task description text, task dependency graph, agent capability description text, semantic embedding vector of the task, and historical execution data and real-time load data of the agent from the output of the large model.
[0021] Optionally, task description text, task dependency graph, agent capability description text, semantic embedding vector of the task, and historical execution data and real-time load data of the agent can be collected from the large model parsing output terminal and the system operation log and real-time monitoring module, respectively.
[0022] In some embodiments of the present invention, obtaining the task description text, task dependency graph, agent capability description text, semantic embedding vector of the task, and historical execution data and real-time load data of the agent from the output of the large model includes: extracting the task description text, task dependency graph, agent capability description text, and semantic embedding vector of the task from the parsed output of the large model; extracting the historical task execution sequence, completion status, blocking events, and concurrent execution performance indicators of the agent from the system operation log; and obtaining the task queue length and remaining processing time of each agent from the real-time monitoring module.
[0023] Specifically, it receives the parsed output generated by the large model for the current collaborative task scenario, and extracts the description text of each task, the dependency graph between tasks, the capability description text of each agent, and the semantic embedding vector corresponding to each task from the output; it reads the running log files generated by each agent and the task scheduling module, and extracts the time sequence information, status information, and concurrent performance indicators of the agents related to the execution of historical tasks; and it collects the current task queue length of each agent and the remaining processing time of each task in the queue in real time through the monitoring agent deployed on each agent node or through the status query interface of the central scheduler.
[0024] For example, after a large model completes the parsing of user needs or project plans, its output contains multiple fields: the name and detailed description of each task, the directed dependencies between tasks (e.g., task B can only begin after task A is completed), the functional description or capability declaration of each agent, and a fixed-dimensional numerical vector generated after each task passes through the large model's embedding layer. By parsing these fields, we obtain a set of task description texts, a task dependency graph stored in the form of an adjacency list or edge list, a set of agent capability description texts, and a semantic embedding vector corresponding to each task.
[0025] For completed tasks, the start time, end time, completion status (success or failure), and whether any blocking events occurred during task execution (e.g., waiting due to resource unavailability or unmet dependencies) are retrieved from the logs. For agents, records of queue length changes across all historical task executions are filtered out. The response time series of an agent executing a task alone is calculated within a time window with a queue length of 1. Simultaneously, the average response time of each task when executing multiple tasks concurrently is calculated within a time window with a queue length greater than 1. This data is stored in a time-series database indexed by timestamps, forming a historical performance profile for each agent.
[0026] Each time a task allocation decision is triggered, a status query request is sent to all candidate agents. Each agent returns the number of tasks in its current task queue that have not yet started or completed (i.e., queue length), and the estimated processing time required for each task in the queue (i.e., remaining processing time). The collected queue lengths and remaining processing times are aggregated to form a real-time load snapshot for each agent.
[0027] S102. Determine the semantic blocking strength of each task based on the task description text.
[0028] Optionally, natural language processing techniques can be used to perform semantic analysis on the description text of each task. By means of clustering, blocking word feature extraction and weighted fusion, the inherent risk of the task causing blocking at the semantic level can be quantified.
[0029] For example, firstly, word segmentation and stop word filtering are performed on the description text of each task, and the corresponding word frequency vector is constructed after counting word frequencies. Then, a pre-defined clustering algorithm is used to cluster the word frequency vectors of all tasks, grouping tasks with similar word usage patterns into the same semantic cluster, and calculating the intra-cluster sum of squares for each cluster, which reflects the consistency of word frequency distribution within the cluster. Within the description text of each task, several consecutive blocking semantic units are divided according to predefined rules for the continuous occurrence of blocking words (such as "wait," "dependency," "block," "synchronization," etc.). For each consecutive unit, the frequency of occurrence of each blocking word and the co-occurrence relationship between different blocking words are analyzed to determine the uniformity of blocking influence of each blocking word in the unit. Finally, the semantic blocking strength of the task is calculated by weighted fusion, combining the intra-cluster sum of squares of the cluster to which each task belongs, the uniformity of blocking influence of each consecutive unit of the task, and the number of blocking words contained in each unit.
[0030] Further, in some embodiments of the present invention, determining the semantic blocking intensity of each task according to the task description text includes: performing word segmentation and stop word filtering on the description text of each task, and then counting the word frequencies to construct a word frequency vector for each task; clustering all the word frequency vectors according to a preset clustering algorithm to determine multiple semantic cluster classes, and calculating the within-cluster sum of squares for each semantic cluster class; in the description text of each task, dividing out continuous blocking semantic units according to the continuous occurrence rule of preset blocking words; determining the unified blocking influence of the blocking words in the continuous blocking semantic unit according to the occurrence frequency of the preset blocking words in the continuous blocking semantic unit and the co-occurrence relationship between different preset blocking words; and determining the semantic blocking intensity of each task through weighted fusion calculation based on the within-cluster sum of squares of each semantic cluster class, the unified blocking influence of each continuous blocking semantic unit, and the number of preset blocking words included in each continuous blocking semantic unit.
[0031] Specifically, read the description text of each task. First, use a word segmentation tool to segment the text into an independent sequence of words, and then filter out the stop words according to a preset stop word list (such as common words without actual semantic contribution like "of", "already", "is", etc.). For each remaining word after filtering, count its occurrence times in the current task description text to obtain the original word frequency value. Obtain the total number of words in each task description text after word segmentation and stop word filtering. If the total number of words is greater than zero, divide the original word frequency value of each word by the total number of words to obtain the normalized word frequency; if the total number of words is zero (i.e., the task description text is empty or only contains stop words), directly set the word frequency vector of this task to a zero vector. Arrange the normalized word frequency values or zero vectors in the order of the preset global vocabulary list to form the word frequency vector corresponding to this task. Each dimension in this vector corresponds to a specific word, and the dimension value is the normalized frequency (or zero) of this word in the current task description.
[0032] Take the word frequency vectors of all tasks as input and perform clustering using a preset clustering algorithm (such as the K-means clustering algorithm). After clustering, each word frequency vector is assigned to a unique semantic cluster class, and tasks in the same cluster class have similar patterns in word frequency distribution. For each cluster class, calculate its within-cluster sum of squares: that is, the sum of the squares of the distances between all word frequency vectors in this cluster class and the centroid vector of this cluster class. The smaller the within-cluster sum of squares, the more consistent the word frequency distribution within the cluster and the more concentrated the semantic pattern; the larger the within-cluster sum of squares and when the number of samples within the cluster is relatively large, it indicates that the semantic pattern represented by this cluster class has a high degree of typicality.
[0033] A pre-defined set of blocking vocabularies is established, containing words such as wait, depend, block, synchronize, lock, mutual exclusion, and exclusive. For each task's description text, the vocabulary is scanned sequentially to determine if each word belongs to the pre-defined blocking vocabulary set. When the number of non-blocking words between two blocking words does not exceed a pre-defined threshold (e.g., two), these blocking words and the non-blocking words between them are grouped into a single continuous blocking semantic unit. For isolated blocking words, the blocking word, along with one non-blocking word on each side, is defined as an independent continuous unit. Through these rules, the description text of each task is divided into several groups of continuous blocking semantic units, each unit representing a continuous segment of blocking-related semantics in the task description.
[0034] For each consecutive blocking semantic unit, the frequency of occurrence of each preset blocking word is counted, and the normalized word frequency is calculated for each position within the unit. Further analysis is conducted on the co-occurrence of different blocking words at the same position: if multiple different blocking words appear simultaneously at a position, the mutual interference between these words weakens the independent contribution of each individual word. Based on a preset decay rule (e.g., exponential decay), co-occurrence interference correction is applied to the word frequency of each blocking word at each position, and then the fluctuation degree of the corrected word frequency across all positions within the unit is calculated. The smaller the fluctuation degree, the more uniform the distribution of the word's occurrence throughout the unit, the stronger its dominance, and the smaller its blocking influence uniformity value; conversely, the larger the fluctuation degree, the greater the impact.
[0035] For example, the uniformity of the impact of blocking satisfies the following formula: ; in, for The first blocking word In the semantic cluster class, the first The blocking of consecutive blocking semantic units affects uniformity; Preset blocking vocabulary set The first in One blocking word; For the first These semantic clusters are obtained by K-means clustering. The first in the description text of a certain task Group of consecutive blocking semantic units; This represents the total number of positions in consecutive blocking semantic units (i.e., the number of words contained within a unit). For the first unit Each position (vocabulary number). For the first The blocking word in the first The first cluster class The first unit The normalized word frequency at each position represents the relative frequency of the blocking word appearing at that position; For the first All positions within each unit (total) The average word frequency of (number of words); The total number of words in the preset blocking vocabulary set; To remove Indexes of other candidate blocking words besides; Indicates the first At the [number] position, the [number]th Whether the blocking word appears (1 indicates it appears, 0 indicates it does not appear); This is a hyperparameter with a value of 3, used to adjust the intensity of exponential decay; For An exponential function with base is used to attenuate co-occurring interference terms.
[0036] Calculate the degree to which the word frequency at this position deviates from the unit mean. A positive value indicates that the blocking word appears more frequently than the average, while a negative value indicates that it appears less frequently than the average. The co-occurrence of other blocking words at this position is averaged and decayed. If no other blocking words appear (all d=0), this term is 1; if many other blocking words appear, the contribution of each appearing word is... In this plan , The summation average is much less than 1, which weakens the interference of the co-occurrence of different blocking words at this position on the dominance calculation of the current blocking word. Multiply the two items, square them, and then divide... The calculation involves summing, averaging, and finally taking the square root. The overall result is the root mean square of the unique contribution of the blocking word at each position within the unit, representing the degree to which the word dominates the blocking semantics. Finally, the square root function is used to compress the overall dimensions, making the result smoother. The larger the value, the stronger the dominance of the blocking word (and the less interference from other words).
[0037] It should be understood that This ensures that other blocking words that appear at the same time and in the same position as the current word will not cause significant interference, while allowing certain collaborative semantic information to be preserved in special cases (such as when multiple blocking words appear closely and consecutively). This value has been verified by a large number of experiments and can make the uniformity of blocking influence well indicate the dominance of words while maintaining numerical stability. It can be modified based on the effect in the future, but this invention does not limit it.
[0038] It should be noted that the denominator in the above formula... Require | means that the preset blocking vocabulary set contains at least two different words; otherwise, the fraction is undefined. In this embodiment of the invention, a set containing multiple blocking words (as in the example above) is used to ensure the mathematical validity of the formula.
[0039] Finally, the semantic cluster to which each task belongs is determined, and the intra-cluster sum of squares for that cluster is obtained. For each consecutive blocking semantic unit of a task, the uniformity of blocking impact corresponding to that unit and the number of preset blocking words contained within that unit are obtained. First, Softmax normalization is applied to the vector formed by the number of blocking words of all consecutive units of the task within the current cluster to obtain the weight of each unit, so that units with more blocking words receive higher weights. Then, the uniformity of blocking impact of each unit is subtracted from 1, multiplied by its corresponding weight, and then averaged across all units of the task within that cluster to obtain the average unit contribution of the task within that cluster. Finally, this average unit contribution is multiplied by (1 minus the intra-cluster sum of squares) to obtain the semantic blocking intensity of the task. If the task has no consecutive blocking semantic units within its cluster, the semantic blocking intensity of the task is directly set to zero. This intensity value comprehensively reflects the degree of clustering, consistency, and density of blocking semantics in the task description text, and the higher the value, the higher the inherent blocking risk of the task.
[0040] For example, semantic blocking strength satisfies the following formula: ; in, For the current task to be computed, To perform hard clustering of the word frequency vectors of the task description text using a pre-defined clustering algorithm (such as K-means), the task... The semantic cluster index that is uniquely assigned; For the current task to be computed semantic blocking strength; For the first The normalized sum of squares within each cluster, ranging from [0,1]. The smaller the value, the more concentrated the word frequency distribution and the more consistent the semantic patterns of the task description text within that cluster; the larger the value, the more dispersed the text within the cluster. For the first The total number of consecutively blocked semantic units obtained by dividing the description text of task v in each cluster class according to the preset rule of consecutive occurrence of blocking words. If (If no consecutive units matching the conditions are detected in the task description), then directly let ; For the first An index of consecutive blocking semantic units; For the first Within each cluster class, the task The The blockage of consecutive units affects uniformity; For the first Cluster-type tasks The The weight of each unit is obtained by normalizing a vector consisting of the number of blocking words in all consecutive units within that cluster using Softmax. Softmax is a normalization exponential function that maps the input vector to a probability distribution, giving higher weights to larger values. , For the first input elements The value of the exponential function, For all input elements (including the first one) The sum of the exponential function values of the numerators is used to normalize the numerator so that the sum of all outputs is 1.
[0041] S103. Determine the exclusivity index of each agent based on the agent's capability description text and the agent's historical execution data.
[0042] Optionally, semantic information related to parallel processing characteristics is first extracted from the agent's capability description text. For example, by analyzing whether words implying serial execution, resource exclusivity, or non-parallel operation frequently appear in the description, a score reflecting the agent's subjective exclusivity tendency is obtained. Simultaneously, the response time differences of the agent under different concurrent loads are statistically analyzed from historical execution data: the first average response time is recorded when the agent processes a task alone, and the second average response time is recorded when the agent processes multiple tasks simultaneously. By comparing the performance changes under these two states, an efficiency factor reflecting the agent's objective parallel capability is obtained. The semantic tendency score and efficiency factor are then fused and corrected to obtain the exclusivity index for each agent. This index is used to characterize the agent's sensitivity to blocking risks in parallel task processing scenarios.
[0043] Furthermore, in some embodiments of the present invention, the exclusivity index of each agent is determined based on the agent's capability description text and the agent's historical execution data, including: counting the frequency of occurrence of preset exclusivity words in the capability description text of each agent, and calculating the exclusivity tendency score of each agent in combination with the total number of words in the description text; determining the parallel efficiency ratio based on the ratio of the first average response time when the agent executes a single task to the second average response time of each task when executing multiple tasks; performing sliding window statistics on the parallel efficiency ratios corresponding to a preset number of historical parallel execution records to obtain the median and interquartile range; performing extreme value truncation processing on the current parallel efficiency ratio based on the median and interquartile range to determine the corrected parallel efficiency factor; and normalizing the exclusivity tendency score and the corrected parallel efficiency factor respectively, and then calculating and determining the exclusivity index of each agent according to a preset fusion rule.
[0044] Specifically, the capability description text for each agent is obtained (e.g., the self-description text during initial registration and the incremental update description of its capabilities by the large model after each task). After word segmentation and stop word filtering of this text, the filtered word sequence is traversed, and the total number of occurrences of words belonging to a preset exclusive word set is counted. For example, the preset exclusive word set may include words reflecting the characteristics of exclusive or serial processing, such as exclusive, one at a time, serial, locked resources, non-parallel, single-threaded, mutual exclusion, and atomic operation. The total number of words in the description text after word segmentation and stop word filtering is obtained. If the total number of words is greater than zero, the total number of occurrences of exclusive words is divided by the total number of words to obtain the exclusiveness tendency score; if the total number of words is zero (i.e., the description text is empty or contains only stop words), the exclusiveness tendency score of the agent is set to zero. This score reflects the agent's subjective exclusiveness tendency at the semantic description level; the higher the score, the denser the exclusive terms in the description.
[0045] The system filters out all time windows in historical execution data where the agent's task queue length is always 1. For each such window, it records the response time of the tasks executed within it and calculates the arithmetic mean of the response times of all windows to obtain the first average response time. If no time window with a queue length of 1 exists in the historical data, the first average response time is set to a preset default baseline value (e.g., the average single-task response time of all agents in the system, or a very small positive number greater than 0, such as 0.001). Simultaneously, the system filters out all time windows where the agent's task queue length is greater than 1. For each such window, it records the response time of each task within it and calculates the arithmetic mean of the response times of all tasks in all windows to obtain the second average response time. If no time window with a queue length greater than 1 exists in the historical data, the parallel efficiency ratio is directly set to 1, indicating that no performance correction is performed when concurrent data is lacking. In other cases, the first step is to determine if the first average response time is greater than zero. If it is, the second average response time is divided by the first average response time to obtain the parallel efficiency ratio. If the first average response time is equal to zero (e.g., in the extreme case where all single-task response times are zero), the parallel efficiency ratio is set to a preset upper limit (e.g., 10) to indicate that the agent completes almost instantaneously when executed alone, with extremely low risk of performance degradation. This ratio characterizes the degree of performance degradation of the agent in concurrent processing compared to its idle state.
[0046] The system maintains a record of the parallel efficiency ratios for each agent in the most recent (e.g., 100) concurrent execution scenarios. Whenever a new concurrent execution record is generated and its corresponding parallel efficiency ratio is calculated, the system adds this ratio to a sliding window sequence and removes the oldest record, ensuring that the window always maintains a preset number (e.g., 100) of efficiency ratios. For all parallel efficiency ratio values within the current sliding window, they are sorted in ascending order. The value at the middle of the sorted sequence is taken as the median. The first quartile (value at the 25th percentile) and the third quartile (value at the 75th percentile) are calculated, and the difference between them is the interquartile range. The median reflects the typical efficiency ratio level of the current agent in recent concurrent scenarios, while the interquartile range reflects the degree of fluctuation and dispersion of this efficiency ratio.
[0047] Obtain the currently calculated parallel efficiency ratio, along with the median and interquartile range obtained through sliding window statistics. Set the lower truncation limit to the median minus twice the interquartile range, and the upper truncation limit to the median plus twice the interquartile range. If the current parallel efficiency ratio is less than the lower truncation limit, correct it to the lower truncation limit value; if the current parallel efficiency ratio is greater than the upper truncation limit, correct it to the upper truncation limit value; otherwise, keep the original value unchanged. Through the above extreme value truncation processing, the interference of unreasonable efficiency ratios caused by occasional abnormal concurrency events (such as system jitter or temporary resource contention) on subsequent calculations is effectively eliminated. The result after truncation processing serves as the corrected parallel efficiency factor, which is more stable and can reflect the true performance characteristics of the agent under normal concurrent load.
[0048] The exclusivity tendency scores of all agents are obtained. Based on global statistics, their maximum and minimum values are determined. A maximum-minimum normalization method is used to map each agent's exclusivity tendency score to the interval between 0 and 1, resulting in a normalized tendency score. Specifically, the original score is subtracted from the global minimum and then divided by the difference between the global maximum and minimum. If the global maximum equals the global minimum, the normalized tendency score of that agent is directly set to 0.5 or a preset default value. Similarly, the corrected parallel efficiency factors of all agents are collected. Based on global statistics, their maximum and minimum values are determined, and a maximum-minimum normalization method is used to map them to the interval between 0 and 1, resulting in a normalized efficiency factor. After obtaining the normalized tendency scores and normalized efficiency factors, the exclusivity index of each agent is calculated according to a preset fusion rule. A preferred fusion rule is as follows: add 1 to the normalization efficiency factor and divide by 2 to map it from the interval [0, 1] to the interval [0.5, 1]. Then multiply by the normalization propensity score to obtain the exclusivity index. The index ranges from 0 to 1. The larger the value, the stronger the exclusivity of the agent.
[0049] S104. Based on the task dependency graph, the semantic embedding vector of the task, and the semantic blocking strength of the task, determine the semantic transmission capability of each task when it is used as a preceding task.
[0050] Optionally, for the target task to be evaluated, first obtain all downstream descendant tasks of the target task from the task dependency graph (i.e., tasks that can be directly or indirectly reached from the target task along the dependency direction). If there are no downstream descendant tasks, the semantic transitivity of the task is set to zero. For each downstream descendant task, obtain the semantic embedding vector of the target task and the semantic embedding vector of the downstream task, and calculate the similarity between them (e.g., cosine similarity). Considering the impact of dependency chain length on propagation effect, a deep decay process is applied to the above similarity based on the shortest path length from the target task to each downstream descendant task: the shorter the dependency path, the smaller the decay; the longer the path, the larger the decay. The decay coefficient used in the deep decay is not a fixed constant, but is dynamically determined based on the correlation between the semantic similarity of task pairs with direct dependencies in historical data and their execution time interval. If directly dependent task pairs with high similarity usually have a shorter time interval (i.e., fast blocking propagation), the decay coefficient is smaller, making the decay slower; conversely, a larger value is used, making the decay faster.
[0051] Further, in some embodiments of the present invention, the semantic transitivity of each task as a preceding task is determined based on the task dependency graph, the semantic embedding vector of the task, and the semantic blocking strength of the task. This includes: obtaining all downstream descendant tasks of the target task from the task dependency graph; downstream descendant tasks refer to tasks that can be reached from the target task along the dependency direction; the target task is any task in each task; obtaining the semantic embedding vector of the target task and calculating the similarity between the semantic embedding vector of the target task and the semantic embedding vector of each downstream descendant task; applying depth decay to the similarity based on the shortest path length from the target task to each downstream descendant task to obtain the decayed semantic dependency strength; the decay coefficient of the depth decay is determined based on the correlation between the semantic similarity of directly dependent tasks and the execution time interval in historical data; determining the intensity distribution characteristics based on the decayed semantic dependency strength of all downstream descendant tasks and constructing the basic transitivity based on the intensity distribution characteristics; weighting and correcting the basic transitivity based on the semantic blocking strength of all downstream descendant tasks and the global maximum semantic blocking strength to determine the semantic transitivity of the target task as a preceding task; and determining the semantic transitivity of each task as a preceding task.
[0052] The target task is any task within each task; the decay coefficient of depth decay is determined based on the correlation between the semantic similarity of directly dependent tasks and the execution time interval in historical data.
[0053] Specifically, the task dependency graph generated by the large model is read. This graph records the pre- and post-constraint relationships between tasks in the form of directed edges. For the target task to be evaluated, a breadth-first or depth-first traversal is performed along the directions of the directed edges, collecting all reachable task nodes. These task nodes constitute the set of downstream descendant tasks of the target task, that is, the set of subsequent tasks that may be directly or indirectly affected by the target task after its execution. If a cyclic dependency is detected during the traversal (e.g., caused by a large model parsing error), a topological sorting check can be performed in advance and reverse edges can be removed, retaining only positive dependencies. If the target task has no downstream descendant tasks (i.e., no other tasks can be reached from this task), the semantic transitivity of the task is directly set to zero, and subsequent calculation steps are skipped.
[0054] Then, the semantic embedding vector corresponding to each task is obtained from the embedding layer output of the large model. This vector is a sequence of floating-point numbers of a preset dimension, and has been normalized by the L2 norm (magnitude of 1). For the target task and one of its downstream descendant tasks, the dot product of the two semantic embedding vectors is calculated. Since both have been normalized, the dot product result is the cosine similarity. To ensure that only positive semantic associations are considered in subsequent calculations, negative similarities are truncated to zero, that is, only non-negative similarity values are retained. The above dot product and truncation operations are repeated for the target task and each downstream descendant task to obtain the semantic similarity between each downstream descendant task and the target task. This similarity reflects the degree of association between the two tasks in terms of semantic content. The higher the similarity, the more consistent the upstream and downstream tasks are semantically, and the easier it is for upstream blockages to propagate to downstream tasks.
[0055] In the task dependency graph, the number of edges (i.e., the shortest path length) of the shortest path from the target task to each downstream descendant task is calculated. For direct descendant tasks, the path length is 1; for indirect descendant tasks, the path length is greater than 1. A decay coefficient is determined based on statistical information from all task pairs with direct dependencies in historical data. Specifically, the semantic similarity between the upstream and downstream tasks in each direct dependency pair, as well as the time interval between their actual execution, are collected, and the Pearson correlation coefficient between semantic similarity and time interval is calculated. A strong positive correlation indicates faster transmission between task pairs with high semantic similarity, and a smaller decay coefficient is used; a strong negative correlation indicates slower transmission despite high similarity, and a larger decay coefficient is used; if there are insufficient samples, a default value is used. After obtaining the decay coefficient, the semantic similarity of each downstream descendant task is multiplied by an exponential decay factor to obtain the decayed semantic dependency strength. The shorter the path length, the closer the decay factor is to 1; the longer the path length, the closer the decay factor is to 0.
[0056] For example, the weakened semantic dependency strength satisfies the following formula: ; ; in, Indicates the upstream task. Indicates downstream descendant tasks; For semantic dependency strength, The strength of the semantic dependency after decay; and For normalized semantic embedding vectors (L2 norm is 1), the dot product is the cosine similarity; Negative similarity is truncated to 0 to avoid the propagation of negative correlation; From arrive The shortest path length (number of edges); This is the decay coefficient, a dependency decay rate statistically derived from historical data. The statistical method is as follows: collect data from all historical projects on any two items with a direct dependency relationship (i.e., ...). semantic similarity of tasks And the time difference between their completion times (the interval between the upstream completion and the downstream start). For all direct dependency pairs, compute and Pearson correlation coefficient If the number of samples directly dependent on the pair is less than 2, or If the variance is 0, the correlation coefficient cannot be calculated, so an empirical default value is set. .if A negative value indicates that direct dependencies with high semantic similarity lead to shorter intervals (i.e., faster blocking propagation), in which case the decay rate should be smaller; if If positive, the attenuation rate should be relatively large (ranging from 0 to 2). Specifically, the definition is... .so, The larger the value (the stronger the positive correlation), the stronger the correlation. The smaller the value, the slower the decay. The smaller the value (the stronger the negative correlation). The larger the value, the faster the decay. The calculation is recalculated after each project cycle to adapt to changes in the development environment. For an exponentially decaying function, when The time index is 1, and decreases as depth increases.
[0057] Determine the decayed semantic dependency strength values of all downstream descendant tasks of the target task, forming a numerical sequence. Calculate the arithmetic mean of this sequence, reflecting the overall level of downstream dependency strength. If the decayed semantic dependency strength of all downstream tasks is zero, directly set the semantic transitivity of the target task to zero and skip subsequent calculation steps. Otherwise, calculate the coefficient of variation (i.e., standard deviation divided by the mean) of the sequence, reflecting the dispersion of the strength values: the smaller the coefficient of variation, the closer the dependency strengths of each downstream task are, and the blockage tends to propagate uniformly; the larger the coefficient of variation, the more significant the differences in dependency strengths, and the blockage may only propagate along a few strongly correlated paths. Compress the mean based on the coefficient of variation: when the coefficient of variation is very small, the compression is small, and the basic transitivity is close to the mean; when the coefficient of variation is very large, the compression is large, and the basic transitivity is significantly reduced. The basic transitivity is obtained by multiplying the mean by the standardized value of the compression factor.
[0058] The semantic blocking strength of each of the downstream descendant tasks of the target task is obtained, and the arithmetic mean of these strengths is calculated. Simultaneously, the system identifies the maximum semantic blocking strength among all tasks, which is taken as the global maximum semantic blocking strength. The average downstream blocking strength is divided by the global maximum semantic blocking strength (with a very small positive number added to avoid division by zero) to obtain a normalized ratio. This ratio reflects the overall blocking sensitivity level of the downstream tasks: a higher ratio indicates that the downstream tasks themselves generally have a higher blocking risk, and that upstream blocking is more likely to trigger new blocking. This normalized ratio is multiplied by the aforementioned basic propagation capability to obtain the semantic propagation capability when the target task is the preceding task. If the set of downstream descendant tasks is empty, the semantic propagation capability is directly set to zero.
[0059] For example, semantic transferability satisfies the following formula: ; in, This refers to the target task currently being evaluated, i.e., its role as a prerequisite task. For the target task The set of all downstream descendant tasks (from) (Starting from the task that can be reached along the dependent direction); For the target task The arithmetic mean of the decayed semantic dependency strength of all downstream descendant tasks. This value reflects the average degree to which downstream tasks are semantically affected by upstream blocking. The larger the value, the stronger the overall semantic dependency of the downstream on the upstream. The coefficient of variation is the ratio of the standard deviation to the mean of the semantic dependency strength sequence after decay. When calculating, a very small positive number is added to the denominator, such as 0.001 to prevent division by zero. The coefficient of variation is used to measure the dispersion of downstream dependency strength. The smaller the value, the closer the dependency strength of each downstream task is, and the blockage tends to spread evenly. The larger the value, the more significant the difference in dependency strength, and the blockage may only spread along a few strongly correlated paths. This is the average semantic blocking intensity of all downstream tasks, reflecting the inherent blocking risk level of the downstream tasks themselves. A higher value indicates that the downstream tasks are generally more prone to blocking. This is the maximum semantic blocking strength of all tasks in the entire system (global maximum blocking strength), used to normalize the downstream average blocking strength. It is a very small positive number (e.g., 0.001) to prevent the denominator from being zero.
[0060] The compression factor is used to compress the baseline based on the coefficient of variation of the downstream dependent intensity: the smaller the coefficient of variation (the more uniform the intensity distribution), the closer the compression factor is to 1, keeping the average value unchanged; the larger the coefficient of variation (the more dispersed the intensity distribution), the smaller the compression factor, and the average value is significantly reduced. The downstream blocking sensitivity weighting factor is calculated by dividing the average blocking intensity of the downstream task by the global maximum blocking intensity, resulting in a normalized ratio between 0 and 1. A larger ratio indicates that downstream tasks generally have a higher risk of blocking, and that upstream blocking is more likely to trigger new blocking; therefore, the upstream transmission capacity should be increased. Conversely, a smaller ratio indicates that downstream tasks are less prone to blocking, and even strong semantic transmission may not lead to actual blocking; therefore, the upstream transmission capacity should be reduced. Characterizing the target task When used as a preceding task, its own blocking risk (reflected by semantic blocking strength) has the potential to propagate downstream and trigger cascading blocking, taking into account the semantic dependence of downstream tasks, the distribution consistency of dependency strength, and the blocking sensitivity of downstream tasks themselves.
[0061] It should be noted that the above formula is only applicable to... Valid at the time; when At this time, the semantic transmission capability is directly set to zero, and the formula calculation is no longer executed.
[0062] Finally, traverse each task node in the task dependency graph, take each task as the target task in turn, repeat the above steps, and determine the semantic transfer capability of each task when it is used as a prerequisite task.
[0063] S105. Based on the semantic blocking strength and semantic transmission capability of all the preceding tasks of the task to be assigned, the exclusivity index of the candidate agent, and the real-time load data of the candidate agent, construct the semantic blocking risk index of the combination of the task to be assigned and the candidate agent.
[0064] Optionally, for each currently awaiting assignment task, the risk level formed by its combination with each candidate agent is evaluated separately. This evaluation integrates the blocking threat from the upstream dependency chain, the agent's sensitivity to concurrent tasks, and the agent's current resource occupancy status.
[0065] For example, for a given task to be assigned, first obtain all its predecessor tasks in the task dependency graph, and determine the degree of blocking impact of each predecessor task on the current task based on the semantic blocking strength and semantic transitivity of each predecessor task. Taking into account the impact of all predecessor tasks, the most significant threat is taken as the upstream cumulative pressure borne by the current task. This upstream cumulative pressure reflects the level of blocking risk accumulated by the task before execution.
[0066] Furthermore, in some embodiments of the present invention, a semantic blocking risk index for the combination of the task to be assigned and the candidate agent is constructed based on the semantic blocking strength and semantic transmission capability of all the preceding tasks of the task to be assigned, the exclusivity index of the candidate agent, and the real-time load data of the candidate agent. This includes: obtaining all the preceding tasks of the task to be assigned; determining the blocking propagation contribution value of each preceding task based on the semantic transmission capability and semantic blocking strength of each preceding task; determining the maximum value among the blocking propagation contribution values as the cumulative semantic pressure of the preceding chain; determining the load factor based on the sum of the remaining processing time in the real-time load data of the candidate agents; and determining the semantic blocking risk index based on the cumulative semantic pressure of the preceding chain, the exclusivity index of the candidate agent, and the load factor.
[0067] Specifically, all predecessor tasks (i.e., task nodes directly pointing to the task) in the task dependency graph of the task to be assigned are read. If the task to be assigned has no predecessor tasks, the set of blocking propagation contribution values is empty, and the accumulated semantic pressure of subsequent predecessor chains is directly set to zero. For each existing predecessor task, the semantic transitivity (characterizing the strength of its blocking propagation downstream) and semantic blocking strength (characterizing the degree of risk of it causing blocking itself) of the predecessor task are obtained, and the two are multiplied to obtain the blocking propagation contribution value of the predecessor task to the task to be assigned. The physical meaning of this product is: even if a predecessor task has a high blocking strength, if its semantic transitivity is weak, its blocking contribution to the current task is limited; conversely, if its transitivity is strong but its blocking strength is low, its contribution is also limited. The larger the product value, the more serious the blocking threat that the predecessor task poses to the current task.
[0068] Determine the blocking propagation contribution value of all predecessor tasks of the task to be assigned. If there are no predecessor tasks (i.e., the contribution value set is empty), directly set the cumulative semantic pressure of the predecessor chain to zero. If there is at least one predecessor task, find the maximum value among all contribution values and use this maximum value as the cumulative semantic pressure of the predecessor chain. The maximum value can accurately reflect the constraint of the weakest link in the dependency chain on the current task, ensuring that the most dangerous propagation path is given priority during assignment.
[0069] Then, the sum of the remaining processing time of all tasks in the current task queue of the candidate agent is obtained from the real-time monitoring module. Simultaneously, based on the agent's historical execution data, the median number of tasks completed per unit time when executing tasks alone in an idle state is calculated as the agent's standard processing capacity. The sum of remaining processing time is divided by a preset time window constant (e.g., 1 minute) to obtain a load ratio. If the ratio is greater than 1, it is truncated to 1; if the ratio is less than 0, it is set to 0. This load ratio reflects the proportion of the agent's currently occupied processing time relative to the time window, and its value ranges from 0 to 1. If the preset time window constant is zero or cannot be obtained, the agent's load factor is set to a preset default value (e.g., 0.5). Through the above processing, the load factor's value range is limited to between 0 and 1, where 0 indicates the agent is completely idle, and 1 indicates the agent is fully loaded or overloaded.
[0070] Finally, the cumulative semantic pressure of the preceding chain of the task to be assigned, the exclusivity index of the candidate agent, and the load factor of the candidate agent are obtained. The three are multiplied together to obtain the semantic blocking risk index of the combination of the task to be assigned and the candidate agent.
[0071] S106. Adjust the observation window length of the time series prediction model according to the global semantic blocking risk index, and use the adjusted time series prediction model to predict the future load and blocking trend of each agent. Assign tasks according to the prediction results and the semantic blocking risk index.
[0072] Among them, the global semantic blocking risk index is the maximum value of the semantic blocking risk index of all candidate agents at the current moment.
[0073] Optionally, a global semantic blocking risk index is first determined based on the semantic blocking risk index of all combinations of tasks to be assigned and candidate agents at the current moment. The global index reflects the highest blocking risk level currently faced by the system. Based on the magnitude of the global index, the length of the historical data observation window used by the time-series prediction model is dynamically adjusted: when the global risk is high, the observation window is shortened to improve the model's response speed to changes in load and blocking trends; when the global risk is low, the observation window is extended to enhance the stability of the prediction and reduce the interference of random noise. Next, the time-series prediction model is updated or retrained using historical load time-series data within the adjusted window, and this model is used to predict the load change trend, the probability of blocking, and the estimated time required to execute a new task for each agent in future time periods. During the task allocation phase, each task to be assigned and its candidate agents are comprehensively evaluated, taking into account both the semantic blocking risk index and the predicted load and blocking trends. For tasks with high semantic blocking intensity and significant accumulated semantic pressure on the preceding chain, priority is given to agents with low exclusivity index, low current load factor, and predicted stable future load with low blocking probability. For ordinary tasks, allocation is carried out according to the load balancing principle to avoid excessive resource concentration. At the same time, the execution priority of each task queue and the task allocation sequence can be dynamically adjusted based on the prediction results to avoid potential blocking nodes in advance.
[0074] Furthermore, in some embodiments of the present invention, adjusting the observation window length of the time-series prediction model according to the global semantic blocking risk index includes: obtaining the semantic blocking risk index of all tasks waiting to be assigned and their respective candidate agents at the current time; determining the maximum value among the semantic blocking risk indices as the global semantic blocking risk index; establishing a dynamic mapping relationship between the global semantic blocking risk index and the observation window length of the time-series prediction model; the dynamic mapping relationship causes the observation window length to decrease when the value of the global semantic blocking risk index increases, and the observation window length to increase when the value of the global semantic blocking risk index decreases; and adjusting the observation window length of the time-series prediction model according to the dynamic mapping relationship.
[0075] Specifically, upon each allocation decision trigger, all currently unassigned and ready tasks are traversed. For each task, all candidate agents (determined by the large model based on capability matching results) are then traversed. The risk index for each task and each candidate agent combination is calculated sequentially, and all calculation results are aggregated to form a risk index set. The element with the largest value in the risk index set is searched. If the set is not empty, this maximum value is used as the global semantic blocking risk index at the current moment; if the set is empty (e.g., there are no tasks to be assigned or no available candidate agents), the global semantic blocking risk index is set to zero.
[0076] A predefined mapping function or table maps the global semantic blocking risk index, which ranges from [0, 1], to the available observation window length (e.g., in seconds or the number of historical data points) for the time series prediction model. This mapping relationship follows a monotonically decreasing principle: when the global risk index approaches 0 (the system is very safe), the observation window length is set to a larger default value (e.g., 3600 seconds or 100 historical points) to enhance the smoothness and stability of the prediction; when the global risk index approaches 1 (the system has an extremely high blocking risk), the observation window length is set to a smaller default value (e.g., 300 seconds or 10 historical points) to improve the model's sensitivity to load and blocking mutations.
[0077] For example, the mapping relationship can be implemented using a linear function (such as window length = ab × global risk index), a piecewise function (such as setting multiple risk levels corresponding to different window lengths), or a lookup table. The mapping relationship is configured based on operation and maintenance experience, and this invention does not limit it.
[0078] After calculating the global semantic blocking risk index at the current moment, the index value is substituted into the mapping relationship to calculate the corresponding target observation window length. If the target length differs from the window length currently being used by the time series prediction model, the system updates the model configuration so that it uses only historical load time series data within the most recent target length in the next prediction. The update operation includes: truncating or expanding the data buffer maintained internally by the model, and recalculating or warm-starting the model parameters. If the target length is the same as the current length, the existing configuration remains unchanged.
[0079] Furthermore, in some embodiments of the present invention, the adjusted time-series prediction model is used to predict the future load and blocking trend of each agent, including: weighting the residual correction stage of the time-series prediction model according to the semantic blocking risk index of each task to be assigned and candidate agent combination, and constructing an improved time-series prediction model that integrates semantic features; predicting the historical load time-series data of each agent according to the improved time-series prediction model, and determining the load change trend, blocking probability and task execution duration of each agent in the future time period.
[0080] Specifically, based on the standard time series prediction model, a semantic blocking risk index is introduced as a dynamic weighting factor to adjust the residual correction stage of the model. For each agent, its historical load time series data (e.g., task queue length, task arrival rate, processing completion rate, etc.) is maintained. Before each prediction, the semantic blocking risk index of all combinations of tasks to be assigned and candidate agents related to that agent is obtained, and a comprehensive weight is calculated (e.g., taking the maximum value or a weighted average). This weight is used to adjust the coefficient of the residual correction term in the time series prediction model: when the comprehensive weight is large, the weight of residual correction is increased, making the model pay more attention to the prediction error of the most recent period, thereby quickly capturing load mutations caused by blocking risk; when the comprehensive weight is small, the weight of residual correction is decreased, making the model maintain its dependence on long-term trends and suppressing the interference of random noise.
[0081] An improved temporal prediction model is used to fit and predict the historical workload time series data of each agent. During the prediction phase, the model takes load data from several past time points (e.g., queue length sequences, task processing rate sequences) as input and outputs multi-dimensional prediction results for a preset future time period. The load change trend characterizes the evolution direction of the agent's expected workload in the future period, which can be represented as the slope of the prediction sequence (positive values indicate increasing load, negative values indicate decreasing load, and near zero indicates stability). The blocking probability represents the likelihood that the agent will experience task blocking (e.g., task waiting time exceeding a preset threshold, queue overflow, or dependency timeout) in the future period under the current semantic risk environment and predicted load conditions. This probability can be obtained through the confidence interval of the model output or a calibration function trained based on historical blocking events (e.g., using the Sigmoid function to map the predicted load change trend). The task execution time represents the expected completion time if a new task is assigned to the agent. This predicted value integrates the agent's standard processing capacity, current load level, predicted load change trend, and semantic blocking risk index.
[0082] For example, taking the ARIMA time series forecasting model as an example, let the model prediction residual be... The overall weight is denoted as The weighted adjusted residual correction term is: The updated predicted value equals the original predicted value plus... .when When the residual is large, the residual correction is enhanced, and the model pays more attention to recent errors; when When the residuals are smaller, the residual correction is weakened, and the model relies more on long-term trends.
[0083] Furthermore, in some embodiments of the present invention, task allocation based on prediction results and semantic blocking risk index includes: allocating tasks that meet preset conditions to candidate agents whose exclusivity index is less than a third preset threshold, whose load factor is less than a fourth preset threshold, and whose blocking probability is less than a fifth preset threshold; the preset conditions are that the semantic blocking intensity is greater than a first preset threshold and the cumulative semantic pressure of the preceding chain is greater than a second preset threshold; allocating tasks that do not meet the preset conditions to candidate agents according to the load balancing principle; and adjusting the execution priority and allocation sequence of each task queue according to the predicted load and blocking trends.
[0084] Specifically, a first, second, third, fourth, and fifth preset threshold are pre-set or dynamically configured. These thresholds can be empirical values, obtained through offline experimental calibration, or adaptively adjusted. For each task to be assigned, the system first determines whether it meets preset conditions: whether the semantic blocking intensity of the task is greater than the first preset threshold, and whether the cumulative semantic pressure of its preceding chains is greater than the second preset threshold. If both conditions are met, the task is considered a high-risk task, requiring a special allocation strategy. For high-risk tasks, the system selects agents from its candidate agent set that simultaneously meet the following three conditions: an exclusivity index less than the third preset threshold, a load factor less than the fourth preset threshold, and a blocking probability less than the fifth preset threshold. If at least one candidate agent meets the conditions, one (e.g., the one with the smallest load factor or the smallest exclusivity index) is selected for allocation. If no candidate agent meets the conditions, the task is temporarily suspended and awaits the next round of allocation.
[0085] For example, the first preset threshold (semantic blocking strength threshold) can be set to 0.6. This threshold is used to determine whether the semantic description of a task belongs to a high-risk blocking task. When the semantic blocking strength of a task is greater than 0.6, the task is considered to have a high inherent blocking risk at the semantic level and should be preferentially assigned to agents with low exclusivity and low load.
[0086] The second preset threshold (cumulative semantic pressure threshold of the preceding chain) can be set to 0.5. This threshold is used to determine whether a task is under high upstream dependency pressure. When the cumulative semantic pressure of the preceding chain (i.e., the maximum value among the blocking propagation contributions of all preceding tasks) is greater than 0.5, it indicates that there is a significant blocking propagation threat in the upstream dependency chain of the task, and the task is likely to become a propagation node of cascading blocking.
[0087] The third preset threshold (exclusivity index threshold): can be set to 0.3. This threshold is used to filter agents suitable for undertaking high-risk tasks. Agents with an exclusivity index less than 0.3 are considered low-exclusivity agents, capable of processing multiple tasks in parallel well, and are less likely to cause blocking amplification even when assigned to high-risk tasks.
[0088] The fourth preset threshold (load factor threshold) can be set to 0.4. This threshold is used to filter agents with sufficient current resources. A load factor less than 0.4 indicates that the total remaining processing time of the agent's current task queue is less than 40% of the product of its standard processing capacity and the time window constant, indicating that it is in a relatively idle state and is suitable for undertaking high-risk tasks to avoid additional delays caused by queuing.
[0089] The fifth preset threshold (blocking probability threshold) can be set to 0.3. This threshold is used to filter agents that are relatively safe in the prediction state. A blocking probability (e.g., the blocking probability output by the improved time series prediction model) of less than 0.3 indicates that even if the agent is unlikely to be blocked during the prediction period, assigning high-risk tasks to the agent will not easily trigger new blocking events.
[0090] It should be noted that the above threshold values are merely an exemplary configuration. In practical applications, the system can adaptively adjust these values based on specific business scenarios, requirements for blocking sensitivity, and statistical analysis of historical operational data. For example, in highly sensitive scenarios, the thresholds can be appropriately lowered to more rigorously screen tasks and agents; in resource-constrained or high-task scenarios, the thresholds can be appropriately increased to avoid frequent triggering of special allocation strategies that could lead to allocation delays. Those skilled in the art can flexibly configure the above thresholds according to actual needs, and this invention does not limit this configuration.
[0091] For ordinary tasks that do not meet the preset conditions, a load balancing strategy is used for allocation. Specifically, the current load factor of each candidate agent is calculated, and the task is assigned to the agent with the smallest load factor. If multiple agents have the same or similar load factors, their exclusivity index or predicted future load trend is further compared, and the agent with the smallest overall load is selected. The load balancing principle aims to avoid concentrating ordinary tasks on a few agents, which would lead to new load imbalances. At the same time, since the blocking risk of ordinary tasks is not high, there is no need to deliberately avoid agents with strong exclusivity.
[0092] Finally, based on the load change trend, blocking probability, and task execution duration of each agent in the future time period, the execution priority of each agent's task queue and the allocation sequence of subsequent tasks are adjusted. Specifically, if an agent predicts a sharp increase in future load and a high blocking probability, the priority of subsequent tasks assigned to that agent can be reduced, even if its current load factor has not exceeded the threshold; at the same time, the execution priority of existing tasks in the agent's task queue can be increased (e.g., by adjusting the operating system thread priority or resource allocation weight) to accelerate the processing of current backlogged tasks. If an agent predicts a stable future load and a very low blocking probability, the priority of new tasks assigned to that agent can be appropriately increased to fully utilize its idle resources.
[0093] On the other hand, the present invention also provides a multi-agent cooperative task allocation system based on a large model, the system comprising: a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the steps of any of the methods described above.
[0094] It should be noted that the order of the above embodiments of the present invention is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0095] The various embodiments in this specification are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on describing the differences from other embodiments.
Claims
1. A multi-agent cooperative task allocation method based on a large model, characterized in that, The method includes: Obtain the task description text, task dependency graph, agent capability description text, semantic embedding vector of the task, and historical execution data and real-time load data of the agent from the output of the large model; Based on the task description text, determine the semantic blocking strength of each task; Based on the agent's capability description text and the agent's historical execution data, determine the exclusivity index of each agent; Based on the task dependency graph, the semantic embedding vector of the task, and the semantic blocking strength of the task, determine the semantic transmission capability of each task when it is used as a preceding task. Based on the semantic blocking strength and semantic transmission capability of all the preceding tasks of the task to be assigned, the exclusivity index of the candidate agent, and the real-time load data of the candidate agent, a semantic blocking risk index of the combination of the task to be assigned and the candidate agent is constructed. The observation window length of the time-series prediction model is adjusted according to the global semantic blocking risk index, and the future load and blocking trend of each agent is predicted using the adjusted time-series prediction model. Tasks are assigned based on the prediction results and the semantic blocking risk index. The global semantic blocking risk index is the maximum value among the semantic blocking risk indices of all candidate agents at the current time.
2. The multi-agent cooperative task allocation method based on a large model according to claim 1, characterized in that, The step of determining the semantic blocking strength of each task based on the task description text includes: After segmenting and filtering stop words in the description text of each task, the word frequency is counted to construct the word frequency vector for each task. Based on a preset clustering algorithm, all word frequency vectors are clustered to determine multiple semantic clusters, and the sum of squares within each semantic cluster is calculated. In the description text of each task, continuous blocking semantic units are divided according to the pre-defined rules for the continuous occurrence of blocking words. Based on the frequency of occurrence of preset blocking words in continuous blocking semantic units and the co-occurrence relationship between different preset blocking words, the uniformity of blocking influence of blocking words in continuous blocking semantic units is determined. The semantic blocking strength of each task is determined by weighted fusion calculation based on the intra-cluster sum of squares of each semantic cluster, the uniformity of blocking influence of each consecutive blocking semantic unit, and the number of preset blocking words contained in each consecutive blocking semantic unit.
3. The multi-agent cooperative task allocation method based on a large model according to claim 1, characterized in that, The step of determining the exclusivity index of each agent based on the agent's capability description text and the agent's historical execution data includes: The frequency of pre-defined exclusive words in the ability description text of each agent is counted, and the exclusiveness tendency score of each agent is calculated by combining the total number of words in the description text. The parallel efficiency ratio is determined by the ratio of the first average response time when the agent performs a single task to the second average response time of each task when performing multiple tasks. A sliding window statistical analysis is performed on the parallel efficiency ratios corresponding to a preset number of historical parallel execution records to obtain the median and interquartile range. Based on the median and interquartile range, the current parallel efficiency ratio is truncated to determine the corrected parallel efficiency factor. After normalizing the exclusivity tendency score and the corrected parallel efficiency factor, the exclusivity index of each agent is calculated and determined according to the preset fusion rule.
4. The multi-agent cooperative task allocation method based on a large model according to claim 1, characterized in that, The step of determining the semantic delivery capability of each task as a preceding task based on the task dependency graph, the semantic embedding vector of the task, and the semantic blocking strength of the task includes: Obtain all downstream descendant tasks of the target task from the task dependency graph; the downstream descendant task refers to any task that can be reached from the target task along the dependency direction; the target task is any one of the tasks in each of the aforementioned tasks. Obtain the semantic embedding vector of the target task, and calculate the similarity between the semantic embedding vector of the target task and the semantic embedding vector of each downstream descendant task; Based on the shortest path length from the target task to each downstream descendant task, a depth decay is applied to the similarity to obtain the weakened semantic dependency strength; the decay coefficient of the depth decay is determined based on the correlation between the semantic similarity of directly dependent tasks and the execution time interval in historical data. Based on the weakened semantic dependency strength of all downstream descendant tasks, the intensity distribution features are determined, and the basic transitive capability is constructed based on the intensity distribution features. Based on the semantic blocking strength of all downstream descendant tasks and the global maximum semantic blocking strength, the basic transmission capability is weighted and corrected to determine the semantic transmission capability when the target task is used as a preceding task. Determine the semantic transferability of each task when it is used as a prerequisite task.
5. The multi-agent cooperative task allocation method based on a large model according to claim 1, characterized in that, The semantic blocking risk index of the combination of the task to be assigned and the candidate agent is constructed based on the semantic blocking strength and semantic transmission capability of all the preceding tasks of the task to be assigned, the exclusivity index of the candidate agent, and the real-time load data of the candidate agent. This includes: Obtain all the preceding tasks of the task to be assigned, and determine the blocking propagation contribution value of each preceding task based on its semantic transit capability and semantic blocking strength. The maximum value among the blocking propagation contribution values is determined as the cumulative semantic pressure of the front chain; The load factor is determined based on the total remaining processing time in the real-time load data of the candidate agents; The semantic blocking risk index is determined based on the cumulative semantic pressure of the preceding chain, the exclusivity index of the candidate agent, and the load factor.
6. The multi-agent cooperative task allocation method based on a large model according to claim 1, characterized in that, The observation window length of the time series prediction model is adjusted based on the global semantic blocking risk index, including: Obtain the semantic blocking risk index of all tasks waiting to be assigned at the current moment and their respective candidate agent combinations; The maximum value among the semantic blocking risk indices is determined as the global semantic blocking risk index; A dynamic mapping relationship is established between the global semantic blocking risk index and the observation window length of the time series prediction model; the dynamic mapping relationship makes the observation window length decrease when the value of the global semantic blocking risk index increases, and the observation window length increase when the value of the global semantic blocking risk index decreases. The observation window length of the time series prediction model is adjusted according to the dynamic mapping relationship.
7. The multi-agent cooperative task allocation method based on a large model according to claim 1, characterized in that, The adjusted time-series prediction model is used to predict the future load and congestion trends of each agent, including: Based on the semantic blocking risk index of each task to be assigned and candidate agent combination, the residual correction link of the time series prediction model is weighted and adjusted to construct an improved time series prediction model that integrates semantic features. Based on the improved time-series prediction model, the historical load time-series data of each agent is predicted to determine the load change trend, blocking probability, and task execution time of each agent in the future period.
8. The multi-agent cooperative task allocation method based on a large model according to claim 7, characterized in that, Tasks are assigned based on prediction results and a semantic blocking risk index, including: Tasks that meet preset conditions are assigned to candidate agents whose exclusivity index is less than the third preset threshold, whose load factor is less than the fourth preset threshold, and whose blocking probability is less than the fifth preset threshold; the preset conditions are that the semantic blocking intensity is greater than the first preset threshold and the cumulative semantic pressure of the preceding chain is greater than the second preset threshold. Tasks that do not meet the preset conditions will be distributed to candidate agents according to the load balancing principle. Based on the predicted load and congestion trends, adjust the execution priority and allocation sequence of each task queue.
9. The multi-agent cooperative task allocation method based on a large model according to claim 1, characterized in that, The acquisition of the task description text, task dependency graph, agent capability description text, semantic embedding vector of the task, and historical execution data and real-time load data of the agent from the large model output includes: Extract the task description text, the task dependency graph, the agent capability description text, and the semantic embedding vector of the task from the large model parsing output; Extract historical task execution sequence, completion status, blocking events, and concurrent execution performance metrics of intelligent agents from the system operation logs; Obtain the task queue length and remaining processing time for each agent from the real-time monitoring module.
10. A multi-agent cooperative task allocation system based on a large model, the system comprising: A memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method as claimed in any one of claims 1-9.