Skill tree and dynamic prompt word based agent task processing method and system
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI ZHIHE NETWORK TECH CO LTD
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
Smart Images

Figure CN122242563A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence and natural language processing, and in particular to an intelligent agent task processing method and system based on skill trees and dynamic prompts. Background Technology
[0002] When handling complex vertical domain tasks (such as legal assistance, financial analysis, and medical diagnosis), intelligent agent systems based on large language models typically employ a "ReAct" paradigm or a direct reasoning workflow. In existing system architectures, the agent's reasoning engine usually injects all available skills (including basic API tools, database query interfaces, and complex orchestration workflows) into the system prompts all at once in the form of a flat list.
[0003] As system capabilities expand and vertical domain depth increases, the flat architecture exhibits the following drawbacks. When the number of skills exceeds 10 or 20, the prompts containing all skill descriptions, input parameter validation rules, and usage scenario descriptions become drastically longer. This not only significantly increases the memory footprint of the KV cache and token consumption but also leads to "attention dilution" when large language models process extremely long texts. The model tends to overemphasize tools at the beginning or end of the prompts, forgetting the rules for tools in the middle, thus reducing the accuracy of inference and decision-making. Simultaneously, large language models need to make globally optimal choices from dozens of tool options with complex parameter rules in a single autoregressive inference. This explosive computational demand exceeds the planning capabilities of most current models, easily leading to tool selection errors or parameter illusions. At the code implementation level, adding any underlying tool requires extensive modifications to the descriptions and rules of the core inference prompts, and may even require modifications to the hard-coded logic of the backend parsing tool calls, resulting in a mixture of skill definitions, scheduling logic, and execution logic, and an exponential increase in system maintenance costs. Furthermore, for scenarios requiring the collaboration of multiple tools, existing systems rely excessively on large language models to plan steps and remember intermediate variables during multi-turn dialogue interactions. Due to the lack of standardized intermediate state management mechanisms and pipelined data transfer guarantees, intermediate steps often get stuck in infinite loops, or core identifiers extracted from previous steps are lost when finally generating the report. Summary of the Invention
[0004] The main objective of this invention is to provide an intelligent agent task processing method and system based on skill trees and dynamic prompts, aiming to solve the technical problems mentioned in the background art.
[0005] This invention proposes an agent task processing method based on skill trees and dynamic prompts, including: Receive a task instruction containing a user request, and initialize the agent context state in memory for the current session. The context state includes an independent skill tree navigation state machine. The navigation state machine records at least the current node identifier and the traversal path stack. Initially, the current node identifier points to the root node. Based on the current node identifier, obtain the set of child nodes directly belonging to the current node in the preset hierarchical skill tree. The hierarchical skill tree divides the system capability units into a multi-layer nested directed acyclic graph structure, and the number of child nodes in a single layer is limited by a threshold. The dynamic prompt word construction engine is triggered, and the basic system settings, the description information of the child node set, and the global format output rules are concatenated and assembled in memory into the inference prompt words for the current round. The inference prompt words are then sent to the large language model for single inference, and the selection result returned by the large language model is parsed. Based on the selection result, perform the following steps: If the selection result indicates a non-leaf node used as a classification container, then the current node identifier in the navigation state machine is updated to the selection result, the original node is pushed onto the traversal path stack, and the process is returned to obtain the set of child nodes. If the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, then the navigation loop is terminated, the corresponding computer execution logic is executed, and the result is output.
[0006] The present invention is further configured such that the step of obtaining the set of child nodes directly belonging to the current node in the preset hierarchical skill tree based on the current node identifier further includes: Only the prompt word fragment template corresponding to the current node level is loaded in memory, and the attributes and execution rules of other branch nodes in the unselected skill tree are completely hidden from the context construction text of the current round, so as to physically limit the length of the input context window of the large language model.
[0007] The present invention is further configured such that, if the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, the navigation loop is terminated, the corresponding computer execution logic is executed, and the result is output, the step of terminating the navigation loop, executing the corresponding computer execution logic, and outputting the result includes: Read the list of directed steps and their dependencies predefined in the background configuration file for the skill set; The underlying code-level skills are executed sequentially according to the preset topology order; Before scheduling and executing each skill, the expression parser extracts specific data fragments from the output memory buffer of the previous level or historical steps according to the predefined input mapping rules, and automatically assigns them as the input parameters of the current skill. In this process, the large language model does not need to participate in the intermediate data transfer. When the system detects an anomaly or timeout in the current execution step, it parses the preset error rollback field and triggers an alternative lightweight skill or execution path to continue execution.
[0008] The present invention is further configured such that the navigation state machine is also equipped with a backtracking error correction mechanism, which includes the following steps: If the selected result is found to be invalid or an execution trigger dead zone error occurs, the system will perform a pop operation on the traversal path stack to restore the current node identifier to the previous valid level. When re-constructing the inference prompt, negative constraint text is dynamically injected to avoid previous incorrect selections.
[0009] The present invention is further configured such that, after the step of parsing the selection result returned by the large language model, an intent penetration mechanism is also included, which includes the following steps: If the parsed selection result is a string containing multi-level path separators, then the string is identified as a complete leaf node identifier; The system's backend routing verification module uses hash matching to verify whether the complete leaf node identifier exists in the hierarchical skill tree topology. If the verification exists, the navigation state machine will directly jump across levels to the leaf node corresponding to the complete leaf node identifier.
[0010] The present invention is further configured such that the hierarchical skill tree is constructed as a directed acyclic graph that allows the same underlying execution skill or skill group to be attached to multiple different category nodes, and the number of child nodes under each non-leaf node is constrained to a preset threshold of 3 to 5.
[0011] This invention also provides an intelligent agent task processing system based on skill trees and dynamic prompts, including: The state initialization module is used to receive task instructions containing user requests and initialize the agent context state in memory for the current session. The context state includes an independent skill tree navigation state machine. The navigation state machine records at least the current node identifier and the traversal path stack. Initially, the current node identifier points to the root node. The child node acquisition module is used to acquire a set of child nodes that directly belong to the current node in a preset hierarchical skill tree based on the current node identifier. The hierarchical skill tree divides the system capability units into a multi-layer nested directed acyclic graph structure, and the number of child nodes in a single layer is limited by a threshold. The navigation control module is used to trigger the dynamic prompt word construction engine, which concatenates and assembles the basic system settings, the description information of the child node set, and the global format output rules in memory into the reasoning prompt words for the current round, and sends the reasoning prompt words to the large language model for single reasoning, and parses the selection result returned by the large language model. The execution module is used to perform the following steps based on the selection result: If the selection result indicates a non-leaf node used as a classification container, then the current node identifier in the navigation state machine is updated to the selection result, the original node is pushed onto the traversal path stack, and the process is returned to obtain the set of child nodes. If the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, then the navigation loop is terminated, the corresponding computer execution logic is executed, and the result is output.
[0012] The navigation control module also includes: The intent penetration unit is used to identify a string containing multi-level path separators as a complete leaf node identifier if the parsed selection result is such a string; and to verify the existence of the complete leaf node identifier in the hierarchical skill tree topology using hash matching through the routing verification module in the system background; if the verification shows that it exists, the navigation state machine is directly made to jump across levels to the leaf node corresponding to the complete leaf node identifier.
[0013] The present invention also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of an intelligent agent task processing method based on a skill tree and dynamic prompts.
[0014] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of an agent task processing method based on a skill tree and dynamic prompts.
[0015] The beneficial effects of this invention are as follows: By constructing a hierarchical skill tree in memory and maintaining an independent skill tree navigation state machine, this invention decomposes a single large-scale global decision into multiple small-scale local choices. This allows the large language model to make a choice from a limited number of child nodes (e.g., 3 to 5) at the current level each time, thereby reducing the decision dimension and context length of the large language model from the underlying architecture and avoiding attention dilution and parameter illusion caused by too many options. Furthermore, this invention uses a dynamic prompt word construction engine to load only the prompt word fragment template corresponding to the current node level in memory and completely hides the attributes and execution rules of other branch nodes in the unselected skill tree. This compresses the original 3000 to 5000 token prompt words to the 500 to 800 token range, physically limiting the input context window length of the large language model and reducing its computational overhead and first-word response time.
[0016] To address the state loss problem in multi-step complex tasks, this invention introduces a skill group execution engine. When navigating to a pre-arranged skill group node, the system takes over control, sequentially scheduling the execution of underlying code-level skills according to a preset topological order. An expression parser automatically extracts data fragments from the output memory cache of the previous level or historical steps as input parameters for the current skill, based on predefined input mapping rules. During this process, the large language model is completely uninvolved in intermediate data handling, eliminating identifier loss due to model forgetting or format errors. Simultaneously, this invention configures a backtracking error correction mechanism for the navigation state machine, automatically popping and backtracking when the navigation path fails, injecting negative constraints, and giving the system self-correcting capabilities to prevent the task from entering an infinite loop. Through an intent penetration mechanism, the large language model or pre-classifier can directly output complete leaf node paths and skip intermediate level model calls, accelerating response in simple task scenarios. Furthermore, this invention constructs the skill tree as a directed acyclic graph structure, allowing the same underlying execution skill or skill group to be attached to multiple different classification nodes, thereby avoiding redundancy and logical coupling in skill definitions and significantly improving the system's scalability and maintainability. In summary, this invention systematically solves the underlying defects of existing flat architectures, such as context bloat, excessive decision-making burden, poor scalability, and loss of multi-step task states, through the synergistic effect of multiple technical means, including hierarchical skill trees, dynamic prompt word construction engines, skill group execution engines, backtracking and error correction mechanisms, intent penetration mechanisms, and directed acyclic graph structures. This improves the reasoning accuracy, execution stability, scalability, and response efficiency of intelligent agent systems. Attached Figure Description
[0017] Figure 1 This is a schematic diagram of a method flow according to an embodiment of this application.
[0018] Figure 2This is a schematic diagram of the system structure according to an embodiment of this application.
[0019] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0020] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
[0021] like Figure 1 As shown, this application provides an agent task processing method based on skill trees and dynamic prompts, including: S1, Receive a task instruction containing a user request, and initialize the agent context state for the current session in memory. The context state includes an independent skill tree navigation state machine. The navigation state machine records at least the current node identifier and the traversal path stack. Initially, the current node identifier points to the root node. S2, based on the current node identifier, obtain the set of child nodes directly belonging to the current node in the preset hierarchical skill tree. The hierarchical skill tree divides the system capability units into a multi-layer nested directed acyclic graph structure, and the number of child nodes in a single layer is limited by a threshold. S3, trigger the dynamic prompt word construction engine, and concatenate the basic system settings, the description information of the child node set, and the global format output rules in memory to form the inference prompt word for the current round, and send the inference prompt word to the large language model for single inference, and parse the selection result returned by the large language model; S4, based on the selection result, perform the following steps: if the selection result indicates a non-leaf node used as a classification container, update the current node identifier in the navigation state machine to the selection result, push the original node onto the traversal path stack, and return to step S2; if the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, terminate the navigation loop, execute the corresponding computer execution logic, and output the result.
[0022] As described in steps S1-S4 above, this invention constructs a hierarchical skill tree and maintains an independent skill tree navigation state machine in memory. This decomposes a single large-scale global decision into multiple small-scale local choices, allowing the large language model to make choices from a limited number of child nodes at the current level each time. This reduces the decision dimensionality and context length of the large language model from the underlying architecture. By dynamically concatenating concise reasoning prompts containing only information about the current level's child nodes in memory, the computational overhead and first-word response time of the large language model are significantly reduced. By recording the path stack through the state machine and cyclically returning at non-leaf nodes, a hierarchical progressive navigation decision is achieved. This effectively solves the technical problems of context bloat, excessive decision burden, poor scalability, and loss of multi-step task states caused by the flat architecture in existing technologies.
[0023] In one embodiment of the present invention, step S2 further includes: loading only the prompt word fragment template corresponding to the current node level in memory, and completely masking the attributes and execution rules of other branch nodes in the unselected skill tree from the context construction text of the current round, so as to physically limit the length of the input context window of the large language model.
[0024] As described in step S2 above, this embodiment further defines the slicing mechanism of the dynamic prompt word construction engine. Specifically, during the prompt word assembly process in step S3, the system executes the following sub-steps: First, the system loads the basic system role settings and global security boundary instructions from persistent storage. This part is pre-trimmed and controlled to within 200 tokens, serving as the common base for each round of reasoning.
[0025] Then, the system reads the current node identifier from the current navigation state machine. If the identifier is the root node, an additional lightweight intent recognition rule text is loaded, which contains 3 to 5 few-sample examples to guide the large language model to make the correct classification decision at the top level; if the current node is a deep classification node, the intent recognition text is skipped, and only the prompt word fragment template corresponding to the node level is loaded.
[0026] Next, the system traverses the skill tree data structure in memory, which is stored as an inverted tree-like directed acyclic graph. Using a breadth-first search algorithm, the system extracts only the information of first-level child nodes that have a direct parent-child relationship with the current node. For each extracted child node, the system reads its name and a very brief description strictly limited to 30 characters. For all other branches outside the current node—including its sibling branches, uncle branches, and deeper, unrelated child nodes—the system completely ignores all their attributes, descriptions, input validation rules, or execution conditions. In other words, when constructing the memory view of the inference prompt for the current round, information from these irrelevant branches is physically and completely masked and does not exist in the context sent to the large language model.
[0027] Finally, the system assembles the extracted child node information into a hierarchical menu string with numerical identifiers, for example: 1. Legal Analysis - Retrieve relevant laws, regulations, and judicial interpretations; 2. Case Analysis - Find similar case judgments; 3. Comprehensive Legal Research - Integrate regulations and cases to generate analytical opinions. Following this menu string, the system appends a strict JSON output format constraint, requiring the large language model's return results to include two fields: "thought" (reasoning process) and "selected_node" (selected node ID).
[0028] After processing by the aforementioned slicing and assembly algorithm, the total length of the inference prompts sent to the large language model each time is stably controlled between 500 and 800 tokens, which is only one-sixth to one-tenth of that of the existing flat architecture (typically 3000 to 5000 tokens). This design, which physically limits the length of the input context window, not only significantly reduces the computational overhead of the GPU for matrix operations on long sequences and significantly shortens the first-word response time, but also fundamentally eliminates the "attention dilution" phenomenon—because the large language model cannot see information from other irrelevant branches, its attention mechanism can only focus on a limited number of options at the current level, thereby greatly improving the accuracy of node selection.
[0029] In one embodiment of the present invention, the step of terminating the navigation loop, executing the corresponding computer execution logic, and outputting the result if the selection result indicates a leaf node, the leaf node corresponding to a single underlying execution skill or a pre-arranged multi-step skill group, includes: S41, Read the list of directed steps and their dependencies predefined in the background configuration file of the skill group; S42, executes underlying code-level skills sequentially according to a preset topology order; S43, before scheduling the execution of each skill, uses an expression parser to extract specific data fragments from the output memory buffer of the previous level or historical steps according to predefined input mapping rules, and automatically assigns them as the input parameters of the current skill. In this process, no large language model is required to participate in the intermediate data transfer. S44. When the system detects an abnormality or timeout response in the current execution step, it parses the preset error rollback field and triggers an alternative lightweight skill or execution path to continue execution.
[0030] As described in steps S41-S44 above, this embodiment details the deterministic execution process after the skill group execution engine takes over. This process separates the execution of multi-step composite tasks from the uncontrollable reasoning of the large language model, transforming it into an automated pipeline fully controlled by the system.
[0031] In step S41, the system reads the corresponding execution definition from the background configuration file based on the skill group identifier bound to the leaf node. This definition contains at least the following three parts: first, a list of directed steps arranged in execution order, with each step specifying the underlying skill identifier to be invoked; second, input mapping rules, defining which historical steps' outputs the input parameters for the current step should be extracted from; and third, an error rollback strategy, specifying the backup processing path when a step fails.
[0032] In step S42, the system executes each step sequentially according to the topological order of the directed step list. Before executing each step, the system first checks whether the preconditions on which the step depends have been met. For example, step A declares that it requires document parsing results as input, and these results may have already been cached in the agent's context memory pool in a previous session (or even in the previous round of dialogue with the user). If the required data is detected to already exist, the system triggers short-circuit execution logic, directly skipping the time-consuming precondition skill calls, reading the existing data from the memory cache, and continuing with subsequent steps. This design avoids repeatedly executing the same high-cost operations, significantly improving execution efficiency in multi-turn dialogues or continuous task scenarios.
[0033] In step S43, the system utilizes a built-in expression parser (e.g., a parser supporting JSONPath syntax) to automatically transfer data between skills. Specifically, each skill defines its own input parameter mapping rule in a configuration file. This rule is an expression, such as "$.step1_output.data.doc_ids". When this skill is executed, the expression parser uses the output JSON object generated in the previous step as context, precisely extracts the required data fragments (e.g., an array of document IDs) based on the expression, and then automatically encapsulates them as the input parameters for that skill. The entire parameter extraction and assignment process is completed entirely in the system backend; the large language model does not participate in any intermediate data handling, conversion, or memorization. Because the output of each step is structured into a JSON object with clear field names and paths, and the mapping rule uses standardized path expressions, the system can guarantee that in any complex multi-step task, key identifiers (such as document IDs, case numbers, legal citations, etc.) can be passed to subsequent steps without errors, completely solving the parameter loss problem caused by the large language model forgetting or formatting errors in existing solutions.
[0034] In step S44, the system equips the skill group execution with a robust fault tolerance mechanism. When executing a certain step, if the called underlying API returns a timeout error, an internal server error, or there is no response for an extended period, the system will not directly throw the exception to the upper layer, interrupting the entire task. Instead, the system will read the "fallback" field in the skill group configuration. For example, the fallback field of a "Comprehensive Legal Research" skill group can be configured as "General Answer Engine". When the in-depth research report generation step fails, the system automatically downgrades to calling a lightweight general question-and-answer skill and extracts key evidence fragments obtained from the output cache of previously successfully executed steps as additional context for the downgraded response. In this way, even if some external services are unavailable, the entire agent system can still provide a meaningful fallback response, ensuring the system's high availability.
[0035] In one embodiment of the present invention, the navigation state machine is further configured with a backtracking error correction mechanism, which includes the following steps: S51, if the selection result is found to be invalid or an execution trigger dead zone error is detected, the system restores the current node identifier to the previous valid level by performing a pop operation on the traversal path stack; S52, when re-executing step S3 to construct the reasoning prompt, negative constraint text for avoiding previous incorrect selection is dynamically injected.
[0036] As described in steps S51-S52 above, this embodiment discloses the self-correction capability of the navigation state machine. In the prior art, once a large language model makes an incorrect choice at a certain step (for example, selecting a non-existent child node, or selecting a skill that exists but cannot be executed later), it often leads to the entire task getting stuck in a dead end, and the model has difficulty realizing the error on its own and actively backtracking.
[0037] In this invention, when step S4 is completed and the next loop begins, the system verifies the selection result returned by the large language model. The verification rules include: whether the node ID exists in the set of child nodes of the current node; if it exists, whether the node is a non-leaf node or a leaf node; and whether the current system resources meet the prerequisites required to execute the node (e.g., some skills require the user to be logged in or have completed prior authorization). If any verification fails, the system determines that the selection result is invalid or results in a dead zone error.
[0038] At this point, the navigation state machine performs automatic backtracking: it pops the current node from the current traversal path stack, restoring the current node's identifier to the parent node. Simultaneously, the system records the ID of the invalid node selection. When the system re-executes step S3 to construct inference prompts for a new round, it dynamically adds a negative constraint text to the end of the prompt, such as: "Note: The node 'XX' selected in the previous round is unavailable. Please do not select this node again and choose from the remaining options." This negative constraint is injected into the prompt in natural language, guiding the large language model to avoid previously identified erroneous paths during re-inference.
[0039] This backtracking and error-correction mechanism endows the system with intelligent behavior similar to humans' "trial and error-back and reselection," greatly improving the robustness of navigation for complex tasks. Even if the large language model occasionally makes a reasoning error, the system can automatically recover and guide the model to the correct path without causing the entire session to crash or fall into an infinite loop.
[0040] In one embodiment of the present invention, after the step of parsing the selection result returned by the large language model, an intent-penetration mechanism is further included, which includes the following steps: S61, if the parsed selection result is a string containing multi-level path separators, then the string is identified as a complete leaf node identifier; S62, through the routing verification module in the system backend, use hash matching to verify whether the complete leaf node identifier exists in the hierarchical skill tree topology; S63, if the verification exists, the navigation state machine is directly transitioned across levels to the leaf node corresponding to the complete leaf node identifier, skipping the loop return operation based on the non-leaf node in step S4 and the subsequent intermediate level prompt word assembly and model inference call, and directly entering step S4 to execute the computer execution logic corresponding to the leaf node.
[0041] As described in steps S61-S63 above, this embodiment discloses an acceleration mechanism, namely intent penetration. In the default layer-by-layer exploration mode, the system needs multiple rounds of interaction with the LLM to navigate step by step from the root node to the target leaf node. However, for some high-frequency, simple, and clearly defined user requests, this layer-by-layer navigation is redundant.
[0042] In this embodiment, the system supports the large language model or pre-classifier to output a complete leaf node path string containing multi-level path separators (such as periods ".") at once, for example, "professional_analysis.legal_research". When step S3 parses the selection result returned by the LLM, the system first checks whether the string contains multi-level path separators. If it does, it is treated as a complete node identifier and handed over to the background route verification module for processing.
[0043] The route verification module internally maintains a hash mapping table from complete path strings to leaf node pointers, which is pre-built based on the skill tree topology at system startup. During verification, the system performs a hash lookup with O(1) time complexity using the path string as the key. If a corresponding leaf node is found, the path is considered valid.
[0044] Once verification is successful, the navigation state machine will be controlled to directly jump to the leaf node, without going through any intermediate category nodes. Specifically, the system will not update the current node identifier to any of the intermediate category nodes, nor will it execute steps S2 and S3 for the intermediate levels (i.e., it will not re-acquire the child node set, rebuild the prompt words, or re-call the LLM). Instead, it will directly treat the leaf node as the reached target and then enter step S4 to execute the computer execution logic corresponding to the leaf node.
[0045] This intent-penetration mechanism is particularly effective in two scenarios: first, when the user's question is very clear, and the LLM directly provides the complete path in the first round of inference; second, when a lightweight classification model (such as BERT) is placed before the system gateway layer, this model does not need to perform complex multi-round inference, but directly maps the user's text classification to the complete path of a certain skill group. Through intent-penetration, the system can skip multiple intermediate rounds of prompt word assembly and LLM network calls, compressing the response time from seconds to milliseconds, significantly improving the execution efficiency of simple tasks.
[0046] In one embodiment of the present invention, the hierarchical skill tree is constructed as a directed acyclic graph that allows the same underlying execution skill or skill group to be attached to multiple different category nodes, and the number of child nodes under each non-leaf node is constrained to a preset threshold of 3 to 5.
[0047] This embodiment further defines the data structure of the skill tree. In traditional tree structures, each node has only one parent node, which requires that each lower-level skill can only choose one upper-level category when classifying skills, leading to redundancy in skill definitions and difficulties in reuse.
[0048] In this invention, the skill tree is constructed as a directed acyclic graph (DAG). This means that a bottom-level skill or skill group (leaf node) can be referenced by multiple different parent nodes (category nodes). For example, a general "document reading" skill can be attached to both the "legal document processing" and "financial statement analysis" categories without needing to copy two definitions. This design significantly reduces the redundancy of skill definitions, allowing the system to add new vertical domains without repeatedly writing basic tool logic; simply establishing new attachment relationships within the skill tree is sufficient.
[0049] Meanwhile, to ensure a sufficiently light selection burden for the large language model in each round of navigation, this embodiment sets a preset threshold for the number of child nodes under each non-leaf node, preferably 3 to 5. This threshold is based on the "7±2" rule in cognitive psychology, combined with the attention distribution characteristics of the large language model in multi-select scenarios. Experiments show that when the number of options presented in a single session exceeds 5, the model's selection error rate begins to rise significantly; while when it is less than 3, it may lead to overly detailed classifications and excessively deep navigation levels, increasing the total number of rounds. Therefore, 3 to 5 is an optimal balance point. System designers should follow this threshold constraint when writing skill tree configuration files. If the number of child nodes under a certain category node exceeds 5, further subdivision of the category should be considered, adding intermediate levels.
[0050] like Figure 2 As shown, the present invention also provides an intelligent agent task processing system based on skill trees and dynamic prompts, including: The state initialization module is used to receive task instructions containing user requests and initialize the agent context state in memory for the current session. The context state includes an independent skill tree navigation state machine. The navigation state machine records at least the current node identifier and the traversal path stack. Initially, the current node identifier points to the root node. The child node acquisition module is used to acquire a set of child nodes that directly belong to the current node in a preset hierarchical skill tree based on the current node identifier. The hierarchical skill tree divides the system capability units into a multi-layer nested directed acyclic graph structure, and the number of child nodes in a single layer is limited by a threshold. The navigation control module is used to trigger the dynamic prompt word construction engine, which concatenates and assembles the basic system settings, the description information of the child node set, and the global format output rules in memory into the reasoning prompt words for the current round, and sends the reasoning prompt words to the large language model for single reasoning, and parses the selection result returned by the large language model. The execution module is used to perform the following steps based on the selection result: If the selection result indicates a non-leaf node used as a classification container, then the current node identifier in the navigation state machine is updated to the selection result, the original node is pushed onto the traversal path stack, and the process is returned to obtain the set of child nodes. If the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, then the navigation loop is terminated, the corresponding computer execution logic is executed, and the result is output.
[0051] The navigation control module also includes: The intent penetration unit is used to identify a string containing multi-level path separators as a complete leaf node identifier if the parsed selection result is such a string; and to verify the existence of the complete leaf node identifier in the hierarchical skill tree topology using hash matching through the routing verification module in the system background; if the verification shows that it exists, the navigation state machine is directly made to jump across levels to the leaf node corresponding to the complete leaf node identifier.
[0052] The present invention also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of an intelligent agent task processing method based on a skill tree and dynamic prompts.
[0053] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of an agent task processing method based on a skill tree and dynamic prompts.
[0054] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, apparatus, article, or method. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, apparatus, article, or method that includes that element.
[0055] The above description is merely a preferred embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.
Claims
1. An agent-based task processing method based on skill trees and dynamic prompts, characterized in that, include: Receive a task instruction containing a user request, and initialize the agent context state in memory for the current session. The context state includes an independent skill tree navigation state machine. The navigation state machine records at least the current node identifier and the traversal path stack. Initially, the current node identifier points to the root node. Based on the current node identifier, obtain the set of child nodes directly belonging to the current node in the preset hierarchical skill tree. The hierarchical skill tree divides the system capability units into a multi-layer nested directed acyclic graph structure, and the number of child nodes in a single layer is limited by a threshold. The dynamic prompt word construction engine is triggered, and the basic system settings, the description information of the child node set, and the global format output rules are concatenated and assembled in memory into the inference prompt words for the current round. The inference prompt words are then sent to the large language model for single inference, and the selection result returned by the large language model is parsed. Based on the selection result, perform the following steps: If the selection result indicates a non-leaf node used as a classification container, then the current node identifier in the navigation state machine is updated to the selection result, the original node is pushed onto the traversal path stack, and the process is returned to obtain the set of child nodes. If the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, then the navigation loop is terminated, the corresponding computer execution logic is executed, and the result is output.
2. The agent task processing method based on skill tree and dynamic prompts according to claim 1, characterized in that, The step of obtaining the set of child nodes directly belonging to the current node in the preset hierarchical skill tree based on the current node identifier further includes: Only the prompt word fragment template corresponding to the current node level is loaded in memory, and the attributes and execution rules of other branch nodes in the unselected skill tree are completely hidden from the context construction text of the current round, so as to physically limit the length of the input context window of the large language model.
3. The agent task processing method based on skill tree and dynamic prompts according to claim 1, characterized in that, If the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, then the steps of terminating the navigation loop, executing the corresponding computer execution logic, and outputting the result include: Read the list of directed steps and their dependencies predefined in the background configuration file for the skill set; The underlying code-level skills are executed sequentially according to the preset topology order; Before scheduling and executing each skill, the expression parser extracts specific data fragments from the output memory buffer of the previous level or historical steps according to the predefined input mapping rules, and automatically assigns them as the input parameters of the current skill. In this process, the large language model does not need to participate in the intermediate data transfer. When the system detects an anomaly or timeout in the current execution step, it parses the preset error rollback field and triggers an alternative lightweight skill or execution path to continue execution.
4. The agent task processing method based on skill tree and dynamic prompts according to claim 1, characterized in that, The navigation state machine is also equipped with a backtracking error correction mechanism, which includes the following steps: If the selected result is found to be invalid or an execution trigger dead zone error occurs, the system will perform a pop operation on the traversal path stack to restore the current node identifier to the previous valid level. When re-constructing the inference prompt, negative constraint text is dynamically injected to avoid previous incorrect selections.
5. The agent task processing method based on skill tree and dynamic prompts according to claim 1, characterized in that, Following the step of parsing the selection results returned by the large language model, an intent-penetration mechanism is also included, which comprises the following steps: If the parsed selection result is a string containing multi-level path separators, then the string is identified as a complete leaf node identifier; The system's backend routing verification module uses hash matching to verify whether the complete leaf node identifier exists in the hierarchical skill tree topology. If the verification exists, the navigation state machine will directly jump across levels to the leaf node corresponding to the complete leaf node identifier.
6. The agent task processing method based on skill tree and dynamic prompts according to claim 1, characterized in that, The hierarchical skill tree is constructed as a directed acyclic graph that allows the same underlying execution skill or skill group to be attached to multiple different category nodes, and the number of child nodes under each non-leaf node is constrained to a preset threshold of 3 to 5.
7. An intelligent agent task processing system based on skill trees and dynamic prompts, characterized in that, include: The state initialization module is used to receive task instructions containing user requests and initialize the agent context state in memory for the current session. The context state includes an independent skill tree navigation state machine. The navigation state machine records at least the current node identifier and the traversal path stack. Initially, the current node identifier points to the root node. The child node acquisition module is used to acquire a set of child nodes that directly belong to the current node in a preset hierarchical skill tree based on the current node identifier. The hierarchical skill tree divides the system capability units into a multi-layer nested directed acyclic graph structure, and the number of child nodes in a single layer is limited by a threshold. The navigation control module is used to trigger the dynamic prompt word construction engine, which concatenates and assembles the basic system settings, the description information of the child node set, and the global format output rules in memory into the reasoning prompt words for the current round, and sends the reasoning prompt words to the large language model for single reasoning, and parses the selection result returned by the large language model. The execution module is used to perform the following steps based on the selection result: If the selection result indicates a non-leaf node used as a classification container, then the current node identifier in the navigation state machine is updated to the selection result, the original node is pushed onto the traversal path stack, and the process is returned to obtain the set of child nodes. If the selection result indicates a leaf node, and the leaf node corresponds to a single underlying execution skill or a pre-arranged multi-step skill group, then the navigation loop is terminated, the corresponding computer execution logic is executed, and the result is output.
8. The intelligent agent task processing system based on skill tree and dynamic prompts according to claim 7, characterized in that, The navigation control module also includes: The intent penetration unit is used to identify a string containing multi-level path separators as a complete leaf node identifier if the parsed selection result is such a string; and to verify the existence of the complete leaf node identifier in the hierarchical skill tree topology using hash matching through the routing verification module in the system background; if the verification shows that it exists, the navigation state machine is directly made to jump across levels to the leaf node corresponding to the complete leaf node identifier.
9. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 6.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.