Tool invocation method, apparatus, device, and storage medium

By recognizing user intent and generating matching prompts, the problem of inaccurate tool calls from large language models is solved, enabling accurate calls to target tools.

CN119883439BActive Publication Date: 2026-06-12BEIJING 58 INFORMATION TTECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING 58 INFORMATION TTECH CO LTD
Filing Date
2024-12-26
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Large language models have inaccuracies when calling tools, including not calling tools or calling the wrong tools.

Method used

By identifying the user's target intent, obtaining the calling protocol of the target tool, and generating target prompt words that match the calling protocol, the large language model is guided to obtain tool calling parameters from contextual information in order to accurately call the target tool.

🎯Benefits of technology

This improves the accuracy of large language models in calling tools, enabling them to more accurately obtain tool call parameters from the context information input by the user, thus achieving accurate invocation of the target tool.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119883439B_ABST
    Figure CN119883439B_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a tool calling method, device and equipment and a storage medium. The method comprises: identifying a target intention of a user according to context information input by the user; in response to determining that the target intention needs to be implemented by calling a target tool through a large language model, acquiring a calling protocol corresponding to the target tool; generating a target prompt word matched with the calling protocol according to a parameter type of a tool calling parameter to be acquired included in the calling protocol; inputting the target prompt word into the large language model, so that the large language model acquires the tool calling parameter from the context information according to the target prompt word, and calls the target tool through the tool calling parameter. Through the scheme, the tool calling parameter for calling the target tool can be more accurately acquired from the context information input by the user, and accurate calling of the target tool is realized.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a tool invocation method, apparatus, device, and storage medium. Background Technology

[0002] Large Language Model (LLM) refers to a large-scale parametric language model learned using a deep learning framework based on a large-scale data corpus. It can be used to handle various natural language tasks such as text classification, question answering, and dialogue.

[0003] To fully leverage the capabilities of large language models, they can be treated as intelligent entities capable of autonomously performing tasks, often referred to as agents. When acting as an agent, a large language model can invoke various external tools through programming interfaces, such as Application Programming Interfaces (APIs), to perform corresponding tasks, such as web page searches, weather inquiries, and executing specific code.

[0004] However, in practical applications, large language models have problems with inaccurate tool invocation when calling tools. For example, a tool may not be called when it is needed, or the wrong tool may be called. Summary of the Invention

[0005] This invention provides a tool invocation method, apparatus, device, and storage medium to improve the accuracy of tool invocation for large language models.

[0006] In a first aspect, embodiments of the present invention provide a tool invocation method, the method comprising:

[0007] Identify the user's target intent based on the contextual information input by the user;

[0008] In response to the requirement that determining the target intent requires the large language model to invoke the target tool, the invocation protocol corresponding to the target tool is obtained;

[0009] Based on the parameter type corresponding to the tool call parameters to be obtained contained in the call protocol, generate target prompt words that match the call protocol;

[0010] The target prompt word is input into the large language model, so that the large language model can obtain the tool invocation parameters from the context information based on the target prompt word, and invoke the target tool through the tool invocation parameters.

[0011] Secondly, embodiments of the present invention provide a tool invocation device, the device comprising:

[0012] The first processing module is used to identify the user's target intent based on the context information input by the user;

[0013] The second processing module is used to, in response to the determination that the target intent requires the large language model to call the target tool, obtain the calling protocol corresponding to the target tool; and generate target prompt words that match the calling protocol based on the parameter types corresponding to the tool calling parameters to be obtained contained in the calling protocol.

[0014] The invocation module is used to input the target prompt word into the large language model, so that the large language model can obtain the tool invocation parameters from the context information based on the target prompt word, and invoke the target tool through the tool invocation parameters.

[0015] Thirdly, embodiments of the present invention provide an electronic device, including: a memory, a processor, and a communication interface; wherein, the memory stores executable code, and when the executable code is executed by the processor, the processor can at least implement the tool invocation method as described in the first aspect.

[0016] Fourthly, embodiments of the present invention provide a non-transitory machine-readable storage medium storing executable code, which, when executed by a processor of an electronic device, enables the processor to at least implement the tool invocation method as described in the first aspect.

[0017] Fifthly, embodiments of the present invention provide a computer program product, comprising: a computer program that, when executed by a processor of an electronic device, enables the processor to at least implement the tool invocation method as described in the first aspect.

[0018] In the solution provided by the embodiments of the present invention, if the target intent expressed by the user through the input context information needs to be realized by the large language model through calling a tool, then the calling protocol of the target tool corresponding to the target intent is obtained, and according to the parameter type corresponding to the tool calling parameters to be obtained contained in the calling protocol, a target prompt word matching the calling protocol is generated, that is, a target prompt word matching the parameter type corresponding to the tool calling parameters is generated. Thus, based on the target prompt word, the large language model can more accurately obtain the tool calling parameters for calling the target tool from the context information input by the user, and realize the accurate calling of the target tool. Attached Figure Description

[0019] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0020] Figure 1 A flowchart of a tool invocation method provided in an embodiment of the present invention;

[0021] Figure 2 A flowchart illustrating a target prompt word generation method provided in an embodiment of the present invention;

[0022] Figure 3 A schematic diagram of a first prompt word provided in an embodiment of the present invention;

[0023] Figure 4 A schematic diagram of a second prompt word provided in an embodiment of the present invention;

[0024] Figure 5 This is a schematic diagram of the structure of a tool calling device provided in an embodiment of the present invention;

[0025] Figure 6 To and Figure 5 The illustrated embodiment provides a schematic diagram of the electronic device corresponding to the tool calling device. Detailed Implementation

[0026] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0027] In some processes described in the specification, claims, and accompanying drawings of the embodiments of this invention, multiple operations appearing in a specific order are included. However, it should be clearly understood that these operations may not be executed in the order they appear herein, or may be executed in parallel. The operation numbers, such as 101, 102, etc., are merely used to distinguish different operations and do not represent any execution order. Furthermore, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions such as "first" and "second" in this document are used to distinguish different messages, devices, modules, etc., and do not represent a sequential order, nor do they limit "first" and "second" to different types.

[0028] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in the embodiments of the present invention are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation entry points are provided for users to choose to authorize or refuse.

[0029] Large Language Model (LLM) refers to a large-scale parametric language model learned using a deep learning framework based on a large-scale data corpus. It can be used to handle various natural language tasks such as text classification, question answering, and dialogue.

[0030] In practical applications, large language models can be implemented using large models (Large Model, or LM for short) that are pre-trained with massive amounts of data. For example, they can be GPT-3 (Generative Pre-Trained Transformer-3, the third generation of generative pre-trained models), GPT-4 (Generative Pre-Trained Transformer-4, the fourth generation of generative pre-trained models), BERT (Bidirectional Encoder Representation from Transformers, a bidirectional encoder model based on Transformers), etc. This embodiment of the invention does not limit the specific implementation of such models.

[0031] To fully leverage the capabilities of large language models, they can be treated as intelligent entities capable of autonomously performing tasks, commonly known as agents. The specific process of constructing an intelligent entity based on a large language model can be found in the field of Artificial Intelligence Agent (AI Agent) technology, which will not be elaborated upon in this embodiment.

[0032] When the large language model acts as a proxy, it can invoke various tools outside the model through programming interfaces, such as Application Programming Interfaces (APIs), to perform corresponding tasks, such as web page searches, weather inquiries, and the execution of specific code. In this embodiment of the invention, the type of tool invoked by the large language model is not limited; the tool can be, for example, an application like a weather query, a web service, a search engine, and so on.

[0033] It is worth emphasizing that, in this embodiment, the tool, after receiving input data conforming to the tool invocation protocol sent by the large language model, can feed back the corresponding tool invocation result to the large language model. For example, consider an application scenario where a user inquires about the weather in a specific location at a specific time. Since the large language model does not have the ability to generate the weather for that specific location at that time, it needs to invoke a weather query tool outside the model to retrieve the weather query result for the user. Assuming the large language model accurately understands the user's weather query request and sends input data instructing the weather query tool to query the weather in a specific location at a specific time, and that this input data conforms to the weather query tool's tool invocation protocol, the weather query tool, after completing the query, will feed back the weather query result for that specific location at that specific time to the large language model, which can then provide the weather query result back to the user.

[0034] The above example describes the process of a large language model correctly calling a tool. However, in practical applications, in scenarios where a large language model needs to call tools, there are often problems with inaccurate tool calls. For example, a tool may not be called when it is needed, or the wrong tool may be called.

[0035] In practice, there are many reasons why the large language model cannot accurately call tools. These include: when the large language model needs to obtain information from the context information input by the user to call tools, the large language model does not perform the corresponding information acquisition operation, or the information obtained is inaccurate, which will lead to the inability to call tools or the calling of the wrong tools.

[0036] To address at least one of the aforementioned technical problems, embodiments of the present invention provide a tool invocation method. In this method, if the target intent expressed by a user through input context information requires a large language model to invoke a tool, the invocation protocol of the target tool corresponding to the target intent is obtained. Based on the parameter types corresponding to the tool invocation parameters to be obtained contained in the invocation protocol, a target prompt word matching the invocation protocol is generated, that is, a target prompt word matching the parameter types corresponding to the tool invocation parameters is generated. Thus, based on this target prompt word, the large language model can more accurately obtain the tool invocation parameters for invoking the target tool from the user's input context information, achieving accurate invocation of the target tool.

[0037] The tool invocation method provided in this embodiment of the invention can be executed by an electronic device, which can be a terminal device such as a PC, laptop, or smartphone, or a server. The server can be a physical server containing an independent host, a virtual server, a cloud server, or a server cluster.

[0038] The following detailed description of some embodiments of the present invention is provided in conjunction with the accompanying drawings. Where there is no conflict between the embodiments, the following embodiments and features can be combined with each other. Furthermore, the timing of the steps in the following method embodiments is merely an example and not a strict limitation.

[0039] Figure 1 A flowchart of a tool invocation method provided for an embodiment of the present invention, such as... Figure 1 As shown, it may include the following steps:

[0040] 101. Identify the user's target intent based on the contextual information input by the user.

[0041] 102. In response to the need to determine the target intent, the large language model calls the target tool to obtain the calling protocol corresponding to the target tool.

[0042] 103. Generate target prompt words that match the calling protocol based on the parameter types corresponding to the tool calling parameters to be obtained contained in the calling protocol.

[0043] 104. Input the target prompt word into the large language model so that the large language model can obtain the tool call parameters from the context information based on the target prompt word, and call the target tool through the tool call parameters.

[0044] In this embodiment of the invention, the contextual information input by the user refers to at least one piece of conversation information generated during the user's conversation with the large language model. Optionally, the large language model can identify the user's target intent reflected in the contextual information input by the user, that is, the purpose of the user's conversation with the large language model, such as querying the weather in xx time and xx region as exemplified above.

[0045] In practical applications, users' target intentions can be diverse. Some target intentions can be directly achieved by the large language model through its own natural language processing capabilities, such as replacing word A with word B in a certain text. Other target intentions require the large language model to call tools, such as querying the weather in a certain place at a certain time, which requires the large language model to call a weather query tool.

[0046] In an optional embodiment, in order to ensure that the large language model accurately calls the tool when it is required, a mapping relationship between the application scenario and whether the target intent needs to be realized by the large language model through calling the tool can be established in advance. For example, in application scenario 1, application scenario 2 and application scenario 3, the target intent needs to be realized by the large language model through calling the tool; in application scenario 4 and application scenario 5, the target intent does not need to be realized by the large language model through calling the tool, etc.

[0047] In the specific implementation process, based on the first scenario information corresponding to the target intent, it can be determined whether the target intent needs to be implemented by the large language model through calling tools. For example: when the first scenario information matches any of the application scenarios 1, 2, or 3 mentioned above, it is determined that the target intent needs to be implemented by the large language model through calling tools; when the first scenario information matches any of the application scenarios 4 or 5 mentioned above, it is determined that the target intent does not need to be implemented by the large language model through calling tools.

[0048] Specifically, when the target intent requires the large language model to call a tool, the subsequent processing related to the tool call is executed, such as obtaining the information required to call the tool. This effectively avoids the situation where the large language model fails to perform the operation related to the tool call when it needs to call a tool.

[0049] In another optional embodiment, in order to further improve the accuracy of the large language model calling the tool, a mapping relationship between application scenarios and tools can be established in advance. For example, in a more granular sub-application scenario 1-1 of application scenario 1, the large language model calls tool 1; in a more granular sub-application scenario 1-2 of application scenario 1, the large language model calls tool 2; in a more granular sub-application scenario 1-3 of application scenario 1, the large language model calls tool 3, etc.

[0050] In the specific implementation process, in response to the determination that the large language model needs to call a tool to achieve the target intent, further, based on the second scenario information corresponding to the target intent, the target tool that the large language model needs to call is determined. For example: when the second scenario information matches sub-application scenario 1-1, the large language model needs to call tool 1 to achieve the target intent; when the second scenario information matches sub-application scenario 1-2, the large language model needs to call tool 2 to achieve the target intent, and so on.

[0051] Optionally, the first scene information and the second scene information mentioned above can be the same or different scene information.

[0052] The above explains the process of determining whether a large language model needs to call a tool based on the user's target intent, and which tool to call.

[0053] The following details the tool invocation process when the target intent requires a large language model to achieve it by calling the target tool.

[0054] In practice, in response to the need for the large language model to call the target tool to determine the target intent, the calling protocol corresponding to the target tool is first obtained. This calling protocol describes the tool calling parameters required to call the target tool, as well as the format information of the tool calling parameters (e.g., JSON format).

[0055] Next, the tool call parameters to be obtained are contained in the call protocol of the target tool from the context information input by the user. For example, the query time and query location are contained in the weather query tool.

[0056] As an optional method for obtaining tool call parameters, the tool call parameters to be obtained can be obtained from the context information by using a large language model based on prompt word engineering. In other words, prompt words guide the large language model to obtain the tool call parameters to be obtained from the context information.

[0057] In practical applications, large language models exhibit varying capabilities in extracting tool invocation parameters based on different prompt words. For instance, guided by a prompt word with feature 'a', a large language model can accurately extract tool invocation parameters corresponding to parameter type A from the user's input context; guided by a prompt word with feature 'b', it can accurately extract tool invocation parameters corresponding to parameter type B, and so on. Therefore, generating target prompt words that match the invocation protocol of the target tool is beneficial for accurately retrieving the tool invocation parameters contained in the target tool's invocation protocol from the context information.

[0058] In the specific process, in order to improve the accuracy of the large language model in obtaining the tool call parameters for calling the target tool from the context information of the user input, optionally, target prompt words that match the calling protocol can be generated according to the parameter type of the tool call parameters to be obtained contained in the calling protocol. That is, target prompt words that match the parameter type of the tool call parameters to be obtained contained in the calling protocol can be generated.

[0059] In practical applications, tool call parameters can optionally be categorized into two types: a first parameter type and a second parameter type. Tool call parameters of the first parameter type contain fields with required attributes, while those of the second parameter type do not contain fields with required attributes. It should be noted that whether the fields in each tool call parameter within the calling protocol have required attributes is predefined by the tool provider.

[0060] For fields with required attributes, users need to inform the large language model of the relevant information during interaction. For example, in a recharge scenario, the recharge amount parameter in the recharge tool's call protocol is a required field. If the amount field in the recharge amount parameter is required, the user needs to inform the large language model of the specific recharge amount, such as 30 yuan or 50 yuan, during interaction.

[0061] When a field does not have a required attribute, it means that the field can be filled with default content, and the user is not required to inform the language model of the relevant content during the interaction with the language model. For example, in a weather query scenario, assuming that the query time in the calling protocol of the weather query tool does not contain a field with a required attribute, if the user does not inform the language model of the relevant content during the interaction with the language model, the default content can be filled in, such as: the last 7 days, the last 15 days, etc.

[0062] It is understandable that the parameter types of the tool invocation parameters to be obtained in the invocation protocol corresponding to the target tool may only contain the first parameter type or the second parameter type, or they may contain both the first parameter type and the second parameter type. The following will combine... Figure 2 The specific process of generating target prompt words that match the calling protocol based on the parameter type corresponding to the tool calling parameters to be obtained contained in the calling protocol in this embodiment of the invention is described.

[0063] Figure 2 A flowchart of a target prompt word generation method provided in an embodiment of the present invention is shown below. Figure 2 As shown, it can include the following steps:

[0064] 201. Determine the first number of tool call parameters of the first type belonging to the first parameter type and the second number of tool call parameters of the second type belonging to the second parameter type among the tool call parameters to be obtained in the call protocol of the target tool.

[0065] 202. Based on the first quantity and the second quantity, generate target prompt words that match the calling protocol.

[0066] In this embodiment, the tool invocation parameters to be obtained contained in the target tool's invocation protocol are divided into two categories according to parameter type. For ease of description, tool invocation parameters belonging to the first parameter type are referred to as first-category tool invocation parameters; and tool invocation parameters belonging to the second parameter type are referred to as second-category tool invocation parameters.

[0067] In practice, target prompt words matching the calling protocol can be generated based on the first quantity of parameters corresponding to the first type of tool calls and the second quantity of parameters corresponding to the second type of tool calls. For example, when the first quantity is large, prompt words that can guide the large language model to accurately extract the parameters of the first type of tool calls can be generated; when the second quantity is large, prompt words that can guide the large language model to accurately extract the parameters of the second type of tool calls can be generated, and so on.

[0068] It is important to emphasize that the fact that a certain prompt word can better guide the large language model to extract the call parameters of a certain type of tool does not mean that the prompt word cannot guide the large language model to extract the call parameters of other types of tools. Rather, under the guidance of the prompt word, the large language model's ability to extract "the call parameters of a certain type of tool" is better than its ability to extract "the call parameters of other types of tools".

[0069] In an optional embodiment, generating target prompt words that match the calling protocol based on a first quantity and a second quantity includes:

[0070] In response to a ratio between the first and second quantities exceeding a set threshold, a first prompt word matching the invocation protocol is generated. This first prompt word contains the target function and its corresponding descriptive information. The descriptive information guides the large language model to obtain tool invocation parameters from the context information through the target function.

[0071] Optionally, the threshold can be customized, such as being set to 1.

[0072] In practical applications, the objective function provides a large language model with an additional capability to obtain tool call parameters. Based on the description information corresponding to the objective function, the large language model can understand the function's functionality and thus retrieve the tool call parameters from the context information when needed. After obtaining the tool call parameters, the target tool is invoked through function call or tool usage.

[0073] Understandably, the clearer and more explicit the contextual information regarding tool call parameters, the higher the accuracy of obtaining these parameters through the objective function. Since the first type of tool call parameters all contain fields with required attributes, the large language model guides users to input content related to these parameters during interaction. Therefore, the objective function often performs better in extracting these parameters from the contextual information. Consequently, when the ratio between the first and second quantities exceeds a set threshold, a first prompt word containing the objective function can be generated, achieving accurate extraction of the larger number of first-type tool call parameters.

[0074] For ease of understanding, Figure 3 This is a schematic diagram of a first prompt word provided in an embodiment of the present invention, such as... Figure 3 As shown, the first prompt word contains an instruction section, context information, and a function section. The instruction section includes system instructions to explicitly tell the large language model the task to be performed or the question to be answered; the context information is the conversation information from the multi-turn dialogue between the user and the large language model; the function section contains the target function that the large language model can use to extract tool call parameters, as well as the function's description information, such as the function name and function type. Based on the description information corresponding to the function in the prompt word, the large language model can understand the function's purpose and use it correctly.

[0075] In another optional embodiment, generating target prompt words matching the calling protocol based on the first quantity and the second quantity further includes:

[0076] In response to the ratio between the first quantity and the second quantity being less than or equal to a set threshold, a second prompt word matching the invocation protocol is generated; wherein, the second prompt word contains a parameter extraction instruction, which is used to guide the large language model to obtain tool invocation parameters from context information.

[0077] In practical applications, during the supervised fine-tuning (SFT) stage of a large language model, the model is usually trained using labeled data with given instructions. As a result, the large language model has good instruction following ability, that is, it can execute the given instructions well.

[0078] Based on the instruction-following capabilities of the large language model, this embodiment provides another prompt word that matches the calling protocol, namely the second prompt word. The second prompt word describes the task requiring the acquisition of tool call parameters in the form of instructions, thereby guiding the large language model to accurately obtain the tool call parameters from the context information.

[0079] For ease of understanding, Figure 4 This is a schematic diagram of a second prompt word provided in an embodiment of the present invention, such as... Figure 4 As shown, the second prompt contains an instruction section and contextual information. The instruction section includes system instructions to explicitly tell the large language model the task to be performed or the question to be answered. The system instructions include parameter extraction instructions to instruct the large language model to obtain tool call parameters from the contextual information. The contextual information refers to the conversation information between the user and the large language model during multiple rounds of dialogue.

[0080] It should be noted that, Figure 3 The first prompt word shown and Figure 4 The second prompt word shown is only an example. In actual applications, the prompt word may contain other information, such as large language model information, model parameters, etc.

[0081] In practical applications, the reasons why large language models may not be able to accurately call tools may also include: the tool call parameters obtained by the large language model from the context information are not output in a format that conforms to the calling protocol of the target tool. For example, the calling protocol describes that the tool call parameters should be in JSON format, but the tool call parameters output by the large language model are in natural language.

[0082] To improve the accuracy of the large language model invoking tools, optionally, the output format of the tool invoking parameters can be defined in the target function contained in the first prompt word; or, a parameter output instruction can be configured in the second prompt word, wherein the parameter output instruction is used to guide the large language model to output the obtained tool invoking parameters in the target format, which matches the format information contained in the invoking protocol.

[0083] The above embodiment provides a method for generating target prompt words that match the calling protocol based on the relationship between the first quantity and the second quantity. In practical applications, the generation method of target prompt words is not limited to this. For example, the extraction difficulty corresponding to the first type of tool calling parameters and the second type of tool calling parameters can also be combined to generate target prompt words that match the calling protocol. For example, if the extraction difficulty of the second type of tool calling parameters is greater than that of the first type of tool calling parameters, then when the second quantity corresponding to the second type of tool calling parameters is not zero, prompt words that can guide the large language model to accurately extract the second type of tool calling parameters are generated, such as the second prompt word mentioned above.

[0084] Finally, the target prompt word (i.e., the first prompt word or the second prompt word) is input into the large language model so that the large language model can obtain the tool call parameters from the context information based on the target prompt word and call the target tool through the tool call parameters.

[0085] In summary, in the tool invocation method provided by this embodiment of the invention, if the target intent expressed by the user through the input context information needs to be realized by the large language model through tool invocation, then the invocation protocol of the target tool corresponding to the target intent is obtained, and according to the parameter type corresponding to the tool invocation parameter to be obtained contained in the invocation protocol, a target prompt word matching the invocation protocol is generated, that is, a target prompt word matching the parameter type corresponding to the tool invocation parameter is generated. Thus, based on the target prompt word, the large language model can more accurately obtain the tool invocation parameter for invoking the target tool from the context information input by the user, and realize accurate invocation of the target tool.

[0086] The tool invocation apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will understand that these apparatuses can all be configured using commercially available hardware components through the steps taught in this solution.

[0087] Figure 5 This is a schematic diagram of the structure of a tool calling device provided in an embodiment of the present invention, as shown below. Figure 5 As shown, the device includes: a first processing module 11, a second processing module 12, and a calling module 13.

[0088] The first processing module 11 is used to identify the user's target intent based on the context information input by the user.

[0089] The second processing module 12 is used to, in response to the determination that the target intent requires the large language model to call the target tool, obtain the calling protocol corresponding to the target tool; and generate a target prompt word that matches the calling protocol according to the parameter type corresponding to the tool calling parameters to be obtained contained in the calling protocol.

[0090] The calling module 13 is used to input the target prompt word into the large language model, so that the large language model can obtain the tool calling parameters from the context information based on the target prompt word, and call the target tool through the tool calling parameters.

[0091] In an optional embodiment, the parameter type includes: a first parameter type and / or a second parameter type; wherein, the tool call parameters belonging to the first parameter type include fields with required attributes, and the tool call parameters belonging to the second parameter type do not include fields with required attributes.

[0092] In an optional embodiment, the second processing module 12 is specifically used to determine the first number of tool call parameters of the first type belonging to the first parameter type and the second number of tool call parameters of the second type belonging to the second parameter type among the tool call parameters to be obtained in the call protocol; and to generate a target prompt word that matches the call protocol based on the first number and the second number.

[0093] In an optional embodiment, the second processing module 12 is further configured to generate a first prompt word matching the calling protocol in response to the ratio between the first quantity and the second quantity being greater than a set threshold; wherein the first prompt word contains a target function and descriptive information corresponding to the target function, the descriptive information being used to guide the large language model to obtain the tool calling parameters from the context information through the target function.

[0094] In an optional embodiment, the second processing module 12 is further configured to generate a second prompt word matching the calling protocol in response to the ratio between the first quantity and the second quantity being less than or equal to a set threshold; wherein the second prompt word contains a parameter extraction instruction, the parameter extraction instruction being used to guide the large language model to obtain the tool calling parameters from the context information.

[0095] In an optional embodiment, the second prompt word further includes a parameter output instruction, which is used to instruct the large language model to output the obtained tool call parameters in a target format that matches the format information contained in the call protocol.

[0096] In an optional embodiment, the first processing module 11 is further configured to determine whether the target intent needs to be implemented by the large language model through calling a tool based on the first scene information corresponding to the target intent; in response to determining that the target intent needs to be implemented by the large language model through calling a tool, the first processing module 11 is configured to determine the target tool that the large language model needs to call based on the second scene information corresponding to the target intent.

[0097] Figure 5 The device shown can perform the steps described in the foregoing embodiments. For detailed execution process and technical effects, please refer to the description in the foregoing embodiments, which will not be repeated here.

[0098] In one possible design, the above Figure 5 The structure of the tool calling device shown can be implemented as an electronic device, such as... Figure 6 As shown, the electronic device may include: a memory 21, a processor 22, and a communication interface 23. The memory 21 stores executable code, which, when executed by the processor 22, enables the processor 22 to at least implement the tool invocation methods provided in the foregoing embodiments.

[0099] In addition, embodiments of the present invention provide a non-transitory machine-readable storage medium storing executable code, which, when executed by a processor of an electronic device, enables the processor to at least implement the tool invocation method provided in the foregoing embodiments.

[0100] This invention provides a computer program product, including: a computer program that, when executed by a processor of an electronic device, causes the processor to execute the tool invocation method provided in the foregoing embodiments.

[0101] The device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separate. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0102] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of a necessary general-purpose hardware platform, or by a combination of hardware and software. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a computer product. The present invention can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0103] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for invoking a tool, characterized in that, include: Based on the contextual information input by the user, the user's target intent is identified, wherein the contextual information is at least one piece of conversation information generated by the user's conversation with the large language model; In response to the requirement that determining the target intent requires the large language model to invoke the target tool, the invocation protocol corresponding to the target tool is obtained; The method determines a first number of tool call parameters of a first type belonging to a first parameter type and a second number of tool call parameters of a second parameter type from the tool call parameters to be obtained in the calling protocol; wherein the tool call parameters of the first parameter type include fields with required attributes, and the tool call parameters of the second parameter type do not include fields with required attributes; if the ratio between the first number and the second number is greater than a set threshold, a first prompt word containing a target function and descriptive information corresponding to the target function is generated; the descriptive information is used to guide the large language model to obtain the tool call parameters from the context information through the target function; If the ratio between the first quantity and the second quantity is less than or equal to a set threshold, a second prompt word containing parameter extraction instructions is generated; the parameter extraction instructions are used to describe the task of obtaining tool call parameters in the form of instructions based on the instruction compliance capability formed by the supervised fine-tuning of the SFT stage of the large language model, so as to guide the large language model to obtain the tool call parameters from the context information. The target prompt word is input into the large language model so that the large language model can obtain the tool call parameters from the context information based on the target prompt word, and call the target tool through the tool call parameters; the target prompt word is either the first prompt word or the second prompt word.

2. The method according to claim 1, characterized in that, The second prompt also includes a parameter output instruction, which is used to guide the large language model to output the obtained tool call parameters in a target format that matches the format information contained in the call protocol.

3. The method according to any one of claims 1 to 2, characterized in that, The method further includes: Based on the first scene information corresponding to the target intent, determine whether the target intent needs to be implemented by calling a tool through a large language model; In response to determining that the large language model needs to invoke a tool to achieve the target intent, the target tool that the large language model needs to invoke is determined based on the second scenario information corresponding to the target intent.

4. A tool calling device, characterized in that, include: The first processing module is used to identify the user's target intent based on the context information input by the user; The context information is at least one piece of conversation information generated by the user's conversation with the large language model; The second processing module is used to obtain the calling protocol corresponding to the target tool in response to the determination that the target intent requires the large language model to call the target tool. The method determines a first quantity of tool call parameters of a first type belonging to a first parameter type and a second quantity of tool call parameters of a second type belonging to a second parameter type, among the tool call parameters to be acquired in the calling protocol. The tool call parameters of the first parameter type include fields with required attributes, while the tool call parameters of the second parameter type do not include fields with required attributes. If the ratio between the first quantity and the second quantity is greater than a set threshold, a first prompt word containing a target function and its corresponding descriptive information is generated. The descriptive information guides the large language model to acquire the tool call parameters from the context information through the target function. If the ratio between the first quantity and the second quantity is less than or equal to a set threshold, a second prompt word containing a parameter extraction instruction is generated. The parameter extraction instruction, based on the instruction compliance capability formed during the supervised fine-tuning SFT stage training of the large language model, describes the task of acquiring tool call parameters in instruction form to guide the large language model to acquire the tool call parameters from the context information. The invocation module is used to input the target prompt word into the large language model, so that the large language model can obtain the tool invocation parameters from the context information based on the target prompt word, and invoke the target tool through the tool invocation parameters; the target prompt word is the first prompt word or the second prompt word.

5. An electronic device, characterized in that, include: The device includes a memory, a processor, and a communication interface; wherein the memory stores executable code, and when the executable code is executed by the processor, the processor performs the tool invocation method as described in any one of claims 1 to 3.

6. A non-transitory machine-readable storage medium, characterized in that, The non-transitory machine-readable storage medium stores executable code that, when executed by a processor of an electronic device, causes the processor to perform the tool invocation method as described in any one of claims 1 to 3.