A reply content confirmation method, device, storage medium and program product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing content rewriting models and functional interface selection models, the accuracy problem of referential resolution in complex task-oriented dialogue systems by large language models was solved, achieving accurate mapping between user input and structured information, and improving the accuracy and efficiency of response content.

CN122240818APending Publication Date: 2026-06-19ZHEJIANG DAHUA TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ZHEJIANG DAHUA TECH CO LTD
Filing Date: 2026-05-15
Publication Date: 2026-06-19

Application Information

Patent Timeline

15 May 2026

Application

19 Jun 2026

Publication

CN122240818A

IPC: G06F16/334; G06F16/3329; G06F16/31; G06F40/30; G06F18/22; G06N3/0455; G06N3/096; G06N5/04

AI Tagging

Application Domain

Semantic analysis Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

When dealing with complex task-oriented dialogue systems, large language models struggle to map user references to text entities that are no longer in the historical dialogue to internal structured information, resulting in inaccurate responses.

Method used

By constructing a content rewriting model, a function interface selection model, a parameter extraction model, and a pronoun recognition model, and using prompt words to instruct these models to perform semantic supplementation, function interface selection, and parameter extraction tasks, the mapping between user input and backend structured information is realized.

Benefits of technology

It improves the accuracy and efficiency of the dialogue system's responses in complex task-oriented dialogues, ensuring that user needs are accurately identified and met.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122240818A_ABST

Patent Text Reader

Abstract

This application relates to the field of artificial intelligence technology, and in particular to a method, device, storage medium, and program product for confirming response content. The method includes: determining at least one pronoun contained in the current input content, and the user's corresponding need; selecting a target functional interface from a plurality of pre-built functional interfaces whose name description information matches the user's need; determining a target pronoun from the at least one pronoun that matches the input parameter description information of the target functional interface; using the target pronoun as an input parameter of the target functional interface, and calling the target functional interface to obtain a result matching the functional description information of the target functional interface; and using the result as the response content corresponding to the current input content. Through the above method, pronouns can be mapped to internal structured information, thereby providing the user with the desired response content.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, device, storage medium, and program product for confirming response content. Background Technology

[0002] Significant breakthroughs have been achieved in natural language processing techniques based on large language models. When dealing with open-ended, generative, multi-turn text dialogue tasks, large language models demonstrate superior contextual understanding capabilities, generating coherent responses closely related to the dialogue history. Currently, common methods for processing user questions involve using large language models to perform multiple tasks based on historical dialogues and the current user question, including referential resolution, user intent recognition, and response generation.

[0003] However, when large language models are applied to more complex task-oriented dialogue systems, for referential resolution, the user's referent is no longer limited to the text entities that appear in the history of the dialogue. In more cases, it may point to the internal structured information corresponding to a specific card in the previous round of response, such as the device serial number, rack location code, port bandwidth configuration, etc. stored in the background in the data center equipment management scenario.

[0004] If the pronouns in a user's question refer to content that is not presented directly in the context as natural language, but is stored in the background as standard structured information, the large language model will have difficulty outputting the response content that the user needs. Summary of the Invention

[0005] This application provides a method, device, storage medium, and program product for confirming response content, which can map user-inputted text information to internal structured information stored in the background.

[0006] In a first aspect, embodiments of this application provide a method for confirming response content, the method comprising: Determine at least one pronoun contained in the current input content, and the user requirement corresponding to the current input content; Select the target functional interface whose name and description information match the user's needs from a number of pre-built functional interfaces; From the at least one pronoun, determine the target pronoun that matches the input parameter description information of the target function interface; The target pronoun is used as the input parameter of the target function interface, and the target function interface is called to obtain a result that matches the function description information of the target function interface; The result will be used as the response content corresponding to the current input content.

[0007] In the above solution, by calling the functional interface, the dialogue system can map the user's pronouns to internal structured information, rather than simply dissolving the pronouns at the "plain text" level, thereby providing the user with the response content they want.

[0008] In one possible implementation, the current input content is obtained in the following way: The historical dialogues of a preset number of rounds and the latest input content are input into a pre-built content rewriting model to obtain input content after semantic supplementation of the latest input content. The historical dialogues include historical input content and historical response content. The semantically supplemented input content is used as the current input content.

[0009] In the above scheme, the user input is first semantically supplemented so that the subsequent user demand identification process can obtain more accurate results.

[0010] In one possible implementation, the step of inputting a preset number of rounds of historical dialogue and the latest input content into a pre-built content rewriting model to obtain input content after semantic supplementation of the latest input content includes: The semantic recognition layer of the content rewriting model determines the semantic category of each keyword contained in the historical dialogue of the preset number of rounds. The decision layer of the content rewriting model determines target keywords from the plurality of keywords that have a different semantic category from the at least one pronoun. The semantic supplementation layer of the content rewriting model performs semantic supplementation on the latest input content based on the target keywords.

[0011] In the above scheme, the content rewriting subtask in the dialogue understanding task is executed by the content rewriting model, which can ensure the accuracy and efficiency of the content rewriting subtask. Moreover, the content rewriting subtask is executed separately by the content rewriting model, which can more accurately control and optimize the content rewriting subtask.

[0012] In one possible implementation, the step of inputting a preset number of rounds of historical dialogue and the latest input content into a pre-built content rewriting model to obtain input content after semantic supplementation of the latest input content includes: Construct content rewriting prompts using the preset number of rounds of historical dialogue and the latest input content; The content rewriting prompts are input into the content rewriting model so that the content rewriting model can semantically supplement the latest input content according to the content rewriting prompts. The content rewriting prompts also include the tasks that the content rewriting model needs to complete, the semantic supplementation rules used to instruct the content rewriting model to perform semantic supplementation, and the formats of the input and output information of the content rewriting model.

[0013] In the above scheme, using prompt words to instruct the content rewriting model to perform content rewriting subtasks can make the results more accurate.

[0014] In one possible implementation, selecting the target functional interface whose name description information matches the user's needs from a pre-built plurality of functional interfaces includes: Construct function interface selection prompts using the user requirements and the name description information of each function interface; The function interface selection prompts are input into a pre-built function interface selection model so that the function interface selection model determines the target function interface according to the function interface selection prompts. The function interface selection prompt also includes the task that the function interface selection model needs to complete, the matching rules for the function interface selection model to match the function interface, and the output requirements for the output information format of the function interface selection model. The matching rules are to match the user requirements and the name description information of each function interface, and use the matching results to determine the target function interface. When multiple target function interfaces are matched, the matched function interfaces are output in a preset order. When no target function interface is matched, the information indicating no matching result is output.

[0015] The above scheme ensures the accuracy and efficiency of the function interface selection subtask in the dialogue understanding task by having the function interface selection model execute it. Moreover, executing the function interface selection subtask independently through the function interface selection model allows for more precise control and optimization of the subtask. Furthermore, using prompts to instruct the function interface selection model to execute the subtask makes the results more accurate.

[0016] In one possible implementation, the method further includes: Construct a referential index prompt word using the referential index description information of each referential noun and the target functional interface; The reference index prompt is input into a pre-built reference noun recognition model so that the reference noun recognition model determines the input parameters of the target function interface according to the reference index prompt; The reference index prompt also includes the tasks that the reference noun recognition model needs to complete, the recognition rules for instructing the reference noun recognition model to recognize each reference noun, and the output requirements for instructing the reference noun recognition model to output information in a specific format.

[0017] In the above scheme, the reference resolution subtask in the dialogue understanding task is executed by the reference noun recognition model, which can ensure the accuracy and efficiency of the reference resolution subtask. Moreover, the reference resolution selection subtask is executed separately by the reference noun recognition model, which can more accurately control and optimize the reference resolution subtask. In addition, the reference noun recognition model is instructed to execute the reference resolution subtask by using prompt words, which can make the results more accurate.

[0018] In one possible implementation, determining the target pronoun from the at least one pronoun that matches the input parameter description information of the target functional interface includes: Construct parameter extraction prompts using the description information of each pronoun and the input parameters of the target function interface; The parameter extraction prompts are input into a pre-built parameter extraction model so that the parameter extraction model selects the target pronoun according to the parameter extraction prompts. The parameter extraction prompts also include the tasks that the parameter extraction model needs to complete, the selection rules for the parameter extraction model to select pronouns, and the output requirements for the output information format of the parameter extraction model. The selection rules are to compare the parameter names contained in the input parameter description information with the semantics of each pronoun, and determine the target pronoun according to the comparison results.

[0019] The above scheme, by executing the parameter extraction subtask in the dialogue understanding task through the parameter extraction model, can ensure the accuracy and efficiency of the parameter extraction subtask. Moreover, by executing the parameter extraction subtask independently through the parameter extraction model, the parameter extraction subtask can be controlled and optimized more precisely. In addition, by instructing the parameter extraction model to execute the parameter extraction subtask through prompt words, the results obtained can be more accurate.

[0020] In one possible implementation, the step of invoking the target function interface to obtain a result that matches the function description information of the target function interface includes: Query the target data related to the input parameters from the target data source connected to the target function interface; The target data is processed according to the functional description information of the target function interface to obtain the result.

[0021] In one possible implementation, the step of invoking the target function interface to obtain a result that matches the function description information of the target function interface includes: Query the target field from the target data source connected to the target function interface that matches the parameter category to which the input parameter belongs; Target data that meets the filtering conditions of the input parameters is filtered from the data corresponding to the field, and the filtering conditions are obtained from the parameter description information of the target function interface; Determine the data processing logic contained in the functional description information of the target functional interface, wherein the processing logic includes at least one of data verification, data format conversion, and data calculation. The target data is processed according to the processing logic to obtain the result.

[0022] In the above solution, by calling the target function interface, the required data can be retrieved and processed accordingly, thereby providing the user with the data they want; moreover, the query can also be performed by inputting parameter categories and filtering conditions, which can improve the accuracy of the query.

[0023] In one possible implementation, the method further includes: If negative feedback is received regarding the response content, the output content of each model is obtained from the pre-built dialogue management module, and the output content of each model corresponds to the input content of the latest round. The output of each model is displayed through the interface, allowing users to rewrite the model by retrieving the content, and / or re-retrieve the model by re-retrieving the function interface, and / or re-retrieve the parameter extraction model to obtain updated output.

[0024] In the above scheme, different models are used to perform different tasks. When the response content does not match the user's needs, the output content of each model can be retrieved from the dialogue management module so that the user can trace the error and re-execute the corresponding model.

[0025] Secondly, embodiments of this application provide a response content confirmation device, the device comprising: The determination module is used to determine at least one pronoun contained in the current input content, and the user requirement corresponding to the current input content; The function interface matching module is used to select a target function interface from a plurality of pre-built function interfaces whose name and description information match the user's requirements. An input parameter determination module is used to determine, from the at least one pronoun, a target pronoun that matches the input parameter description information of the target function interface; The calling module is used to take the target pronoun as the input parameter of the target function interface, and call the target function interface to obtain a result that matches the function description information of the target function interface; The response module is used to use the result as the response content corresponding to the current input content.

[0026] Thirdly, embodiments of this application provide an electronic device, the electronic device comprising: At least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the first aspect described above.

[0027] Fourthly, embodiments of this application provide a computer storage medium storing a computer program for causing a computer to perform the method described in the first aspect above.

[0028] Fifthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the method described in the first aspect above. Attached Figure Description

[0029] Figure 1 This is a schematic diagram illustrating an application scenario of a response content confirmation method provided in an embodiment of this application. Figure 2 This is a schematic flowchart of a response content confirmation method provided in an embodiment of this application; Figure 3 This application provides an alternative method for confirming response content. Figure 4 This is a schematic diagram of a response content confirmation device provided in an embodiment of this application; Figure 5 This is a schematic diagram of an electronic device provided in an embodiment of this application. Detailed Implementation

[0030] The principles and spirit of this application will now be described with reference to several exemplary embodiments. It should be understood that these embodiments are provided merely to enable those skilled in the art to better understand and implement this application, and are not intended to limit the scope of this application in any way. Rather, these embodiments are provided to make this disclosure more thorough and complete, and to fully convey the scope of this disclosure to those skilled in the art.

[0031] Those skilled in the art will recognize that embodiments of this application can be implemented as a system, apparatus, method, or computer program product. Therefore, this disclosure can be specifically implemented in the following forms: entirely hardware, entirely software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

[0032] In this article, it is important to understand that any number of elements in the accompanying figures is for illustrative purposes and not for limitation, and any naming is for distinction only and has no limiting meaning.

[0033] The following describes some of the concepts involved in the embodiments of this application.

[0034] Large Language Model: With Transformer as its core architecture, it is pre-trained on general text with trillions of tokens and can adapt to diverse natural language processing tasks such as translation, creation, and reasoning through prompt words, without the need for retraining for a single task.

[0035] Functional interfaces, also known as tools or functional modules, are tools with specific functions that are actively invoked to achieve a goal. They are key means for large language models to perceive and influence the external world, or to expand their capability boundaries. Essentially, they are bridge components that allow large language models to break through the limitations of "plain text generation," calling external systems / services to obtain real-time information or internal structured information, performing specific operations, and completing complex tasks. Functional interfaces correspond to descriptive information, which includes: name description information, input parameter description information, and function description information. This application's embodiments do not specifically limit the content of the descriptive information.

[0036] Reference resolution is a core task in natural language processing. Its goal is to identify the correspondence between pronouns, referential nouns and the entities or concepts they refer to in a text, solve the problem of "ambiguity of reference" in language expression, and enable the model to accurately understand the logic of the text.

[0037] Dialogue systems, also known as conversational systems or chatbots, are artificial intelligence systems capable of engaging in multi-turn, coherent, goal-oriented, or casual interactions with humans using natural language. They integrate multiple technologies such as natural language processing, large language models, speech recognition / synthesis, knowledge graphs, and semantic dereference. Their core objective is to simulate human conversational logic, understand user intent, and provide accurate or natural responses, which can be displayed to the user through a user interface.

[0038] Dialogue systems include casual conversation systems, task-oriented dialogue systems, question-and-answer dialogue systems, and hybrid dialogue systems. This application primarily applies task-oriented dialogue systems, but can be extended to other types of dialogue systems; no specific limitations are made here. The dialogue system provided in this application is built based on models that perform different tasks, such as content rewriting models, function interface selection models, and parameter extraction models. The number of training parameters for different models varies.

[0039] Model Context Protocol (MCP) is a unified, secure, and scalable bidirectional communication layer that enables large language models to communicate with external tools, data, and services. It can be understood as the large language model or dialogue system communicating with functional interfaces through the MCP.

[0040] The following description, in conjunction with the accompanying drawings, illustrates a method for confirming the content of a response in an embodiment of this application.

[0041] like Figure 1 The illustration shows an application scenario for a response content confirmation method provided in this application embodiment. This application scenario includes a dialogue system 101 and at least one functional interface (…). Figure 1 The diagram shows functional interfaces 102_1, 102_2, ..., 102_N and at least one data source (e.g., ...). Figure 1 The table shows data source 1, data source 2, ..., data source n.

[0042] The dialogue system 101 described above deploys a large language model, which can interact with the user through the display interface 103. The dialogue system 101 can access different data sources through different functional interfaces. These functional interfaces are key means for the large language model to perceive and influence the external world, or to expand its capability boundaries. The large language model can communicate with these functional interfaces through network communication protocols. These communication interfaces establish connections with data sources, such as... Figure 1 In this embodiment, functional interface 102_1 establishes a connection with data source 1, functional interfaces 102_2 and 102_N both establish connections with data source 2, and functional interface 102_N establishes a connection with data source n. A single functional interface can connect to a single data source or multiple data sources; the data sources can originate from external systems of the dialog system or be local to the dialog system itself; this embodiment does not limit the specific data sources in this application.

[0043] The aforementioned dialogue system 101 can be a casual conversation dialogue system, a task-oriented dialogue system, a question-and-answer dialogue system, or a hybrid dialogue system. In the task-oriented dialogue system scenario, for the resolution of pronouns, the user's pronouns are no longer limited to the text entities that appear in the past dialogue. In more cases, they may point to the internal structured information corresponding to a specific card in the previous round of replies. However, the large language model of the dialogue system cannot match the pronouns with the internal structured information, and thus the dialogue system cannot provide the reply content that the user needs.

[0044] To address the aforementioned issues, this application provides a method for confirming response content, applied to a dialogue system, such as... Figure 2 As shown, the process includes: S201: Determine at least one pronoun contained in the current input content, and the user requirement corresponding to the current input content.

[0045] For multi-turn text dialogue tasks, users will enter questions on the display interface provided by the dialogue system, such as searching for cameras near location A. The questions entered by users may include at least one pronoun. For example, the input "search for cameras near location A" contains the pronouns "location A" and "cameras".

[0046] In multi-turn dialogue scenarios, user input may be brief. For example, if a user queries location B, the dialogue system may not be able to accurately identify the user's needs if the original input is not semantically supplemented. Based on this, this application proposes a method for semantic supplementation.

[0047] In one possible implementation, semantic supplementation of user input is performed through the following method: The historical dialogues of a preset number of rounds and the latest input content are input into a pre-built content rewriting model to obtain semantically supplemented input content, which is then used as the current input content.

[0048] Specifically: The following operations are performed through the content rewrite model: The semantic recognition layer of the content rewriting model determines the semantic category of each keyword contained in the historical dialogue of the preset number of rounds. The decision layer of the content rewriting model determines target keywords from the plurality of keywords that have a different semantic category from the at least one pronoun. The semantic supplementation layer of the content rewriting model performs semantic supplementation on the latest input content based on the target keywords.

[0049] The aforementioned historical dialogues include historical input content and the history itself. These historical dialogues represent a preset number of rounds, such as a maximum of 5 rounds by default. It should be noted that the dialogue system can cache keywords obtained from parsed historical dialogues, allowing for direct retrieval if needed in the next round.

[0050] The aforementioned word meaning categories can be parts of speech, such as nouns, verbs, adjectives, etc., or they can be preset categories according to needs, such as those representing numerical ranges or device types. This application embodiment does not impose specific limitations. The identification of the meaning of keywords in historical dialogues can be performed through the word meaning recognition layer of the content rewriting model. This word meaning recognition layer can be trained using sample keywords and the word meaning categories to which the sample keywords belong.

[0051] For example, the historical input in the dialogue is: "Query cameras near location A," and the historical response is: "10 cameras within a 300-meter radius of location A have been found." The keywords in the historical dialogue are: "query," "location A," "nearby," "cameras," "300 meters," and "10." The latest input is: "What about location B?" This example shows that among the multiple keywords, "location A" has the same semantic category as the latest input. Therefore, when supplementing semantically, "query," "nearby," "cameras," "300 meters," and "10" are used to supplement the latest input.

[0052] It should be noted that, in order not to change the semantics of the latest input content, this application embodiment does not rewrite the original input content, but rather supplements the semantics. That is, while retaining the original content, it adds verbs, function words, adjectives, etc., to make the semantics of the input content clearer, thereby ensuring the accurate identification of user needs.

[0053] The above steps of selecting target keywords for semantic supplementation from multiple keywords are performed through the decision layer of the content rewriting model; the above steps of semantic supplementation of the latest input content are performed through the semantic supplementation layer of the content rewriting model.

[0054] In one possible implementation, to ensure the accuracy of the content rewriting model in semantically supplementing the input content, content rewriting prompts are further constructed using historical dialogues and the latest input content. These prompts instruct the content rewriting model to semantically supplement the latest input content.

[0055] Specifically, the above-mentioned content rewriting prompts can be implemented as follows: #Task Based on the user's current input and historical dialogue content, the system identifies contextual information, semantically reorganizes the current question, and generates a complete and clear new query.

[0056] #Recombination rules (also known as semantic supplementation rules)

[0057] 1. Identify key parameters (such as distance range, device type, etc.) contained in the system responses in historical dialogues. 2. When the current input contains omissions or references, automatically complete the valid parameters from the previous dialogue; 3. Preserve the user's original semantics and only perform necessary context completion; 4. The output format is a complete natural language question.

[0058] #Example

[0059] Example 1: Historical Dialogue: [ { "user":"Query the cameras in Square A." "assistant": "Eight security cameras within a 300-meter radius of Plaza A have been located." } ] Current input: What about the other side of Road B? Reconstructed output: View the monitoring points within 300 meters of Road B.

[0060] Example 2: Historical Dialogue: [ { "user":"Displays pedestrian flow statistics for Road A over the past hour." "assistant": "A heat map of pedestrian flow along Route A over the past hour has been generated." } ] Current input: What about switching to route C? Reconstructed output: Displays pedestrian flow statistics for Road C over the past hour.

[0061] Example 3: Historical Dialogue: [] Current input: Query the cameras in lane A.

[0062] Reconstruct the output: Query the cameras in lane A.

[0063] #Current Task

[0064] Historical Dialogue: {} Latest input content: {} In the above prompts, the "{}" below the "#current task" is a placeholder and needs to be replaced with the obtained historical dialogue. The "{}" below the latest input content is also a placeholder and needs to be replaced with the latest input content. The content rewriting prompt after replacing the placeholders can be input into the content rewriting model so that the content rewriting model can perform semantic supplementation according to the instructions in the prompts.

[0065] For example, to retrieve historical dialogues: "Query cameras near location A," the latest input is: "Query cameras near location B." Replacing the curly braces "{}" under "Query cameras near location A" with "Query cameras near location A," and replacing the curly braces "{}" under "Latest input," will generate a rewritten prompt for the latest input.

[0066] The content rewriting model described above can utilize multi-turn dialogue data from real-world scenarios to construct a training dataset during training. The data is labeled, and the original user input and corresponding semantically supplemented content are used as input-output pairs. For example: Input: [Historical Dialogue] User: Query cameras near location A. Response: 10 cameras within a 300-meter radius of location A have been found. [Current Input] Query cameras near location B. Output: Query cameras within a 300-meter radius of location B. The content rewriting model can be fine-tuned using low-rank adaptation or cue-based fine-tuning methods. The optimization objective is to minimize the cross-entropy loss between the output sequence and the labeled sequence. Parameters such as maximum generation length and repetition penalty can also be set to control output quality. This application embodiment does not specifically limit the measures taken during model training.

[0067] After obtaining the current input, the input is parsed to obtain at least one pronoun and the corresponding user requirement. This pronoun and user requirement can also be extracted and identified using a large language model.

[0068] Specifically, the intent recognition model segments the current input content into words, selecting words with noun and pronoun parts of speech as referents. Simultaneously, the current input content is converted into a semantic vector, which is then compared with multiple pre-defined demand vectors to obtain the similarity score for each demand. The demand with the highest similarity score is selected as the demand corresponding to the current input content. These pre-defined user demands are intents trained by the intent recognition model. The referent extraction task and the user demand recognition task can be performed using a single model or two separate models; no specific limitation is made here.

[0069] S202: Select the target functional interface from a plurality of pre-built functional interfaces whose name description information matches the user's requirements.

[0070] A functional interface, also known as a tool, can be used to map user-input plain text information to internal structured information. As a program module with functions such as querying and calculation, a functional interface possesses descriptive information. For example, the descriptive information of a functional interface may include name description information, input parameter description information, function description information, reference index description information, result description information, etc. This application embodiment does not specifically limit the content included in the descriptive information of the functional interface.

[0071] The description information of the above functional interfaces can be presented in a lightweight data interchange format (JSON), as follows: [ { "type": "function", "function": { "name": "Tool1", "description": "Description of the first tool", "parameters": { "type": "object", "properties": {}, "required": [] }, "outputIndexMappings": [ / / Relationship between tool output and index parameters] { "indexIdentifier": "Unique identifier for the index (required)", "fieldJsonPath": "Data source field (required), specifies a field from which values are retrieved during index mapping", "valueType": "Data type (optional), e.g., string" } ], "inputParamBindingIndex": [ / / Relationship between tool input parameters and index] { "paramKey": "Tool parameter name", "indexIdentifier": "Index identifier (required), indicating the mapping based on this index." } ] } }, The “name” and “description” fields mentioned above correspond to name description information; the “parameters” field mentioned above corresponds to input parameter description information; the “inputParamBindingIndex” field mentioned above corresponds to index description information; and the “outputIndexMappings” field mentioned above corresponds to result description information. For some functional interfaces, they may not include index description information or result description information. This application embodiment does not make specific limitations on this.

[0072] When selecting a suitable functional interface for the current input content, it is first necessary to clarify the user requirements, which are obtained through the embodiment in S201 above, and will not be repeated here.

[0073] In one possible implementation, the function interface (i.e., the target function interface) corresponding to the current input content is selected through the following method: Construct function interface selection prompts using the user requirements and the name description information of each function interface; The function interface selection prompts are input into a pre-built function interface selection model so that the function interface selection model determines the target function interface according to the function interface selection prompts. The function interface selection prompt also includes the task that the function interface selection model needs to complete, the matching rules for the function interface selection model to match the function interface, and the output requirements for the output information format of the function interface selection model. The matching rules are to match the user requirements and the name description information of each function interface, and use the matching results to determine the target function interface. When multiple target function interfaces are matched, the matched function interfaces are output in a preset order. When no target function interface is matched, the information indicating no matching result is output.

[0074] To ensure the accuracy of function interface selection, function interface selection prompts are constructed using user requirements and function interface descriptions. These prompts then guide the function interface selection model to select the target function interface.

[0075] Specifically, the prompts for selecting the above-mentioned functional interfaces can be implemented as follows: #Task You are a tool selection expert, and you need to identify the most suitable tool from the provided toolset based on the user's input question.

[0076] #Toolset Description

[0077] {}

[0078] #Matching Rules

[0079] 1. Determine its core tasks and areas of need.

[0080] 2. Check the `name` and `description` of each tool to determine whether its purpose best matches the user's needs.

[0081] 3. If multiple tools are potentially relevant, select the most suitable one; if multiple tools are indeed needed, sort the output by relevance.

[0082] 4. If the problem exceeds the scope of all tool functions, you must reply "No matching tool" and do not expand the functions yourself.

[0083] #Output Requirements

[0084] Only return the `name` of the matching tool.

[0085] A brief explanation of the matching reason is attached.

[0086] The output uses JSON format and includes the following fields: `"matched_tool_name"`: The name of the tool "reason": Matching reason Output example: { "matched_tool_name": "toolName", "reason": "The tool's functionality is completely consistent with the X task in the user's requirements." } #Start matching User requirements: {} In the above prompts, the "{}" below the "#toolset description" are placeholders and need to be replaced with the name description information of each functional interface. The "{}" after "user requirements" are also placeholders and need to be replaced with the user requirements parsed in S202. Input the functional interface selection prompts after replacing the placeholders into the functional interface selection model so that the functional interface selection model selects the target functional interface according to the instructions in the prompts.

[0087] In another possible implementation, the semantically supplemented input content and the name description information of each functional interface can be directly used to construct the functional interface selection prompts, as follows: #Toolset Description {} # Matching rules 1. Carefully read the user's questions and analyze their core tasks and areas of need.

[0088] 2. Check the `name` and `description` of each tool to determine whether its purpose best matches the user's needs.

[0089] 3. If multiple tools are potentially relevant, select the most suitable one; if multiple tools are indeed needed, sort the output by relevance.

[0090] 4. If the problem exceeds the scope of all tool functions, you must reply "No matching tool" and do not expand the functions yourself.

[0091] # Output Requirements

[0092] Only return the `name` of the matching tool.

[0093] A brief explanation of the matching reason is attached.

[0094] The output uses JSON format and includes the following fields: `"matched_tool_name"`: The name of the tool "reason": Matching reason Output example: { "matched_tool_name": "toolName", "reason": "The tool's functionality is completely consistent with the X task in the user's requirements." } #Start matching User issue: {} In other words, the user's needs for the current input content are extracted through the function interface selection model, and the target function interface is matched according to the user's needs. In the embodiments of this application, the function interface selection model can be used for both user needs identification and function interface selection, or only function interface selection can be performed; no specific limitation is made here.

[0095] In one possible implementation, if the function interface selection model determines that there are multiple function interfaces, the description information of the multiple function interfaces is displayed on the display interface so that the user can select the required function interface.

[0096] S203: Determine a target pronoun from the at least one pronoun that matches the input parameter description information of the target function interface.

[0097] After selecting the target function interface according to user needs, it is necessary to further extract input parameters from the current input content as the input of the target function interface.

[0098] In one possible implementation, the input parameters are determined through the following methods: Construct parameter extraction prompts using the description information of each pronoun and the input parameters of the target function interface; The parameter extraction prompts are input into a pre-built parameter extraction model so that the parameter extraction model selects the target pronoun according to the parameter extraction prompts. The parameter extraction prompts also include the tasks that the parameter extraction model needs to complete, the selection rules for the parameter extraction model to select pronouns, and the output requirements for the output information format of the parameter extraction model. The selection rules are to compare the parameter names contained in the input parameter description information with the semantics of each pronoun, and determine the target pronoun according to the comparison results.

[0099] To ensure the accuracy of the extracted input parameters, parameter extraction prompts are constructed using the current input content and the input parameter description information of the target function interface. These prompts are then used to instruct the parameter extraction model to extract parameters from the input content, which can be semantically supplemented.

[0100] Specifically, the above parameters can be used to extract prompt words as shown in the following example: #Tool Description {} #User Input {} #Output Requirements 1. Output only the JSON object containing the parameters, without any additional text descriptions.

[0101] 2. The structure of the JSON should strictly conform to the 'parameters' definition in the tool description, including attribute names and type requirements.

[0102] 3. If the user does not provide a value for a parameter, the parameter will be set to null or an empty string.

[0103] 4. Do not make any additional inferences about the parameters; only extract the information explicitly provided by the user.

[0104] 5. Ensure the order of JSON keys matches the order defined in the tool description.

[0105] 6. If no parameters are specified, only the tool name will be output, and the parameters will be empty.

[0106] #Example

[0107] User input: "View surveillance locations within 300 meters of location C"

[0108] Output: { "toolName": "queryCamera", "parameters": { "cameraName": null, "locationName": "locationC", "locationRange": 300 } } The curly braces "{}" below the above #toolset description are placeholders and need to be replaced with the input parameter description information of the target function interface or the complete description information. Similarly, the curly braces "{}" below "User Input" need to be replaced with semantically supplemented input content. The parameter extraction model, after replacing the placeholders, extracts the input parameters from the user input content, enabling the model to extract parameters and output results according to output requirements.

[0109] For example, the pronouns include: location A, camera, nearby. The parameter name in the input parameter description information is "location name". Location A in the pronouns represents the location. Therefore, location A is used as the input parameter of the target function interface.

[0110] In another possible implementation, parameter extraction prompts can be constructed directly using the current input content and the input parameter description information of the target function interface, so that the parameter extraction model matches the input content and the input parameter description information, and extracts the required input parameters from the input content. This application does not specifically limit this aspect.

[0111] In one possible implementation, if the input parameters of the target functional interface are not obtained through the parameter extraction model, it may be because there is no target pronoun among the pronouns that matches the parameter name contained in the input parameter description information. In this case, the content referred to by the pronoun can be determined based on the pronoun index description information of the target functional interface, and the determined content can be used as the parameter of the target functional interface.

[0112] For example, if the user's current input is: "Play the video from the first camera", no input parameters for the target function interface can be extracted from this input. This is because to play the video from the camera, it is necessary to find the camera's ID from the corresponding data source, and then find the video corresponding to that ID. Therefore, "the first camera" refers to the camera's ID, which is not reflected in the current input. In this case, it is necessary to further determine the camera ID referred to by "the first camera" based on the reference index description information of the target function interface, and use the camera ID as the input parameter.

[0113] Specifically, this involves constructing a referential index prompt word using the referential index description information of each referential noun and the target functional interface; The reference index prompt is input into a pre-built reference noun recognition model so that the reference noun recognition model determines the input parameters of the target function interface according to the reference index prompt; The reference index prompt also includes the tasks that the reference noun recognition model needs to complete, the recognition rules for instructing the reference noun recognition model to recognize each reference noun, and the output requirements for instructing the reference noun recognition model to output information in a specific format.

[0114] To ensure the accuracy of parameter extraction, a referential index prompt is constructed using the referential index description information of each pronoun and the target function interface. Alternatively, the current input content can be used to construct the referential index prompt; no specific limitation is made here. The referential index prompt can be implemented as follows: #Task Based on the given "reference index parameter definition", extract the corresponding index from the user's input question and return JSON that conforms to the parameter format.

[0115] # Extraction Rules

[0116] 1. Locate the words in the user input that refer to serial numbers, for example: "The first": [1] "The second one": [2] The first three: [1,2,3] The first two: [1,2] "The tenth":

[10] 2. If the "first N" appear, generate an array of indices from 1 to N.

[0117] 3. If multiple independent references appear (such as "the first and the third"), extract them into an array [1,3].

[0118] 4. If the last three appear, extract them as [-1, -2, -3].

[0119] 5. If "all of the above" or "all" appear, extract them as an empty array [].

[0120] 6. Numbers can come from Chinese numerals (first, second, third), Arabic numerals (first, second), scope descriptions (first five), or a combination of these.

[0121] 7. If no match is found, return null.

[0122] #Output Format

[0123] Output objects that strictly conform to the JSON structure, with keys specified in the parameter definition and values being an array of extracted ordinal numbers. Do not output any unnecessary explanations or text.

[0124] #Refers to the definition of index parameters

[0125] {}

[0126] #User input {}

[0127] Play the video from the first camera.

[0128] The "{}" corresponding to the above-mentioned # reference index parameter definition are placeholders and need to be replaced with the reference index description information of the target function interface. The "{}" corresponding to the above-mentioned # user input are also placeholders and need to be replaced with each reference noun or the current input content. The reference noun recognition model will resolve the reference of unclear reference nouns based on the prompt words and extract the input parameters of the target function interface from each reference noun.

[0129] S204: Use the target pronoun as the input parameter of the target function interface, call the target function interface, obtain a result that matches the function description information of the target function interface, and use the result as the response content corresponding to the current input content.

[0130] In one possible implementation, the dialogue system can pass the input parameters (i.e., the obtained target pronoun) and the call request to the target function interface through a network communication protocol, and then call the target function interface (also known as the execution tool).

[0131] In one possible implementation, invoking the target function interface to obtain a result that matches the function description information of the target function interface includes: Query target data related to the input parameters from the target data source connected to the target function interface; The target data is processed according to the functional description information of the target function interface to obtain the result.

[0132] Specifically, query the target field from the target data source that matches the parameter category to which the input parameter belongs; Target data that meets the filtering conditions of the input parameters is filtered from the data corresponding to the field, and the filtering conditions are obtained from the parameter description information of the target function interface; Determine the data processing logic contained in the functional description information of the target functional interface, wherein the processing logic includes at least one of data verification, data format conversion, and data calculation. The target data is processed according to the processing logic to obtain the result.

[0133] The parameter categories of the aforementioned input parameters can be preset categories. For example, the target pronouns "Subway Station A" and "City A" belong to the location category; the target pronouns "Camera" and "Mobile Phone" belong to the device category; and the target pronouns "Nearby" and "Surrounding Area" belong to the range category. Furthermore, the parameter category can also be the semantic category of the input parameters. For example, the target pronoun "Subway Station A" belongs to the "Subway Station A" category, and the second camera belongs to the "Camera" category. This application embodiment does not specifically limit the division of parameter categories. Parameter categories are used to identify specific datasets or data tables.

[0134] The above filtering conditions can be obtained from the parameter description information of the target function interface; that is, the filtering conditions can be found in the parameter description information. For example, if the user inputs a query for cameras on Road B, and there may be many cameras on Road B, and the parameter description information specifies a query range of 100 meters, then the data needs to be filtered according to this query range.

[0135] The above functional description information may include data processing logic, such as data validation, data format conversion, data calculation, data aggregation, etc., which are not specifically limited here. It may also not include data processing logic and only include query function; or it may only include processing logic and not query function (in which case it may only process the input parameters). In this case, the embodiments of this application do not make specific limitations.

[0136] When the target function interface outputs results, it can output them according to the format specified in the result description information of the target function interface. After the dialog system receives the result after the target function interface is executed, it can further refine it before displaying it to the user. For example, if the user inputs to query cameras near location A, in addition to outputting the camera number, location, and other information according to the result description information of the target function interface, the dialog system can also generate a reply text, such as: "Information on 10 cameras within 300 meters of location A has been found."

[0137] The dialogue system executes steps S201 to S204 as described above on the user's input, and then uses the execution result as the response content, which is displayed to the user through the display interface. The user can receive the response through the display interface and start the next round of dialogue.

[0138] However, there may be situations where users are not satisfied with the response. For example, if a user inputs "query cameras near location A", but the dialogue system outputs "base stations near location A", the user can provide feedback to the dialogue system such as "response content is incorrect". In this case, the dialogue system can display the output of each model in S201 to S203 above to the user so that the user can confirm which task has an error and re-execute the erroneous task.

[0139] It should be noted that the output content of each model in S201~S204 above, the results obtained by calling the function interface, the response content of the dialogue system and the historical dialogue are all stored in the dialogue management module of the dialogue system, so that users can trace the process of the dialogue system processing tasks and generating responses.

[0140] This application's embodiments divide natural language processing tasks into multiple sub-tasks, such as semantic supplementation, intent recognition, function interface selection, parameter extraction, and index referencing mapping. Each sub-task is executed through its corresponding model and outputs the execution results. Compared with the existing technology's approach of "only performing multiple tasks such as referencing resolution, user intent recognition, and response content generation by calling a large language model once before outputting the final result to the user," this approach can precisely control and optimize the execution process of each task.

[0141] Furthermore, using different models to perform different tasks avoids the need to call high-cost, parameter-intensive models for all tasks. For example, tasks such as semantic completion and intent recognition are relatively simple, so models with fewer parameters can be trained. Models can also be fine-tuned for individual tasks to improve performance. Conversely, tasks such as functional interface recognition and parameter extraction are relatively complex, so models with more parameters can be trained to ensure accuracy. The decomposition of the "dialogue understanding task" in this application's embodiments achieves the same effect as using a large language model throughout, while reducing the cost and time consumption of inference hardware. Moreover, for prompt words, optimization can be performed only for prompt words in a specific task, improving the execution accuracy of each task.

[0142] This application embodiment also enables the dialogue system to map the user's pronouns to internal structured information by calling functional interfaces, instead of simply performing referential resolution at the "plain text" level, thereby providing the user with the desired response content.

[0143] The following is based on Figure 3 This application provides a detailed description of a method for confirming response content.

[0144] S301: Determine the description information for multiple functional interfaces. The description information here is complete; the following two functional interfaces are used as examples.

[0145] Description of Function Interface 1: [ { "type": "function", "function": { "name": "queryCamera", "description": "Search camera information by camera name and location name", "parameters": { "type": "object", "properties": { "cameraName": { "type": "string", "description": "The name of the camera to be queried" }, "locationName": { "type": "string", "description": "Identify the location name of the camera" }, "locationRange": { "type": "number", "description": "The geographic radius of the query, in meters, for example: 300 means 300 meters." } }, "required": [] }, "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "fieldJsonPath": "result[ ].cameraCode", "valueType": "string" } ] } } The above "name": "queryCamera" and "description": "Query camera information based on camera name and location name" are the name description information of function interface 1; the content corresponding to "parameters": {} includes the input parameter description information and function description information of function interface 1. Specifically: the fields cameraName, locationName, and locationRange represent the input parameter description information of function interface 1, and the content contained under these fields (such as "description": "name of the camera to be queried", "description": "name of the location where the camera is located", "description": "radius of the geographic range to be queried, in meters, for example: 300 means 300 meters") is the function description information of function interface 1.

[0146] The description information for Function Interface 2 is as follows: { "type": "function", "function": { "name": "playVideo", "description": "Plays video content from the corresponding camera based on the provided camera encoding". "parameters": { "type": "object", "properties": { "cameraCodes": { "type": "array", "description": "A list of camera encodings used to identify video source cameras", "items": { "type": "string" } }, "cameraCodeIndexs": { "type": "array", "description": "Camera Information Object Reference Index", "items": { "type": "number" }} }, "required": [ "cameraCodes" ] }, "inputParamBindingIndex": [ { "paramKey": "cameraCodes", "indexIdentifier": "cameraCodeIndexs" } ] } } ] The above "name":"playVideo" and "description":"Play the video content of the corresponding camera according to the provided camera code" are the name description information of function interface 2; the content corresponding to "parameters": {} includes the input parameter description information and the function description information of function interface 2. Specifically: the fields type and items represent the input parameter description information of function interface 2, and the content contained under the cameraCodes field (such as: "description": "List of camera codes used to identify video source") is the function description information of function interface 2. The content corresponding to "inputParamBindingIndex":[ ] is the reference index description information of function interface 2, which can help extract the input parameters of function interface.

[0147] S302: Based on the latest input content and the historical dialogue content of a preset number of rounds, construct content rewriting prompts and call the content rewriting model to semantically supplement the latest input content based on the content rewriting prompts.

[0148] Examples of multi-turn dialogues are as follows: First round of input: Query the cameras near location A.

[0149] Second round of input: Search for locations near B.

[0150] Third round of input: Play the video from the first camera.

[0151] Taking the first round of input as an example, the latest input is: query cameras near location A; historical dialogues are [], where [] is an empty array, indicating that the historical dialogues are empty.

[0152] Based on the latest input and historical dialogue, the placeholders in the prompt template are replaced to obtain the content rewrite prompt as follows: #Task Based on the user's current input and historical dialogue content, the system identifies contextual information, semantically reorganizes the current question, and generates a complete and clear new query.

[0153] #Reorganization Rules

[0154] 1. Identify key parameters (such as distance range, device type, etc.) contained in the system responses in historical dialogues. 2. When the current input contains omissions or references, automatically complete the valid parameters from the previous dialogue; 3. Preserve the user's original semantics and only perform necessary context completion; 4. The output format is a complete natural language question.

[0155] #Example

[0156] Example 1: Historical Dialogue: [ { "user":"Query the cameras in Square A." "assistant": "Eight security cameras within a 300-meter radius of Plaza A have been located." } ] Current input: What about the other side of Road B? Reconstructed output: View the monitoring points within 300 meters of Road B.

[0157] Example 2: Historical Dialogue: [ { "user":"Displays pedestrian flow statistics for Road A over the past hour." "assistant": "A heat map of pedestrian flow along Route A over the past hour has been generated." } ] Current input: What about switching to route C? Reconstructed output: Displays pedestrian flow statistics for Road C over the past hour.

[0158] Example 3: Historical Dialogue: [] Current input: Query the cameras in lane A.

[0159] Reconstruct the output: Query the cameras in lane A.

[0160] #Current Task

[0161] Historical Dialogue: [] (This is the location of a placeholder) Latest input content: Find the cameras near location A (placeholder location). In the above content rewriting prompts, the part below "#current task" is a placeholder, indicating the portion that needs to be replaced. The placeholders are replaced with historical dialogue and the latest input content. The resulting content rewriting prompts are then input into the content rewriting model, and the semantically supplemented input is output as: "Query cameras near location A." Since the historical dialogue is empty, the semantically supplemented result is consistent with the original input content.

[0162] S303: Based on the supplemented input content and the name and description information of each functional interface, construct functional interface selection prompts and call the functional interface selection model to select the target functional interface based on the functional interface selection prompts.

[0163] Using the supplemented input: Query the cameras near location A, and the names and descriptions of each functional interface, replace the placeholders in the prompt template to obtain the functional interface selection prompts as follows: #Task You are a tool selection expert, and you need to identify the most suitable tool from the provided toolset based on the user's input question.

[0164] # Toolset Description (Placeholders are located below) [

[0166] {

[0167] "type": "function", "function": { "name": "queryCamera", "description": "Search camera information by camera name and location name" } (This is the name and description information for Function Interface 1) { "type": "function", "function": { "name": "playVideo", "description": "Plays video content from the corresponding camera based on the provided camera encoding." } (This is the name and description information for Function Interface 2) ] #Matching Rules 1. Carefully read the user's questions and analyze their core tasks and areas of need.

[0168] 2. Check the `name` and `description` of each tool to determine whether its purpose best matches the user's needs.

[0169] 3. If multiple tools are potentially relevant, select the most suitable one; if multiple tools are indeed needed, sort the output by relevance.

[0170] 4. If the problem exceeds the scope of all tool functions, you must reply "No matching tool" and do not expand the functions yourself.

[0171] # Output Requirements

[0172] Only return the `name` of the matching tool.

[0173] A brief explanation of the matching reason is attached.

[0174] The output uses JSON format and includes the following fields: "matched_tool_name": The name of the tool. "reason": Matching reason Output example: { "matched_tool_name":"toolName", "reason": "The tool's functionality is completely consistent with the X task in the user's requirements." } #Start matching User question: Find the cameras near location A. (Placeholder here) Replace the placeholders with the completed input content and the name and description information of each functional interface. Then, input the obtained functional interface selection prompts into the functional interface selection model, and output the following results according to the output requirements in the functional interface selection prompts: { "matched_tool_name": "queryCamera", "reason": "This tool's function is to query camera information based on camera name and location name, which is completely consistent with the user's need to find cameras near a specific location." } As can be seen from the above, the name of the matched functional interface is queryCamera.

[0175] S304: Construct parameter extraction prompts using the input parameter description information or complete description information of the target function interface and the semantically supplemented input content, and call the parameter extraction model to extract input parameters from the semantically supplemented input content based on the parameter extraction prompts.

[0176] Using the complete description information of the queryCamera functional interface, and the semantically supplemented input content, placeholders in the prompt words are replaced to obtain the parameter extraction prompt words as follows: #Task You are a parameter extraction assistant whose task is to identify parameters in a natural language question input by a user, based on a complete description of the given tool, and output a JSON object as the parameter result.

[0177] #Tool Description (Placeholders are located below)

[0178] {

[0179] "type": "function", "function": { "name": "queryCamera", "description": "Search camera information by camera name and location name", "parameters": { "type": "object", "properties": { "cameraName":{ "type":"string", "description": "The name of the camera to be queried" }, "locationName": { "type":"string", "description": "Location name of the camera" }, "locationRange": { "type":"number", "description": "The geographic radius of the query, in meters. For example, 300 means 300 meters." } }, "required": []} } } #User Input Find the cameras near location A. (Placeholder) #Output Requirements 1. Output only the JSON object containing the parameters, without any additional text descriptions.

[0180] 2. The structure of the JSON should strictly conform to the 'parameters' definition in the tool description, including attribute names and type requirements.

[0181] 3. If the user does not provide a value for a parameter, the parameter will be set to null or an empty string.

[0182] 4. Do not make any additional inferences about the parameters; only extract the information explicitly provided by the user.

[0183] 5. Ensure the order of JSON keys matches the order defined in the tool description.

[0184] 6. If no parameters are specified, only the tool name will be output, and the parameters will be empty.

[0185] #Example

[0186] User input: "View surveillance locations within 300 meters of location C"

[0187] Output: { "toolName": "queryCamera", "parameters": { "cameraName": null, "locationName": "locationC", "locationRange": 300 } } Replace the placeholders with the complete description of the queryCamera function interface and the supplemented input content. Then, input the obtained parameter extraction prompts into the parameter extraction model and output the following results according to the output requirements of the parameter extraction prompts: { "toolName": "queryCamera", "parameters": { "cameraName": null, "locationName": "Location A", "locationRange": 300 } } S305: Call the target function interface.

[0188] Use the result output in S304 as the input parameter of the function interface, call queryCamera, and obtain the following results. The following results are output in the format of the result description information in the target function interface: { "toolExcuteResult": { "cameraCode": "12800288265668811", "cameraName": "East Gate of A", "gpsX": "120.160194", "gpsY": "30.18822", "status": 1 }, { "cameraCode": "12800288265668813", "cameraName": "West Gate of A", "gpsX": "120.165445", "gpsY": "30.187191", "status": 1 }, { "cameraCode": "12909520803137728", "cameraName": "Lobby of Building A1", "gpsX": "120.165131", "gpsY": "30.188221", "status": 0 }, { "cameraCode": "12800288265734337", "cameraName": "South Street Corner", "gpsX": "120.162518", "gpsY": "30.188298", "status": 1 }, { "cameraCode": "13303502522747080", "cameraName": "North Parking Lot", "gpsX": "120.162173", "gpsY": "30.193861", "status": 0 }, { "cameraCode": "11315618917451969", "cameraName": "East Parking Lot / Loading / Unloading Area", "gpsX": "120.165531", "gpsY": "30.187265", "status": 0 }, { "cameraCode": "13383930556713153", "cameraName": "Inner pedestrian street / shops", "gpsX": "120.159332", "gpsY": "30.193634", "status": 0 }, { "cameraCode": "12800288265734336", "cameraName": "Underground parking lot entrance", "gpsX": "120.166908", "gpsY": "30.188751", "status": 1 }, { "cameraCode": "13383930556713154", "cameraName": "Basement B2 Entrance", "gpsX": "120.165359", "gpsY": "30.191864", "status": 0 }, { "cameraCode": "12268814835714245", "cameraName": "West end of the third-floor dining area", "gpsX": "120.166004", "gpsY": "30.18964", "status": 0 } ] } The above results can be displayed to the user as a response. Furthermore, to facilitate responses to subsequent user input, the above results are transformed according to the "outputIndexMappings" description information in the queryCamera function interface and stored in the dialogue management module. The transformed results are as follows: { "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "12800288265668811", "12800288265668813", "12909520803137728", "12800288265734337", "13303502522747080", "11315618917451969", "13383930556713153", "12800288265734336", "13383930556713154", "12268814835714245" ] } ] } S306: Display the reply content through the display interface.

[0189] The results from S305 above are displayed to the user as a reply. The displayed content may be only the result before conversion, only the result after conversion, or both the result before and after conversion may be displayed simultaneously. In addition, a text reply may be generated, such as "10 cameras within a 300-meter radius of location A have been found". This application embodiment does not specifically limit the scope of the reply.

[0190] In one possible implementation, the output content of the content rewriting module in S302 and the result after conversion in S305 are stored in the dialogue management module. The content stored in the dialogue management module is as follows. [

[0192] {

[0193] User: "Looking up CCTV cameras near location A." "Rewrite": "Query the cameras near location A." Reply: "Information on 10 cameras within a 300-meter radius of location A has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "12800288265668811", "12800288265668813", "12909520803137728", "12800288265734337", "13303502522747080", "11315618917451969", "13383930556713153", "12800288265734336", "13383930556713154", "12268814835714245" ] } ] } ] For the second round of input: Searching for locations near B, the same steps S302~S306 are executed, with the specific process as follows: Step 1: The latest input is: query for locations near B; the historical dialogue is the content of the first round of dialogue stored in the dialogue management module (that is, the content in S306 above), as follows: [ { User: "Looking up CCTV cameras near location A." "Rewrite": "Query the cameras near location A." Reply: "Information on 10 cameras within a 300-meter radius of location A has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "12800288265668811", "12800288265668813", "12909520803137728", "12800288265734337", "13303502522747080", "11315618917451969", "13383930556713153", "12800288265734336", "13383930556713154", "12268814835714245" ] } ] } ] Based on the latest input and historical dialogue, the placeholders in the prompt template are replaced to obtain the content rewrite prompt as follows: #Task Based on the user's current input and historical dialogue content, the system identifies contextual information, semantically reorganizes the current question, and generates a complete and clear new query.

[0194] #Reorganization Rules

[0195] 1. Identify key parameters (such as distance range, device type, etc.) contained in the system responses in historical dialogues. 2. When the current input contains omissions or references, automatically complete the valid parameters from the previous dialogue; 3. Preserve the user's original semantics and only perform necessary context completion; 4. The output format is a complete natural language question.

[0196] #Example

[0197] Example 1: Historical Dialogue: [ { "user":"Query the cameras in Square A." "assistant": "Eight security cameras within a 300-meter radius of Plaza A have been located." } ] Current input: What about the other side of Road B? Reconstructed output: View the monitoring points within 300 meters of Road B.

[0198] Example 2: Historical Dialogue: [ { "user":"Displays pedestrian flow statistics for Road A over the past hour." "assistant": "A heat map of pedestrian flow along Route A over the past hour has been generated." } ] Current input: What about switching to route C? Reconstructed output: Displays pedestrian flow statistics for Road C over the past hour.

[0199] Example 3: Historical Dialogue: [] Current input: Query the cameras in lane A.

[0200] Reconstruct the output: Query the cameras in lane A.

[0201] #Current task (The following are placeholder locations)

[0202] Historical Dialogue: [ { User: "Looking up CCTV cameras near location A." "Rewrite": "Query the cameras near location A." Reply: "Information on 10 cameras within a 300-meter radius of location A has been found." } ] Latest input content: Next, search for the area near lowland B (this is a placeholder location). In the above content rewriting prompts, the part below #current task is the placeholder, which is the part that needs to be replaced. Replace the placeholder with the historical dialogue and the latest input content. Then, input the obtained content rewriting prompts into the content rewriting model. The output, after semantic supplementation, is: Query the cameras within 300 meters of location B.

[0203] Step 2: Using the supplemented input: Query the cameras within 300 meters of location B, and the names and descriptions of each function interface. Replace the placeholders in the prompt template to obtain the function interface selection prompts as follows: #Task You are a tool selection expert, and you need to identify the most suitable tool from the provided toolset based on the user's input question.

[0204] # Toolset Description (Placeholders are located below) [

[0206] {

[0207] "type": "function", "function": { "name": "queryCamera", "description": "Search camera information by camera name and location name" } }, { "type": "function", "function": { "name": "playVideo", "description": "Plays video content from the corresponding camera based on the provided camera encoding." } } ] # Matching rules 1. Carefully read the user's questions and analyze their core tasks and areas of need.

[0208] 2. Check the `name` and `description` of each tool to determine whether its purpose best matches the user's needs.

[0209] 3. If multiple tools are potentially relevant, select the most suitable one; if multiple tools are indeed needed, sort the output by relevance.

[0210] 4. If the problem exceeds the scope of all tool functions, you must reply "No matching tool" and do not expand the functions yourself.

[0211] # Output Requirements

[0212] Only return the `name` of the matching tool.

[0213] A brief explanation of the matching reason is attached.

[0214] The output uses JSON format and includes the following fields: `"matched_tool_name"`: The name of the tool "reason": Matching reason Output example: { "matched_tool_name": "toolName", "reason": "The tool's functionality is completely consistent with the X task in the user's requirements." } #Start matching User question: Find cameras within 300 meters of location B. (Placeholder) Replace the placeholders with the completed input content and the name and description information of each functional interface. Then, input the obtained functional interface selection prompts into the functional interface selection model, and output the following results according to the output requirements in the functional interface selection prompts: { "matched_tool_name": "queryCamera", "reason": "This tool's function is to query camera information based on camera name and location name, which is completely consistent with the user's need to find cameras near a specific location." } As can be seen from the above, the name of the matched functional interface is queryCamera.

[0215] Step 3: Using the complete description information of the queryCamera function interface, and the supplemented input content, replace the placeholders in the prompt words to obtain the parameters and extract the prompt words as follows: #Task You are a parameter extraction assistant whose task is to identify parameters in a natural language question input by a user, based on a complete description of the given tool, and output a JSON object as the parameter result.

[0216] #Tool Description (Placeholders are located below)

[0217] {

[0218] "type": "function", "function": { "name": "queryCamera", "description": "Search camera information by camera name and location name", "parameters": { "type": "object", "properties": { "cameraName":{ "type":"string", "description": "The name of the camera to be queried" }, "locationName": { "type":"string", "description": "Location name of the camera" }, "locationRange": { "type":"number", "description": "The geographic radius of the query, in meters. For example, 300 means 300 meters." } }, "required": [] } } } #User Input Find the cameras within 300 meters of location B (placeholder location). #Output Requirements 1. Output only the JSON object containing the parameters, without any additional text descriptions.

[0219] 2. The structure of the JSON should strictly conform to the 'parameters' definition in the tool description, including attribute names and type requirements.

[0220] 3. If the user does not provide a value for a parameter, the parameter will be set to null or an empty string.

[0221] 4. Do not make any additional inferences about the parameters; only extract the information explicitly provided by the user.

[0222] 5. Ensure the order of JSON keys matches the order defined in the tool description.

[0223] 6. If no parameters are specified, only the tool name will be output, and the parameters will be empty.

[0224] #Example

[0225] User input: "View surveillance locations within 300 meters of location C"

[0226] Output: { "toolName": "queryCamera", "parameters": { "cameraName": null, "locationName": "locationC", "locationRange": 300 } } Replace the placeholders with the complete description of the queryCamera function interface and the supplemented input content. Then, input the obtained parameter extraction prompts into the parameter extraction model and output the following results according to the output requirements of the parameter extraction prompts: { "parameters": { "cameraName": null, "locationName": "locationB", "locationRange": 300 } } Step 4: Using the output from Step 3 as the input parameter for the queryCamera function interface, call queryCamera to obtain the following result: { "toolExcuteResult": [ { "cameraCode": "14800288265668811", "cameraName": "Entrance Channel A", "gpsX": "120.160194", "gpsY": "30.18822", "status": 1 }, { "cameraCode": "14800288265668813", "cameraName": "Escalator B", "gpsX": "120.165445", "gpsY": "30.187191", "status": 1 }, { "cameraCode": "14909520803137728", "cameraName": "Ticket vending machine area at Exit C", "gpsX": "120.165131", "gpsY": "30.188221", "status": 0 }, { "cameraCode": "14800288265734337", "cameraName": "D Exit Gate", "gpsX": "120.162518", "gpsY": "30.188298", "status": 1 }, { "cameraCode": "14303502522747080", "cameraName": "Service Desk in the Central Hall", "gpsX": "120.162173", "gpsY": "30.193861", "status": 0 }, { "cameraCode": "14315618917451969", "cameraName": "South Bus Stop Position 1", "gpsX": "120.165531", "gpsY": "30.187265", "status": 0 }, { "cameraCode": "14383930556713153", "cameraName": "North Shared Bike Area", "gpsX": "120.159332", "gpsY": "30.193634", "status": 0 }, { "cameraCode": "14800288265734336", "cameraName": "East Taxi Waiting Point", "gpsX": "120.166908", "gpsY": "30.188751", "status": 1 }, { "cameraCode": "14383930556713154", "cameraName": "West Side Zebra Crossing No. 1", "gpsX": "120.165359", "gpsY": "30.191864", "status": 0 }, { "cameraCode": "14268814835714245", "cameraName": "Northeast Office Building Entrance", "gpsX": "120.166004", "gpsY": "30.18964", "status": 0 } ] } The above results can be displayed to the user as a response. Furthermore, to facilitate responses to subsequent user input, the above results are transformed according to the "outputIndexMappings" description information in the queryCamera function interface and stored in the dialogue management module. The transformed results are as follows: { "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "14800288265668811", "14800288265668813", "14909520803137728", "14800288265734337", "14303502522747080", "14315618917451969", "14383930556713153", "14800288265734336", "14383930556713154", "14268814835714245" ] } ] } Step 5: Display the results from Step 4 above as the reply to the user. The displayed content may be only the result before conversion, only the result after conversion, or both the result before and after conversion may be displayed simultaneously. In addition, a text reply may be generated, such as "10 cameras within a 300-meter radius of location B have been found". This application embodiment does not specifically limit the scope of the reply.

[0227] In one possible implementation, the output of the content rewriting module in the first step and the transformed result in the fourth step are stored in the dialogue management module. The content stored in the dialogue management module is as follows. [

[0229] {

[0230] User: "Looking up CCTV cameras near location A." "Rewrite": "Query the cameras near location A." Reply: "Information on 10 cameras within a 500-meter radius of location A has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "12800288265668811", "12800288265668813", "12909520803137728", "12800288265734337", "13303502522747080", "11315618917451969", "13383930556713153", "12800288265734336", "13383930556713154", "12268814835714245" ] } ] }, { User: "Search for locations near B." "Rewrite": "Query cameras within 300 meters of location B", Reply: "Information on 10 cameras within a 300-meter radius of location B has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "14800288265668811", "14800288265668813", "14909520803137728", "14800288265734337", "14303502522747080", "14315618917451969", "14383930556713153", "14800288265734336", "14383930556713154", "14268814835714245" ] } ] } ] For the third round of input: playing the video from the first camera, the same steps S302~S306 are executed as described above, with the specific process as follows: Step 1: The latest input is: playing the video from the first camera; the historical dialogue is the content stored in the dialogue management module for the first and second rounds of dialogue, as follows: [ { User: "Looking up CCTV cameras near location A." "Rewrite": "Query the cameras near location A." Reply: "Information on 10 cameras within a 300-meter radius of location A has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "12800288265668811", "12800288265668813", "12909520803137728", "12800288265734337", "13303502522747080", "11315618917451969", "13383930556713153", "12800288265734336", "13383930556713154", "12268814835714245" ]} ] }, { User: "Check locations near B." "Rewrite": "Query cameras within a 300-meter radius of location B." Reply: "Information on 10 cameras within a 300-meter radius of location B has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "14800288265668811", "14800288265668813", "14909520803137728", "14800288265734337", "14303502522747080", "14315618917451969", "14383930556713153", "14800288265734336", "14383930556713154", "14268814835714245" ] } ] } ] Based on the latest input and historical dialogue, the placeholders in the prompt template are replaced to obtain the content rewrite prompt as follows: #Task Based on the user's current input and historical dialogue content, the system identifies contextual information, semantically reorganizes the current question, and generates a complete and clear new query.

[0231] #Reorganization Rules

[0232] 1. Identify key parameters (such as distance range, device type, etc.) contained in the system responses in historical dialogues. 2. When the current input contains omissions or references, automatically complete the valid parameters from the previous dialogue; 3. Preserve the user's original semantics and only perform necessary context completion; 4. The output format is a complete natural language question.

[0233] #Example

[0234] Example 1: Historical Dialogue: [ { "user":"Query the cameras in Square A." "assistant": "Eight security cameras within a 300-meter radius of Plaza A have been located." } ] Current input: What about the other side of Road B? Reconstructed output: View the monitoring points within 300 meters of Road B.

[0235] Example 2: Historical Dialogue: [ { "user":"Displays pedestrian flow statistics for Road A over the past hour." "assistant": "A heat map of pedestrian flow along Route A over the past hour has been generated." } ] Current input: What about switching to route C? Reconstructed output: Displays pedestrian flow statistics for Road C over the past hour.

[0236] Example 3: Historical Dialogue: [] Current input: Query the cameras in lane A.

[0237] Reconstruct the output: Query the cameras in lane A.

[0238] #Current task (The following are placeholder locations)

[0239] Historical Dialogue: [ { User: "Looking up CCTV cameras near location A." "Rewrite": "Query the cameras near location A." Reply: "Information on 10 cameras within a 300-meter radius of location A has been found." }, ] [ { User: "Check locations near B." "Rewrite": "Query cameras within a 300-meter radius of location B." Reply: "Information on 10 cameras within a 300-meter radius of location B has been found." } ] Latest input content: Play the video from the first camera (placeholder location). In the above content rewriting prompts, the part below #current task is the placeholder, which is the part that needs to be replaced. Replace the placeholder with the historical dialogue and the latest input content, and then input the obtained content rewriting prompts into the content rewriting model. The output, after semantic supplementation, is: Play the video from the first camera.

[0240] Step 2: Using the supplemented input: Play the video from the first camera, and the name and description information of each function interface, replace the placeholders in the prompt template to obtain the function interface selection prompts as follows: #Task You are a tool selection expert, and you need to identify the most suitable tool from the provided toolset based on the user's input question.

[0241] # Toolset Description (Placeholders are located below) [

[0243] {

[0244] "type": "function", "function": { "name": "queryCamera", "description": "Search camera information by camera name and location name" } }, { "type": "function", "function": { "name": "playVideo", "description": "Plays video content from the corresponding camera based on the provided camera encoding." } } ] #Matching Rules 1. Carefully read the user's questions and analyze their core tasks and areas of need.

[0245] 2. Check the `name` and `description` of each tool to determine whether its purpose best matches the user's needs.

[0246] 3. If multiple tools are potentially relevant, select the most suitable one; if multiple tools are indeed needed, sort the output by relevance.

[0247] 4. If the problem exceeds the scope of all tool functions, you must reply "No matching tool" and do not expand the functions yourself.

[0248] #Output Requirements

[0249] Only return the `name` of the matching tool.

[0250] A brief explanation of the matching reason is attached.

[0251] The output uses JSON format and includes the following fields: `"matched_tool_name"`: The name of the tool "reason": the reason for the match Output example: { "matched_tool_name":"toolName", "reason": "The tool's functionality is completely consistent with the X task in the user's requirements." } #Start matching User issue: Play the video from the first camera. (Placeholder here) Replace the placeholders with the completed input content and the name and description information of each functional interface. Then, input the obtained functional interface selection prompts into the functional interface selection model, and output the following results according to the output requirements in the functional interface selection prompts: { "matched_tool_name":"playVideo", "reason": "The tool's function is to play video content based on the camera's encoding, which is completely consistent with the user's task of playing camera video." } As can be seen from the above, the name of the matched functional interface is playVideo.

[0252] Step 3: Using the complete description information of the playVideo function interface, and the supplemented input content, replace the placeholders in the prompt words to obtain the parameter extraction prompt words as follows: #Task You are a parameter extraction assistant whose task is to identify parameters in a natural language question input by a user, based on a complete description of the given tool, and output a JSON object as the parameter result.

[0253] #Tool Description (Placeholders are located below)

[0254] {

[0255] "type": "function", "function": { "name": "playVideo", "description": "Plays video content from the corresponding camera based on the provided camera encoding". "parameters": { "type": "object", "properties": { }, "required": [ ] } } } #User Input Play the video from the first camera. (Placeholder) #Output Requirements 1. Output only the JSON object containing the parameters, without any additional text descriptions.

[0256] 2. The structure of the JSON should strictly conform to the 'parameters' definition in the tool description, including attribute names and type requirements.

[0257] 3. If the user does not provide a value for a parameter, the parameter will be set to null or an empty string.

[0258] 4. Do not make any additional inferences about the parameters; only extract the information explicitly provided by the user.

[0259] 5. Ensure the order of JSON keys matches the order defined in the tool description.

[0260] 6. If no parameters are specified, only the tool name will be output, and the parameters will be empty.

[0261] #Example

[0262] User input: "View surveillance locations within 300 meters of location C"

[0263] Output: { "toolName": "queryCamera", "parameters": { "cameraName": null, "locationName": "locationC", "locationRange": 300 } } Replace the placeholders with the complete description of the playVideo function interface and the supplemented input content. Then, input the obtained parameter extraction prompts into the parameter extraction model and output the following results according to the output requirements of the parameter extraction prompts: { "toolName": "playVideo", "parameters": {} (This is empty) } The parameter extraction model did not extract the input parameters from the current input content through the parameter extraction prompt words in step 3. At this point, step 4 needs to be executed to determine the referential index mapping.

[0264] Step 4: Determine the reference index description information for the functional interface playVideo, that is, the content corresponding to "inputParamBindingIndex": "inputParamBindingIndex": [ { "paramKey": "cameraCodes", "indexIdentifier": "cameraCodeIndexs" } ] The content corresponding to the above "cameraCodeIndexs" is (in the complete description of the playVideo function interface): { "cameraCodeIndexs": { "type": "array", "description": "Camera Information Object Reference Index", "items": { "type": "number" } } } By replacing the placeholders in the prompt with the supplemented input content and the aforementioned referential index description information, the referential index prompt is obtained as follows: #Task Based on the given "reference index parameter definition", extract the corresponding index from the user's input question and return JSON that conforms to the parameter format.

[0265] # Extraction Rules

[0266] 1. Locate the words in the user input that refer to serial numbers, for example: "The first": [1] "The second one": [2] The first three: [1,2,3] The first two: [1,2] "The tenth":

[10] 2. If the "first N" appear, generate an array of indices from 1 to N.

[0267] 3. If multiple independent references appear (such as "the first and the third"), extract them into an array [1,3].

[0268] 4. If the last three appear, extract them as [-1, -2, -3].

[0269] 5. If "all of the above" or "all" appear, extract them as an empty array [].

[0270] 6. Numbers can come from Chinese numerals (first, second, third), Arabic numerals (first, second), scope descriptions (first five), or a combination of these.

[0271] 7. If no match is found, return null.

[0272] #Output Format

[0273] Output objects that strictly conform to the JSON structure, with keys specified in the parameter definition and values being an array of extracted ordinal numbers. Do not output any unnecessary explanations or text.

[0274] #Refers to the definition of the index parameter (the following are the locations of placeholders)

[0275] {

[0276] "cameraCodeIndexs": {

[0277] "type": "array", "description": "Camera Information Object Reference Index", "items": { "type": "number" } } } #User Input Play the video from the first camera. (Placeholder) Replace the placeholders with the reference index description information and input content of the playVideo function interface to obtain the reference index prompt words. Then, input the reference noun recognition into the reference noun recognition model. The output result according to the output format of the above reference noun recognition prompt words is: {"cameraCodeIndexs":[1]} Step 5: Extract the model output {"cameraCodeIndexs":[1]} based on the parameters in Step 4 above, and query the output result of the most recent round from the dialogue management module as follows: { "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "14800288265668811", "14800288265668813", "14909520803137728", "14800288265734337", "14303502522747080", "14315618917451969", "14383930556713153", "14800288265734336", "14383930556713154", "14383930556713154" ] } Since the value of cameraCodeIndexs is 1, representing the first element in the result set, and the value of the first element in the result set is "14800288265668811", cameraCodeIndexs can be mapped to ["14800288265668811"]. This mapped value is then added to the output of step 3, resulting in the following: { "toolName": "playVideo", "parameters": { "cameraCodeIndexs": [ "14800288265668811" ] } } Step 6: Use the output of Step 5 above as the input parameter of the function interface playVideo, call playVideo to play the video corresponding to the camera with number 14800288265668811, and generate the text response: The video of the first camera has been successfully played.

[0278] Next, the input content supplemented in step 1 and the text response obtained in step 6 are stored in the dialogue management module, as follows: [ { User: "Looking up CCTV cameras near location A." "Rewrite": "Query the cameras near location A." Reply: "Information on 10 cameras within a 500-meter radius of location A has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "12800288265668811", "12800288265668813", "12909520803137728", "12800288265734337", "13303502522747080", "11315618917451969", "13383930556713153", "12800288265734336", "13383930556713154", "12268814835714245" ] } ] }, { User: "Search for locations near B." "Rewrite": "Query cameras within 300 meters of location B", Reply: "Information on 10 cameras within a 300-meter radius of location B has been found." "outputIndexMappings": [ { "indexIdentifier": "cameraCodeIndexs", "result": [ "14800288265668811", "14800288265668813", "14909520803137728", "14800288265734337", "14303502522747080", "14315618917451969", "14383930556713153", "14800288265734336", "14383930556713154", "14268814835714245" ] } ] }, { User: "Play the video from the first camera", "Rewrite": "Play the video from the first camera", Reply: "The video from the first camera has been successfully played." } ] Based on the same inventive concept, embodiments of this application also provide a response content confirmation device, such as... Figure 4 As shown, the device includes: The determining module 401 is used to determine at least one pronoun contained in the current input content, and the user requirement corresponding to the current input content; The function interface matching module 402 is used to select a target function interface from a plurality of pre-built function interfaces whose name description information matches the user's requirements. The input parameter determination module 403 is used to determine, from the at least one pronoun, a target pronoun that matches the input parameter description information of the target function interface; The module 404 is used to take the target pronoun as the input parameter of the target function interface and call the target function interface to obtain a result that matches the function description information of the target function interface; The response module 405 is used to use the result as the response content corresponding to the current input content.

[0279] Based on the same inventive concept, this application also provides an electronic device, the device comprising: At least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform a response content confirmation method provided in the embodiments of this application.

[0280] The following reference Figure 5 To describe an electronic device 50 according to this embodiment of the present application. Figure 5 The electronic device 50 shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0281] like Figure 5 As shown, the electronic device 50 is presented in the form of a general-purpose electronic device. The components of the electronic device 50 may include, but are not limited to: at least one processor 51, at least one memory 52, and a bus 53 connecting different system components (including memory 52 and processor 51).

[0282] The processor 51 is used to read and execute instructions from the memory 52, so that the at least one processor can execute a response content confirmation method provided in the above embodiments.

[0283] Bus 53 represents one or more of several bus structures, including a memory bus or memory controller, peripheral bus, processor, or a local bus using any of the various bus structures.

[0284] The memory 52 may include a readable medium in the form of volatile memory, such as random access memory (RAM) 521 and / or cache memory 522, and may further include read-only memory (ROM) 523.

[0285] The memory 52 may also include a program / utility 525 having a set (at least one) of program modules 524, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.

[0286] Electronic device 50 can also communicate with one or more external devices 54 (e.g., keyboard, pointing device, etc.), and with one or more devices that enable a user to interact with electronic device 50, and / or with any device that enables electronic device 50 to communicate with one or more other electronic devices (e.g., router, modem, etc.). This communication can be performed via input / output (I / O) interface 55. Furthermore, electronic device 50 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 56. As shown, network adapter 56 communicates with other modules used in electronic device 50 via bus 53. It should be understood that, although not shown in the figure, other hardware and / or software modules can be used in conjunction with electronic device 50, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0287] In some possible implementations, various aspects of the response content confirmation method provided in this application can also be implemented in the form of a program product, which includes program code. When the program product is run on a computer device, the program code is used to cause the computer device to perform the steps of the response content confirmation method according to the various exemplary embodiments of this application described above.

[0288] In addition, this application also provides a computer-readable storage medium storing a computer program for causing a computer to perform the method described in any of the above embodiments.

[0289] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0290] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0291] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.

[0292] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the scope and intent of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application is also intended to include such modifications and variations.

Claims

1. A method for confirming response content, characterized in that, The method includes: Determine at least one pronoun contained in the current input content, and the user requirement corresponding to the current input content; Select the target functional interface whose name and description information match the user's needs from a number of pre-built functional interfaces; From the at least one pronoun, determine the target pronoun that matches the input parameter description information of the target function interface; The target pronoun is used as the input parameter of the target function interface, and the target function interface is called to obtain a result that matches the function description information of the target function interface; The result will be used as the response content corresponding to the current input content.

2. The method according to claim 1, characterized in that, The current input content is obtained in the following ways: The historical dialogues of a preset number of rounds and the latest input content are input into a pre-built content rewriting model to obtain input content after semantic supplementation of the latest input content. The historical dialogues include historical input content and historical response content. The semantically supplemented input content is used as the current input content.

3. The method according to claim 2, characterized in that, The step of inputting a preset number of rounds of historical dialogue and the latest input content into a pre-built content rewriting model to obtain input content after semantic supplementation of the latest input content includes: The semantic recognition layer of the content rewriting model determines the semantic category of each keyword contained in the historical dialogue of the preset number of rounds. The decision layer of the content rewriting model determines target keywords from the plurality of keywords that have a different semantic category from the at least one pronoun. The semantic supplementation layer of the content rewriting model performs semantic supplementation on the latest input content based on the target keywords.

4. The method according to claim 2, characterized in that, The step of inputting a preset number of rounds of historical dialogue and the latest input content into a pre-built content rewriting model to obtain input content after semantic supplementation of the latest input content includes: Construct content rewriting prompts using the preset number of rounds of historical dialogue and the latest input content; The content rewriting prompts are input into the content rewriting model so that the content rewriting model can semantically supplement the latest input content according to the content rewriting prompts. The content rewriting prompts also include the tasks that the content rewriting model needs to complete, the semantic supplementation rules used to instruct the content rewriting model to perform semantic supplementation, and the formats of the input and output information of the content rewriting model.

5. The method according to claim 1, characterized in that, The step of selecting a target functional interface from a pre-built set of functional interfaces whose name and description information match the user's needs includes: Construct function interface selection prompts using the user requirements and the name description information of each function interface; The function interface selection prompts are input into a pre-built function interface selection model so that the function interface selection model determines the target function interface according to the function interface selection prompts. The function interface selection prompt also includes the task that the function interface selection model needs to complete, the matching rules for the function interface selection model to match the function interface, and the output requirements for the output information format of the function interface selection model. The matching rules are to match the user requirements and the name description information of each function interface, and use the matching results to determine the target function interface. When multiple target function interfaces are matched, the matched function interfaces are output in a preset order. When no target function interface is matched, the information indicating no matching result is output.

6. The method according to claim 5, characterized in that, The method further includes: Construct a referential index prompt word using the referential index description information of each referential noun and the target functional interface; The reference index prompt is input into a pre-built reference noun recognition model so that the reference noun recognition model determines the input parameters of the target function interface according to the reference index prompt; The reference index prompt also includes the tasks that the reference noun recognition model needs to complete, the recognition rules for instructing the reference noun recognition model to recognize each reference noun, and the output requirements for instructing the reference noun recognition model to output information in a specific format.

7. The method according to claim 1, characterized in that, The step of determining the target pronoun from the at least one pronoun that matches the input parameter description information of the target function interface includes: Construct parameter extraction prompts using the description information of each pronoun and the input parameters of the target function interface; The parameter extraction prompts are input into a pre-built parameter extraction model so that the parameter extraction model selects the target pronoun according to the parameter extraction prompts. The parameter extraction prompts also include the tasks that the parameter extraction model needs to complete, the selection rules for the parameter extraction model to select pronouns, and the output requirements for the output information format of the parameter extraction model. The selection rules are to compare the parameter names contained in the input parameter description information with the semantics of each pronoun, and determine the target pronoun according to the comparison results.

8. The method according to claim 1, characterized in that, The step of calling the target function interface and obtaining a result that matches the function description information of the target function interface includes: Query the target field from the target data source connected to the target function interface that matches the parameter category to which the input parameter belongs; Target data that meets the filtering conditions of the input parameters is filtered from the data corresponding to the field, and the filtering conditions are obtained from the parameter description information of the target function interface; Determine the data processing logic contained in the functional description information of the target functional interface, wherein the processing logic includes at least one of data verification, data format conversion, and data calculation. The target data is processed according to the processing logic to obtain the result.

9. An electronic device, characterized in that, The electronic device includes: At least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method as described in any one of claims 1-8.

10. A computer storage medium, characterized in that, The computer storage medium stores a computer program that enables the computer to perform the method as described in any one of claims 1-8.

11. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-8.