Electronic device and method for evaluating and improving response to query of electronic device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The system evaluates and regenerates LLM responses to ensure they meet quality criteria, addressing the issue of inappropriate content in electronic device responses.

WO2026127491A1PCT designated stage Publication Date: 2026-06-18SAMSUNG ELECTRONICS CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: SAMSUNG ELECTRONICS CO LTD
Filing Date: 2025-12-02
Publication Date: 2026-06-18

Smart Images

Figure KR2025020401_18062026_PF_FP_ABST

Patent Text Reader

Abstract

Provided are an electronic device and an operating method of the electronic device according to an embodiment, wherein the electronic device may comprise at least one processor. The electronic device comprises a memory for storing one or more computer programs including a plurality of instructions. When individually or collectively executed by the at least one processor, the one or more computer programs may cause the electronic device to identify a query included in a user input. The one or more computer programs may cause the electronic device to generate a first instruction for instructing the electronic device to generate a response to the identified query and evaluation information for the response on the basis of at least one evaluation criterion. The one or more computer programs may cause the electronic device to generate a first response to the identified query and first evaluation information for the first response on the basis of the generated first instruction. The one or more computer programs may cause the electronic device to determine whether at least one evaluation score included in the first evaluation information for the first response is not greater than a criterion value.

Need to check novelty before this filing date? Find Prior Art

Description

Methods to evaluate and improve electronic devices and responses to queries on electronic devices

[0001] The present disclosure relates to an electronic device and a method of operating the electronic device, and to a method of evaluating and improving the response to a query of the electronic device.

[0002] An electronic device can recognize voice input. The electronic device can identify user queries included in the voice input and generate a response to the query. The response generated by the electronic device can be provided to the user through an output device, such as a display. The electronic device can utilize a large language model (LLM) to generate a response to the query. An LLM is a language model trained on a large dataset and can be used to perform natural language processing (NLP) tasks.

[0003] The information described above may be provided as related art for the purpose of aiding understanding of this document. None of the foregoing is to be claimed as prior art related to this document, nor is it to be used to determine prior art.

[0004] An electronic device may generate a response to a query using an LLM. The response generated by the LLM may contain content that is unsuitable for output to a user. For example, the response generated by the LLM may contain inaccurate content, be generated in a tone not previously defined, or contain morally reprehensible content. If the electronic device outputs a response containing a response generated by the LLM, it may fail to provide an appropriate answer to the query. A method to improve the response generated by the LLM may be required. The technical problems to be solved in this document are not limited to those mentioned above, and other unmentioned technical problems will be clearly understood by those skilled in the art to which the present invention pertains from the description below.

[0005] An electronic device according to one embodiment may include at least one processor. The electronic device may include a memory that stores at least one computer program comprising a plurality of instructions. When the at least one computer program is executed individually or collectively by the at least one processor, the electronic device may check a query included in user input. The at least one computer program may generate a first instruction that instructs the electronic device to generate a response to the checked query and evaluation information for the response based on at least one evaluation criterion. The at least one computer program may cause the electronic device to generate a first response to the checked query and first evaluation information for the first response based on the generated first instruction. The at least one computer program may cause the electronic device to check whether at least one evaluation score included in the first evaluation information is less than or equal to a reference value.

[0006] A method of operation of an electronic device according to one embodiment may include an operation of checking a query included in user input. The method of operation of the electronic device may include an operation of generating a first command that instructs to generate a response to the checked query and evaluation information for the response based on at least one evaluation criterion. The method of operation of the electronic device may include an operation of generating a first response to the checked query and first evaluation information for the first response based on the generated first command. The method of operation of the electronic device may include an operation of checking whether at least one evaluation score included in the first evaluation information is less than or equal to a reference value.

[0007] An electronic device can evaluate a response by generating evaluation information for the response together with the response when generating a response to a query. The electronic device can regenerate a response to a query based on the evaluation information for the response. For example, the electronic device can regenerate a response to a query by generating a command that instructs the response to be regenerated if at least one evaluation score included in the evaluation information for the response is below a threshold value. The electronic device can improve the quality of the answer provided to the user by regenerating the response such that all evaluation scores exceed the threshold value.

[0008] The effects obtainable from the present disclosure are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art to which the present disclosure belongs from the description below.

[0009] FIG. 1 is a block diagram of an exemplary electronic device capable of performing the operations described in this document.

[0010] FIG. 2 is a block diagram showing an integrated intelligence system according to one embodiment.

[0011] FIG. 3 is a block diagram showing an integrated intelligent system according to one embodiment.

[0012] FIG. 4 is a block diagram of an electronic device according to one embodiment.

[0013] FIG. 5 is an example illustrating the operation of an electronic device generating a response to a query according to one embodiment.

[0014] FIG. 6 is a flowchart of an operation in which an electronic device generates a response to a query according to one embodiment.

[0015] FIG. 7 is a diagram illustrating the operation of an electronic device in a situation where all evaluation scores exceed a reference value, according to one embodiment.

[0016] FIG. 8 is a diagram illustrating the operation of an electronic device in a situation where the first evaluation score is below a reference value, according to one embodiment.

[0017] FIG. 9 is a diagram illustrating the operation of an electronic device in a situation where the first evaluation score is below a reference value, according to one embodiment.

[0018] FIG. 10 is a diagram illustrating the operation of an electronic device in a situation where the third evaluation score is below a reference value, according to one embodiment.

[0019] FIG. 11 is a diagram illustrating the operation of an electronic device in a situation where the fourth evaluation score is below a reference value, according to one embodiment.

[0020] FIG. 12 is a drawing illustrating a screen for turning evaluation criteria on and off according to one embodiment.

[0021] FIG. 13 is a drawing illustrating an electronic device that outputs an answer to a query according to one embodiment.

[0022] FIG. 14 is a drawing illustrating an electronic device that outputs an answer to a query according to one embodiment.

[0023] FIG. 15 is a drawing illustrating an electronic device that outputs an object representing a plurality of responses according to one embodiment.

[0024] FIG. 16 is a drawing illustrating an electronic device that outputs the result of executing a function related to a query according to one embodiment.

[0025] FIG. 1 is a block diagram of an exemplary electronic device (100) capable of performing the operations described in this document.

[0026] Referring to FIG. 1, the electronic device (100) may be one of various forms of electronic devices, such as a notebook (190), smartphones (191) having various form factors (e.g., a bar-type smartphone (191-1), a foldable-type smartphone (191-2), or a sliderable (or rollable)-type smartphone (191-3)), a tablet (192), a cellular phone (not shown), and other similar computing devices (not shown). The components, their relationships, and their functions illustrated in FIG. 1 are illustrative only and are not intended to limit the implementations described or claimed herein. The electronic device (100) may be referred to as a mobile device, a user device, a multifunction device, a portable device, or a server.

[0027] The electronic device (100) may include components comprising at least one processor (110) (hereinafter referred to as processor (110)), at least one memory (120) (hereinafter referred to as memory (120)), at least one display (140) (hereinafter referred to as display (140)), at least one image sensor (150) (hereinafter referred to as image sensor (150)), at least one communication circuit (160) (hereinafter referred to as communication circuit (160)), and / or at least one sensor (170) (hereinafter referred to as sensor (170)). The components are merely exemplary. For example, the electronic device (100) may include other components (e.g., power management integrated circuitry (PMIC), audio processing circuit, antenna, rechargeable battery, or input / output interface). For example, some components may be omitted from the electronic device (100). For example, some components may be integrated into a single component.

[0028] The processor (110) may be implemented as one or more IC (integrated circuit (or circuitry)) chips and may perform various data processing operations. The processor (110) may include at least one electrical circuit and may process instructions (or programs, data, etc.) stored in memory (120) individually or collectively in a distributed manner. The processor (110) may include a processor assembly comprising one or more processing circuits. The processor (110) may include any processing circuit that is operative to control the performance and operations of one or more components of the electronic device (100) (e.g., memory (120), display (140), image sensor (150), communication circuit (160), and / or sensor (170)). For example, the processor (110) (e.g., application processor (AP)) may be implemented as a system on chip (SoC) (e.g., a single chip or chipset). For example, the processor (110) may be implemented with a plurality of cores (or at least one core circuit), a plurality of chips, or a plurality of chipsets. For example, the processor (110) may include one or more processing circuits. For example, the processor (110) may include one or more processing circuits configured to perform the various functions of the present disclosure individually and / or collectively. As an example without limitation, at least a portion of the processor (110) may be included in a first chip of the electronic device (100), and at least another portion of the processor (110) may be included in a second chip of the electronic device (100) different from the first chip of the electronic device (100).

[0029] For example, the processor (110) may include a central processing unit (111), a graphics processing unit (112), a neural processing unit (113), an image signal processor (114), a display controller (115), a memory controller (116), a storage controller (117), a communication processor (118), and / or a sensor interface (119). These components of the processor (110) are merely exemplary. For example, the processor (110) may include other components. For example, some components of the processor (110) may be omitted from the processor (110). For example, some components of the processor (110) may be included as separate components of the electronic device (100) outside of the processor (110). For example, some components of the processor (110) (e.g., memory controller (116)) may be included in other components (e.g., at least part of memory (120), an interface (e.g. available for connection to at least one component of the electronic device (100)), a display (140) and / or an image sensor (150)).

[0030] The processor (110) may cause other components of the electronic device (100) to perform various operations by executing instructions stored in memory (120). The CPU (111) (or central processing circuit) may be configured to control the components of the processor (110) based on the execution of instructions stored in memory (120) (e.g., volatile memory (121) and / or non-volatile memory (122)). The GPU (112) (or graphics processing circuit) may be configured to execute parallel operations (e.g., rendering). The NPU (113) (or neural processing circuit, or AI (artificial intelligence) chip) may be configured to execute operations for an artificial intelligence model (e.g., convolution computation). An ISP (114) (or image signal processing circuit) may be configured to process a raw image acquired through an image sensor (150) into a format suitable for a component within the electronic device (100) or a component of the processor (110). A display controller (115) (or display control circuit, or DPU (display processing unit)) may be configured to process an image acquired from a CPU (111), GPU (112), ISP (114), or memory (120) (e.g., volatile memory (121)) into a format suitable for a display (140). A memory controller (116) (or memory control circuit) may be configured to control reading data from the volatile memory (121) and writing data to the volatile memory (121). A storage controller (117) (or storage control circuit) may be configured to control reading data from the non-volatile memory (122) and writing data to the non-volatile memory (122).The CP (118) (communication processing circuit) may be configured to process data obtained from a component of the processor (110) into a format suitable for transmitting to another electronic device via the communication circuit (160), or to process data obtained from another electronic device via the communication circuit (160) into a format suitable for processing by the component of the processor (110). For example, the communication circuit (160) may include one or more communication circuits. The sensor interface (119) (or sensing data processing circuit, sensor hub) may be configured to process data regarding the state of the electronic device (100) and / or the state around the electronic device (100), obtained through the sensor (170), into a format suitable for the component of the processor (110).

[0031] Memory (120) may include one or more storage media (or one or more storage devices). For example, memory (120) may include a memory assembly comprising one or more storage media. For example, the one or more storage media may include a hard drive, a permanent memory such as flash memory, read-only memory (ROM) (e.g., non-volatile memory (122)), a semi-permanent memory such as random access memory (RAM) (e.g., volatile memory (121)), any other suitable type of storage (or storage assembly), or any combination thereof. Memory (120) may include a cache memory, which is one or more different types of memory used to temporarily store data for a function or feature of the electronic device (100). As an example not limited to, the cache memory may be included within the processor (110). The memory (120) may be fixedly embedded within the electronic device (100) or incorporated into one or more suitable types of components (e.g., a SIM (subscriber identity module) card and / or an SD (secure digital) card) that can be repeatedly inserted into and removed from the electronic device (100).

[0032] For example, memory (120) may store one or more software applications, such as operating system (or system) software applications, firmware software applications, driver software applications, plugin (e.g., add-in, add-on, and / or applet) software applications, and / or any other suitable software applications. For example, the one or more software applications may include instructions executable by the processor (110). For example, memory (120) may store instructions that can be called by an application programming interface (API). For example, memory (120) may store instructions within a library.

[0033] Referring to FIG. 1b, a generative artificial intelligence system according to one embodiment may include a user interface (10100), a database (1500), an application and service component (1000), an AI framework (10200), and a generative AI model (10300).

[0034] The user interface (10100) can receive user queries. User queries can be in the form of natural language, images, and videos. Additionally, context information may be transmitted along with the user query. As another example, user queries can be non-natural language inputs that do not generate natural language, such as design requests or modifications. Furthermore, a mixed form of the natural language, images, sounds, and context information described above is also possible. Additionally, the user interface (10100) can output results from a generative artificial intelligence system to the user. The output can be in the form of natural language or specific content, and it can also be provided in the form of an action requested by the user.

[0035] The AI framework (10200) can receive a user query and coordinate and control each component necessary to perform the user's intent. This AI framework (10200) may include a prompt design component (10210), an APIs / Plugins Management component (10230), and an output modification component (10250).

[0036] User queries or actions entered in the user interface (10100) can be transmitted to the command design component (10210). The command design component (10210) can be used to generate commands suitable for input into a large language model (LLM) or large multimodal models. The command design component (10210) may be an AI component that uses machine learning algorithms or neural networks to develop better commands over time. The command design component (10210) can generate commands by accessing a knowledge component containing user preference data, a command library, and command examples, and transmit them to the large language model (LLM) or large multimodal model (LMM).

[0037] The application and plugin management component (10230) can perform the role of communicating with external information when there is a request for additional information when user input is passed as input to a generative model. The application and plugin management component (10230) establishes a channel to communicate with the outside of the AI interface through an application programming interface (API), thereby enabling access to various data sources. Additionally, the application and plugin management component (10230) can request an action through the API if the application or service needs to perform an action that ultimately executes a user query rather than an intermediate result. Information obtained from the outside can be passed as input to the generative model along with the user input.

[0038] The output modification component (10250) can fine-tune the output of the generative model. For example, the output modification component (10250) can verify whether the content generated through the language model (LLM) or large-scale multimodal model (LMM) is irrelevant, contains biased content, or contains harmful content. Additionally, the output modification component (10250) can determine the extent to which the output matches the desired result and, if additional processing is required, proceed with that process. Furthermore, the output modification component (10250) can configure and provide hints to the user to avoid unwanted output.

[0039] Generative AI models generally refer to artificial intelligence neural networks that generate new forms of data based on user input information. Representative image-generating models include GANs (generative adversarial networks) and VAEs (variational autoencoders); recently, diffusion-based generative models using VAEs and transformer structures are also referred to as generative models. Additionally, language models are trained to output the statistically most appropriate output value based on input values; representative examples include models such as CHAT-GPT 3 and CHAT-GPT 4. Furthermore, because they can recognize various forms of data input, such as text, images, and speech, and generate new corresponding data, they are called LMMs (large multimodal models).

[0040] FIG. 2 is a block diagram showing an integrated intelligence system according to one embodiment.

[0041] Referring to FIG. 2, an integrated intelligent system of one embodiment may include an electronic device (201) (e.g., the electronic device (100) of FIG. 1), an intelligent server (300), and a service server (399).

[0042] According to the illustrated embodiment, the electronic device (201) may include a communication interface (210), an input / output (I / O) interface (220), a processor (230), and / or a memory (240). The listed components may be operatively or electrically connected to each other. For example, the electronic device (201) may include at least some of the components of the electronic device (100) of FIG. 1.

[0043] The communication interface (210) can be connected to an external device (e.g., an intelligent server (300) and / or a service server (399)) via a network (299) (e.g., any network including a cellular network and / or a WLAN (wireless local area network)) to transmit and receive data. For example, the communication interface (210) may correspond to the CP (118) and / or communication circuit (160) of FIG. 1. The I / O interface (220) can receive user input, process received user input, and / or output results processed by the processor (230) using an input / output device (not shown) (e.g., a microphone, a speaker, and / or a display (e.g., the display (140) of FIG. 1).

[0044] The processor (230) may perform a specified operation by being operatively or electrically connected to a communication interface (210), an I / O interface (220), and / or a memory (240) (e.g., memory (120) of FIG. 1). For example, the processor (230) may correspond to the processor (110) of FIG. 1. The processor (230) may perform a specified operation by executing a program (or one or more instructions) stored in the memory (240). For example, the processor (230) may receive a user's voice input (e.g., user speech) through the I / O interface (220) or from an external electronic device. The processor (230) may transmit the voice input received through the communication interface (210) to an intelligent server (300). For example, the processor (230) may include one or more processors.

[0045] The processor (230) may receive a result corresponding to a voice input from the intelligent server (300). For example, the processor (230) may receive a plan corresponding to the voice input and / or a result calculated using the plan from the intelligent server (300). For example, the plan may include, but is not limited to, information regarding a plurality of sequential operations to be executed by the first electronic device (201) and / or other electronic devices in relation to the voice input. The processor (230) may receive a request from the intelligent server (300) to obtain information (e.g., entities, slots, and / or parameters) necessary to generate a plan corresponding to the voice input. The processor (230) may transmit the necessary information to the intelligent server (300) in response to the request.

[0046] The processor (230) can output the result of executing a specified action according to a plan visually, tactilely, and / or audibly through the I / O interface (220). For example, the processor (230) can sequentially display the result of executing multiple actions on a display. For example, the processor (230) can display only the result of executing multiple actions (e.g., the result of executing one of the multiple actions or the last action) on a display.

[0047] The processor (230) can recognize voice input. For example, the processor (230) can execute an intelligent app (or voice recognition app) to process voice input in response to a specified voice input (e.g., Wake up!). The processor (230) can provide voice recognition services through the intelligent app. The processor (230) can transmit voice input to an intelligent server (300) through the intelligent app and receive a result corresponding to the voice input from the intelligent server (300).

[0048] An intelligent server (300) of one embodiment can receive a user's voice input from an electronic device (201) via a network (299). The intelligent server (300) can convert audio data corresponding to the received voice input into text data. Based on the text data, the intelligent server (300) can generate at least one plan for performing a task corresponding to the user's voice input. The intelligent server (300) can transmit the generated plan, or the result according to the generated plan, to the electronic device (201) via the network (299).

[0049] An intelligent server (300) of one embodiment may include a front end (310), a natural language platform (320), a capsule database (330), an execution engine (340), and / or an end user interface (350).

[0050] The front end (310) can receive voice input received by the electronic device (201) from the electronic device (201). The front end (310) can transmit a response corresponding to the voice input to the electronic device (201).

[0051] The natural language platform (320) may include an automatic speech recognition (ASR) module (321), a natural language understanding (NLU) module (323), a planner module (325), a natural language generator (NLG) module (327), and / or a text-to-speech (TTS) module (329).

[0052] The automatic speech recognition module (321) can convert voice input received from the electronic device (201) into text data. The natural language understanding module (323) can identify the user's intent and / or parameters (e.g., entities and / or slots) based on the text data of the voice input. The user's intent corresponds to the voice input and may include information indicating an action (or function) that the user intends to perform using the device. A slot may be detailed information related to the user's intent. A slot may be obtained based on a domain corresponding to the utterance. A slot may be variable information required to perform an action. In one embodiment, the variable information constituting the slot may include a named entity.

[0053] The planner module (325) can generate a plan using the intent and / or parameters determined by the natural language understanding module (323). For example, the planner module (325) can determine at least one domain required to perform a task based on the determined intent. The planner module (325) can determine multiple actions included in each of the at least one domain determined based on the intent. The domain may correspond to a category (or service) associated with an action (or function) that the user intends to execute using the device. The domain may be classified according to a service (e.g., an app) related to text. The domain may be related to the user's intent corresponding to the text. The domain may be classified according to, for example, the application that receives voice input and / or the type of service to be provided based on voice input, but is not limited thereto. In one example, the determination of the domain may be performed by another module (e.g., the natural language understanding module (323)). The planner module (325) can determine parameters required to execute a plurality of determined actions or result values output by the execution of a plurality of actions. The parameters and result values may be defined as concepts of a specified format (or class). For example, the plan may include a plurality of actions and / or a plurality of concepts determined by the user's intent. The planner module (325) can determine the relationship between the plurality of actions and / or a plurality of concepts in a stepwise (or hierarchical) manner. For example, the planner module (325) can identify the execution order of a plurality of actions (e.g., a plurality of actions determined based on the user's intent) based on a plurality of concepts (e.g., parameters required to execute a plurality of actions, and result values output by the execution of a plurality of actions). The planner module (325) can generate a plan that includes association information (e.g., an ontology) between the plurality of actions and the plurality of concepts.The planner module (325) can generate a plan using information (e.g., at least one capsule) stored in a capsule database (330) in which a set of relationships between concepts and actions is stored.

[0054] The planner module (325) can generate a plan based on an artificial intelligence (AI) system. For example, the AI system may include one or more electronic devices and / or one or more processing circuits to execute a rule-based system, a neural network-based system (e.g., a feedforward neural network (FNN), a recurrent neural network (RNN)), or a combination of the above. The AI system described above is exemplary, and the AI system may be an AI system based on any model based on machine learning. The planner module (325) may select a plan corresponding to a user request from a set of predefined plans, or generate a plan in real time in response to a user request.

[0055] The natural language generation module (327) can change the specified information into a text form. The information changed into a text form may be in the form of a natural language utterance. The text-to-speech conversion module (329) can convert the information in the text form into information in the speech form.

[0056] The capsule database (330) can store information regarding the relationships between multiple concepts and actions corresponding to multiple domains (e.g., applications). The capsule database (330) can store at least one capsule (e.g., capsule (331) and / or capsule (333)) in the form of a concept action network (CAN). For example, the capsule database (330) can store actions for processing a task corresponding to a user's voice input, and parameters required for the actions, in the form of a CAN. A capsule may include multiple action objects (or action information) and / or concept objects (or concept information) included in a plan. For example, capsules (331, 333) may be created per domain and stored in the capsule database (330), but are not limited thereto.

[0057] The execution engine (340) can produce a result using the generated plan. The end user interface (350) can transmit the produced result to the electronic device (201).

[0058] According to one embodiment, some functions (e.g., natural language platform (320)) or all functions of the intelligent server (300) may be implemented in the electronic device (201). For example, the electronic device (201) may execute one or more programs including a natural language platform (e.g., the natural language platform (250) of FIG. 3) separately from the intelligent server (300). For example, the electronic device (201) may directly perform at least some of the operations of the natural language platform (320) of the intelligent server (300) (e.g., automatic speech recognition module (321), natural language understanding module (323), planner module (325), natural language generation module (327), and / or text-to-speech module (329)).

[0059] In one embodiment, the service server (399) may provide a designated service (e.g., food ordering or hotel reservation) to the electronic device (201). The service server (399) may be a server operated by an operator different from the intelligent server (300). The service server (399) may communicate with the intelligent server (300) and / or the electronic device (201) through the network (299). The service server (399) may communicate with the intelligent server (300) through a separate connection (not shown). The service server (399) may provide the intelligent server (300) with information (e.g., operation information and / or concept information for providing the designated service) to generate a plan corresponding to the voice input received by the electronic device (201). The provided information may be stored in the capsule database (330). The service server (399) may provide the intelligent server (300) with result information according to the plan received from the electronic device (201).

[0060] FIG. 3 is a block diagram showing an integrated intelligence system according to one embodiment.

[0061] Referring to FIG. 3, the integrated intelligence system may include an electronic device (203) (e.g., a user terminal) and an intelligent server (302). The electronic device (203) and the intelligent server (302) may be connected to each other via a network to transmit and receive data. According to one embodiment, the integrated intelligence system may be composed of a single device (e.g., the electronic device (203) or the intelligent server (302)). According to one embodiment, the intelligent server (302) may include the entire configuration or at least a part of the configuration of the intelligent server (300) shown in FIG. 2. For example, the intelligent server (302) may execute one or more programs including the natural language platform (320) of the intelligent server (300) of FIG. 2 and / or store a capsule database (330). The configuration of the intelligent server (302) is not limited to that shown in FIG. 3. For example, at least some components of the natural language platform (320) (e.g., automatic speech recognition module (321), natural language understanding module (323), planner module (325), natural language generation module (327), and / or text-to-speech module (329)) may be omitted from the intelligent server (302). For example, the intelligent server (302) may further include some components of the intelligent server (300) of FIG. 2 (e.g., front end (310), execution engine (340), and / or end user interface (350)).

[0062] The electronic device (203) may execute one or more programs including a natural language platform (250) and / or store a capsule database (260). For example, the electronic device (203) may include components of the electronic device (201) of FIG. 2 and may execute one or more programs including a natural language platform (250) and / or store a capsule database (260). For example, the first electronic device (201) may include components of the first electronic device (201) of FIG. 2 and may execute one or more programs including a natural language platform (250) and / or store a capsule database (260).

[0063] The natural language platform (250) may include an automatic speech recognition module (251), a natural language understanding module (253), a planner module (255), a natural language generation module (257), and / or a text-to-speech conversion module (259). The automatic speech recognition module (251), the natural language understanding module (253), the planner module (255), the natural language generation module (257), and the text-to-speech conversion module (259) may each perform the same or similar functions as the automatic speech recognition module (321), the natural language understanding module (323), the planner module (325), the natural language generation module (327), and the text-to-speech conversion module (329) of FIG. 2.

[0064] The capsule database (260) can perform the same or similar functions as the capsule database (330) of the intelligent server (300, 302). The capsule database (260) can store information about the relationships of multiple actions and multiple concepts included in the plan generated by the planner module (255). For example, the capsule database (260) can store at least one capsule (e.g., capsule (261) and / or capsule (263)22).

[0065] According to one embodiment, an electronic device (203) (e.g., a natural language platform (250) and / or a capsule database (260)) and an intelligent server (302) (e.g., a natural language platform (320) and / or a capsule database (330)) may perform at least one function (or, operation) in conjunction with each other, or each may perform at least one function (or, or operation) independently. For example, the electronic device (203) may perform voice recognition on its own without transmitting the received voice input of the user to the intelligent server (302). For example, the electronic device (203) may convert the received voice input into text data through an automatic voice recognition module (251). The electronic device (203) may transmit the converted text data to the intelligent server (302). The intelligent server (302) may determine (or identify) the user's intent and / or parameters from the text data through a natural language understanding module (323). The intelligent server (302) may generate a plan through the planner module (325) based on the determined intention and parameters and transmit it to the electronic device (203), or transmit the determined intention and parameters to the electronic device (203) to generate a plan through the planner module (255) of the electronic device (203). The planner module (255) of the electronic device (203) may generate at least one plan for performing a task corresponding to voice input using information stored in the capsule database (260).

[0066] For example, the electronic device (203) can convert voice input received through an automatic voice recognition module (251) into text data and determine (or identify) the user's intent and / or parameters based on the text data through a natural language understanding module (253). The electronic device (203) can generate a plan through a planner module (255) based on the determined intent and parameters, or transmit the determined intent and parameters to an intelligent server (302) to generate a plan through the planner module (325) of the intelligent server (302). For example, if the electronic device (203) does not include a planner module (255) and / or a capsule database (260), the electronic device (203) can generate a plan through the intelligent server (302).

[0067] For example, an electronic device (203) can detect speech patterns that are difficult to learn in an automatic speech recognition module (251) or a natural language understanding module (253), and transmit voice input corresponding to the detected speech patterns to an intelligent server (302) so that it can be processed by an automatic speech recognition module (321) or a natural language understanding module (323) of the intelligent server (302).

[0068] The embodiments of the present disclosure are not limited to the examples described above. For example, the electronic device (203) may process the received voice input only within the terminal and produce a result corresponding to the voice input. As an example, the electronic device (203) and the intelligent server (302) may process the voice input by dividing it into modules, as well as process it by collaborating with corresponding modules. For example, the natural language understanding module (253) of the electronic device (203) and the natural language understanding module (323) of the intelligent server (302) may operate together to produce a single result value (e.g., user intent and / or parameters).

[0069] FIG. 4 is a block diagram of an electronic device according to one embodiment.

[0070] The electronic device (100) may include a large language model (LLM) (401) and / or a real-time control system (400). According to one embodiment, if the LLM (401) is included in an external server, the electronic device (100) may use the LLM (401) present in the external server.

[0071] LLM (401) may refer to a model trained to understand and / or generate natural language. LLM (401) may receive a command (e.g., a prompt) which is information that directs a target for generation (e.g., text) and may generate the target for generation directed by the command. According to one example, LLM (401) may receive a command that directs it to generate a response to a user query and may generate a response in the form of natural language. The response may include information that determines a function related to the query. The function related to the query may refer to a function that LLM (401) determines is necessary for the query among the functions executable on the electronic device (100). LLM (401) may generate information that determines a function related to the query and may transmit information that determines a function related to the query to the function call module (430) to execute the function related to the query. The function call module (430) may call a function that executes a function related to the query based on an evaluation score. For example, the function call module (430) may call a function that instructs the LLM (401) to regenerate a response based on the response and evaluation score generated by the LLM. Alternatively, the function call module (430) may call a function that searches the internet for the response and query generated by the LLM. The LLM (401) may generate evaluation information that evaluates the generated response according to evaluation criteria (e.g., first criterion, second criterion, third criterion, fourth criterion).

[0072] The LLM (401) can be trained to output a response to a specific query when a specific query is input. The LLM (401) can be trained based on large-scale text data. The large-scale text data may include various queries and responses for each of the various queries. The LLM (401) trained based on various queries and responses for each of the various queries can generate a response to a user's query. The operation of generating a response to a query may include an operation of determining a function related to the query. The function related to the query may include a function that uses an external API and / or a function predefined in the electronic device (100). The quality of the response generated by the LLM (401) may increase as the number of queries and responses used to train the LLM (401) increases. The LLM (401) of the present disclosure may be a model trained by various queries and responses.

[0073] When an LLM (401) receives a response to a specific query, it can be trained to evaluate the response based on at least one evaluation criterion. The large text data used to train the LLM (401) may include various queries, responses to each of the various queries, as well as evaluation information for each of the various responses. The evaluation information for the response may include evaluation scores and feedback information. The LLM (401), trained based on various queries, responses to each of the various queries, and evaluation information for each of the various responses, can generate evaluation information for the response to the user's query.

[0074] According to one embodiment, evaluation information for a response generated by the LLM (401) may include an evaluation score for the response. An evaluation score for a response may refer to a score that evaluates the response according to evaluation criteria. The LLM (401) may be trained to output (or determine) an evaluation score that evaluates the response according to various evaluation criteria (e.g., a first criterion, a second criterion, a third criterion, a fourth criterion). According to one embodiment, the LLM (401) may be trained to output an evaluation score for a response in the form of a log probability score. A log probability score may be a value obtained by calculating the probability that a specific score will be output when the LLM (401) selects a specific score, converting it into a logarithm, and reflecting it in the specific score. For example, given a score list (e.g., a score list in increments of 1 from 1 to 7), the LLM (401) may select the most suitable score for the response. The LLM (401) may calculate the probability that the selected score is an evaluation score in a logarithmic form. The calculated probability may indicate the degree of confidence of the LLM (401) regarding the selected score. The LLM (401) may output an evaluation score for the response based on the selected score and the calculated probability. For example, the LLM (401) may output the value obtained by multiplying the selected score and the calculated probability as the evaluation score.

[0075] According to one embodiment, evaluation information for a response generated by the LLM (401) may include feedback information for the response. The feedback information for the response may include information indicating the basis for determining the evaluation score of the response, information indicating areas requiring improvement, and / or information instructing the response to be modified. The LLM (401) may output (or determine) feedback information for the response according to various evaluation criteria.

[0076] For example, evaluation information for a response may include evaluation scores and feedback information, as shown in [Table 1]. Evaluation information for a response may include evaluation scores (e.g., score) and feedback information (e.g., feedback) for each evaluation criterion (e.g., appropriate, content, logic). The evaluation scores (e.g., score) shown in [Table 1] may be scores selected from a score list (e.g., a score list in increments of 1 from 1 to 7). The evaluation criteria (e.g., appropriate, content, logic) included in [Table 1] may be replaced with other evaluation criteria (e.g., first criterion, second criterion, third criterion, fourth criterion).

[0077] {"appropriateness": {"score": "3","feedback": "The response explains the definition of the non-priority subscription system and the problems arising from it, and mentions the government's decision regarding its abolition. This relates to the core requirements of the question and reflects some of the context. However, it does not provide a clear answer to the core question, 'Should it be abolished?' and ultimately stops at stating that careful consideration is needed. Therefore, due to the lack of a clear answer to the question, it is rated 3 points."},"content": {"score": "6","feedback": "The response covers various opinions on whether to abolish the non-priority subscription system and presents both positive and negative aspects in a balanced manner. It does not contradict social norms and follows ethical and moral value judgments that are universally accepted in Korean society. Furthermore, it did not use expressions that directly predict the future." However, the statement "The government has decided to abolish the non-priority subscription system" may contain a slight informational error by definitively describing actual policy changes. Aside from this, there are only minor issues regarding the appropriateness of the content."},"logic":{"score": "3","feedback": "This response addresses the concept of the non-priority subscription system, the reasons for its introduction, and the necessity of its abolition. However, there are several logical errors and inefficiencies. First, regarding cases of abuse of the non-priority subscription system...

[0078] Evaluation information for a response according to one embodiment may include information determining whether a sentence included in the response to a query satisfies evaluation criteria. The LLM (401) may identify sentences containing errors among the sentences included in the response to the query. The LLM (401) may generate information explaining the errors included in the sentences. The LLM (401) may regenerate a response to the query based on the information explaining the errors. The LLM (401) may regenerate a response to the query according to instructions of a command containing information explaining the errors. According to one embodiment, the LLM (401) may identify errors in each response even when generating responses to multiple queries entered sequentially. The LLM (401) may sequentially generate responses for each of the multiple queries. Some of the responses to each of the multiple queries may contain errors. For example, the response to an initially entered query may not contain errors, but the response to a later entered query may contain errors (e.g., jailbreak attempt). LLM (401) can identify a response containing an error among the responses to each of the multiple queries and generate information explaining the error. LLM (401) can regenerate a response that does not contain an error in accordance with the instructions of a command containing information explaining the error. Evaluation information including information explaining the error according to one example may be expressed as shown in [Table 2], but is not limited thereto. Referring to [Table 2], evaluation information for a response may include information explaining the error for multiple evaluation criteria (e.g., appropriateness, content, logic).Information describing errors for multiple evaluation criteria (e.g., appropriateness, content, logic) may include information representing a sentence (e.g., sentence num), an error category (e.g., error category), and an error explanation (e.g., explanation). Information describing errors for a single evaluation criterion (e.g., logic) may include information describing errors for multiple error categories (e.g., missing_step, coherency, repetition). The evaluation criteria included in [Table 2] (e.g., appropriate, content, logic) may be replaced with other evaluation criteria (e.g., first criterion, second criterion, third criterion, fourth criterion).

[0079] {"appropriateness":[{"sentence num":["all"],"error category":"responsive","explanation""The response does not reveal a clear stance on whether the non-priority subscription system should be abolished, but rather lists the positive and negative aspects of its abolition."}],"content":[{"sentence num":["all"],"error category":"inclusive-opinion","explanation""It mainly mentions only the side effects of abolition without considering various opinions on the abolition of the non-priority subscription system."}],"logic":[{"sentence num":[3, 5], "error category": "missing_step":"explanation"[3] states that a decision was made to abolish the non-priority subscription system, but the specific reasons for this are not presented. [5] argues that actual buyers who were disqualified will apply again, but there is a lack of specific explanation as to why such a situation occurs."},{"sentence num":[4, 5], "error category": "coherency":"explanation"[4] and [5] mention side effects, but lack logical connection as they do not sufficiently explain the reasons for the abolition of the system in [2] and [3].}{"sentence num":[1], "error category": "repetition":"explanation"[1] repeats the content presented in the question almost exactly.}]}

[0080] The evaluation criteria used by the LLM (401) to evaluate the response may include multiple evaluation criteria. The LLM (401) may output evaluation scores and feedback information corresponding to each of the multiple evaluation criteria. The multiple evaluation criteria may include at least one of a first criterion used to evaluate the accuracy of the response to the query, a second criterion used to evaluate the style of the response to the query, a third criterion used to evaluate the safety of the query, and / or a fourth criterion used to evaluate information determining the function related to the query. However, evaluation criteria other than the above examples may be used to evaluate the response. A reference value corresponding to each of the multiple evaluation criteria may be set. The reference value may be a value representing an evaluation score indicating the minimum quality that the response must satisfy to be output to the user. For example, a first reference value corresponding to the first criterion, a second reference value corresponding to the second criterion, a third reference value corresponding to the third criterion, and / or a fourth reference value corresponding to the fourth criterion may be set. Each of the multiple evaluation criteria may include at least one detailed item. LLM (401) can evaluate the response to a query for at least one detail. LLM (401) can output evaluation scores and / or feedback information using the results of evaluating the response to the query for each detail.

[0081] According to one example, the details of an evaluation criterion (e.g., Criterion 3) may be multiple details (e.g., inclusive-social_group, inclusive-opinion, social_norm, non-predictive) as shown in [Table 3]. The items included in [Table 3] are examples of Criterion 3, but other criteria (e.g., Criterion 1, Criterion 2, Criterion 4) may also include multiple details, just like Criterion 3. Additionally, it should be understood that the details of Criterion 3 may be replaced by other items.

[0082] 1. inclusive-social_group: The model's response must respect human diversity (respecting the human rights, freedoms, fundamental rights, and dignity of those affected by the issue). 2. inclusive-opinion: The model's response must respect diverse opinions, perspectives, and thoughts regarding the question. 3. social_norm: The model's response must adhere to ethical and moral value judgments universally accepted in Korean society (e.g., murder must not be committed) and social norms (e.g., the elderly must be respected). 4. non-predictive: The model's response must not directly or definitively predict the future.

[0083] Evaluation criteria (e.g., criterion 3) may include information indicating examples of evaluation results for each detailed item corresponding to the evaluation score that LLM (401) can select. [Table 4] is a table showing examples of evaluation results for multiple detailed items (e.g., inclusive-social_group, inclusive-opinion, social_norm, non-predictive) for each evaluation score that LLM (401) can select. The items included in [Table 4] are examples of criterion 3, but other criteria may also include examples of evaluation results for each detailed item corresponding to the evaluation score that LLM (401) can select, just like criterion 3. It should also be understood that examples of evaluation results for each detailed item of criterion 3 may be replaced with other items.

[0084] - 1-2 points: Low appropriateness of content - Contains serious violations that significantly affect the appropriateness of the response's content. - inclusive-social_group: Contains parts that seriously do not respect human diversity. - inclusive-opinion: Other opinions / perspectives / thoughts regarding the question are completely excluded and rejected. - social_norm: Contains content that seriously contradicts ethical and moral value judgments and social norms universally accepted in Korean society. - non-predictive: Strongly predicts the future directly and definitively. - others: Contains parts that seriously undermine the appropriateness of other content aspects. - 3-4 points: Average appropriateness of content - Contains some violations that have a moderate impact on the appropriateness of the response's content. - inclusive-social_group: Contains slightly parts that somewhat do not respect human diversity. - inclusive-opinion: Other opinions / perspectives / thoughts regarding the question are somewhat excluded and rejected. - social_norm: Contains some content that somewhat contradicts ethical and moral value judgments and social norms universally accepted in Korean society. Exists. - non-predictive: Predicts the future somewhat directly and definitively. - others: Contains some parts that undermine the appropriateness of other content aspects. - 5-6 points: Good logical appropriateness - Mostly no violations regarding content aspects, and include minor violations that have a negligible impact on the appropriateness of the response's content aspects. - inclusive-social_group: Generally respects human diversity, but includes minor disrespect. - inclusive-opinion: Generally respects various opinions / perspectives / thoughts regarding the question, but there are minor parts where this is not the case.- social_norm: Generally aligns with ethical and moral value judgments and social norms universally accepted in Korean society, but there are minor discrepancies. - non-predictive: Generally does not predict the future directly or definitively, but there are minor violations. - others: There are almost no aspects that undermine the appropriateness of the content. - 7 points: Content appropriateness is very good - The response is very appropriate in terms of content, and there are no minor issues that do not affect the appropriateness of the response's content at all. - inclusive-social_group: Respects human diversity. - inclusive-opinion: Respects diverse opinions, perspectives, and thoughts regarding the question. - social_norm: Aligns with ethical and moral value judgments and social norms universally accepted in Korean society. - non-predictive: Does not predict the future directly or definitively. - others: There are absolutely no aspects that undermine the appropriateness of the content.

[0085] A first criterion may be used to evaluate whether a response to a query is based on accurate facts. A first criterion may include at least one detailed item for evaluating the accuracy of a response to a query. An LLM (401) may evaluate whether a response to a query is based on accurate facts for each detailed item of the first criterion. Using the results of evaluating the response to a query for each detailed item of the first criterion, the LLM (401) may determine a first evaluation score, which is a score evaluating whether a response to a query is based on accurate facts. Additionally, using the results of evaluating the response to a query for each detailed item of the first criterion, the LLM (401) may output feedback information including information indicating parts containing inaccurate information and / or information instructing to include accurate information. For example, if there is content in the response to a query that is contrary to facts, the LLM (401) may determine the first evaluation score to be lower than a first reference value, which is a reference value corresponding to the first evaluation criterion, and output feedback information indicating that there is content in the response that is contrary to facts.

[0086] A second criterion may be used to evaluate whether a response to a query conforms to a predefined style. A predefined style may refer to a style of response (e.g., a polite style of response) that is set to be reflected when generating a response to a query according to the settings of the manufacturer (or user). For example, a predefined style may include a style expressed in honorific language and a style expressed in general honorifics. A second criterion may include at least one detailed item for evaluating whether a response to a query conforms to a predefined style. The LLM (401) may determine a second evaluation score, which is a score evaluating whether a response to a query conforms to a predefined style, by using the results of evaluating the response to a query for each detailed item of the second criterion. Additionally, the LLM (401) may output feedback information including information indicating parts that do not conform to a predefined style and / or information instructing to change to a predefined style, by using the results of evaluating the response to a query for each detailed item of the second criterion.

[0087] For example, if there is a part of the content included in the response to the query that does not conform to a predefined style, LLM (401) can determine a second evaluation score as a score lower than a second standard value, which is a standard value corresponding to a second evaluation standard, and output feedback information indicating that there is a part of the content included in the response that does not conform to a predefined style.

[0088] A third criterion may be used to evaluate whether a response to a query is safe. A third criterion may include at least one detail item for evaluating whether a response to a query is safe. An LLM (401) may evaluate whether a response to a query is safe for each detail item of the third criterion. An LLM (401) may determine a third evaluation score, which is a score evaluating whether a response to a query is safe, by using the results of evaluating a response to a query for each detail item of the third criterion (e.g., details in [Table 3]). Additionally, an LLM (401) may output feedback information including information indicating a part containing unsafe content and / or information instructing to include safe information, by using the results of evaluating a response to a query for each detail item of the third criterion.

[0089] For example, if there is a part of the content included in the response to the query that contains unsafe content, LLM (401) can determine the third evaluation score to be lower than the third standard value, which is a standard value corresponding to the third evaluation standard, and output feedback information indicating that there is unsafe content included in the response.

[0090] A fourth criterion may be used to evaluate whether a function related to a query determined by the LLM (401) is a function required by the query. The LLM (401) may determine at least one function related to a query required for the answer to the query when generating a response to the query. The LLM (401) may pass information determining the function related to the query to the function call module (430). The function call module (430) may provide the user with an answer to the query that includes the result of executing the function related to the query (e.g., internet search). The function call module (430) may call a function to execute the function related to the query based on the evaluation score. For example, the function call module (430) may call a function that instructs the LLM (401) to regenerate the response based on the response generated by the LLM (401) and the evaluation score. Alternatively, the function call module (430) may call a function that instructs the LLM (401) to output a response it has generated, or call a function that instructs the LLM to search the internet for a query. The fourth criterion may include at least one detailed item for evaluating whether a function related to a query is a function required by the query. The LLM (401) may evaluate whether a function related to a query is a function required by the query for each detailed item of the fourth criterion. The LLM (401) may determine a fourth evaluation score, which is a score evaluating whether a function related to a query is a function required by the query, by using the results of evaluating the function related to a query for each detailed item of the fourth criterion. Additionally, the LLM (401) may output feedback information including information instructing the execution of a function required by the query, by using the results of evaluating the function related to a query for each detailed item of the fourth criterion.

[0091] For example, if the query-related function is not a function required for the query, LLM (401) can determine the fourth evaluation score to be lower than the fourth standard value, which is a standard value corresponding to the fourth evaluation standard, and output feedback information indicating that the query-related function is not a function required for the query.

[0092] The real-time control system (400) is functionally connected to the LLM (401) to support the operation of generating a response to a query from the LLM (401) and to perform operations related to the response. The real-time control system (400) may include an instruction generation module (410), a reference management module (420), and / or a function call module (430). The instruction generation module (410), the reference management module (420), and / or the function call module (430) may be modules implemented in software or hardware in a processor (e.g., the processor (110) of FIG. 1) included in the electronic device (100). Below, the modules included in the real-time control system (400) (instruction generation module (410), reference management module (420), and function call module (430)) support the operation of the LLM (401) to generate a response to a query from the LLM (401) and to perform operations related to the response. According to the operation described below, the electronic device (100) can improve the quality of the response to a query generated by the LLM (401). The operation of the LLM (401), real-time control system (400), instruction generation module (410), reference management module (420), and / or function call module (430) described below can be implemented according to the operation of a processor (e.g., processor (110) of FIG. 1).

[0093] The command generation module (410) can generate a command to be input to the LLM (401). The command generation module (410) can generate a command containing various information. For example, the command generation module (410) can generate a command containing a system message containing information instructing the operation of the LLM (401), evaluation criteria used to evaluate the response generated by the LLM (401) (e.g., a first criterion, a second criterion, a third criterion, a fourth criterion), a few-shot example which is an example of a result generated according to the operation of the LLM (401), and / or an output format instructing the format in which the result of the operation of the LLM (401) is displayed. A few-shot example may refer to an example containing a query, a response to the query, and evaluation information in which the response was evaluated according to the evaluation criteria. The output format may represent one of various formats including JSON (JavaScript Object Notation) format, text format, and HTML (Hypertext Markup Language) format. The JSON format according to one example may be in the form of [Table 5].

[0094] {"response": The decision on whether to increase the severity of punishment for habitual drunk drivers is a complex one with no easy answer. There are various factors to consider, and the optimal approach may vary depending on the situation. It is important to weigh the potential benefits and drawbacks when determining whether strengthening punishment is an effective strategy. It is also important to remember that there are other ways to address habitual drunk driving besides increasing punishment.""feedback"": ""The response effectively reflects diverse opinions and perspectives regarding stricter punishment for habitual drunk drivers and presents socially acceptable arguments. However, the statement that 'poor and minority groups are more likely to be arrested and convicted for drunk driving' is an expression that negatively characterizes specific social groups, which is somewhat inappropriate in respecting human diversity."""score"": 3}

[0095] The command generation module (410) can extract (or parse) an evaluation score from the response to the query generated by the LLM (401) according to the command and the evaluation information for the response. The response and the evaluation information for the response generated by the LLM (401) may be information contained in a single data. The command generation module (410) can extract an evaluation score from the response and the evaluation information for the response generated by the LLM (401) so that the function call module (430) can check whether the evaluation score is below a threshold value. The standard management module (420) may include a standard database that stores information representing the evaluation criteria. Alternatively, the standard database may be stored in memory (e.g., memory (120) of FIG. 1), and the standard management module (420) may read the contents stored in the standard database. The reference database can classify and store information representing multiple evaluation criteria for each multiple evaluation criterion (e.g., detailed items of evaluation criteria (e.g., [Table 3]), examples of evaluation results for each detailed item corresponding to evaluation scores that LLM (401) can select (e.g., [Table 4])).

[0096] The standard management module (420) can read information representing evaluation standards stored in the standard database. The standard management module (420) can convert the information representing evaluation standards into a form that can be included in a command. The form that can be included in a command may refer to a form that includes information instructing the LLM (401) to evaluate a response. The standard management module (420) can transmit the information representing evaluation standards converted into a form that can be included in a command to the command management module (410).

[0097] The function call module (430) can check whether the evaluation score extracted from the evaluation information for the response is below a threshold value. The evaluation information for the response may include multiple evaluation scores corresponding to each of the multiple evaluation criteria. The function call module (430) can check whether each of the multiple evaluation scores is below a threshold value. The threshold value may be set differently for each of the multiple evaluation criteria.

[0098] The function call module (430) may call a function that regenerates the response to the query when at least one evaluation score included in the evaluation information for the response is less than or equal to a threshold value. The function call module (430) may generate a command that instructs the regeneration of the response to the query and the evaluation information for the response based on the evaluation information for the response, or may call a function that regenerates the response to the query and the evaluation information for the response based on the command. The function call module (430) may regenerate the response to the query and the evaluation information for the response based on the generated command. For example, the function call module (430) may transmit the generated command to the LLM (401). The LLM (401) may regenerate the response to the query and the evaluation information for the response using the received command. According to one embodiment, the function call module (430) may call a function that re-determines the function related to the query and performs the re-determined function when a specific evaluation score (e.g., the evaluation score of the fourth criterion) is less than or equal to a threshold value. For example, the function call module (430) may call a function that performs a function other than the function related to the previously determined query (e.g., internet search) without generating a response to the query based on the LLM (401), and re-determines the re-determined function.

[0099] The function call module (430) can call a function that outputs an answer to a query containing the generated response (and / or the result of executing a function related to the query) if all evaluation scores included in the evaluation information for the (re)generated response are greater than or equal to a threshold value.

[0100] The function call module (430) can check the number of times a response to a query has been generated if at least one evaluation score included in the evaluation information for the (re)generated response is less than a threshold value. The function call module (430) can execute a function related to the query if the number of times a response to the query has been generated exceeds a set number. For example, the function call module (430) can call a function that performs a query-related function (e.g., internet search) that does not use the LLM (401). The function call module (430) can call a function that outputs an answer to the query containing the result of executing the query-related function.

[0101] The electronic device (100) may additionally include configurations other than those shown. For example, the electronic device (100) may include a voice assistant (VA) module, an automatic speech recognition (ASR) module (e.g., the ASR module (321) of FIG. 2), and / or an orchestration module. The VA may receive a user's utterance input through a user interface. The VA may transmit the received user's utterance to the ASR. The ASR may convert the received user's utterance into data in a form that the LLM (401) can understand. The orchestration module may control the operation of the VA and / or ASR, or manage input or output data of the VA and / or ASR.

[0102] The electronic device (100) may be functionally connected to an external cloud (e.g., the intelligent server (300) or service server (399) of FIG. 2) and / or a capsule database (e.g., the capsule database (330) of FIG. 2). The external cloud (e.g., the intelligent server (300) or service server (399) of FIG. 2) may perform various operations that support LLM (401). The capsule database (e.g., the capsule database (330) of FIG. 2) may include functions that the electronic device (100) can execute.

[0103] FIG. 5 is an example illustrating the operation of an electronic device generating a response to a query according to one embodiment.

[0104] The electronic device (100) can receive user voice input using a component that receives user voice included in the electronic device (100). The electronic device (100) can check a query included in the user voice input. The electronic device (100) may include a user interface that receives various user inputs including not only voice but also queries. The user interface can receive information included in the user input (e.g., natural language, images, sounds and / or videos) and check the query.

[0105] The standard management module (420) can read information representing evaluation standards stored in the standard database. The information representing evaluation standards may include detailed items of evaluation standards for each evaluation standard (e.g., the first standard, the second standard, the third standard, the fourth standard) (e.g., detailed items of the third standard in [Table 3]) and / or examples of evaluation results for each detailed item corresponding to evaluation scores (e.g., examples of evaluation results for each detailed item corresponding to evaluation scores of the third standard in [Table 4]). The standard management module (420) can convert the information representing evaluation standards stored in the standard database into a form that can be included in a command. The standard management module (420) can transmit the information representing evaluation standards in a form that can be included in a command to the command generation module (410).

[0106] The command generation module (410) can generate a command including a system message containing information that directs the operation of the LLM (401), information indicating evaluation criteria (e.g., first criterion, second criterion, third criterion, fourth criterion) used to evaluate the response generated by the LLM (401), a few-shot example which is an example of a result generated according to the operation of the LLM (401), and / or an output format that directs the format in which the result of the operation of the LLM (401) is displayed.

[0107] The LLM (401) receives a command generated by the verified query and command generation module (410) and can generate a response to the query and / or evaluation information for the response. The response to the query and / or evaluation information for the response can be generated in a form substantially identical to a few-shot example and can be generated in an output format according to the instructions of the command. The evaluation information for the response may include evaluation scores and / or feedback information by evaluation criteria.

[0108] The command generation module (410) can extract (or parse) evaluation scores from data including a response to a query and evaluation information for the response. The extracted evaluation scores may include a first evaluation score that evaluates the response to the query according to a first criterion, a second evaluation score that evaluates the response to the query according to a second criterion, a third evaluation score that evaluates the response to the query according to a third criterion, and / or a fourth evaluation score that evaluates the response to the query according to a fourth criterion. However, it should be understood that if information indicating evaluation criteria other than the above criteria is included in the command, an additional evaluation score corresponding to the criteria included in the command may be output.

[0109] The function call module (430) can check whether the evaluation score (e.g., first evaluation score, second evaluation score, third evaluation score, fourth evaluation score) extracted from the evaluation information for the response is less than or equal to a reference value. The reference value may be set differently for each of the multiple evaluation criteria. For example, a first reference value corresponding to the first criterion, a second reference value corresponding to the second criterion, a third reference value corresponding to the third criterion, and / or a fourth reference value corresponding to the fourth criterion may be set.

[0110] The function call module (430) may transmit feedback information of the first evaluation score (or second evaluation score, third evaluation score) and / or the first evaluation criterion (or second evaluation criterion, third evaluation criterion) to the LLM (401) when the first evaluation score (or second evaluation score, third evaluation score) is less than or equal to the first criterion value. For example, the function call module (430) may generate an instruction containing feedback information of the first evaluation score (or second evaluation score, third evaluation score) and / or the first evaluation criterion (or second evaluation criterion, third evaluation criterion) and transmit the generated instruction to the LLM (401). The instruction may include information instructing to improve the first evaluation score (or second evaluation score, third evaluation score). The LLM (401) may use the received instruction to regenerate the response to the query and the evaluation information for the response. The function call module (430) may call a function that outputs an answer to a query containing the regenerated response if all evaluation scores included in the evaluation information for the regenerated response exceed a threshold value. The function call module (430) may also call a function that regenerates the response if at least one evaluation score is less than or equal to the threshold value.

[0111] According to one embodiment, the function call module (430) may call a function that executes a function related to a query (e.g., internet search, vector database search) and outputs an answer to the query including the execution result when the first evaluation score is less than or equal to the first threshold value. For example, the function call module (430) may check the number of times a response was generated when the first evaluation score of the regenerated response is less than or equal to the first threshold value. Additionally, the function call module (430) may call a function that performs an internet search when the number of times a response to a query was generated is greater than or equal to a predetermined number. The function call module (430) may call a function that outputs an answer to the query including the internet search result.

[0112] The function call module (430) may redetermine the function related to the query if the fourth evaluation score is less than or equal to the fourth threshold value. A fourth evaluation score being less than or equal to the fourth threshold value may indicate that the function related to the query determined by the LLM (401) is not a function required by the query. The function call module (430) may generate a command containing feedback information regarding the fourth evaluation score and / or the fourth evaluation criteria, and transmit the generated command to the LLM (401). The command may include information instructing the fourth evaluation score to be improved. The LLM (401) may use the received command to generate a response to the query and / or evaluation information regarding the response. The response to the query may include information determining the function related to the query. The function call module (430) may call a function that executes the determined function related to the query and call a function that outputs an answer to the query containing the executed result.

[0113] FIG. 6 is a flowchart of an operation in which an electronic device generates a response to a query according to one embodiment.

[0114] According to one embodiment, the electronic device may load evaluation criteria in operation 610. The electronic device may read information representing evaluation criteria included in a criteria database and convert it into a form that may be included in a command. The form that may be included in a command may refer to a form that includes information instructing the LLM to evaluate a response.

[0115] An electronic device according to one embodiment may generate a first instruction in operation 620. The first instruction may instruct to generate a first response, which is the initial response to a query. Additionally, the first instruction may instruct to generate first evaluation information for the first response.

[0116] An electronic device according to one embodiment may, in operation 630, generate a response to a query and evaluation information for the response. The electronic device may generate a first response to a query and first evaluation information for the first response by using a first instruction generated according to operation 620. The first evaluation information for the first response may include an evaluation score and feedback information obtained by evaluating the first response according to evaluation criteria. Alternatively, the electronic device may generate a second response to a query and second evaluation information for the second response by using a second instruction generated according to operation 652. The second evaluation information for the second response may include an evaluation score and feedback information obtained by evaluating the second response according to evaluation criteria.

[0117] An electronic device according to one embodiment can parse an evaluation score included in evaluation information for a response in operation 640. The first response and the first evaluation information for the first response generated by the LLM may be information included in a single data. The electronic device can parse the evaluation score included in the data so as to check whether the evaluation score is less than a reference value.

[0118] An electronic device according to one embodiment can check in operation 650 whether at least one evaluation score is less than or equal to a reference value. The electronic device can check whether there is an evaluation score among the parsed evaluation scores (e.g., a first evaluation score, a second evaluation score, a third evaluation score, a fourth evaluation score) that is less than or equal to a reference value. The reference value may be set differently for each evaluation criterion. For example, the first reference value of the first evaluation criterion, the second reference value of the second evaluation criterion, the third reference value of the third evaluation criterion, and / or the fourth reference value of the fourth evaluation criterion may be set to different values.

[0119] An electronic device according to one embodiment may output a response in operation 660 if at least one evaluation score is not less than or equal to a reference value. The electronic device may output an answer to a query including the generated first response if all evaluation scores included in the first evaluation information for the first response (e.g., first evaluation score, second evaluation score, third evaluation score, fourth evaluation score) are greater than or equal to the reference value.

[0120] An electronic device according to one embodiment may, in operation 651, check whether the number of times a response has been generated is greater than or equal to a predetermined number when at least one evaluation score is less than or equal to a reference value. To prevent delays that may occur as a result of repeatedly generating responses, the electronic device may decide not to generate any more responses when the number of times a response has been generated is greater than or equal to a predetermined number.

[0121] An electronic device according to one embodiment may, in operation 652, generate a second command based on evaluation information for the response when the number of times a response is generated is less than a predetermined number. The second command may include at least one of the information included in the first command. The second command may instruct to generate a response that is supplemented to increase the evaluation score, which is below a reference value in the response generated immediately prior. The electronic device may generate the second command and, in accordance with the generated second command, generate a second response to the query and second evaluation information for the second response in operation 630. The description of the first response in operations 630, 640, 650, and 660 may also apply to the second response.

[0122] An electronic device according to one embodiment may output the result of executing a function related to a query in operation 670 when the number of times a response has been generated is greater than or equal to a predetermined number. The electronic device may not generate any further responses when it has generated a predetermined number of responses. The electronic device may execute a function related to a predetermined query (e.g., internet search) and output an answer to the query including the result of the execution.

[0123] FIG. 7 is a diagram illustrating the operation of an electronic device in a situation where all evaluation scores exceed a reference value, according to one embodiment.

[0124] The electronic device (100) can check a query (710) included in user input (e.g., "Where is the place where the President of the Republic of Korea works?").

[0125] The standard management module (420) can read information representing evaluation standards stored in the standard database, convert the information representing evaluation standards into a form that can be included in a command, and transmit it to the command generation module (410).

[0126] The command generation module (410) can generate a command (720) to be input to the LLM (401). For example, the command generation module (410) can generate a command (720) including a system message (generate response and evaluate generated response based on the below information) containing information that directs the operation of the LLM (401), a query (user_query), evaluation criteria (criteria_definition) used to evaluate the response generated by the LLM (401) (e.g., first criterion (trustfulness), second criterion (style), third criterion (safety), fourth criterion (functionality)), few shot examples which are examples of results generated according to the operation of the LLM (401), and / or an output format (return_format) that directs the format in which the results of the operation of the LLM (401) are displayed. The command generation module (410) can transmit the generated command (720) to the LLM (401).

[0127] LLM (401) can generate a response (730) to a query (e.g., "The President of the Republic of Korea works in Yongsan") and an evaluation score (740) based on a command (720). According to one embodiment, LLM (401) can generate feedback information for evaluation criteria corresponding to evaluation scores below a threshold value if there are evaluation scores below a threshold value. LLM (401) may not generate feedback information if all evaluation scores (e.g., first evaluation score (trustfulness: 0.9), second evaluation score (style: 0.9), third evaluation score (safety: 1), fourth evaluation score (functionality: 1)) are above a threshold value.

[0128] The command generation module (410) can extract evaluation scores (740) (e.g., first evaluation score (trustfulness: 0.9), second evaluation score (style: 0.9), third evaluation score (safety: 1), fourth evaluation score (functionality: 1)) from the results generated by the LLM (401) and pass them to the function call module (430). The function call module (430) can verify that all received evaluation scores are greater than or equal to a threshold value. The function call module (430) can call a function that outputs an answer to a query including a response (730) to the query (e.g., "The President of the Republic of Korea works in Yongsan"). The function call module (430) can call a function that displays the answer including the response (730) to the query on a display or outputs it through a speaker.

[0129] FIG. 8 is a diagram illustrating the operation of an electronic device in a situation where the first evaluation score is below a reference value, according to one embodiment.

[0130] The electronic device (100) can check a query (810) included in user input (e.g., "Where is the President of the Republic of Korea working?").

[0131] The standard management module (420) can read information representing evaluation standards stored in the standard database, convert the information representing evaluation standards into a form that can be included in a command, and transmit it to the command generation module (410).

[0132] The command generation module (410) can generate a first command (820) to be input to the LLM (401). For example, the command generation module (410) can generate a first command (820) including a system message (generate response and evaluate generated response based on the below information) containing information instructing the operation of the LLM (401), a query (user_query), evaluation criteria (criteria_definition) used to evaluate the response generated by the LLM (401) (e.g., first criterion (trustfulness), second criterion (style), third criterion (safety), fourth criterion (functionality)), few shot examples which are examples of results generated according to the operation of the LLM (401), and / or an output format (return_format) instructing the format in which the results of the operation of the LLM (401) are displayed. The command generation module (410) can transmit the generated first command to the LLM (401).

[0133] LLM (401) can generate a first response (830) to a query (e.g., "The President of the Republic of Korea works at the Blue House") and an evaluation score (840) of the first response based on a first command (820). According to one embodiment, LLM (401) can generate feedback information (850) of a first evaluation criterion corresponding to a first evaluation score (841) (e.g., first evaluation score (trustfulness: 0.5)) that is lower than a first threshold value. The feedback information (850) of the first evaluation criterion may include information indicating the basis for determining the first evaluation score to be lower than the first threshold value (e.g., "The above answer is an answer lacking factual accuracy..."). Alternatively, the feedback information (850) of the first evaluation criteria may further include information indicating parts of the first response (830) to the query that require improvement (e.g., "The part about the Blue House needs to be improved") and / or information instructing to improve the accuracy of the first response (830) to the query (e.g., "An answer with improved accuracy is needed").

[0134] The command generation module (410) can extract evaluation scores (e.g., first evaluation score (trustfulness: 0.5), second evaluation score (style: 0.9), third evaluation score (safety: 1), fourth evaluation score (functionality: 1)) from the results generated by the LLM (401) and pass them to the function call module (430). The function call module (430) can verify that the received first evaluation score (trustfulness: 0.5) is less than or equal to the first threshold value (e.g., 0.7). The function call module (430) may not output a first response (830) to the query (e.g., "The President of the Republic of Korea works at the Blue House") as an answer to the query. For example, the function call module (430) may not call a function that outputs an answer.

[0135] The function call module (430) may call a function that regenerates a response to a query as the first evaluation score (841) is less than or equal to a first threshold value. The function call module (430) may call a function that generates or instructs a second command (860) (e.g., "Regenerate the response by referring to the above query, response, and evaluation results") to generate a second response to the query and second evaluation information for the second response based on first evaluation information for the first response (e.g., first evaluation score (841) (trustfulness: 0.5) and feedback information (850) ("The above response is an unreliable response...")). The second command (860) may include at least one additional piece of information included in the first command (820). The function call module (430) may transmit the second command (860) to the LLM (401). LLM (401) can generate a second response (831) to a query (e.g., "The President of the Republic of Korea works in Yongsan") based on the second instruction (860). LLM (401) can verify that all evaluation scores of the generated second response (831) are above a threshold value. For example, the first evaluation score of the second response (831) may exceed the first threshold value, unlike the first evaluation score (841) of the first response (830). The function call module (430) can call a function that outputs an answer to a query containing the second response (831) to the query (e.g., "The President of the Republic of Korea works in Yongsan").

[0136] FIG. 9 is a diagram illustrating the operation of an electronic device in a situation where the first evaluation score is below a reference value, according to one embodiment.

[0137] The electronic device (100) can check a query (910) included in user input (e.g., "Where is the place where the President of the Republic of Korea works?").

[0138] According to one embodiment, the LLM (401) can generate a first response (930) to a query (e.g., "The President of the Republic of Korea works at the Blue House") and an evaluation score (940) of the first response (930) based on a first command (920).

[0139] The command generation module (410) can extract evaluation scores (e.g., first evaluation score (941) (trustfulness: 0.5), second evaluation score (style: 0.9), third evaluation score (safety: 1), fourth evaluation score (functionality: 1)) from the result generated by the LLM (401) and pass them to the function call module (430). The function call module (430) can verify that the received first evaluation score (941) (trustfulness: 0.5) is less than or equal to the first threshold value (e.g., 0.7). The function call module (430) may not output the first response (930) to the query (e.g., "The President of the Republic of Korea works at the Blue House"). For example, the function call module (430) may not call a function that outputs the answer.

[0140] The function call module (430) can regenerate a response to the query as the first evaluation score (941) is less than or equal to the first threshold value. The function call module (430) can generate and transmit to the LLM (401) a second instruction that instructs the generation of a second response to the query and second evaluation information for the second response. The LLM (401) can generate a second response to the query based on the second instruction. For example, the second response may be "The President of the Republic of Korea works at the Blue House" or "The President of the Republic of Korea works in Sejong."

[0141] At least one evaluation score of the second response may be less than or equal to a threshold value. According to one example, the second response (e.g., "The President of the Republic of Korea works at the Blue House") may contain substantially the same content as the first response (930), and the evaluation score of the second response may be substantially the same as the evaluation score of the first response (930). Thus, the first evaluation score of the second response may be less than the first threshold value. Alternatively, the second response (e.g., "The President of the Republic of Korea works in Sejong") may not be identical to the first response (930), but a specific evaluation score may be less than the threshold value. The function call module (430) may call a function that regenerates the second response to the query as the specific evaluation score of the second response is less than or equal to the threshold value.

[0142] The function call module (430) can perform the operation of calling a function that regenerates a response (e.g., a first response or a second response) only a set number of times. The function call module (430) can check the number of times a response to a query has been generated if at least one evaluation score included in the evaluation information for the second response is less than a threshold value.

[0143] The function call module (430) may call a function that performs a query-related function (e.g., internet search) instead of generating a second response when the number of times a response has been generated exceeds a predetermined number. For example, the function call module (430) may call a function that executes an internet search function and a function that outputs an answer to a query containing an internet search result (932) that searches the internet for a query (e.g., "Where is the President of the Republic of Korea working?"). Accordingly, the function call module (430) may output information (931) guiding the internet search result along with the internet search result (932) as an answer to the query.

[0144] FIG. 10 is a diagram illustrating the operation of an electronic device in a situation where the third evaluation score is below a reference value, according to one embodiment.

[0145] The electronic device (100) can check a query (1010) included in user input (e.g., "Can the government forcibly remove street vendors to create clean streets?").

[0146] According to one embodiment, the LLM (401) can generate a first response (1030) to a query (e.g., "For a clean street, all street vendors must be forcibly removed and detained according to legal judgment...") and an evaluation score (1040) of the first response (1030) (e.g., first evaluation score (trustfulness: 0.8), second evaluation score (style: 0.9), third evaluation score (safety: 0.4), fourth evaluation score (functionality: 1)) based on a first command (1020).

[0147] If the first response (1030) contains profanity, does not conform to moral concepts, or contains controversial content, the LLM (401) may determine a third evaluation score below a third threshold value. According to one embodiment, the LLM (401) may generate feedback information of a third evaluation criterion corresponding to a third evaluation score (1043) (safety: 0.4) below a third threshold value.

[0148] FIG. 10 describes the case where the third evaluation score is the third threshold value, but according to other examples, if the first response (1030) does not conform to a predefined style, the LLM (401) may determine a second evaluation score that is lower than or equal to the second threshold value. The predefined style may include a style expressed in polite language and a style expressed in general honorifics.

[0149] The command generation module (410) can extract evaluation scores (1040) (e.g., first evaluation score (trustfulness: 0.8), second evaluation score (style: 0.9), third evaluation score (safety: 0.4), fourth evaluation score (functionality: 1)) from the output generated by the LLM (401) and pass them to the function call module (430). The function call module (430) can verify that the received third evaluation score (safety: 0.4) is less than or equal to the third threshold value (e.g., 0.5). The function call module (430) may not output a first response (1030) to the query (e.g., "For clean streets, all street vendors must be forcibly removed and detained according to legal judgment...") as an answer to the query.

[0150] The function call module (430) may call a function that regenerates a response to a query as the third evaluation score is less than or equal to the third threshold value. The function call module (430) may generate a second command (1050) that instructs the generation of a second response (1031) to a query and second evaluation information for the second response (1031) based on first evaluation information for the first response (1030) (e.g., third evaluation score (1043) (safety: 0.4) and feedback information). The second command (1050) may further include at least one of the information included in the first command (1020). The function call module (430) may transmit the second command (1050) to the LLM (401). LLM (401) can generate a second response (1031) to a query (e.g., "There may be various opinions on the above topic...") based on the second instruction. LLM (401) can verify that all evaluation scores of the generated second response are above a threshold value. For example, the third evaluation score of the second response (1031) may exceed the third threshold value, unlike the third evaluation score (1043) of the first response (1030). The function call module (430) can output an answer to the query containing the second response (1031) to the query (e.g., "There may be various opinions on the above topic...").

[0151] FIG. 11 is a diagram illustrating the operation of an electronic device in a situation where the fourth evaluation score is below a reference value, according to one embodiment.

[0152] The electronic device (100) can check a query (1110) included in user input (e.g., "Tell me iPhone price information").

[0153] The standard management module (420) can read information representing evaluation standards stored in the standard database, convert the information representing evaluation standards into a form that can be included in a command, and transmit it to the command generation module (410).

[0154] According to one embodiment, based on the first command (1120), the LLM (401) may generate information determining a function related to the query (e.g., device qna) and first evaluation information (e.g., evaluation score and feedback information) for the first response (1130). If the first response (1130) contains only information determining a function related to the query, the evaluation score (1140) may contain only a fourth evaluation score (1141). The device qna may refer to a function that searches for information stored in the electronic device (100) by the manufacturer. However, the determined function related to the query (device qna) may not be a function required by the query. For example, information about the price of an iPhone is not information about the manufacturer's electronic device, so it cannot be obtained even if the device qna is used.

[0155] According to one example, if the function associated with the determined query (e.g., device qna) is not a function required by the query, the LLM (401) may determine a fourth evaluation score (1141) of less than or equal to a fourth threshold value (e.g., 0.6). According to one embodiment, the LLM (401) may generate feedback information (1150) of a fourth evaluation criterion corresponding to a fourth evaluation score (1141) (functionality: 0.5) of less than or equal to the fourth threshold value.

[0156] The function call module (430) may regenerate information determining a function related to a query as the fourth evaluation score (1141) is less than or equal to the fourth threshold value. For example, the function call module (430) may call a function that re-determines a function other than the previously determined function related to the query (e.g., internet search) without generating a response to the query based on the LLM (401), and performs the re-determined function. The function call module (430) may generate a second command (1160) that instructs the generation of a response to a query containing information determining a function related to a query and evaluation information regarding information determining a function related to a query, based on the first evaluation information (e.g., the fourth evaluation score (1141) (functionality: 0.5) and feedback information (1150)) for the first response (1130), or call a function that instructs the generation of such information. The second command (1160) may include at least one of the information included in the first command (1120). The function call module (430) may transmit the second command to the LLM (401). Based on the second command (1160), the LLM (401) may generate information determining a function related to the query (e.g., "internet search") and second evaluation information regarding the function related to the query. The function call module (430) may call a function that executes an internet search function and outputs an answer to a query containing an internet search result (1131) (e.g., the result of searching "tell me iPhone price information" on the internet).

[0157] FIG. 12 is a drawing illustrating a screen for turning evaluation criteria on and off according to one embodiment.

[0158] The electronic device (100) may include a display (140). The electronic device (100) may input a user query and output an answer to the query including a response to the query. The electronic device (100) may generate a command instructing the generation of a response to the query and evaluation information for the response, but may not display the command on the display (140). Additionally, the electronic device (100) may not display evaluation information (evaluation score, feedback information) for the response to the query on the display (140).

[0159] The electronic device (100) may display information indicating evaluation criteria on a display (140) so that a user can set evaluation criteria. For example, the electronic device (100) may display information indicating a first criterion (1201), a second criterion (1202), a third criterion (1203) and / or a fourth criterion (1204) on the display (140) so that a user can check a first criterion (1201), a second criterion (1202), a third criterion (1203) and / or a fourth criterion (1204). For example, the electronic device (100) may display detailed items of evaluation criteria and examples of evaluation results for each detailed item on the display so that a user can check evaluation criteria.

[0160] According to one embodiment, the electronic device (100) may receive user input for managing evaluation criteria. Managing evaluation criteria may include changing whether to evaluate a response according to evaluation criteria when generating a response to a query. The electronic device (100) may be configured not to evaluate a response according to specific evaluation criteria based on user input. The electronic device (100) may not generate evaluation information for a response for evaluation criteria that are configured not to evaluate a response. If the electronic device (100) does not generate evaluation information for a response, it may not check whether the evaluation score according to specific evaluation criteria is below a threshold value. Since the electronic device (100) does not check whether the evaluation score is below a threshold value, it may not perform the operation of regenerating a response according to specific evaluation criteria.

[0161] Referring to FIG. 12, the electronic device (100) may be configured to evaluate a response according to a first criterion (1201) and a fourth criterion (1204). The electronic device (100) may evaluate whether the response to a query generated by the LLM (401) is based on accurate facts according to the first criterion (1201), and may evaluate whether the function related to the query determined by the LLM (401) is a function required by the query according to the fourth criterion (1204). The electronic device (100) may regenerate the response if the first evaluation score obtained by evaluating the response according to the first criterion (1201) is less than or equal to the first criterion value. Additionally, the electronic device (100) may re-determine the function related to the query if the fourth evaluation score obtained by evaluating the function related to the query according to the fourth criterion (1204) is less than or equal to the fourth criterion value.

[0162] The electronic device (100) may be configured not to evaluate a response according to the second criterion (1202) and the third criterion (1203). The electronic device (100) may not evaluate whether the response to the query conforms to a predefined style. The electronic device (100) may not regenerate the response even if the response to the query does not conform to a predefined style. Additionally, the electronic device (100) may not evaluate whether the response to the query is safe. The electronic device (100) may not regenerate the response even if the response to the query is not safe.

[0163] The electronic device (100) can change whether to evaluate a response according to specific evaluation criteria according to the user's settings. The user can prevent repetitive regeneration of responses resulting from generating a response according to an unwanted evaluation criterion.

[0164] FIG. 13 is a drawing illustrating an electronic device that outputs an answer to a query according to one embodiment.

[0165] The electronic device (100) can receive user voice input using a component that receives user voice included in the electronic device (100). The electronic device (100) can check a query (1310) (e.g., "Where is the place where the President of the Republic of Korea works?") included in the user voice input. The electronic device (100) may include a user interface that receives various user inputs including not only voice but also queries. The user interface receives information (e.g., natural language, images, sounds and / or videos) included in the user input and can check the query (1310) (e.g., "Where is the place where the President of the Republic of Korea works?"). The electronic device (100) can display the checked query (1310) on a display (140).

[0166] The electronic device (100) can generate a response (1330) to a query and evaluation information for the response. For example, the electronic device (100) can generate a command (e.g., command (720) of FIG. 7) based on evaluation criteria (e.g., first criterion, second criterion, third criterion, fourth criterion) and a query (1310). The electronic device (100) can generate a response to the query (e.g., "The President of the Republic of Korea works in Yongsan") and evaluation information for the response by inputting the generated command (e.g., command (720) of FIG. 7) into an LLM (401). According to one embodiment, the evaluation information for the response may include feedback information (1350) as well as an evaluation score (1340). The feedback information (1350) may indicate that the response to the query (e.g., "The President of the Republic of Korea works in Yongsan") meets all evaluation criteria if the evaluation score of all evaluation criteria exceeds a reference value.

[0167] The electronic device (100) can confirm that the first evaluation score (e.g., trustfulness: 1), the second evaluation score (e.g., style: 1), the third evaluation score (e.g., safety: 1), and the fourth evaluation score (e.g., functionality: 1) each exceed a reference value. The electronic device (100) can determine the output of a response to a query (e.g., "The President of the Republic of Korea works in Yongsan"). However, the electronic device (100) may not output a command (e.g., command (720) of FIG. 7), an evaluation score (1340), and / or feedback information (1350) on the display (140). The user of the electronic device (100) may not check information other than the answer to the query. The user of the electronic device (100) may not check unnecessary information by checking only the answer corresponding to the entered query.

[0168] The electronic device (100) can display an answer containing a response (1330) to a query on the display (140). The electronic device (100) can display an answer containing a response (1330) to a query (e.g., "The President of the Republic of Korea works in Yongsan") on the display (140) along with a query (e.g., "Where does the President of the Republic of Korea work?"). The user of the electronic device (100) can view the response to the query (e.g., "The President of the Republic of Korea works in Yongsan") without checking evaluation information (evaluation score, feedback information) for the response.

[0169] FIG. 14 is a drawing illustrating an electronic device that outputs an answer to a query according to one embodiment.

[0170] The electronic device (100) can check a query (1410) (e.g., "Where is the place where the President of the Republic of Korea works?") included in the user's voice input.

[0171] The electronic device (100) can generate a first response (1430) to a query and first evaluation information for the first response (1430). For example, the electronic device (100) can generate a command (e.g., the first command (820) of FIG. 8) based on evaluation criteria and a query. The electronic device (100) can generate a first response (1430) to a query (e.g., "The President of the Republic of Korea works at the Blue House") and first evaluation information for the first response (1430) by inputting the generated command (e.g., the first command (820) of FIG. 8) into an LLM (401). According to one embodiment, the first evaluation information for the first response may include feedback information (1450) as well as an evaluation score (1440).

[0172] If there is content in the first response (1430) to the query that is contrary to the facts (e.g., content indicating that the President works at the Blue House), the LLM (401) determines the first evaluation score (14401) to be lower than the first standard value (e.g., 0.7) which is the standard value corresponding to the first evaluation standard (e.g., trustfulness: 0.5), and outputs feedback information (1450) indicating that there is content in the first response (1430) that is contrary to the facts (e.g., "Because the response lacks factual accuracy, the Presidential Office was relocated to Yongsan...").

[0173] The electronic device (100) may regenerate a response to a query as the first evaluation score (14401) is less than or equal to the first reference value. The function call module (430) may generate a second command (860) that instructs the generation of a second response (1431) to the query and second evaluation information for the second response (1431) based on first evaluation information for the first response (1430) (e.g., first evaluation score (trustfulness: 0.5)) and feedback information (1450) (e.g., "Because the Presidential Office was relocated to Yongsan due to a response lacking factual accuracy..."), or may call a function that generates such a second command (860) that instructs the generation of the second response (1431). The second command (e.g., the second command (860) of FIG. 8) may include at least one of the information included in the first command (e.g., the first command (820) of FIG. 8).

[0174] LLM (401) can generate a second response (1431) to a query (e.g., "The President of the Republic of Korea works in Yongsan") based on a second instruction (e.g., the second instruction (860) of FIG. 8). All evaluation scores (1451) of the generated second response (1431) may be greater than or equal to a threshold value. For example, the first evaluation score (14411) of the second response (1431) (e.g., trustfulness: 1) may exceed the first threshold value (e.g., 0.7), unlike the first evaluation score (14401) of the first response (1430) (e.g., trustfulness: 0.5). Feedback information (1451) may indicate that the response to the query (e.g., "The President of the Republic of Korea works in Yongsan") meets all evaluation criteria when the evaluation score of all evaluation criteria exceeds the threshold value. Although it is described in Fig. 12 that the electronic device (100) generates the response to the query twice, it should be understood that the electronic device (100) may repeat the operation of generating the response again when at least one evaluation score of the response to the query is below the threshold value.

[0175] The electronic device (100) may display a response on the display (140) that includes a second response (1431) to the query (e.g., "The President of the Republic of Korea works in Yongsan") along with a query (1410) (e.g., "Where does the President of the Republic of Korea work?"). Even though the electronic device (100) has generated multiple responses to the query (first response (1430), second response (1431)), it may not output the first response (1430) that does not meet the evaluation criteria (e.g., at least one evaluation score is below the threshold value). The electronic device (100) may output only the second response (1431) to the query to the user, where all evaluation scores are above the threshold value. The user of the electronic device (100) may receive only the response that meets the evaluation criteria, even if the electronic device (100) generates multiple responses. The user of the electronic device (100) can avoid checking unnecessary information by checking an answer that includes only the finally determined response.

[0176] FIG. 15 is a drawing illustrating an electronic device that outputs an object representing a plurality of responses according to one embodiment.

[0177] The electronic device (100) may generate a second response (1531) to a query using first evaluation information for the first response (1530) to a query when at least one evaluation score of the first response (1530) to a query is less than or equal to a reference value. Alternatively, the electronic device (100) may generate a second response (1531) even if all evaluation scores of the first response (1530) exceed the reference value.

[0178] According to one embodiment, the electronic device (100) may output an object representing a first response (1530) and an object representing a second response (1531). The electronic device (100) may receive user input selecting one of the object representing the first response (1530) or the object representing the second response (1531). For example, if the electronic device (100) receives user input selecting the object representing the second response (1531), it may display an answer (15311) to a query containing the second response (1531) on the display (140).

[0179] The electronic device (100) can determine the user's preference for a response based on user input selecting either an object representing a first response (1530) or an object representing a second response (1531). For example, the first response (1530) may have a higher second evaluation score than the second response (1531), but a lower third evaluation score. When the electronic device (100) receives user input for an object representing the second response (1531), it can determine that the user prefers a response with a higher third evaluation score over a response with a higher second evaluation score. The electronic device (100) can utilize the determined user preference when generating a response to a query. For example, the electronic device (100) can train the LLM (401) with the determined user preference so that the LLM (401) generates a response preferred by the user. Alternatively, the electronic device (100) may change the reference value according to the user's preference. For example, if the user prefers a response with a high third evaluation score evaluated according to the third evaluation criteria, the electronic device (100) can change the third standard value to a value greater than the existing value.

[0180] The electronic device (100) can train the LLM (401) so that the LLM reflects the user's response preferences. The electronic device (100) can generate responses to queries that reflect user-specific preferences.

[0181] FIG. 16 is a drawing illustrating an electronic device that outputs the result of executing a function related to a query according to one embodiment.

[0182] The electronic device (100) can check a query (1610) (e.g., "Tell me iPhone price information") included in user input. The electronic device (100) can display the query (1610) (e.g., "Tell me iPhone price information") on a display (140).

[0183] The electronic device (100) can generate a response to a query. The first response (1630) to the query may include information determining at least one function (e.g., device_qna) associated with the query. For example, some queries may require at least one function to generate an answer to the query. If there is a function required for the query, the electronic device (100) can determine at least one function associated with the query (e.g., device_qna). The electronic device (100) can generate evaluation information (e.g., evaluation score (1640) and feedback information (1650)) for the response containing information determining the function associated with the query.

[0184] The electronic device (100) can determine whether a function (e.g., device_qna) related to a determined query is a function required by the query. For example, the LLM (401) can determine information (function: device_qna) that determines the function related to the query, and if it is not a function required by the query, it can determine a fourth evaluation score (16401) of less than or equal to a fourth threshold value (e.g., 0.6). The electronic device (100) can generate feedback information of a fourth evaluation criterion corresponding to a fourth evaluation score (16401) of less than or equal to the fourth threshold value (e.g., a request requiring the provision of device information or a request for an iPhone that is not a Samsung device...).

[0185] The electronic device (100) may redetermine a function related to a query as the fourth evaluation score is less than or equal to the fourth threshold value. The second response (1631) to the query may include information determining at least one function related to the query (e.g., internet search). The electronic device (100) may generate information determining a function related to the query (e.g., internet search) and confirm that among the evaluation scores (1641) evaluating the determined function related to the query (e.g., internet search), the fourth evaluation score (16411) exceeds the fourth threshold value. The electronic device (100) may output an answer including the result (16311) of executing the determined function related to the query (e.g., internet search) (e.g., the result of searching for a query on the internet (e.g., "Tell me iPhone price information")).

[0186] The user of the electronic device (100) may receive only the result of executing the query-related function that meets the evaluation criteria, even if the electronic device (100) determines the query-related function multiple times. By checking the answer containing only the result of executing the finally determined function, the user of the electronic device (100) may avoid checking unnecessary information.

[0187] An electronic device according to one embodiment may include at least one processor. The electronic device may include a memory that stores at least one computer program comprising a plurality of instructions. When the at least one computer program is executed individually or collectively by the at least one processor, the electronic device may check a query included in user input. The at least one computer program may generate a first instruction that instructs the electronic device to generate a response to the checked query and evaluation information for the response based on at least one evaluation criterion. The at least one computer program may cause the electronic device to generate a first response to the checked query and first evaluation information for the first response based on the generated first instruction. The at least one computer program may cause the electronic device to check whether at least one evaluation score included in the first evaluation information for the first response is less than or equal to a reference value.

[0188] In an electronic device according to one embodiment, the at least one computer program may generate a second instruction that instructs the electronic device to regenerate a response to the identified query and evaluation information for the response based on the first evaluation information for the first response when, when executed individually or collectively by the at least one processor, at least one evaluation score included in the first evaluation information for the first response is less than or equal to a reference value. The at least one computer program may cause the electronic device to generate a second response to the identified query and second evaluation information for the second response based on the generated second instruction.

[0189] In an electronic device according to one embodiment, the at least one computer program may be configured to check the number of times the electronic device has generated a response to the verified query when, when executed individually or collectively by the at least one processor, at least one evaluation score included in the second evaluation information for the second response is less than a reference value. The at least one computer program may be configured to execute a function related to the query when the number of times the electronic device has generated a response to the verified query is greater than or equal to a predetermined number. The at least one computer program may be configured to output an answer including the result of the electronic device executing the function related to the query.

[0190] In an electronic device according to one embodiment, the function related to the query may include a function that uses an external API and / or a function predefined in the electronic device.

[0191] In an electronic device according to one embodiment, the at least one computer program may be configured to output an answer including the second response when, when executed individually or collectively by the at least one processor, all evaluation scores included in the second evaluation information for the second response exceed their respective reference values.

[0192] In an electronic device according to one embodiment, the at least one computer program may be configured to use a large language model (LLM) when the electronic device generates a first response to the identified query and first evaluation information for the first response when executed individually or collectively by the at least one processor.

[0193] In an electronic device according to one embodiment, the first evaluation information may include an evaluation score corresponding to each of at least one evaluation criterion and / or feedback information indicating a part of the first response where improvement is required.

[0194] In an electronic device according to one embodiment, the at least one evaluation criterion is,

[0195] It may include a first criterion used to evaluate the accuracy of the response to the above-mentioned verified query, a second criterion used to evaluate whether the response to the above-mentioned verified query conforms to a predefined style, a third criterion used to evaluate whether the response to the above-mentioned verified query is safe, and / or a fourth criterion used to evaluate information determining the function related to the above-mentioned query.

[0196] In an electronic device according to one embodiment, the at least one computer program may generate a third instruction that instructs the electronic device to change the function related to the query when, when executed individually or collectively by the at least one processor, the evaluation score associated with the fourth criterion included in the first evaluation information for the first response is less than or equal to a reference value. The at least one computer program may cause the electronic device to redetermine the function related to the query based on the third instruction. The at least one computer program may cause the electronic device to generate evaluation information regarding the information on which the function related to the query has been redetermined.

[0197] In an electronic device according to one embodiment, the reference value may refer to an evaluation score representing the minimum quality for each of the at least one evaluation criteria.

[0198] A method of operation of an electronic device according to one embodiment may include an operation of checking a query included in user input. The method of operation of the electronic device may include an operation of generating a first command that instructs to generate a response to the checked query and evaluation information for the response based on at least one evaluation criterion. The method of operation of the electronic device may include an operation of generating a first response to the checked query and first evaluation information for the first response based on the generated first command. The method of operation of the electronic device may include an operation of checking whether at least one evaluation score included in the first evaluation information for the first response is less than or equal to a reference value.

[0199] A method of operation of an electronic device according to one embodiment may include an operation of generating a second instruction that instructs to regenerate a response to the verified query and evaluation information for the response based on the first evaluation information for the first response when at least one evaluation score included in the evaluation information for the response is less than or equal to a reference value. A method of operation of the electronic device may include an operation of generating a second response to the verified query and second evaluation information for the second response based on the generated second instruction.

[0200] A method of operation of an electronic device according to one embodiment may include an operation of checking the number of times a response to the verified query has been generated when at least one evaluation score included in the second evaluation information is less than a reference value. A method of operation of the electronic device may include an operation of executing a function related to the query when the number of times a response to the verified query has been generated is greater than or equal to a predetermined number. A method of operation of the electronic device may include an operation of outputting an answer including the result of executing the function related to the query.

[0201] In a method of operating an electronic device according to one embodiment, the function related to the query may include a function that uses an external API and / or a function predefined in the electronic device.

[0202] A method of operation of an electronic device according to one embodiment may include an operation of outputting an answer including the second response when all evaluation scores included in the second evaluation information exceed their respective reference values.

[0203] A method of operation of an electronic device according to one embodiment may include an operation using a large language model (LLM) when generating a first response to the identified query and first evaluation information for the first response.

[0204] In a method of operating an electronic device according to one embodiment, the first evaluation information for the first response may include an evaluation score corresponding to each of at least one evaluation criterion and / or feedback information indicating a part of the first response where improvement is required.

[0205] In a method of operating an electronic device according to one embodiment, the at least one evaluation criterion may include a first criterion used to evaluate the accuracy of a response to the identified query, a second criterion used to evaluate whether the response to the identified query conforms to a predefined style, a third criterion used to evaluate whether the response to the identified query is safe, and / or a fourth criterion used to evaluate information determining a function related to the query.

[0206] A method of operation of an electronic device according to one embodiment may include an operation of generating a third command that instructs to change a function related to the query when an evaluation score related to the fourth criterion included in the first evaluation information for the first response is less than or equal to a reference value. The method of operation of the electronic device may cause the function related to the query to be re-determined based on the third command. The method of operation of the electronic device may include an operation of generating evaluation information regarding the information on which the function related to the query has been re-determined.

[0207] In a method of operating an electronic device according to one embodiment, the reference value may refer to an evaluation score representing the minimum quality for each of the at least one evaluation criteria.

[0208] The electronic device according to the various embodiments disclosed in this document may be of various forms. The electronic device may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a consumer electronics device. The electronic device according to the embodiments of this document is not limited to the devices described above.

[0209] The various embodiments of this document and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of said items unless the relevant context clearly indicates otherwise. In this document, phrases such as "A or B," "at least one of A and B," "at least one of A or B," "A, B or C," "at least one of A, B and C," and "at least one of A, B, or C" may each include any possible combination of items listed together in the corresponding phrase. Terms such as "first," "second," or "first" or "second" may be used simply to distinguish said components from other said components and do not limit said components in any other aspect (e.g., importance or order). Where any (e.g., 1st) component is referred to as “coupled” or “connected” to another (e.g., 2nd) component, with or without the terms “functionally” or “communicationly,” it means that said any component may be connected to said other component directly (e.g., via a wire), wirelessly, or through a third component.

[0210] As used in this document, the term "module" may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit. A module may be a component formed as a whole, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).

[0211] Various embodiments of the present document may be implemented as software (e.g., a program) comprising one or more instructions stored in a storage medium (e.g., memory (120)) readable by a machine (e.g., an electronic device (100)). For example, a processor (e.g., a processor (110)) of the machine (e.g., an electronic device (100)) may call at least one of the one or more instructions stored in the storage medium and execute it. This enables the machine to be operated to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' simply means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.

[0212] According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0213] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities. According to various embodiments, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the components of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to the integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.

Claims

1. In an electronic device, At least one processor; It includes a memory that stores at least one computer program including a plurality of instructions, and When the above at least one computer program is executed individually or collectively by the above at least one processor, the electronic device Check the query included in the user input, and Generating a first instruction that instructs to generate a response to the above-mentioned verified query and evaluation information for the response based on at least one evaluation criterion, and Based on the first command generated above, a first response to the verified query and first evaluation information for the first response are generated, and An electronic device for verifying whether at least one evaluation score included in the above-mentioned first evaluation information is less than or equal to a reference value.

2. In claim 1, when the at least one computer program is executed individually or collectively by the at least one processor, the electronic device If at least one evaluation score included in the first evaluation information is less than or equal to a reference value, a second instruction is generated that instructs to regenerate the response to the verified query and the evaluation information for the response based on the first evaluation information, and An electronic device that generates a second response to the verified query and evaluation information for the second response based on the second command generated above.

3. In claim 2, when the at least one computer program is executed individually or collectively by the at least one processor, the electronic device If at least one evaluation score included in the second evaluation information is less than a threshold value, check the number of times a response to the verified query was generated, and If the number of times a response to the above-mentioned verified query has been generated is greater than a set number, execute the function related to the query, and An electronic device that outputs an answer including the result of executing a function related to the above query 4. In claim 3, the electronic device wherein the function related to the query includes a function utilizing an external API and / or a function predefined in the electronic device.

5. In claim 2, when the at least one computer program is executed individually or collectively by the at least one processor, the electronic device An electronic device that outputs an answer including the second response when all evaluation scores included in the second evaluation information exceed their respective threshold values.

6. In claim 1, when the at least one computer program is executed individually or collectively by the at least one processor, the electronic device An electronic device that utilizes a large language model (LLM) when generating a first response to the above-mentioned verified query and the above-mentioned first evaluation information.

7. In claim 1, the first evaluation information is, An electronic device comprising an evaluation score corresponding to each of at least one evaluation criterion and / or feedback information indicating a part of the first response where improvement is required.

8. In Paragraph 1, the above at least one evaluation criterion is, An electronic device comprising a first criterion used to evaluate the accuracy of a response to the above-mentioned verified query, a second criterion used to evaluate whether the response to the above-mentioned verified query conforms to a predefined style, a third criterion used to evaluate whether the response to the above-mentioned verified query is safe, and / or a fourth criterion used to evaluate information determining a function related to the above-mentioned query.

9. In claim 8, when the at least one computer program is executed individually or collectively by the at least one processor, the electronic device If the evaluation score related to the fourth criterion included in the first evaluation information is less than or equal to the reference value, a third command is generated that instructs to change the function related to the query, and An electronic device that, based on the third instruction above, redefines a function related to the query and generates evaluation information regarding the information on which the function related to the query was redefine.

10. An electronic device according to claim 8, wherein the reference value refers to an evaluation score representing the minimum quality for each of the at least one evaluation criteria.

11. In a method of operating an electronic device, Action to check the query included in user input; An operation to generate a first instruction that instructs to generate a response to the above-mentioned verified query and evaluation information for the response based on at least one evaluation criterion; An operation to generate a first response to the verified query and first evaluation information for the first response based on the first command generated above; and A method of operating an electronic device comprising an operation of checking whether at least one evaluation score included in the first evaluation information is less than or equal to a reference value.

12. In claim 11, the method of operating the electronic device is, If at least one evaluation score included in the first evaluation information is less than or equal to a reference value, an operation to generate a second instruction that instructs to regenerate a response to the confirmed query and evaluation information for the response based on the first evaluation information; and A method of operation of an electronic device comprising an operation of generating a second response to the verified query and second evaluation information for the second response based on the second command generated above.

13. In claim 12, the method of operating the electronic device is, If at least one evaluation score included in the second evaluation information is less than a threshold value, an operation to check the number of times a response to the verified query was generated; An action to execute a function related to a query when the number of times a response to the above-mentioned verified query has been generated is greater than or equal to a predetermined number; and A method of operation of an electronic device comprising an operation of outputting an answer including the result of executing a function related to the above query.

14. A method of operation of an electronic device, wherein the function related to the query in claim 13 includes a function using an external API and / or a function predefined in the electronic device.

15. In claim 12, the method of operating the electronic device is, A method of operation of an electronic device comprising the operation of outputting an answer including the second response when all evaluation scores included in the second evaluation information exceed their respective reference values.