Server for analyzing user query and assisting in counseling service of counselor by using LLM, and operation method therefor

A server using dual LLMs enhances counseling services by analyzing user queries and optimizing agent responses through learning from feedback, addressing inefficiencies in conventional systems.

WO2026141870A1PCT designated stage Publication Date: 2026-07-02YANOLJA NEXT CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
YANOLJA NEXT CO LTD
Filing Date
2025-09-29
Publication Date
2026-07-02

Smart Images

  • Figure KR2025015280_02072026_PF_FP_ABST
    Figure KR2025015280_02072026_PF_FP_ABST
Patent Text Reader

Abstract

According to various embodiments, a server, which analyzes a user query and assists in a counseling service of a counselor by using a large language model (LLM), may comprise a communication module and a processor, wherein the processor is configured to: identify input text data pertaining to a user query; identify user intent information and guide information corresponding to the input text data by inputting the input text data to a first LLM; and identify answer text data for the input text data by inputting, to a second LLM, the user intent information, the guide information, and reaction information to the guide information. The first LLM is trained on the basis of a plurality of pieces of input text data, a plurality of pieces of user intent information, and a plurality of pieces of reaction information, and the second LLM is trained on the basis of one or more pieces of input text data, one or more pieces of user intent information, one or more pieces of guide information, one or more pieces of reaction information, and one or more pieces of answer text data. Other embodiments are also possible.
Need to check novelty before this filing date? Find Prior Art

Description

A server and its operation method for analyzing user queries using LLM and assisting agent consultation services

[0001] Various embodiments of the present disclosure relate to a server and a method of operation for analyzing user queries using LLM and assisting agent consultation services.

[0002] Recently, artificial intelligence systems capable of achieving human-level intelligence are being utilized in various fields. Unlike conventional rule-based smart systems, artificial intelligence systems are systems in which machines learn, make judgments, and become smarter on their own. As artificial intelligence systems improve in recognition accuracy and gain a more accurate understanding of user preferences with continued use, existing rule-based smart systems are gradually being replaced by deep learning-based artificial intelligence systems.

[0003] Artificial intelligence technology consists of machine learning (e.g., deep learning) and elemental technologies utilizing machine learning.

[0004] Machine learning is an algorithmic technology that classifies and learns the features of input data on its own, and the elemental technology is a technology that mimics the functions of the human brain, such as cognition and judgment, by utilizing machine learning algorithms such as deep learning, and consists of technology fields such as linguistic understanding, visual understanding, reasoning / prediction, knowledge representation, and motion control.

[0005] Meanwhile, Large Language Models (LLMs) are a type of artificial intelligence trained on large text datasets to generate human-like responses to natural language inputs; they are language models composed of artificial neural networks possessing numerous parameters (typically billions of weights or more). These LLMs can be trained on substantial amounts of text using self-supervised or semi-self-supervised learning.

[0006] Various embodiments of the present disclosure can provide a method for counselors performing counseling duties in various fields to quickly provide a solution to a user's inquiry without unnecessary emotional exchange with the user.

[0007] Various embodiments of the present disclosure can provide a method for enhancing and optimizing the performance of an LLM by learning the agent's response process to a user query through the LLM and reflecting user feedback on the agent's response process into the LLM.

[0008] According to various embodiments, a server for analyzing user queries using an LLM and assisting a consultant's consultation service includes a communication module and a processor, wherein the processor is configured to identify input text data regarding a user query, input the input text data into a first LLM (Large Language Model) to identify user intent information and guide information corresponding to the input text data, and input the user intent information, the guide information, and reaction information regarding the guide information into a second LLM to identify answer text data regarding the input text data, wherein the first LLM is learned based on a plurality of input text data, a plurality of user intent information, and a plurality of reaction information, and the second LLM can be learned based on one or more input text data, one or more user intent information, one or more guide information, one or more reaction information, and one or more answer text data.

[0009] According to various embodiments, a method of operation of a server for analyzing a user query using an LLM and assisting a counselor's consultation service includes: an operation of identifying input text data regarding a user query; an operation of inputting the input text data into a first LLM (Large Language Model) to identify user intent information and guide information corresponding to the input text data; and an operation of inputting the user intent information, the guide information, and reaction information regarding the guide information into a second LLM to identify answer text data regarding the input text data, wherein the first LLM is learned based on a plurality of input text data, a plurality of user intent information, and a plurality of reaction information, and the second LLM can be learned based on one or more input text data, one or more user intent information, one or more guide information, one or more reaction information, and one or more answer text data.

[0010] The present disclosure can provide the effect of improving convenience for both agents and users by utilizing LLM to analyze user queries and generating optimal guide information and answer text information for responding to user queries.

[0011] FIG. 1 illustrates a block diagram of a user device and a server according to various embodiments of the present disclosure.

[0012] FIG. 2 is a diagram illustrating a method of communication between a user and a counselor using a counseling assistance service provided by the server of the present disclosure according to various embodiments.

[0013] FIG. 3 is a flowchart illustrating the operation of a server generating response text data for a user query using LLM according to various embodiments.

[0014] FIG. 4a is a diagram illustrating a first embodiment in which a server generates answer text data for a user query using a first LLM and a second LLM according to various embodiments.

[0015] FIG. 4b is a diagram illustrating a second embodiment in which a server generates answer text data for a user query using a first LLM and a second LLM according to various embodiments.

[0016] FIG. 5 shows a screen configuration diagram used when a counselor device performs counseling using a user device and a counseling assistance service according to various embodiments.

[0017] FIG. 6 is an example diagram illustrating the operation method of a first LLM trained to output guide information according to various embodiments.

[0018] Hereinafter, various embodiments of this document are described with reference to the accompanying drawings. The embodiments and the terms used therein are not intended to limit the technology described in this document to specific embodiments and should be understood to include various modifications, equivalents, and / or substitutions of said embodiments. In relation to the description of the drawings, similar reference numerals may be used for similar components. A singular expression may include a plural expression unless the context clearly indicates otherwise. In this document, expressions such as "A or B" or "at least one of A and / or B" may include all possible combinations of items listed together. Expressions such as "first," "second," "first," or "second" may modify said components regardless of order or importance and are used only to distinguish one component from another and do not limit said components. When it is mentioned that a certain (e.g., 1st) component is "(functionally or telecommunicationally) connected" or "connected" to another (e.g., 2nd) component, said certain component may be directly connected to said other component or connected through another component (e.g., 3rd component).

[0019] In this document, "configured to" may be used interchangeably with, depending on the context, for example, hardware- or software-wise, "suitable for," "capable of," "modified to," "made to," "capable of," or "designed to." In some cases, the expression "device configured to" may mean that the device is "capable of" in conjunction with other devices or components. For example, the phrase "processor configured to perform A, B, and C" may mean a dedicated processor for performing the corresponding operations (e.g., an embedded processor), or a general-purpose processor capable of performing the corresponding operations by executing one or more software programs stored in a memory device (e.g., a CPU or application processor).

[0020] A user device or electronic device according to various embodiments of the present document may include, for example, at least one of a smartphone, a tablet PC, a desktop PC, a laptop PC, a netbook computer, a workstation, and a server.

[0021] Referring to FIG. 1, a user device (100) and a server (101) in various embodiments are described. The user device (100) may include a communication module (110), a processor (120), a memory (130), and a display (140). In some embodiments, the user device (100) may omit at least one of the components or additionally include other components.

[0022] The communication module (110) can establish communication between, for example, a user device (100) and an external device (e.g., a first external electronic device (102), a second external electronic device (104), or a server (101)). For example, the communication module (110) can communicate with an external device (e.g., a second external electronic device (104) or a server (101)) by connecting to a network (180) via wireless communication or wired communication.

[0023] Wireless communication may include cellular communication using at least one of, for example, LTE, LTE-A (LTE Advance), CDMA (code division multiple access), WCDMA (wideband CDMA), UMTS (universal mobile telecommunications system), WiBro (Wireless Broadband), or GSM (Global System for Mobile Communications). According to one embodiment, wireless communication may include at least one of, for example, WiFi (wireless fidelity), Bluetooth, Bluetooth Low Energy (BLE), Zigbee, NFC (near field communication), Magnetic Secure Transmission, Radio Frequency (RF), or Body Area Network (BAN). According to one embodiment, wireless communication may include GNSS. GNSS may be, for example, GPS (Global Positioning System), Glonass (Global Navigation Satellite System), Beidou Navigation Satellite System (hereinafter "Beidou"), or Galileo, the European global satellite-based navigation system. Hereinafter, in this document, "GPS" may be used interchangeably with "GNSS". Wired communication may include at least one of, for example, USB (universal serial bus), HDMI (high definition multimedia interface), RS-232 (recommended standard 232), power line communication, or POTS (plain old telephone service).The network (180) may include at least one of a telecommunications network, for example, a computer network (e.g., LAN or WAN), the Internet, or a telephone network.

[0024] The processor (120) may include one or more of a central processing unit, an application processor, or a communication processor (CP). The processor (120) may, for example, perform operations or data processing regarding the control and / or communication of at least one other component of the user device (100).

[0025] The memory (130) may include volatile and / or non-volatile memory. The memory (130) may store, for example, commands or data related to at least one other component of the user device (100).

[0026] The display (140) may include, for example, a liquid crystal display (LCD), a light-emitting diode (LED) display, an organic light-emitting diode (OLED) display, a micro-electromechanical system (MEMS) display, or an electronic paper display. The display (140) may display various content (e.g., text, images, videos, icons, and / or symbols, etc.) to the user, for example. The display (160) may include a touch screen and may receive touch, gesture, proximity, or hovering input using, for example, an electronic pen or a part of the user's body.

[0027] Each of the first and second external electronic devices (102, 104) may be the same or a different type of device as the user device (100). According to various embodiments, all or part of the operations performed on the user device (100) may be performed on one or more other electronic devices (e.g., electronic devices (102, 104), or a server (101). According to one embodiment, when the user device (100) needs to perform a function or service automatically or upon request, the user device (100) may request at least some of the associated functions from another device (e.g., electronic devices (102, 104), or a server (101)) instead of performing the function or service itself or additionally. The other electronic device (e.g., electronic devices (102, 104), or a server (101)) may perform the requested function or additional functions and transmit the result to the user device (100). The user device (100) may provide the requested function or service by processing the received result as is or additionally. For this purpose, for example, cloud computing, distributed computing, or client-server computing technologies may be used.

[0028] The server (101) may include a communication module (111), a processor (121), and a memory (131). In some embodiments, the server (101) may omit at least one of the components or additionally include other components. The communication module (111), the processor (121), and the memory (131) may each perform the same functions as the communication module (110), the processor (120), and the memory (130) within the user device (100).

[0029]

[0030] FIG. 2 is a diagram illustrating a method of communication between a user and a counselor using a counseling assistance service provided by a server (101) of the present disclosure according to various embodiments.

[0031] According to various embodiments, a server (101) (e.g., a counseling assistance service provider) may operate an application that allows a user and a counselor to communicate, communicate with a user device (e.g., the electronic device (100, 102, 104) of FIG. 1) (e.g., a PC, a laptop, a smartphone, etc.) via a network (162, 164), process requests received from the user device (100, 104) via a messenger application or a web page, and transmit requested information to the user device (100, 104). According to one embodiment, the server (101) and the electronic device (104) may include components of the same type as the components of the electronic device (100) of FIG. 1.

[0032] A user device according to the present disclosure (e.g., user device (100) of FIG. 1) may request consultation with an agent through a specific application, and an agent device (104) may accept a communication connection with the user device (100).

[0033] According to one embodiment, after a communication connection is established between an agent device (104) and a user device (100), the user device (100) can acquire user voice data from the user and transmit it to a server (101). The server (101) can convert the user voice data received from the user device (100) into text data using a STT module (Speech-to-Text module). If the user voice data is converted into text data and transmitted directly to the agent device (104), if the text data contains expressions that may offend the agent, it is necessary to process the user query with such expressions removed. The server (101) according to the present disclosure can use a first LLM to identify data from which emotional expressions have been removed from the input text data regarding the user query (e.g., at least one of user intent information, context information, or guide information), and can transmit the data to the agent device (104). According to one embodiment, the first LLM can be trained to rewrite text data from input text data excluding emotional expressions.

[0034] According to one embodiment, the agent device (104) can perform a follow-up response to a user query using the data determined by the first LLM, and can transmit reaction information regarding the follow-up response to the server (101).

[0035] According to one embodiment, the server (101) can generate answer text data to be transmitted to the user device (100) by inputting at least one of input text data for a user query, user intent information, context information, guide information, or reaction information into the second LLM.

[0036] According to one embodiment, the server (101) can transmit the generated answer text data to the user device (100) or convert the answer text data into voice data using Text to Speech (TTS) technology and then transmit the voice data to the user device (100).

[0037]

[0038] FIG. 3 is a flowchart illustrating the operation of a server (e.g., server (101) of FIG. 1) generating answer text data for a user query using LLM according to various embodiments.

[0039] FIG. 4a is a diagram illustrating a first embodiment in which a server (101) generates answer text data for a user query using a first LLM and a second LLM according to various embodiments.

[0040] FIG. 4b is a diagram illustrating a second embodiment in which a server (101) generates answer text data for a user query using a first LLM and a second LLM according to various embodiments.

[0041] FIG. 5 shows a screen configuration diagram used when an agent device (e.g., agent device (104) of FIG. 1) performs counseling using a counseling assistance service with a user device (e.g., user device (100) of FIG. 1) according to various embodiments.

[0042] In operation 301, according to various embodiments, the server (101) (e.g., the processor (121) of FIG. 1) can identify input text data regarding a user query.

[0043] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) can receive voice data (e.g., user utterance) regarding a user query from a user device (100) through a communication module (e.g., communication module (111) of FIG. 1) and can convert the received voice data into input text data. For example, referring to FIG. 4a, the user device (100) can acquire a first voice data of the user (e.g., "Are you guys going to do this? Won't you cancel it quickly?") through a microphone module provided internally and transmit it to the server (101), and the server (101) can convert the user's voice data into a first input text data (410) in the form of text. For another example, referring to FIG. 4a, the user device (100) can acquire the user's second voice data (e.g., "Don't annoy me, reservation number 12345, so cancel the reservation quickly") through a microphone module provided inside and transmit it to the server (101), and the server (101) can convert the user's voice data into second input text data (420) in text form. For another example, referring to FIG. 4b, the user device (100) can acquire the user's first voice data (e.g., "I made a reservation with reservation number 56789, but you made me wait too late without contacting me. What kind of service is this? Please cancel it immediately. If you keep doing this, I won't come back!") through a microphone module and transmit it to the server (101), and the server (101) can convert the user's voice data into input test data (430) in text form. According to one embodiment, the server (101) can convert voice data received from the user device (100) into text data using a STT module (Speech-to-Text module).

[0044] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) may receive input text data regarding a user query from a user device (100) through a communication module (111). For example, the server (101) may receive a text message entered by a user from the user device (100) as input text data.

[0045]

[0046] In operation 303, according to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) can input input text data regarding a user query into a first LLM (Large Language Model) to identify user intent information and guide information corresponding to the input text data.

[0047] According to various embodiments, the server (101) (e.g., the processor (121) of FIG. 1) may input input text data regarding a user query into a first LLM (Large Language Model). According to one embodiment, the first LLM may be stored in the memory of the server (101) (e.g., the memory (131) of FIG. 1) or may be stored in a separate external server rather than inside the server (101) and linked to the server (101).

[0048] According to various embodiments, when a server (101) (e.g., processor (121) of FIG. 1) inputs input text data regarding a user query into a first LLM, it can identify structured first information for an employee (agent) output from the first LLM. According to one embodiment, the first information may consist of at least one of user intent information, numeric information corresponding to user intent information, context information corresponding to user intent information, structured request information, or guide information.

[0049] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) inputs input text data regarding a user query into a first LLM and can identify user intent information corresponding to the input text data determined by the first LLM.

[0050] According to one embodiment, user intent information may represent the intent of a user query, and said user intent information may have a structured format. According to one embodiment, user intent information may be classified into one of a plurality of categories. For example, referring to FIG. 4a, a server (101) inputs first input text data (410) into a first LLM and can identify a first user intent information (411) (e.g., reservation cancellation) corresponding to the first input text data (410) among a plurality of predetermined categories. As another example, referring to FIG. 4b, the server (101) can identify "RESERVATION_CANCELLATION" as user intent information (431) (e.g., intent_category) corresponding to the input text data (430) which is output by the first LLM. The plurality of categories of user intent information may be set by an administrator of the server (101) or by the first LLM during the learning process of the first LLM.

[0051] According to one embodiment, the server (101) can identify user intent information from input text data regarding a user query based on a natural language understanding (NLU) module instead of using the first LLM. For example, the natural language understanding module can identify user intent information by performing syntactic analysis or semantic analysis. According to one embodiment, the natural language understanding module can identify the meaning of a word extracted from the input text data using linguistic features (e.g., grammatical elements) of a morpheme or phrase, and determine user intent information by matching the identified meaning of the word to the intent. The syntactic analysis can divide the input text data into grammatical units (e.g., words, phrases, morphemes, etc.) and identify what grammatical elements the divided units have. The semantic analysis can be performed using semantic matching, rule matching, formula matching, etc. According to one embodiment, the natural language understanding module can determine user intent information by using a natural language recognition database storing linguistic features for identifying the intent of input text data. According to another embodiment, the natural language understanding module can determine user intent information by using a personal language model (PLM) stored in the natural language recognition database.

[0052] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) can input input text data regarding a user query into a first LLM and identify at least one of numeric information, context information, or request information corresponding to the user intent information along with user intent information from the input text data.

[0053] According to one embodiment, context information is information necessary to resolve user intent information and may represent information that must be set to perform user intent information. For example, referring to FIG. 4a, a server (101) inputs a second input text data (420) into a first LLM and can identify a second user intent information (421) (e.g., reservation cancellation) and a second context information (422) (e.g., reservation number "12345") corresponding to the second input text data (420) among a plurality of predetermined categories.

[0054] According to one embodiment, numeric information may represent information recognized as a number within input text data as numeric information associated with user intent information. According to one embodiment, numeric information may be composed of information separate from context information or may be composed of a type of context information, and may be divided into at least one category. For example, referring to FIG. 4b, the server (101) may identify "56789", which corresponds to a reservation number, as numeric information (432) (e.g., booking_number) corresponding to input text data (430) output by the first LLM. The format of the numeric information is not limited to the examples described above, and the numeric information may be implemented in various formats.

[0055] According to one embodiment, context information may include at least one of issue type information or customer status information. For example, referring to FIG. 4b, a server (101) inputs input text data (430) into a first LLM to identify context information (433) including "service_delay" as issue type information (e.g., issue_type) and "waiting" as customer status information (e.g., customer_status), along with user intent information (431). The format of the context information is not limited to the examples described above, and the context information may be implemented in various formats.

[0056] According to one embodiment, context information may be mapped to the input text data as a predetermined rule (e.g., situational internal response guideline) in addition to being identified from the first LLM and may be stored in advance in the memory (131) of the server (101).

[0057] According to one embodiment, the request information is information that combines at least one of user intent information, numeric information, or context information, and may represent information in which input text data is summarized by the first LLM, and may be information to be displayed to an agent of the agent device (104). For example, referring to FIG. 4b, the server (101) may input input text data (430) into the first LLM to identify "Reservation number 56789 cancellation request (waiting time complaint)" as request information (434) (e.g., structured_request) together with user intent information (431). The format of the request information is not limited to the examples described above, and the request information may be implemented in various formats.

[0058] According to one embodiment, the aforementioned numeric information or request information may be implemented as part of the context information.

[0059] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) can input input text data regarding a user query into a first LLM to identify guide information corresponding to the input text data.

[0060] According to one embodiment, the guide information may indicate guide information for an agent to perform a follow-up response in response to user intent information, or guide information for requesting context information necessary for performing the follow-up response. Specifically, the guide information may be composed of a series of information related to data input (e.g., screen recording information over time, mouse click information, text input information, audio input information, etc.). For example, referring to FIG. 4a, the server (101) inputs a first input text data (410) into a first LLM, identifies a first user intent information (411) (e.g., reservation cancellation) corresponding to the first input text data (410) among a plurality of predetermined categories, and if the context information corresponding to the intent information (411) is not confirmed, it may identify guide information (412) for requesting context information that needs to be confirmed (e.g., guide information for requesting a reservation number). For another example, referring to FIG. 4a, the server (101) inputs the second input text data (420) into the first LLM, identifies the second user intent information (421) (e.g., reservation cancellation) and context information (422) (e.g., reservation number "12345") corresponding to the second input text data (420) among a plurality of predetermined categories, and can identify guide information (423) (e.g., guide information for reservation cancellation) for performing a subsequent response.

[0061] According to one embodiment, guide information may represent a series of action information in sequence so that an agent can perform a subsequent response in response to user intent information. For example, referring to FIG. 4b, when a server (101) inputs input text data (430) into a first LLM, it may identify a series of action information for "checking waiting status," "checking reservation status," "processing cancellation," and "reviewing waiting time compensation" as guide information (435) (e.g., recommended_actions) corresponding to the input text data (430). In this case, each guide information may have a link (URL) format or be configured in various formats to realize the corresponding action. The format of the guide information is not limited to the examples described above, and the guide information may be implemented in various formats.

[0062] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) can learn a first LLM using at least one of a plurality of input text data, a plurality of user intent information, a plurality of context information, or a plurality of reaction information.

[0063] According to one embodiment, the reaction information may represent reaction information regarding a subsequent response performed by an agent in response to input text data, or reaction information for requesting context information necessary for performing said subsequent response. Specifically, the reaction information may consist of at least one of a series of action information performed by an agent (e.g., screen recording information over time, mouse click information, audio input information, etc.) and response text data written after said series of action information. According to one embodiment, the reaction information may be structured according to a predetermined format. For example, referring to FIG. 5, the server (101) can structure and classify by type messages (e.g., SMS, E-Mail, etc.) transmitted and received between the user device (100) and the agent device (104) through a conversation window area within the display (501) of the agent device (e.g., electronic device (104) of FIG. 1), commands entered by the agent through a reaction input area (530) within the display (501) (e.g., a series of action information for reservation, a series of action information for cancellation of reservation, etc.), voice data transmitted and received between the agent device (104) and the user device (100) (e.g., VoIP phone, etc.), and information viewed by the agent through an information search area (540) within the display (501) (e.g., internal instructions, external server access, user's reservation history, user's purchase pattern).

[0064] According to one embodiment, the structure and / or format of the reaction information may be the same as or different from the structure and / or format of the guide information, and, for example, the structure and / or format of the reaction information may be the same as or different depending on whether the guide information includes answer text data to be recommended to the agent. Additionally, the first LLM can learn the correlation between (1) a plurality of input text data and (2) at least one of a plurality of user intent information, a plurality of context information, or a plurality of reaction information. According to one embodiment, as a learning process of the first LLM, the server (101) can obtain a result value (output data) using the first LLM to which arbitrary weights are assigned, compare the obtained result value with the labeled data or unlabeled data of the learning data, and perform backpropagation according to the error to optimize the weights. Specifically, the learning of the first LLM means a process of training the first LLM based on the learning data and the labeled data or unlabeled data so that the first LLM can determine the output data for the input data. In other words, the first LLM forms rules regarding the above data and makes a judgment.

[0065] According to one embodiment, the first LLM can be trained to output at least one of user intent information, context information, or guide information when input text data is input. For example, the first LLM can be trained to output user intent information when input text data is input. As another example, the first LLM can be trained to output user intent information and context information associated with said user intent information when input text data is input. As yet another example, the first LLM can be trained to output guide information corresponding to said input text data when input text data is input. A specific operation for outputting said guide information will be described in detail later through FIG. 6.

[0066] According to one embodiment, a server (101) may input input text data into a single LLM to identify at least one of user intent information, context information, or guide information, or input text data into a first LLM composed of at least two sub-models to collect data output from each sub-model to identify at least one of user intent information, context information, or guide information. For example, the first LLM may be implemented as a combination of at least one of a sub-classification model for classifying user intent information from input text data, a sub-extraction model for extracting context information, or a sub-creation model for generating guide information. The implementation form of the first LLM is not limited to the examples described above and may be implemented as a combination of various sub-modular artificial intelligence models to individually optimize performance for identifying each piece of information.

[0067] According to one embodiment, the LLMs described in the present disclosure can generate output data using data related to previous conversation history until the conversation session ends.

[0068]

[0069] In operation 305, according to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) can identify response text data for input text data by inputting user intent information, guide information, and reaction information to said guide information into a second LLM.

[0070] According to one embodiment, referring to FIG. 4a, the server (101) may receive from the agent device (104) first reaction information (413) (e.g., a series of action information for requesting a reservation number) performed by the agent in response to the first input text data (410). In another example, referring to FIG. 4a, the server (101) may receive from the agent device (104) second reaction information (424) (e.g., a series of action information for canceling a reservation) performed by the agent in response to the second input text data (420).

[0071] According to one embodiment, the format of the reaction information may be implemented as a series of information related to data verification or input (e.g., screen display information over time, mouse click information, text input information, audio input information, etc.). According to one embodiment, the reaction information may include log information regarding the operation of the agent device (104) performed by the agent, and the log information may include at least one of action information, time information, or result information. For example, referring to FIG. 4b, the server (101) provides a series of reaction information (436) performed by an agent in response to input text data (430), comprising: (i) first reaction information including first action information (e.g., "reservation_check"), first time information (e.g., "2024-12-09T15:20:00"), and first result information (e.g., "Reservation number 56789 confirmed"); (ii) second reaction information including second action information (e.g., "cancellation_process"), second time information (e.g., "2024-12-09T15:20:30"), and second result information (e.g., "Cancellation processing completed"); and (iii) third action information (e.g., "compensation_applied"), third time information (e.g., "2024-12-09T15:21:00"), and third result information (e.g., A third reaction information including "issuance of a 10% discount coupon for the next visit" can be received from the agent device (104).

[0072] According to one embodiment, with reference to FIG. 5, an agent can input reaction information in a reaction input area (530) by referring to guide information displayed through a guide information display area (520) within a display (501) of an agent device (104). According to one embodiment, the agent device (104) can display guide information (435) as text through the guide information display area (520) or as other forms of information based on said text (e.g., recorded screen display, mouse click / text input guide, voice message output, etc.).

[0073] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) may input user intent information and guide information corresponding to input text data and reaction information to said guide information into a second LLM (Large Language Model). According to one embodiment, the second LLM may be stored in the memory (131) of the server (101) or may be stored in a separate external server rather than inside the server (101) and linked to the server (101).

[0074] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) inputs at least one of input text data, user intent information, numeric information, context information, request information, guide information, or reaction information to said guide information into a second LLM and can identify response text data for input text data determined by the second LLM. For example, referring to FIG. 4a, the server (101) can input a first input text data (410), a first user intent information (411), a first guide information (412), and a first reaction information (413) into the second LLM and can identify a first response text data (415) for the first input text data (410) output from the second LLM. For another example, referring to FIG. 4a, the server (101) can input second input text data (420), second user intent information (421), context information (422), second guide information (423), and second reaction information (424) into the second LLM, and can identify second response text data (425) for the second input text data (420) output from the second LLM. For another example, referring to FIG. 4b, the server (101) can input at least one of input text data (430), user intent information (431), numeric information (432), context information (433), request information (434), guide information (435), or reaction information (436) into the second LLM, and can identify response text data (437) for the input text data (430) output from the second LLM.

[0075] According to various embodiments, a server (101) (e.g., processor (121) of FIG. 1) may train a second LLM using at least one of a plurality of input text data, a plurality of user intent information, a plurality of context information, a plurality of guide information, or a plurality of reaction information. The plurality of reaction information used for training the second LLM may include a series of behavior information of a plurality of agents and response text data of a plurality of agents.

[0076] According to one embodiment, the second LLM can learn the correlation between (1) at least one of a plurality of input text data, a plurality of user intent information, a plurality of context information, a plurality of guide information, or a series of behavioral information of a plurality of agents, and (2) response text data of a plurality of agents. According to one embodiment, the server (101) can perform the learning process of the second LLM by obtaining a result value (output data) using the second LLM to which arbitrary weights are assigned, comparing the obtained result value with the labeled data or unlabeled data of the learning data, and performing backpropagation according to the error to optimize the weights. Specifically, the learning of the second LLM refers to a process of training the second LLM based on the learning data and the labeled data or unlabeled data so that the second LLM can determine the output data for the input data. That is, the second LLM forms a rule and makes a judgment regarding the said data.

[0077] According to one embodiment, the second LLM can be trained to output response text data for input text data when at least one of input text data, user intent information, context information, guide information, or a series of behavioral information of an agent is input.

[0078]

[0079] FIG. 6 is an example diagram illustrating the operation method of a first LLM trained to output guide information according to various embodiments.

[0080] According to various embodiments, a server (e.g., server (101) of FIG. 1) can train a first LLM to output guide information corresponding to the input text data when input text data is input. According to one embodiment, the first LLM can be trained to output guide information corresponding to the input text data when at least one of user intent information corresponding to the input text data or context information associated with said user intent information is input together with the input text data.

[0081] According to various embodiments, the first LLM can learn the correlation between (1) multiple input text data and (2) multiple agent reaction information. According to one embodiment, the server (101) can perform the learning process of the first LLM by obtaining a result value (output data) using the first LLM to which arbitrary weights are assigned, comparing the obtained result value with the labeled data or unlabeled data of the learning data, and performing backpropagation according to the error to optimize the weights. Specifically, the learning of the first LLM refers to a process of training the first LLM based on the learning data and the labeled data or unlabeled data so that the first LLM can determine the output data for the input data. That is, the first LLM forms a rule and makes a judgment regarding the data.

[0082] According to various embodiments, the first LLM may output guide information corresponding to input text data and then collect feedback from a user device (e.g., user device (100) of FIG. 1) regarding reaction information performed by an agent, thereby reflecting the data quality of the reaction information. In this process, the server (101) may use various algorithms (e.g., loss function, gradient descent, normalization technique, etc.) to optimize the first LLM. Specifically, the data quality of the reaction information may be determined according to the weight of user feedback received from the user device (100) and reflected as training data for the first LLM.

[0083] A server (101) according to one embodiment may classify feedback received from a user device (100) by type and assign weights differentially according to predefined criteria based on the reliability and importance of each type. According to one embodiment, the server (101) may assign a relatively high absolute weight to explicit feedback in which the user directly expresses their intention. For example, the server (101) may assign weights differentially based on a specific satisfaction level selected by the user among a plurality of preset options, binary responses such as whether a problem has been solved, or the results of sentiment analysis on text directly entered by the user.

[0084] According to another embodiment, the server (101) may assign a weight of a relatively lower absolute value to implicit feedback, which is indirect information that can be inferred from the user's behavioral patterns, compared to explicit feedback. For example, the server (101) may assign weights based on results calculated by analyzing the user's behavior, such as the conversation termination pattern after the agent's response, whether the same or similar questions are repeated, or whether the task guided by the agent is actually performed.

[0085] As described above, the server (101) can determine the final data quality of the reaction information by combining weights calculated from various types of feedback and continuously optimize system performance by reflecting this in the learning process of the first LLM.

[0086]

[0087] According to various embodiments, a server for analyzing user queries using an LLM and assisting a consultant's consultation service includes a communication module and a processor, wherein the processor is configured to identify input text data regarding a user query, input the input text data into a first LLM (Large Language Model) to identify user intent information and guide information corresponding to the input text data, and input the user intent information, the guide information, and reaction information regarding the guide information into a second LLM to identify answer text data regarding the input text data, wherein the first LLM is learned based on a plurality of input text data, a plurality of user intent information, and a plurality of reaction information, and the second LLM can be learned based on one or more input text data, one or more user intent information, one or more guide information, one or more reaction information, and one or more answer text data.

[0088] According to various embodiments, the processor may be configured to input the input text data into the first LLM to identify context information along with the user intent information from the input text data.

[0089] According to various embodiments, the processor may be configured to input the input text data into the first LLM, identify the user intent information corresponding to the input text data among a plurality of predetermined categories, and if the context information corresponding to the user intent information is not identified, identify guide information for requesting the context information.

[0090] According to various embodiments, the reaction information may consist of a series of action information performed by an agent device in response to the user query and answer text data entered by the agent device after the series of action information.

[0091] According to various embodiments, the processor may be configured to input the input text data, the user intent information, the context information, the guide information, and the reaction information into the second LLM to identify the response text data for the input text data.

[0092] According to various embodiments, a method of operation of a server for analyzing a user query using an LLM and assisting a counselor's consultation service includes: an operation of identifying input text data regarding a user query; an operation of inputting the input text data into a first LLM (Large Language Model) to identify user intent information and guide information corresponding to the input text data; and an operation of inputting the user intent information, the guide information, and reaction information regarding the guide information into a second LLM to identify answer text data regarding the input text data, wherein the first LLM is learned based on a plurality of input text data, a plurality of user intent information, and a plurality of reaction information, and the second LLM can be learned based on one or more input text data, one or more user intent information, one or more guide information, one or more reaction information, and one or more answer text data.

[0093] According to various embodiments, the operation of identifying the user intent information and the guide information may include the operation of inputting the input text data into the first LLM and identifying context information together with the user intent information from the input text data.

[0094]

[0095] As used in this document, the terms “module” or “part” include a unit composed of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. “Module” or “part” may be a component formed integrally or a minimum unit or part thereof that performs one or more functions. “Module” or “part” may be implemented mechanically or electronically and may include, for example, an application-specific integrated circuit (ASIC) chip, field-programmable gate arrays (FPGAs), or programmable logic device known or to be developed that performs certain operations, and may be executed by a processor (120). At least part of the device (e.g., modules or functions thereof) or method (e.g., operations) according to various embodiments may be implemented as instructions stored in a computer-readable storage medium (e.g., memory (130)) in the form of a program module. When the above instruction is executed by a processor (e.g., processor (120)), the processor may perform a function corresponding to the above instruction. Computer-readable recording media may include a hard disk, a floppy disk, a magnetic medium (e.g., magnetic tape), an optical recording medium (e.g., CD-ROM, DVD, magneto-optical medium (e.g., floptical disk), built-in memory, etc. Instructions may include code generated by a compiler or code that can be executed by an interpreter. A module or program module according to various embodiments may include at least one of the aforementioned components, some of which may be omitted, or additionally include other components. Operations performed by a module, program module, or other components according to various embodiments may be executed sequentially, in parallel, iteratively, or heuristically, or at least some operations may be executed in a different order, omitted, or other operations may be added.

[0096] Furthermore, the embodiments disclosed in this document are presented for the purpose of explaining and understanding the disclosed technical content and are not intended to limit the scope of this disclosure. Accordingly, the scope of this disclosure should be interpreted to include all modifications or various other embodiments based on the technical concept of this disclosure.

Claims

1. In a server for analyzing user queries using LLM and supporting agent consultation services, Communication module, and Includes a processor, The above processor is, Identify input text data regarding user queries, and Input the above input text data into a first LLM (Large Language Model) to identify user intent information and guide information corresponding to the above input text data, and The above user intent information, the above guide information, and reaction information to the above guide information are input into the second LLM to identify response text data for the above input text data, and The above first LLM is learned based on a plurality of input text data, a plurality of user intent information, and a plurality of reaction information, and The above second LLM is learned based on one or more input text data, one or more user intent information, one or more guide information, one or more reaction information, and one or more answer text data, Server.

2. In Paragraph 1, The above processor is, Set to input the above input text data into the above first LLM to identify context information along with the user intent information from the above input text data, Server.

3. In Paragraph 2, The above processor is, Input the above input text data into the first LLM, identify the user intent information corresponding to the input text data among a plurality of predetermined categories, and if the context information corresponding to the user intent information is not identified, identify guide information for requesting the context information, configured, Server.

4. In Paragraph 3, The above reaction information consists of a series of action information performed by an agent device in response to the above user query and answer text data input by the agent device after the above series of action information. Server.

5. In Paragraph 2, The above processor is, A set to identify the answer text data for the input text data by inputting the input text data, the user intent information, the context information, the guide information, and the reaction information into the second LLM. Server.

6. A method of operation for a server to analyze user queries using LLM and assist agent consultation services, An action that identifies input text data regarding a user query, An operation of inputting the above input text data into a first LLM (Large Language Model) to identify user intent information and guide information corresponding to the above input text data, and The method includes an operation of inputting the user intent information, the guide information, and reaction information to the guide information into a second LLM to identify response text data for the input text data, and The above first LLM is learned based on a plurality of input text data, a plurality of user intent information, and a plurality of reaction information, and The above second LLM is learned based on one or more input text data, one or more user intent information, one or more guide information, one or more reaction information, and one or more answer text data, Server operation method.

7. In Paragraph 6, The operation of identifying the above user intent information and the above guide information is, The operation of inputting the above input text data into the above first LLM and identifying context information together with the user intent information from the above input text data, Server operation method.