Electronic device and method for modifying and outputting received message
The electronic device analyzes message associations to generate and output audio signals, addressing the challenge of understanding reply messages in restricted contexts, thereby improving user experience.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SAMSUNG ELECTRONICS CO LTD
- Filing Date
- 2025-12-10
- Publication Date
- 2026-06-25
Smart Images

Figure KR2025021250_25062026_PF_FP_ABST
Abstract
Description
Method of modifying and outputting electronic devices and received messages
[0001] This document relates to an electronic device, and, for example, to a method in which the electronic device modifies and outputs a received message.
[0002] With the commercialization of voice assistant technology that provides various services based on user voice input, voice assistant functions are being provided on electronic devices such as smartphones and tablet PCs. Voice assistants on electronic devices can automatically recognize various input data such as text, images, and videos using AI (artificial intelligence) technology, and can provide intelligent services that provide information related to the input data or related services in response to user requests.
[0003] An electronic device can check and output messages received from the other party using an onboard voice assistant. For example, the voice assistant is a messaging application that provides message transmission and reception with other users (e.g., a native messaging application, 3 rd Messages received from the other party can be checked via a party message application and output through the screen on the display, and / or output as audio through a speaker or external audio device (e.g., earbuds).
[0004] When a user sends a specific message on a messaging application, the other party may send a reply message in response. During a conversation through a messaging application, there may be cases where it is difficult for the user to understand the exact content based solely on the reply message. For example, in situations where the screen is locked, the user is wearing an audio device and not looking at the screen, or the output of the voice assistant screen is restricted due to the execution of a specific application (e.g., navigation), it may be difficult for the user to determine which message the reply is responding to based only on the content of the reply message itself.
[0005] An electronic device according to various embodiments of this disclosure (or specification, invention) may include a speaker, a communication circuit, a memory, and at least one processor.
[0006] According to one embodiment, the memory may be executed by at least one processor, and upon execution, the electronic device may store instructions for receiving a first message from an external device through the communication circuit, checking whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input, and if the first message is a reply message, analyzing the association with at least some of the reference messages among the messages transmitted and received with the external device with the first message to determine at least one associated message, generating a second message based on the first message and the determined at least one associated message, and converting the generated second message into an audio signal and outputting it through the speaker or an external audio device wirelessly connected to the communication circuit.
[0007] A method performed by an electronic device according to various embodiments of the present document may include: receiving a first message from an external device; checking whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input; if the first message is a reply message, determining at least one associated message by analyzing the association with at least some of the reference messages among the messages transmitted and received with the external device with respect to the first message; generating a second message based on the first message and the determined at least one associated message; and converting the generated second message into an audio signal and outputting it.
[0008] A computer-readable non-transient recording medium according to various embodiments of the present document may store instructions for performing operations such as receiving a first message from an external device, checking whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input, determining at least one associated message by analyzing the association of at least some of the reference messages among the messages transmitted and received with the external device with the first message if the first message is a reply message, generating a second message based on the first message and the determined at least one associated message, and converting the generated second message into an audio signal and outputting it.
[0009] According to various embodiments of the present document, an electronic device capable of converting and providing the content of a reply message received through a message application in a hands-free state so that the user can easily understand it, and a method for modifying and outputting a received message can be provided.
[0010] FIG. 1 is a block diagram of an electronic device in a network environment according to various embodiments.
[0011] FIG. 2 is a block diagram showing an integrated intelligence system according to one embodiment.
[0012] FIG. 3 is a diagram showing the form in which relationship information between a concept and an action is stored in a database according to one embodiment.
[0013] FIG. 4 is a block diagram of an electronic device according to various embodiments.
[0014] FIG. 5 is a block diagram of a voice assistant client of an electronic device according to one embodiment.
[0015] FIG. 6 is a block diagram of a voice assistant server according to one embodiment.
[0016] FIG. 7 is a flowchart of a method in which an electronic device according to one embodiment modifies and outputs a received message.
[0017] FIG. 8 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0018] FIG. 9 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0019] FIG. 10 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0020] FIG. 11 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0021] FIG. 12 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0022] FIG. 13 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0023] Hereinafter, embodiments of this document are described in detail with reference to the drawings so that those skilled in the art can easily implement them. However, this document may be implemented in various different forms and is not limited to the embodiments described herein. In relation to the description of the drawings, identical or similar reference numerals may be used for identical or similar components. Additionally, in the drawings and related descriptions, descriptions of well-known functions and configurations may be omitted for clarity and brevity.
[0024] FIG. 1 is a block diagram of an electronic device (101) in a network environment (100) according to various embodiments.
[0025] Referring to FIG. 1, in a network environment (100), an electronic device (101) may communicate with an electronic device (102) through a first network (198) (e.g., a short-range wireless communication network) or with at least one of an electronic device (104) or a server (108) through a second network (199) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (101) may communicate with the electronic device (104) through a server (108). According to one embodiment, the electronic device (101) may include a processor (120), memory (130), input module (150), sound output module (155), display module (160), audio module (170), sensor module (176), interface (177), connection terminal (178), haptic module (179), camera module (180), power management module (188), battery (189), communication module (190), subscriber identification module (196), or antenna module (197). In some embodiments, at least one of these components (e.g., connection terminal (178)) may be omitted from the electronic device (101), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (176), camera module (180), or antenna module (197)) may be integrated into a single component (e.g., display module (160)).
[0026] The processor (120) can control at least one other component (e.g., hardware or software component) of the electronic device (101) connected to the processor (120) by executing software (e.g., program (140)), for example, and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (120) can store commands or data received from other components (e.g., sensor module (176) or communication module (190)) in volatile memory (132), process the commands or data stored in volatile memory (132), and store the resulting data in non-volatile memory (134). According to one embodiment, the processor (120) may include a main processor (121) (e.g., central processing unit or application processor) or an auxiliary processor (123) that can operate independently or together with it (e.g., graphics processing unit, neural processing unit (NPU), image signal processor, sensor hub processor, or communication processor). For example, if the electronic device (101) includes a main processor (121) and an auxiliary processor (123), the auxiliary processor (123) may be configured to use lower power than the main processor (121) or to be specialized for a designated function. The auxiliary processor (123) may be implemented separately from the main processor (121) or as part thereof.
[0027] The auxiliary processor (123) may control at least some of the functions or states associated with at least one component of the electronic device (101) (e.g., display module (160), sensor module (176), or communication module (190)) on behalf of the main processor (121) while the main processor (121) is in an inactive (e.g., sleep) state, or together with the main processor (121) while the main processor (121) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (123) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (180) or communication module (190)). According to one embodiment, the auxiliary processor (123) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (101) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (108)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, the artificial intelligence model may include a software structure, either additionally or substantially.
[0028] The memory (130) can store various data used by at least one component of the electronic device (101) (e.g., processor (120) or sensor module (176)). The data may include, for example, input data or output data for software (e.g., program (140)) and related commands. The memory (130) may include volatile memory (132) or non-volatile memory (134).
[0029] The program (140) may be stored as software in memory (130) and may include, for example, an operating system (142), middleware (144), or an application (146).
[0030] The input module (150) can receive commands or data to be used for a component of the electronic device (101) (e.g., processor (120)) from outside the electronic device (101) (e.g., user). The input module (150) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).
[0031] The sound output module (155) can output a sound signal to the outside of the electronic device (101). The sound output module (155) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.
[0032] The display module (160) can visually provide information to an external (e.g., user) of the electronic device (101). The display module (160) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (160) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.
[0033] The audio module (170) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (170) can acquire sound through the input module (150) or output sound through the sound output module (155) or an external electronic device (e.g., electronic device (102)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (101).
[0034] The sensor module (176) can detect the operating state of the electronic device (101) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (176) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.
[0035] The interface (177) may support one or more specified protocols that can be used for the electronic device (101) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (102)). According to one embodiment, the interface (177) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.
[0036] The connection terminal (178) may include a connector through which the electronic device (101) can be physically connected to an external electronic device (e.g., electronic device (102)). According to one embodiment, the connection terminal (178) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).
[0037] The haptic module (179) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that the user can perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module (179) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.
[0038] The camera module (180) can capture still images and video. According to one embodiment, the camera module (180) may include one or more lenses, image sensors, image signal processors, or flashes.
[0039] The power management module (188) can manage the power supplied to the electronic device (101). According to one embodiment, the power management module (188) can be implemented, for example, as at least part of a power management integrated circuit (PMIC).
[0040] The battery (189) can supply power to at least one component of the electronic device (101). According to one embodiment, the battery (189) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.
[0041] The communication module (190) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (101) and an external electronic device (e.g., electronic device (102), electronic device (104), or server (108)), and the performance of communication through the established communication channel. The communication module (190) may include one or more communication processors that operate independently of the processor (120) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (190) may include a wireless communication module (192) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (194) (e.g., LAN (local area network) communication module, or power line communication module). The corresponding communication module among these communication modules can communicate with an external electronic device (104) through a first network (198) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (199) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN)). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (192) can identify or authenticate the electronic device (101) within a communication network such as the first network (198) or the second network (199) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (196).
[0042] The wireless communication module (192) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (192) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (192) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (192) can support various requirements specified in the electronic device (101), external electronic device (e.g., electronic device (104)), or network system (e.g., second network (199)). According to one embodiment, the wireless communication module (192) can support a Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mMTC, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for realizing URLLC.
[0043] An antenna module (197) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (197) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (197) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (198) or a second network (199), may be selected from the plurality of antennas, for example, by a communication module (190). A signal or power may be transmitted or received between the communication module (190) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (197).
[0044] According to various embodiments, the antenna module (197) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to a second surface (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.
[0045] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.
[0046] According to one embodiment, commands or data may be transmitted or received between the electronic device (101) and an external electronic device (104) through a server (108) connected to a second network (199). Each of the external electronic devices (102, or 104) may be the same or different type of device as the electronic device (101). According to one embodiment, all or part of the operations performed on the electronic device (101) may be performed on one or more of the external electronic devices (102, 104, or 108). For example, if the electronic device (101) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (101) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (101). The electronic device (101) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (101) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In one embodiment, the external electronic device (104) may include an Internet of Things (IoT) device. The server (108) may be an intelligent server using machine learning and / or neural networks. According to one embodiment, the external electronic device (104) or the server (108) may be included within a second network (199).The electronic device (101) can be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.
[0047] FIG. 2 is a block diagram showing an integrated intelligence system according to various embodiments.
[0048] Referring to FIG. 2, according to one embodiment, the integrated intelligent system may include an electronic device (210) (e.g., the electronic device (101) of FIG. 1), an intelligent server (230) (e.g., the server (108) of FIG. 1), and a service server (250) (e.g., the server (108) of FIG. 1).
[0049] According to one embodiment, the electronic device (210) may be a terminal device (or electronic device) capable of connecting to the Internet, and may be, for example, a mobile phone, a smartphone, a PDA (personal digital assistant), a laptop computer, a TV, a white goods appliance, a wearable device, an HMD, or a smart speaker.
[0050] According to the illustrated embodiment, the electronic device (210) may include a communication interface (213) (e.g., interface (177) of FIG. 1), a microphone (212) (e.g., input module (150) of FIG. 1), a speaker (216) (e.g., sound output module (155) of FIG. 1), a display module (211) (e.g., display module (160) of FIG. 1), a memory (215) (e.g., memory (130) of FIG. 1), or a processor (214) (e.g., processor (120) of FIG. 1). The listed components may be operatively or electrically connected to each other. The electronic device (210) may include at least some of the configurations and / or functions of the electronic device (101) of FIG. 1.
[0051] According to one embodiment, the communication interface (213) may be configured to be connected to an external device to transmit and receive data. According to one embodiment, the microphone (212) may receive sound (e.g., user speech) and convert it into an electrical signal. According to one embodiment, the speaker (216) may output the electrical signal as sound (e.g., voice).
[0052] According to one embodiment, the display module (211) may be configured to display an image or video. According to one embodiment, the display module (211) may also display a graphic user interface (GUI) of an app (or application program) being executed. The display module (211) of one embodiment may receive touch input through a touch sensor. For example, the display module (211) may receive text input through a touch sensor in an image keyboard area displayed within the display module (211).
[0053] According to one embodiment, memory (215) can store a client module (218), an SDK (software development kit) (217), and a plurality of apps (219a, 219b). The client module (218) and the SDK (217) can form a framework (or solution program) for performing general-purpose functions. Additionally, the client module (218) or the SDK (217) can form a framework for processing user input (e.g., voice input, text input, touch input).
[0054] According to one embodiment, the plurality of apps (219a, 219b) stored in memory (215) may be programs for performing a designated function. According to one embodiment, the plurality of apps may include a first app (219a) and a second app (219b). According to one embodiment, each of the plurality of apps (219a, 219b) may include a plurality of operations for performing a designated function. For example, the apps (219a, 219b) may include an alarm app, a message app, and / or a schedule app. According to one embodiment, the plurality of apps (219a, 219b) may be executed by a processor (214) to sequentially execute at least some of the plurality of operations.
[0055] According to one embodiment, the processor (214) can control the overall operation of the electronic device (210). For example, the processor (214) can be electrically connected to a communication interface (213), a microphone (212), a speaker (216), and a display module (211) to perform a specified operation.
[0056] According to one embodiment, the processor (214) may also perform a specified function by executing a program stored in the memory (215). For example, the processor (214) may execute at least one of the client module (218) or the SDK (217) to perform the following operations for processing user input. The processor (214) may, for example, control the operation of a plurality of apps (219a, 219b) through the SDK (217). The following operations described as the operation of the client module (218) or the SDK (217) may be operations performed by the execution of the processor (214).
[0057] According to one embodiment, the client module (218) can receive user input. For example, the client module (218) can receive a voice signal corresponding to a user utterance detected through a microphone (212). Alternatively, the client module (218) can receive touch input detected through a display module (211). Alternatively, the client module (218) can receive text input detected through a keyboard or a virtual keyboard. In addition, various forms of user input detected through an input module included in the electronic device (210) or an input module connected to the electronic device (210) can be received. The client module (218) can transmit the received user input to an intelligent server (230). Along with the received user input, the client module (218) can transmit status information of the electronic device (210) to the intelligent server (230). The status information may be, for example, execution status information of an app.
[0058] According to one embodiment, the client module (218) can receive a result corresponding to the received user input. For example, the client module (218) can receive a result corresponding to the received user input if the intelligent server (230) can produce a result corresponding to the received user input. The client module (218) can display the received result on the display module (211). Additionally, the client module (218) can output the received result as audio through the speaker (216).
[0059] According to one embodiment, the client module (218) may receive a plan corresponding to the received user input. The client module (218) may display the results of executing a plurality of actions of the app according to the plan on the display module (211). The client module (218) may, for example, sequentially display the results of executing a plurality of actions on the display module (211) and output audio through the speaker (216). The electronic device (210) may, for another example, display only some of the results of executing a plurality of actions (e.g., the result of the last action) on the display module (211) and output audio through the speaker (216).
[0060] According to one embodiment, the client module (218) may receive a request from the intelligent server (230) to obtain information necessary to produce a result corresponding to a voice input. According to one embodiment, the client module (218) may transmit the necessary information to the intelligent server (230) in response to the request.
[0061] According to one embodiment, the client module (218) can transmit result information of executing a plurality of operations according to a plan to the intelligent server (230). The intelligent server (230) can use the result information to confirm that the received user input has been processed correctly.
[0062] According to one embodiment, the client module (218) may include a voice recognition module. According to one embodiment, the client module (218) may recognize voice input that performs a limited function through the voice recognition module. For example, the client module (218) may execute an intelligent app for processing voice input to perform an organic action through a specified input (e.g., Wake Up!).
[0063] According to one embodiment, an intelligent server (230) can receive information related to user voice input from an electronic device (210) via a communication network. According to one embodiment, the intelligent server (230) can convert data related to the received voice input into text data. According to one embodiment, the intelligent server (230) can generate a plan for performing a task corresponding to the user voice input based on the text data.
[0064] According to one embodiment, a plan may be generated by an artificial intelligence (AI) system. The AI system may be a rule-based system or a neural network-based system (e.g., a feedforward neural network (FNN), a recurrent neural network (RNN)). Alternatively, it may be a combination of the foregoing or a different AI system. According to one embodiment, the plan may be selected from a set of predefined plans or may be generated in real time in response to a user request. For example, the AI system may select at least one plan from a plurality of predefined plans.
[0065] According to one embodiment, the intelligent server (230) may transmit the result according to the generated plan to the electronic device (210) or transmit the generated plan to the electronic device (210). According to one embodiment, the electronic device (210) may display the result according to the plan on the display module (211). According to one embodiment, the electronic device (210) may display the result of executing the operation according to the plan on the display module (211).
[0066] According to one embodiment, the intelligent server (230) may include a front end (231), a natural language platform (232), a capsule database (238), an execution engine (233), an end user interface (234), a management platform (235), a big data platform (236), or an analytic platform (237).
[0067] According to one embodiment, the front end (231) can receive user input received from the electronic device (210). The front end (231) can transmit a response corresponding to the user input.
[0068] According to one embodiment, the natural language platform (232) may include an automatic speech recognition module (ASR module) (232a), a natural language understanding module (NLU module) (232b), a planner module (232c), a natural language generator module (NLG module) (232d), or a text to speech module (TTS module) (232e).
[0069] According to one embodiment, an automatic speech recognition module (232a) can convert voice input received from an electronic device (210) into text data. According to one embodiment, a natural language understanding module (232b) can identify the user's intent using the text data of the voice input. For example, the natural language understanding module (232b) can identify the user's intent by performing a syntactic analysis or a semantic analysis on the user input in the form of text data. According to one embodiment, the natural language understanding module (232b) can identify the meaning of a word extracted from the voice input using linguistic features (e.g., grammatical elements) of a morpheme or phrase, and determine the user's intent by matching the identified meaning of the word to the intent. The natural language understanding module (223b) can acquire intent information corresponding to the user's utterance. The intent information may be information indicating the user's intent determined by interpreting the text data. The intent information may include information indicating an action or function that the user intends to execute using the device.
[0070] According to one embodiment, the planner module (232c) can generate a plan using the intent and parameters determined by the natural language understanding module (232b). According to one embodiment, the planner module (232c) can determine a plurality of domains necessary to perform a task based on the determined intent. The planner module (232c) can determine a plurality of actions included in each of the plurality of domains determined based on the intent. According to one embodiment, the planner module (232c) can determine parameters necessary to execute the determined plurality of actions or result values output by the execution of the plurality of actions. The parameters and the result values may be defined as concepts of a specified format (or class). Accordingly, the plan may include a plurality of actions and a plurality of concepts determined by the user's intent. The planner module (232c) can determine the relationship between the plurality of actions and the plurality of concepts in a stepwise (or hierarchical) manner. For example, the planner module (232c) can determine the execution order of multiple actions determined based on the user's intentions based on multiple concepts. In other words, the planner module (232c) can determine the execution order of multiple actions based on parameters required for the execution of multiple actions and results output by the execution of multiple actions. Accordingly, the planner module (232c) can generate a plan that includes association information (e.g., ontology) between multiple actions and multiple concepts. The planner module (232c) can generate the plan using information stored in a capsule database in which a set of relationships between concepts and actions is stored.
[0071] According to one embodiment, the natural language generation module (232d) can change specified information into a text form. The information changed into a text form may be in the form of a natural language utterance. According to one embodiment, the text-to-speech conversion module (232e) can change information in a text form into information in a speech form.
[0072] According to one embodiment, some or all functions of the natural language platform (232) may also be implemented in an electronic device (210).
[0073] The above capsule database can store information regarding the relationships between multiple concepts and actions corresponding to multiple domains. A capsule according to one embodiment may include multiple action objects (or action information) and concept objects (or concept information) included in a plan. According to one embodiment, the capsule database can store multiple capsules in the form of a concept action network (CAN). According to one embodiment, multiple capsules may be stored in a function registry included in the capsule database.
[0074] The above capsule database may include a strategy registry that stores strategy information necessary for determining a plan corresponding to user input. The strategy information may include reference information for determining one plan when there are multiple plans corresponding to user input. According to one embodiment, the capsule database may include a follow-up registry that stores information on a follow-up action for suggesting a follow-up action to the user in a specified situation. The follow-up action may include, for example, a follow-up utterance. According to one embodiment, the capsule database may include a layout registry that stores layout information of information output through an electronic device (210). According to one embodiment, the capsule database may include a vocabulary registry that stores vocabulary information included in the capsule information. According to one embodiment, the capsule database may include a dialogue registry that stores information on a conversation (or interaction) with the user. The capsule database may update stored objects through a developer tool. The above developer tool may include, for example, a function editor for updating action objects or concept objects. The above developer tool may include a vocabulary editor for updating vocabulary. The above developer tool may include a strategy editor for creating and registering strategies for determining plans. The above developer tool may include a dialog editor for creating conversations with a user.The above developer tool may include a follow-up editor capable of activating a follow-up goal and editing a follow-up utterance that provides a hint. The follow-up goal may be determined based on a currently set goal, user preferences, or environmental conditions. In one embodiment, the capsule database may also be implemented within an electronic device (210).
[0075] According to one embodiment, the execution engine (233) can produce a result using the generated plan. The end user interface (234) can transmit the produced result to the electronic device (210). Accordingly, the electronic device (210) can receive the result and provide the received result to the user. According to one embodiment, the management platform (235) can manage information used in the intelligent server (230). According to one embodiment, the big data platform (236) can collect user data. According to one embodiment, the analysis platform (237) can manage the quality of service (QoS) of the intelligent server (230). For example, the analysis platform (237) can manage the components and processing speed (or efficiency) of the intelligent server (230).
[0076] According to one embodiment, the service server (250) may provide a service designated to the electronic device (210) (e.g., food ordering or hotel reservation). According to one embodiment, the service server (250) may be a server operated by a third party. According to one embodiment, the service server (250) may provide information to the intelligent server (230) for generating a plan corresponding to a received voice input. The provided information may be stored in a capsule database. Additionally, the service server (250) may provide result information according to the plan to the intelligent server (230). The service server (250) may include a plurality of service providers (e.g., CP service A (251), CP service B (252), CP service C (253)), and each service provider (251, 252, 253) may provide functions for a domain associated with each capsule stored in the capsule database (238) of the intelligent server (230).
[0077] In the integrated intelligent system described above, the electronic device (210) can provide various intelligent services to the user in response to user input. The user input may include, for example, input via a physical button, touch input, or voice input.
[0078] According to one embodiment, the electronic device (210) may provide a voice recognition service through an intelligent app (or voice recognition app) stored internally. In this case, for example, the electronic device (210) may recognize a user utterance or voice input received through the microphone (212) and provide a service to the user corresponding to the recognized voice input.
[0079] According to one embodiment, the electronic device (210) may perform a specified action based on a received voice input, either alone or in conjunction with the intelligent server (230) and / or service server (250). For example, the electronic device (210) may execute an app corresponding to the received voice input and perform a specified action through the executed app.
[0080] According to one embodiment, when an electronic device (210) provides services together with an intelligent server (230) and / or a service server (250), the electronic device (210) can detect user speech using the microphone (212) and generate a signal (or voice data) corresponding to the detected user speech. The electronic device (210) can transmit the voice data to the intelligent server (230) via a network (240) using a communication interface (213).
[0081] An intelligent server (230) according to one embodiment may generate, in response to a voice input received from an electronic device (210), a plan for performing a task corresponding to the voice input, or a result of performing an operation according to the plan. The plan may include, for example, a plurality of operations for performing a task corresponding to a user's voice input, and a plurality of concepts related to the plurality of operations. The concepts may define parameters input to the execution of the plurality of operations or result values output by the execution of the plurality of operations. The plan may include association information between the plurality of operations and the plurality of concepts.
[0082] According to one embodiment, the electronic device (210) can receive the response using a communication interface (213). The electronic device (210) can output a voice signal generated inside the electronic device (210) to the outside using the speaker (216), or output an image generated inside the electronic device (210) to the outside using a display module (211).
[0083] In FIG. 2, an example is described in which voice recognition of user input received from an electronic device (210), natural language understanding and generation, and output of results using a plan are performed on an intelligent server (230), but various embodiments of this document are not limited thereto. For example, at least some components of the intelligent server (230) (e.g., natural language platform (232), execution engine (233), capsule database (238)) may be embedded in the electronic device (210) (or the electronic device (101) of FIG. 1), and the operation may be performed by the electronic device (210).
[0084] FIG. 3 is a diagram showing the form in which relationship information between concepts and operations is stored in a database according to various embodiments.
[0085] According to one embodiment, a capsule database (e.g., capsule database (238) of FIG. 2) of an intelligent server (e.g., intelligent server (230) of FIG. 2) may store capsules in the form of a CAN (concept action network) (300). The capsule database may store actions for processing tasks corresponding to user voice input, and parameters required for said actions, in the form of a CAN (concept action network).
[0086] According to one embodiment, the capsule database may store a plurality of capsules (capsule (A) (310), capsule (B) (320)) corresponding to each of a plurality of domains (e.g., applications). According to one embodiment, one capsule (e.g., capsule (A) (310)) may correspond to one domain (e.g., location (geo), application). Additionally, one capsule may correspond to at least one service provider (e.g., CP 1 (331) or CP 2 (332)) for performing functions for the domain associated with the capsule. According to one embodiment, one capsule may include at least one operation (350) and at least one concept (360) for performing a designated function.
[0087] According to one embodiment, a natural language platform (e.g., the natural language platform (232) of FIG. 2) can generate a plan for performing a task corresponding to a received voice input using capsules stored in a capsule database. For example, a planner module of the natural language platform (e.g., the planner module (232c) of FIG. 2) can generate a plan using capsules stored in a capsule database. For example, a plan can be generated using the actions (311, 313) and concepts (312, 314) of capsule A (310) and the actions (321) and concepts (322) of capsule B (320).
[0088] FIG. 4 is a block diagram of an electronic device according to various embodiments.
[0089] Referring to FIG. 4, the electronic device (400) may include a display (430), a communication circuit (440), a speaker (460), a processor (410), and a memory (420). Various embodiments of this document may be implemented even if some of the illustrated configurations are omitted or substituted. In addition to the illustrated configurations, the electronic device (400) may further include at least some of the configurations and / or functions of the electronic device (101) of FIG. 1. At least some of each of the illustrated (or unillustrated) components of the electronic device (400) (e.g., communication circuit (440), processor (410), memory (420)) may be placed within the housing of the electronic device (400), and at least some of the other components (e.g., display (430), speaker (460)) may be exposed to the outside of the housing. At least some of each of the components of the electronic device (400) may be operatively, functionally, and / or electrically connected to one another.
[0090] According to one embodiment, the display (430) can display image information provided by the processor (410). The display (430) may be implemented as any one of a liquid crystal display (LCD), a light-emitting diode (LED) display, or an organic light-emitting diode (OLED) display, but is not limited thereto. The display (430) may be configured as a touch screen display that detects touch and / or proximity touch (or hovering) input using a part of the user's body (e.g., finger) or an input device (e.g., stylus pen). The display (430) may include at least some of the configuration and / or functions of the display module (160) of FIG. 1.
[0091] According to one embodiment, the display (430) may be a flexible display in which at least a portion is flexible and / or bendable. According to one embodiment, the electronic device (400) may be implemented in various form factors, such as a foldable device or a rollable device, in which the size of the display area can be expanded or reduced by utilizing the characteristics of the flexible display.
[0092] According to one embodiment, the communication circuit (440) may include various configurations to support wireless communication with an external device. For example, the electronic device (400) may perform cellular wireless communication (e.g., 4G LTE (long term evolution), 5G NR (new radio)) and / or short-range wireless communication (e.g., Wi-Fi, Bluetooth) through the communication circuit (440), and there is no fixed type of wireless communication supported by the electronic device (400). The communication circuit (440) may include at least some of the configurations and / or functions of the communication module of FIG. 1.
[0093] According to one embodiment, the speaker (460) can output an audio signal. For example, the speaker (460) can convert an electrical signal provided by the processor (410) into audio and output it. The speaker (460) can output audio through a hole formed in one area of the housing.
[0094] According to one embodiment, the microphone (450) can pick up external sound, convert it into a digital signal, and transmit it to the processor (410). For example, the user's voice input through the microphone (450) is transmitted to a voice assistant client, and the voice assistant client can provide various services based on the user's voice input.
[0095] According to one embodiment, the electronic device (400) may collect an audio signal through a microphone of an external audio device (e.g., earbuds) worn by a user, and / or output an audio signal through an audio output of the external audio device.
[0096] According to one embodiment, the memory (420) may include volatile memory and non-volatile memory, and may store various data temporarily or permanently. The memory (420) may include at least some of the configuration and / or functions of the memory (130) of FIG. 1 and may store the program (140) of FIG. 1. The memory (420) may store various instructions that can be executed by the processor (410). Such instructions may include control commands such as arithmetic and logical operations, data movement, and input / output that can be recognized by the processor (410).
[0097] According to one embodiment, the processor (410) may be configured to perform operations or data processing regarding the control and / or communication of each component of the electronic device (400) and may be composed of one or more processors. The processor (410) may include at least some of the configuration and / or functions of the processor (120) of FIG. 1. Although there are no limitations on the operations and data processing functions that the processor (410) can implement on the electronic device (400), this document will describe in detail various embodiments for identifying a reply message among the received messages when a user's message confirmation request is received in hands-free mode, and converting the reply message so that the user can easily understand it and providing it to the user. The operations of the processor (410) described below may be performed by loading instructions stored in memory (420).
[0098] In this document, the description that the processor (410) can perform a certain operation (or function, task, or operation) may be interpreted substantially as meaning that an instruction (or command, computer program) causing the electronic device (400) (or processor (410)) to perform said operation is stored in memory (420) (e.g., non-volatile memory, storage). Additionally, the description that the processor (410) can perform a certain operation may be interpreted substantially as meaning that at least one processor, without a fixed number, can perform said operation individually or collectively.
[0099] According to one embodiment, the processor (410) executes a message application and can send and receive messages with an external device through the message application. For example, the processor (410) may use a native message application and / or 3 that supports sending and receiving messages over a network with another user's electronic device (400). rd The party message application can be stored and executed in memory (420).
[0100] According to one embodiment, the processor (410) may execute a voice assistant client. The voice assistant client executes a service based on the user's voice input on an electronic device (400) and can perform various tasks through interaction with a voice assistant server connected via a network. For example, when a message is received through a message application, the voice assistant client may check the received message based on the user's voice input, convert the content of the message so that the user can easily understand it, and output it. The detailed configuration and / or operation of the voice assistant client and the voice assistant server will be described in more detail through FIG. 6.
[0101] According to one embodiment, the processor (410) can detect a user's speech while the voice assistant client is running. For example, when the user's voice is input through the microphone (450) while the voice assistant client is running, the processor (410) can convert the user's voice signal into text information, understand the content to confirm the user's command, and perform an action according to the user's command.
[0102] According to one embodiment, the processor (410) can determine whether the user's utterance is an utterance for acknowledging a received message. According to one embodiment, the processor (410) in hands-free mode can use a message application (e.g., a native message application, 3 rd When a message is received from an external device that is the recipient of the message through a party message application, a notification of receipt may be provided to the user through a notification sound and / or vibration via a speaker (460) or an external audio device. The processor (410) analyzes the user's voice input to determine whether it is a utterance to confirm a message, a request for confirmation of a message from a message application, or a request for confirmation of a message received from a conversation partner, conversation session, or time, and based on the determination, may select any one of the received messages (e.g., the first message) that the user requested to confirm.
[0103] According to one embodiment, the processor (410) can determine whether the received first message is a reply message to a transmission message transmitted by the electronic device (400) to an external device. Here, the reply message may be a message selected as a reply by the other party, who is a user of the external device, to a message transmitted by the user of the electronic device (400), or a message interpreted as the other party's response to a query by the user of the electronic device (400) from the conversation content.
[0104] According to one embodiment, the processor (410) can determine whether the received message is a reply message based on the message type information of the message attribute information included in the received first message. The message attribute information (or metadata, header information) may include information defining the attributes of the message, such as the unique number, type, and address of the message. The message type information may be designated as normal, reply, forward, group, or broadcast, and if the message type is designated as reply, the processor (410) can confirm that the message is a reply message.
[0105] According to another embodiment, the processor (410) may determine whether the first message is a reply message based on context analysis of at least some of the messages included in the same conversation session as the first message received. For example, there may be cases where the other party responds to a query from an electronic device user as a general message without specifying the message type as a reply. In this case, the processor (410) may analyze the contents of the transmitted message and the received message, and if the contents of the received message are analyzed as a reply to a query included in a specific transmitted message, the received message may be determined as a reply message.
[0106] According to one embodiment, the processor (410) may determine at least some of the messages transmitted and received with the counterpart of the first message as reference messages. Here, the reference messages are candidate messages that may be considered when generating a second message by modifying the first message. The processor (410) may extract all transmitted and received messages of the same conversation session as the received first message as reference messages, or determine messages analyzed as being associated with the received message, a fixed number of recently transmitted and received messages, and / or messages transmitted and received during a recently fixed period of time as reference messages.
[0107] According to one embodiment, the processor (410) can determine at least one associated message by analyzing the association between each of the reference messages and the first message. For example, the processor (410) can determine at least one reference message as an associated message corresponding to the first message by comparing the semantic distance between the first message and the reply message sent by the other party among the reference messages in the same conversation session as the first message with a reference value. In this case, the processor (410) may use various semantic distance measurement methods between messages. For example, the processor (410) can convert the reference message and the reply message into respective vectors using BERT (bidirectional encoder representations for transformers), measure the cosine similarity between the two vectors, and if the measured cosine similarity is greater than or equal to a reference value, determine the reference message as an associated message.
[0108] According to one embodiment, the processor (410) can determine the semantic distance between a reference message and a reply message by using a large language model (LM), various statistical analysis methods not limited to one, or a deep learning-based approach.
[0109] According to one embodiment, the processor (410) may generate a second message based on a first message and at least one associated message. Here, the second message may be a final message modified by modifying the content of the first message, which is a reply message, in consideration of the content of the associated messages, so that the user can easily understand the content that the other party intends to convey in accordance with the context of the entire conversation.
[0110] According to one embodiment, the processor (410) may generate a second message using at least a portion of the content of the first message and at least a portion of the content of each associated message. The processor (410) may generate the second message (or final message) by reflecting the tone of conversation between the user and the other party when generating the second message, and including information that can identify the other party and suggestions for additional input after message verification.
[0111] According to one embodiment, the processor (410) can generate a second message from a first message using an AI model. For example, the processor (410) can generate and transmit to the AI model a first message (or reply message) received from the other party, associated messages of the conversation session, and a prompt requesting the generation of a new message to notify the user. The AI model can generate a second message in response to the prompt and transmit it to the processor (410).
[0112] According to one embodiment, the processor (410) may generate at least one first text information by combining at least a portion of the contents of the first message and the associated message to generate a second message from the first message and the associated message, generate a second text information by combining the generated at least one first text information, and generate a second text information by converting the generated second text information using a natural language generator (NLG).
[0113] According to one embodiment, the processor (410) can generate first text information by combining a first message, which is a received reply message, and at least one associated message identified as being associated with the first message.
[0114] According to one embodiment, the processor (410) generates a prompt including a request for a message combination, a received first message and an associated message, and transmits it to an AI model, and can receive at least one first text information from the AI model as a reply to the prompt.
[0115] According to one embodiment, when multiple first text information is generated, the processor (410) can combine the first text information to generate second text information. Accordingly, the electronic device (400) can process multiple reply messages transmitted from the other party as a single message and deliver multiple information to the user as a single message.
[0116] According to one embodiment, the processor (410) may generate a prompt including at least one first text information and a request for combining the first text information and transmit it to an AI model, and receive second text information from the AI model as a reply to the prompt. According to one embodiment, if the processor (410) determines that the generated plurality of first text informations are not related to each other, it may not combine the first text informations.
[0117] According to one embodiment, the processor (410) may generate a second message (or final message) from the second text information using a natural language generator. Accordingly, the second message generated may be a message in the form of a message intended to convey the content of a reply message to the user.
[0118] According to one embodiment, the processor (410) may output the generated final message as an audio signal through a speaker (460), or transmit the audio signal to an audio device worn by a user via short-range wireless communication through a communication circuit (440) so that the audio device outputs it. Additionally, the processor (410) may display the generated final message on the screen of a voice assistant displayed on a display (430).
[0119] Various embodiments of converting a received first message into a second message and outputting it will be described in more detail through FIGS. 8 to 13.
[0120] Instructions for performing the operation of the electronic device (400) (or processor (410)) described above may be stored in a computer-readable recording medium. The recording medium may be tangible and non-transitory. The recording medium may store one or more computer programs containing the instructions.
[0121] FIG. 5 is a block diagram of a voice assistant client of an electronic device according to one embodiment.
[0122] FIG. 6 is a block diagram of a voice assistant server according to one embodiment.
[0123] Each of the blocks illustrated in FIGS. 5 and FIGS. 6 may be a software module running on an electronic device or voice assistant, and two or more blocks may be composed of one software module.
[0124] According to one embodiment, an electronic device (e.g., the electronic device (400) of FIG. 4) providing a voice assistant client (510) and a voice assistant server (600) can transmit and receive various data related to voice assistant services through a network.
[0125] Referring to FIG. 5, the electronic device may include a voice assistant client (510) and a message application (560).
[0126] According to one embodiment, a voice assistant client (510) may be a software element designed to execute a service based on a user's voice input on an electronic device and to perform various tasks through interaction with a voice assistant server (600). The voice assistant may be executed by a processor of the electronic device (e.g., the processor (410) of FIG. 4).
[0127] According to one embodiment, the voice assistant client (510) may include a voice input module (520), a hands-free mode detect module (530), and a voice UI generator (540).
[0128] According to one embodiment, the voice input module (520) can acquire a user's voice signal input through a microphone of an electronic device. The electronic device can convert the voice signal acquired from the voice input module (520) into text information, or transmit the voice signal to a voice assistant server (600) to receive text information converted by the voice assistant server (600) (e.g., ASR module (612)).
[0129] According to one embodiment, a hands-free mode detection module (530) can determine whether a user is currently using the electronic device in hands-free mode. Here, hands-free mode may refer to an operating mode in which the user cannot view the screen and can operate the electronic device only through voice input. For example, the electronic device may be recognized as hands-free mode when the user is driving, an external audio device (e.g., earbuds) is connected, and the display is locked (or disabled).
[0130] According to one embodiment, a voice UI generator (540) can provide a user interface that enables a user to interact with an electronic device based on voice. For example, the voice UI generator (540) can determine a response or action to be performed by the system in response to the user's voice input, and perform various actions to provide the response to the user as an audio signal.
[0131] According to one embodiment, the electronic device may provide at least one message application (560) that supports sending and receiving messages over a network with another user's electronic device. For example, the electronic device may provide a native message application (570) and / or 3 rd The party message application (580) can be stored and executed in memory.
[0132] Referring to FIG. 6, the voice assistant server (600) may include an utterance analyzer module (610), a dialogue manager (620), and a message modifying module.
[0133] According to one embodiment, the speech analysis module (610) may include various software configurations for analyzing a user's voice signal received from an electronic device and understanding the content thereof. Referring to FIG. 6, the speech analysis module (610) may include an automatic speech recognition (ASR) module (612) and a national language understanding (NLU) module (614).
[0134] According to one embodiment, the ASR module (612) can convert a voice signal received from an electronic device into text information. For example, the ASR module (612) can generate and output text information through processes such as preprocessing of the voice signal, conversion into phonemes, text extraction using a language model, and / or decoding.
[0135] According to one embodiment, the NLU module (614) can analyze text information converted from a speech signal by the ASR module (612) and perform various operations to understand the content. For example, the NLU module (614) can perform operations such as preprocessing of text information, intent recognition, context extraction, context processing, semantic analysis, and / or response generation and output.
[0136] According to one embodiment, a dialogue manager (620) can perform various operations to manage the conversation flow by processing user input in a message conversation system and generating an appropriate response. Referring to FIG. 6, the dialogue manager (620) may include a natural language generator (622), a text-to-speech generator (624), a UI generator (626), and an action executor (628).
[0137] According to one embodiment, the natural language generator (622) can convert a semantic representation within the system into human-understandable natural language text.
[0138] According to one embodiment, the TTS generator (624) can convert text information into a voice signal. For example, the TTS generator (624) can convert text information generated by the natural language generator (622) into a voice signal to provide it to an electronic device.
[0139] According to one embodiment, the UI generator (626) can configure a graphical user interface to be displayed through the voice assistant client (510) of the electronic device. For example, the UI generator (626) can generate a graphical user interface that includes text information generated by the natural language generator (622).
[0140] According to one embodiment, the action executer (628) can execute a requested action based on the user's voice.
[0141] According to one embodiment, a message modifying module is a message application (560) (e.g., a native message application (560)), 3 rdThrough the party message application (560), the message application (560) can check reply messages among the messages transmitted from an external device that is the counterpart to the message application (560), and perform actions to modify the reply messages according to the overall context of the conversation. Referring to FIG. 6, the message modification module may include a reference message candidate identification module (630), a reply message identification module (635), an emotion detection module (645), a message relation identification module (640), a message composition module (650), and a message aggregation module (655).
[0142] According to one embodiment, the reference message candidate verification module (630) can verify messages transmitted by a user and messages received from a counterparty. For example, when a voice assistant client (510) is executed and there is a request from a user to verify a received message, the reference message candidate verification module (630) can verify the message information transmitted and received in the conversation session of the message application (560) to which the received message belongs, and determine at least some of them as reference messages. According to one embodiment, the reference message candidate verification module (630) can extract all transmitted and received messages in the conversation session as reference messages, or determine messages analyzed as being associated with the received message, a fixed number of recently transmitted and received messages, and / or messages transmitted and received during a recently fixed period of time as reference messages.
[0143] According to one embodiment, the reply message identification module (635) can determine whether the received message requested for confirmation by the user is a reply message to a message transmitted by the user through an electronic device. For example, the reply message identification module (635) can determine the received message as a reply message if the message type of the received message is specified as "reply", or if, as a result of analyzing the context of the message, it is interpreted as a reply to the user's query.
[0144] According to one embodiment, the message relationship identification module (640) can identify associated messages among the reference messages that are associated with the received reply message. For example, the message relationship identification module (640) can identify a message among the reference messages identified by the reference message candidate identification module (630) that is associated with the reply message identified by the reply message identification module (635).
[0145] According to one embodiment, the message relationship identification module (640) can convert a non-text form of reply message, such as an image, emoji, and / or video file, into text information using metadata or a large language model (LLM). The message relationship identification module (640) can analyze the text message among the messages and the content of the text information converted from the non-text form of reply message to determine at least one message associated with the received reply message.
[0146] According to one embodiment, the message relationship identification module (640) may use at least one semantic distance measurement method to determine the association between a received reply message and each message. For example, the message relationship identification module (640) may use BERT (bidirectional encoder representations for transformers) and / or LLM (large language model). The message relationship identification module (640) may convert the received message and each reference message into two vectors using BERT and measure the association using the cosine similarity of the two vectors.
[0147] According to one embodiment, the emotion detection module (645) can infer the emotion of the user or the other party contained in the message from the transmitted and received message. For example, if the reply message is in the form of an image (or video), such as an emoticon, a GIF file, or an image, the emotion expressed in the message can be analyzed.
[0148] According to one embodiment, the message configuration module (650) can generate new text (e.g., first text information) by combining a message determined to be associated with the reply message with the reply message. For example, the message configuration module (650) can generate new text by combining the reply message with at least some of the messages among the reference messages that are determined to be associated with the reply message by the message relationship identification module (640). The message configuration module (650) can generate at least one text that matches the context and tone by analyzing the relationship, context, tone, etc., between the reply message and each reference message.
[0149] According to one embodiment, the message combining module (655) can combine at least one text (e.g., first text information) generated by the message configuring module (650) to generate a new text (e.g., second text information). If there are two or more text information generated by the message configuring module (650), the message combining module (655) determines that the generated text information is related to each other and can combine the text information to generate a new text.
[0150] According to one embodiment, the natural language generator (622) can generate a modified reply message generated by the message modification module or text generated by the message combination module (655) (e.g., second text information) as text information in the form of a sentence used in an actual conversation.
[0151] Although FIGS. 5 and 6 illustrate that the message modification module is implemented on a voice assistant server (600), at least some of the configuration and / or operation of the message modification module may be implemented on an electronic device.
[0152] FIG. 7 is a flowchart of a method in which an electronic device according to one embodiment modifies and outputs a received message.
[0153] The illustrated method may be performed by the electronic device (400) of FIG. 4, or a part of the illustrated method may be performed individually and / or collectively by the electronic device of FIG. 4 (or the voice assistant client (510) of FIG. 5) and the voice assistant server (600) of FIG. 6.
[0154] According to one embodiment, in operation 710, the electronic device may receive the user's speech. For example, the electronic device may receive the user's speech through a microphone while the voice assistant client is running. The voice assistant client may convert the user's voice signal into text information through the speech analysis module (610), understand the content, and perform an action according to the user's command.
[0155] According to one embodiment, in operation 720, the electronic device can determine whether the input utterance is an utterance for message confirmation. According to one embodiment, the electronic device, in hands-free mode, a message application (e.g., native message application (570), 3 rd When a message is received from an external device that is the recipient of the message through a party message application (580), a notification of receipt may be provided to the user as a notification sound and / or vibration through a speaker or an external audio device. The voice assistant client analyzes the user's voice input to determine whether it is a utterance to confirm a message, a request for confirmation of a message from a message application, or a request for confirmation of a message received from a conversation partner, conversation session, or time, and based on the determination, can select one of the received messages (e.g., the first message) that the user requested to confirm.
[0156] According to one embodiment, in operation 730, the electronic device can verify transmitted and received message information. For example, the electronic device may select at least some of the messages transmitted by the electronic device and messages received from an external device that are included in the same conversation session as the received message (e.g., the first message) that is requested for verification according to user input to the voice assistant client as reference messages.
[0157] According to one embodiment, a conversation session including a received message may contain information on transmitted and received messages that have continued for a long time, and an electronic device may determine candidate messages among these that may be contextually related to the received message as reference messages. For example, the electronic device may extract all transmitted and received messages of the conversation session as reference messages, or determine messages analyzed as being related to the received message, a fixed number of recently transmitted and received messages, and / or messages transmitted and received during a recently fixed period of time as reference messages.
[0158] According to another embodiment, the electronic device may perform operations 740 to 780 in response to the reception of a message even when the user does not make a separate input (e.g., voice input) to directly check the message content. In this case, operations 710 and / or operations 720 may be omitted.
[0159] According to one embodiment, in operation 740, the electronic device can determine whether the received message is a reply message. Here, the reply message may be a message selected as a reply by the other party, who is a user of an external device, to a message transmitted by the user of the electronic device, or a message interpreted as the other party's response to the user of the electronic device's query from the conversation content. In response to the message transmitted by the electronic device, the other party may create and transmit various types of messages, such as text, images, emojis, GIFs, or videos, as reply messages to the electronic device.
[0160] According to one embodiment, an electronic device can determine whether a received message is a reply message based on message type information of message attribute information included in the received message. Here, the message attribute information (or metadata, header information) may include information defining the attributes of the message, such as the unique number, type, and address of the message. Table 1 shows an example of information included in the message attribute information.
[0161] / Key Information - Message ID: A unique number for the message being transmitted - Message Type: Specifies the type of message being transmitted (normal, reply, forward, group, broadcast) - Source Address: Identifies the sender of the message - Destination Address: Identifies the recipient of the message - Timestamp: Indicates the date and time the message was created - Sequence Number: Tracks the order of messages within a conversation - Data Length: Length of the payload data - Attached File Type: Images, videos, sound, emojis, document files, etc. within the payload - Attached File Length: Length of the attachment for each item - Reference ID: Identifies which message the reply refers to / Other Information - Acknowledgment Flag: Indicates whether the recipient successfully received the message - Error Code: Describes errors that occurred during transmission - Checksum: Checks for and corrects errors in the message - Encryption Key: Used for encryption to ensure the secure transmission of sensitive data
[0162] Referring to Table 1 above, message attribute information may include information related to the message type, for example, the message type may be designated as normal, reply, forward, group, or broadcast. If the message type of the received message is designated as reply, the electronic device may determine that the received message is a reply message. According to another embodiment, the electronic device may determine whether the received message is a reply message based on context analysis of at least some of the messages included in the same conversation session as the received message. For example, there may be cases where the other party responds to a query from the electronic device user as a normal message without designating the message type as reply. In this case, the electronic device may analyze the content of the transmitted message and the received message, and if the content of the received message is analyzed as a reply to the query included in a specific transmitted message, the electronic device may determine that the received message is a reply message. According to one embodiment, in operation 750, the electronic device may determine the message among the transmitted and received messages that is associated with the reply message. The electronic device can determine at least one message associated with the reply message identified in operation 740 among the reference messages identified in operation 730.
[0163] According to one embodiment, an electronic device may determine at least one reference message as an associated message by comparing the semantic distance between the reference messages and a reply message transmitted by the other party with a reference value. In this case, the electronic device may use various semantic distance measurement methods between messages. For example, the electronic device may use BERT (bidirectional encoder representations for transformers) to convert the reference message and the reply message into respective vectors, measure the cosine similarity between the two vectors, and if the measured cosine similarity is greater than or equal to a reference value, determine the reference message as an associated message.
[0164] According to one embodiment, an electronic device can determine the semantic distance between a reference message and a reply message by using a large language model (LLM), various statistical analysis methods not limited to one, or a deep learning-based approach.
[0165] According to one embodiment, on a specific conversation session (or conversation window) of a message application, a user of an electronic device sends three messages to another person: “When is our soccer match?” (MessageID_01), “We should have a drink?” (MessageID_02), and “When should we do it?” (MessageID_03), and receives from the other person: “Thursday evening at 7 PM” (MessageID_04) as a reply to MessageID_01 and “How about Saturday evening at 6 PM?” (MessageID_05) as a reply to MessageID_03. In this case, the electronic device can determine through semantic distance analysis that the transmitted message MessageID_01 and the reply message MessageID_04 are related to each other, and determine that the transmitted messages MessageID_02 and MessageID_03 and the reply message MessageID_05 are related to each other.
[0166] As another example, on a specific conversation session of a messaging application, a user of an electronic device sends messages to another person, such as “When is our family gathering?” (MessageID_06) and “How is your cold?” (MessageID_07), and receives from the other person a reply to MessageID_06, “Thursday evening at 7 o’clock” (MessageID_08), a general message, “Oh no.” (MessageID_09), “Oh, it’s Friday evening at 6 o’clock” (MessageID_10), and a reply to MessageID_07, “I’m all better.” (MessageID_11). In this case, the electronic device can determine MessageID_06, MessageID_09, and MessageID_10 as messages associated with MessageID_08 based on semantic distance analysis between MessageID_08, which is identified as a reply message in the message type, and the reference messages. The electronic device can determine that MessageID_08 and MessageID_09 are information not necessary for modifying the reply message through semantic analysis of each message, and can determine that Message_06 and Message_10 are related and that MessageID_07 and MessageID_11 are related by comparing semantic distance with a reference value.
[0167] According to one embodiment, an electronic device can determine the emotion of the other party from associated messages among reference messages that are associated with a reply message. For example, if the associated message is in the form of an image, emoticon, emoji, GIF, or video, the electronic device can analyze the emotion of the other party to be expressed in the data through an AI model (or LLM). Table 2 shows an example of a prompt sent to an AI model to check the emotion of the other party.
[0168] Message [A] has arrived for the user, and message [A] is related to messages [B]. Please generate a keyword representing the emotion expressed in [A]. [A]__Description of Image or Video received from the user of the second device__ [B]__Content of Reference Message #1 related to the received message____Content of Reference Message #2 related to the received message____Content of Reference Message #3 related to the received message__
[0169] According to one embodiment, for information of a type other than text, such as an image, emoticon, emoji, GIF, or video included in an associated message, the electronic device may include text containing the content of said information (e.g., a description of an Image or Video received from a second device user) in a prompt. For example, the electronic device may obtain a description of said Image or Video by analyzing the image or Video using an image analysis tool (e.g., image classification, scene analysis, object recognition). According to one embodiment, the description of the Image or Video may include additional information about the image file (e.g., file name, metadata). According to one embodiment, in a specific conversation session of a message application, the other party may reply to "Was yesterday's test difficult?" (MessageID_12) sent by the user of the electronic device with a crying emoji (MessageID_13). In this case, when a message is received that is multimodal data such as emojis, images, emoticons, or videos rather than text, such as MessageID_13, the electronic device may convert it into text information based on attribute information (or metadata) of the received message. Alternatively, the electronic device may convert the received message into text that can describe the image or video information of the message through an AI model. For example, the electronic device may identify the title of an emoji, "sobbing," from the attribute information of MessageID_07 and convert MessageID_13 into a keyword expressing an emotion, such as sadness or tears, through the identified information. According to one embodiment, in operation 760, the electronic device may generate new text information (e.g., first text information) by combining an associated message and a reply message.The electronic device can generate at least one first text information by combining at least a portion of the content of a reply message and the content of at least one associated message identified as being associated with the reply message.
[0170] According to one embodiment, an electronic device may generate a prompt including a received first message and an associated message and a request for a message combination and transmit it to an AI model, and receive at least one first text information from the AI model as a reply to the prompt. Table 3 shows an example of the prompt.
[0171] Message [A] has arrived for the user, and message [A] is related to messages [B]. Generate a summary message of 20 characters or less that includes the meanings of [A] and [B], matching the tone A uses for B. [A]__Content of message received from the user of the second device__ [B]__Content of reference message #1 related to the received message____Content of reference message #2 related to the received message____Content of reference message #3 related to the received message__
[0172] According to one embodiment, an electronic device can send three messages to a recipient in a message application: "When is our soccer match?" (MessageID_01), "I need to have a drink?" (MessageID_02), and "When should we do it?" (MessageID_03). The recipient can send "Thursday evening at 7 PM" (MessageID_04) as a reply to MessageID_01, and "How about Saturday evening at 6 PM?" (MessageID_05) as a reply to MessageID_03. The electronic device can determine through semantic distance analysis that MessageID_01 and MessageID_04 are related to each other, and that MessageID_02, MessageID_03, and MessageID_05 are related to each other. The electronic device may generate first text information such as "The soccer match is at 7 PM on Thursday" by combining the contents of MessageID_01 and MessageID_04, and generate first text information such as "Shall we have a drink at 6 PM on Saturday?" by combining the contents of MessageID_02, MessageID_03, and MessageID_05. According to one embodiment, when the electronic device generates first text information by combining messages, it may exclude at least one message that is determined not to be associated with the reply message. For example, the electronic device can generate the first text information “Our family gathering is at 6 PM on Friday” by combining “When is our family gathering?” (MessageID_06) and “Ah, it’s Friday at 6 PM” (MessageID_10), which are determined to be relevant among the transmitted and received messages of a conversation session of a message application, and excluding “Thursday at 7 PM” (MessageID_08) and “Ah, no.” (MessageID_09), which are determined to be irrelevant, from creating new first text information. Additionally, the electronic device can generate the first text information “Our family gathering is at 6 PM on Friday” by combining “How is your cold?” (MessageID_07) and “I’m all better.”A first text information, “I’m all better from my cold,” can be generated by combining (MessageID_11). According to one embodiment, in operation 770, the electronic device can combine the generated text information to generate new text information (e.g., second text information). For example, if the first text information generated in operation 760 is multiple, the electronic device can combine multiple first text information to generate second text information. Accordingly, the electronic device can process multiple reply messages transmitted from the other party as a single message and deliver multiple pieces of information to the user as a single message.
[0173] According to one embodiment, an electronic device may generate a prompt including at least one first text information and a request for a combination of the first text information and transmit it to an AI model, and receive a second text information from the AI model as a reply to the prompt. Table 4 shows an example of the prompt.
[0174] You are a virtual assistant that notifies the user when a message is received. The messages to be notified to the user are listed in [A]. Generate a notification message that matches the tone of the received message by naturally connecting the messages to be notified to the user. [A]__Content of Generated Message #1 related to Received Message #1____Content of Generated Message #2 related to Received Message #2____Content of Generated Message #3 related to Received Message #3__
[0175] According to one embodiment, an electronic device may use a rule-based sentence generation method when generating second text information using first text information. According to one embodiment, the first text information may generate “The soccer match is at 7 PM on Thursday” and “Shall we have a drink at 6 PM on Saturday?”, and the electronic device may combine the two first text information to generate second text information, “The soccer match is on Thursday evening. Shall we have a drink at 6 PM on Saturday?”. In this case, the electronic device may refer to the tone and context of the received message in the conversation session to determine it to be a casual tone between friends, and combine the first text information in a form corresponding to that tone. According to one embodiment, if it is confirmed that a plurality of generated first text information are unrelated to each other, the electronic device may not combine the first text information. For example, the first text information is “Our family gathering is at 6 PM on Friday” and “I’m all better from my cold.” In this case, the electronic device may determine that the two first text information have low correlation through semantic distance analysis of the two first text information and may not combine the two first text information.
[0176] According to one embodiment, if there is only one first text information generated in operation 760, operation 770 may be omitted.
[0177] According to one embodiment, in operation 780, the electronic device can generate a message to be delivered to the user based on the second text information generated in operation 770. The electronic device can use a natural language generator to generate a message in a form to deliver the content of a reply message to the user.
[0178] According to one embodiment, if the generated second text message is “The soccer match is on Thursday evening. Shall we have a drink at 6 PM on Saturday?”, the electronic device can generate and output a final message to the user, such as “Received a message from A saying ‘The soccer match is on Thursday evening. Shall we have a drink at 6 PM on Saturday?’”.
[0179] As another example, if the generated second text information consists of two items, “Our family gathering is at 6 PM on Friday” and “I’m all better from my cold,” the electronic device may generate a new message such as “Received messages from A saying ‘Our family gathering is at 6 PM on Friday’ and ‘I’m all better from my cold’” to convey the content of the received message to the user.
[0180] According to one embodiment, an electronic device may generate new text information (e.g., first text information) by combining an associated message and a reply message of operation 760, generate new text information (e.g., second text information) by combining the generated text information of operation 770, and generate a message to be delivered to a user based on the generated second text information of operation 780 using a single prompt. For example, the electronic device may generate a first message (or reply message) received from a counterparty and associated messages of a conversation session, and a prompt requesting the generation of a new message to be delivered to the user, and deliver them to an AI model.
[0181] Table 5 shows an example of the above prompt.
[0182] You are a virtual assistant that notifies the user when a message is received. Message [A] has arrived for the user, and Message [A] is related to Messages [B]. Think of a response message for each of the multiple received messages, and create a notification message that naturally connects these response messages to inform the user, matching the tone. [A]__Content of Message #1 received from the user of the second device____Content of Message #2 received from the user of the second device__[B]__Content of Reference Message #1 related to the received message____Content of Reference Message #2 related to the received message____Content of Reference Message #3 related to the received message__
[0183] The AI model can generate and output a final message corresponding to the message generated in operation 780 based on the received prompt. According to one embodiment, the electronic device may output the generated final message as an audio signal through a speaker, or transmit the audio signal to an audio device worn by the user through a communication circuit. Additionally, the electronic device may display the generated final message on the screen of a voice assistant displayed on a display. Hereinafter, with reference to FIGS. 8 to 13, audio information or a screen output through a voice assistant when a reply message is received from a counterparty in a message application, according to various embodiments of the present document, will be described. In FIGS. 8 to 13, Bixby is used as the voice assistant. TM We will explain based on ), but it is not limited to this.
[0184] According to one embodiment, the electronic device (400) can activate the voice assistant when the user speaks a wake-up word (trigger word) of the voice assistant. For example, the wake-up word of the voice assistant may include at least part of the name of the voice assistant. (e.g., Hi Bixby)
[0185] FIG. 8 illustrates a voice assistant and message application screen provided in an electronic device (400) according to one embodiment.
[0186] FIG. 8 illustrates an embodiment in which, when a reply message with the message type designated as reply is received, the association with previous messages is checked and a new message is generated.
[0187] According to one embodiment, the electronic device (400) executes a voice assistant, and when a user's voice utterance (810) is input through a microphone, it can determine whether the utterance is for confirming a message received through a message application. For example, when the user utters, such as "Hi Bixby, read me recent messages," the electronic device (400) can determine whether the message application (e.g., a native message application, 3 rd It can be determined that it is a request to confirm a message received from the other party through a party message application. According to another embodiment, the electronic device (400) may perform operations to check whether it is a reply message described below and to output a final message in response to the reception of a message, even in the absence of a user's voice utterance (810).
[0188] According to one embodiment, the electronic device (400) can check reply messages among the messages received from the other party. Referring to FIG. 8, in a conversation session (800) with friend A, who is the other party in the message application, the user sends message 1 "When is our soccer match?" (852), message 2 "We should have a drink too" (854), and message 3 "When should we do it?" (856), and the other party can send message 4 "Thursday evening at 7" (862), which is a reply message to message 1 (852), and message 5 "How about Saturday evening at 6" (864), which is a reply message to message 3 (856). The electronic device (400) can check that the message type of the attribute information of message 4 (862) and message 5 (864) is reply, and can confirm that message 4 (862) and message 5 (864) are reply messages.
[0189] According to one embodiment, the electronic device (400) may select messages 1 through 5 (852, 854, 856, 862, 864) among the messages included in the conversation session (800) as reference messages, and determine at least one associated message for message 4 (862), which is a reply message among the reference messages, and at least one associated message for message 5 (864). For example, the electronic device (400) may determine message 1 (852) as the associated message for message 4 (862) and determine message 2 (854) and message 3 (856) as associated messages for message 5 (864) based on a semantic distance measurement between each of the reference messages and message 4 (862). According to one embodiment, the electronic device (400) can determine the semantic distance between each reply message and reference messages using an AI model, or various statistical analysis methods not limited to, or a deep learning-based approach, to determine at least one associated message associated with the reply message.
[0190] According to one embodiment, the electronic device (400) can determine the transmission message to be the subject of the reply as an associated message for a reply message transmitted from the other party.
[0191] According to one embodiment, the electronic device (400) may generate first text information by combining at least a portion of the content of a reply message and at least one associated message associated with the reply message. For example, the electronic device (400) may generate a prompt including message 4 (862), message 1 (852) associated therewith, and a request for combining messages, and transmit it to an AI model, and receive first text information from the AI model. Additionally, the electronic device (400) may generate a prompt including message 5 (864), message 2 (854) and message 3 (856) associated therewith, and a request for combining messages, and transmit it to an AI model, and receive text information from the AI model. In the embodiment of FIG. 8, "The soccer match is on Thursday evening" corresponding to message 4 (862) and "Shall we have a drink at 6 PM on Saturday?" corresponding to message 5 (864) may be generated as first text information.
[0192] According to one embodiment, the electronic device (400) can generate second text information by combining generated first text information. For example, the electronic device (400) can generate second text information by combining text information that combines at least a portion of message 4 (862) and message 1 (852) and text information that combines at least a portion of message 5 (864), message 2 (854), and message 3 (856). The electronic device (400) can generate a prompt including at least one first text information and a request for combining the first text information and transmit it to an AI model, and receive second text information from the AI model as a response to the prompt.
[0193] According to one embodiment, the electronic device (400) can generate a message to be delivered to the user based on the generated second text information. For example, the electronic device (400) can use a natural language generator to generate a final message in the form of a reply message to be delivered to the user. When generating the final message, the electronic device (400) can generate the final message by reflecting the tone of conversation between the user and the other party, including information that can identify the other party (e.g., Friend A) and a suggestion for additional input after checking the message (e.g., Should I reply?).
[0194] Referring to FIG. 8, the electronic device (400) can generate a final message, “Friend A sent a message saying, ‘The soccer match is on Thursday evening. Shall we have a drink at 6 PM on Saturday?’ Shall we reply?” (820).
[0195] According to one embodiment, the electronic device (400) may request the AI model to generate a final message from messages transmitted and received through a single prompt. For example, the electronic device (400) may generate and transmit to the AI model a prompt requesting the generation of messages 1 through 5 within the conversation session (800) and a new message to be notified to the user, and receive the final message from the AI model.
[0196] According to one embodiment, the electronic device (400) may output the generated final message as an audio signal through a speaker, or transmit the audio signal to an audio device worn by the user through a communication circuit. Additionally, the electronic device (400) may display the generated final message (820) on the screen of a voice assistant displayed on a display.
[0197] FIG. 9 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0198] FIG. 9 illustrates an embodiment in which, when a reply message and a general message are received together from the other party, the association with previous messages is confirmed and a new message is generated.
[0199] Referring to FIG. 9, in a conversation session (900) with friend A, who is the counterpart of the message application, the user sends message 1 "When is our soccer match?" (952), message 2 "We should have a drink too" (954), and message 3 "When should we?" (956), and the counterpart can send message 4 "Thursday evening at 7 PM" (962), which is a reply message to message 1 (952), and message 5 "How about Saturday evening at 6 PM?" (964), which is a general message. Message 4 (962) may have the message type of the aging information set to reply, and message 5 (964) may be set to normal.
[0200] According to one embodiment, the electronic device (400) can confirm that the message type of the attribute information of message 4 (962) is reply, and can confirm that message 4 (962) is a reply message.
[0201] According to one embodiment, the electronic device (400) can determine whether a received message is a reply message based on context analysis of at least some of the messages included in the same conversation session (900) as the received message. For example, the electronic device (400) can determine that message 5 (964) is a reply message to message 3 (956) by analyzing the contents of message 5 (964), which is a general message, and the messages within the conversation session (900).
[0202] According to one embodiment, the electronic device (400) determines at least one associated message associated with a reply message among messages within a conversation session (900), generates at least one first text information based on at least a portion of the content of the reply message and the associated message, generates second text information by combining the first text information, and generates a final message through natural language processing of the second text information. At least some of the operations may be performed by sending a prompt to an AI model and receiving a response from the AI model.
[0203] Referring to FIG. 9, the electronic device (400) can generate and output the final message (920) "Friend A sent a message saying 'The soccer match is on Thursday evening. Shall we have a drink at 6 PM on Saturday?' Shall we reply?"
[0204] FIG. 10 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0205] FIG. 10 illustrates an embodiment for generating a final message to be provided to a user by excluding contextually unnecessary messages from among the messages of a conversation session.
[0206] Referring to FIG. 10, in a conversation session (1000) with cousin A, who is the counterpart of the message application, the user sends message 1 "When is our family gathering?" (1052) and message 2 "How is your cold?" (1054), and the counterpart can send message 3 "Thursday evening at 7 o'clock" (1062), which is a reply message to message 1 (1052); message 4 "Oh no" (1064), which is a general message; message 5 "Oh, it's Friday evening at 6 o'clock" (1066); and message 6 "My cold is all better" (1068), which is a reply message to message 2 (1054).
[0207] According to one embodiment, the electronic device (400) can confirm that the message type of the attribute information of message 3 (1062) and message 6 (1068) is reply, and can confirm that message 3 (1062) and message 6 (1068) are reply messages.
[0208] According to one embodiment, the electronic device (400) may exclude at least some of the messages when generating the final message based on the content of the message received after the reply message. For example, even if the electronic device (400) confirms that message 3 (1062) is the reply message, it may determine that message 5 (1066) is the actual reply message to message 1 (1052) containing the user's query based on the content of message 4 (1064) "Oh no" and the content of message 5 (1066) "Oh, it's Friday evening at 6 o'clock." The electronic device (400) may determine that a message other than the received reply message is the actual reply message if the reply message contains content unrelated to the user's query, if a message containing a different answer to the same query is received after the reply message, and / or if a message containing an expression indicating a correction of the answer, such as "Oh no," is received.
[0209] According to one embodiment, the electronic device (400) can determine the final message to be provided to the user based on the contents of message 5 (1066) and associated message 1 (1052), and message 6 (1068) and associated message 2 (1054), excluding message 3 (1062) and message 4 (1064).
[0210] According to one embodiment, an electronic device (400) may determine at least one associated message associated with a reply message among messages within a conversation session (1000), generate at least one first text information based on at least a portion of the content of the reply message and the associated message, generate second text information by combining the first text information, and generate a final message through natural language processing of the second text information. At least some of the operations may be performed by sending a prompt to an AI model and receiving a response from the AI model.
[0211] Referring to FIG. 10, the electronic device (400) can generate and output the final message (1020) "Cousin A sent a message saying, 'Our family gathering is on Friday evening at 6 o'clock. I'm all better now.' Would you like to reply?"
[0212] FIG. 11 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0213] FIG. 11 illustrates an embodiment in which the type of received message is a general message rather than a reply, but is determined to be a reply to a message sent by the user based on the context.
[0214] Referring to FIG. 11, in a conversation session (1100) with a younger sibling A, who is the counterpart of the message application, the user sends message 1 "Are you fully recovered from your cold?" (1152), message 2 "There is an orchestra performance this Saturday" (1154), and message 3 "Can you come with me?" (1156), and the counterpart can send message 4 "Not yet" (1162), message 5 "When is the performance time?" (1164), message 6 "I like orchestra performances" (1166), and message 7 "I think I'll be fully recovered from my cold before Saturday" (1168). Here, the received messages 4 to 7 (1162, 1164, 1166, 1168) may be normal messages with the message type of the attribute information specified as normal.
[0215] According to one embodiment, the electronic device (400) may determine whether a received message is a reply message based on context analysis of at least some of the messages included in the same conversation session (1100) as the received message. For example, the electronic device (400) may determine that among the messages sent by the user, message 1 (1152) and message 3 (1156) contain the user's query, and may determine a reply message to the user's query by analyzing the contents of received messages 4 to 7 (1162, 1164, 1166, 1168). The electronic device (400) may determine message 4 (1162) as a reply message to message 1 (1152) and message 6 (1166) as a reply message to message 3 (1156) by analyzing the context of the entire conversation.
[0216] According to one embodiment, the electronic device (400) may determine at least one associated message among the messages in the conversation session (1100) that is associated with a reply message. For example, the electronic device (400) may determine message 1 (1152) and message 7 (1168) that are associated with a cold, which is the subject of message 4 (1162) that is determined as a reply message, and message 2 (1154), message 3 (1156), and message 5 (1164) that are associated with an orchestra performance, which is the subject of message 6 (1166), as associated messages.
[0217] According to one embodiment, at least one first text information may be generated based on at least a portion of the content of a reply message and an associated message, a second text information may be generated by combining the first text information, and a final message may be generated through natural language processing of the second text information. At least some of the above operations may be performed by sending a prompt to an AI model and receiving a response from the AI model.
[0218] Referring to FIG. 11, the electronic device (400) can generate and output a final message (1120) that says, "I received a message from my younger sibling A saying, 'I think my cold will get better before Saturday, so I can go on Saturday. When is the performance time?' Would you like to reply?"
[0219] FIG. 12 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0220] Fig. 12 is 3 rd An example is illustrated regarding the generation of a final message to be provided to the user when a multimodal message is received through a party message application.
[0221] Referring to Fig. 12, 3 rd In a conversation session (1200) with friend B, who is the counterpart of the party message application, the user sends message 1 "Was the test yesterday difficult?" (1252), message 2 "Let's go eat something delicious after the test" (1254), and message 3 "I know a good restaurant" (1256), and the counterpart can send message 4 (1262) containing emoticons as a reply to message 1.
[0222] According to one embodiment, the electronic device (400) recognizes the user's speech "Hi Bixby, read me recent messages" (1210), and 3 rd It can be determined that this is a request to confirm a message received from the other party through the party message application.
[0223] According to one embodiment, the electronic device (400) can infer the emotions of the user or the other party contained in the message from the transmitted and received message. For example, if the reply message is in the form of an image (or video), such as an emoticon, a GIF file, or an image, the electronic device (400) can convert it into text information based on the attribute information (or metadata) of the received message. Alternatively, the electronic device (400) can convert the received message into text that can describe the image or video information of the message using an AI model. Referring to FIG. 12, the electronic device (400) can identify the title of the emoji "sobbing" from the attribute information of message 4 (1262) and convert message 4 (1262) into a keyword expressing an emotion such as sadness or tears through the identified information.
[0224] According to one embodiment, the electronic device (400) determines message 4 (1262) and message 1 (1252) as associated messages among messages in a conversation session (1200), and can generate a final message based on the contents of message 4 (1262) and message 1 (1252). Referring to FIG. 12, the electronic device (400) can generate and output the final message (1220) "Friend B sent a text yesterday saying 'I'm sad because the test was too difficult.' Should I reply?"
[0225] FIG. 13 illustrates a voice assistant and message screen provided in an electronic device according to one embodiment.
[0226] FIG. 13 illustrates an embodiment of generating a final message to be provided to a user by modifying reply messages received from different counterparts in a group conversation session.
[0227] Referring to FIG. 13, a user of an electronic device (400), friend C, and friend D are participating in a conversation session (1300) of a messaging application. The user sends message 1 "Completely ruined the test" (1352) and message 2 "Let's play tennis together" (1354), and friend C sends reply message 3 "Me ruined it too" (1362) to message 1 (1352), and friend D sends reply message 4 "Available on Saturday morning" (1372) to message 2 (1354).
[0228] According to one embodiment, the electronic device (400) may include information that can identify the recipient who sent the message when generating the final message. For example, the electronic device (400) may generate text information by combining message 1 (1352) and message 3 (1362) and including information of friend C, the sender of message 3 (1362), and may generate text information by combining message 2 (1354) and message 4 (1372) and including information of friend D, the sender of message 4 (1372). The electronic device (400) may combine the two text information generated in this way into new text information and generate the final message through natural language processing.
[0229] Referring to FIG. 13, the electronic device (400) can generate and output the final message (1320) “Friend C sent a message saying that I also failed the test. Friend D sent a message saying that tennis is available Saturday morning. Shall I reply?”
[0230] An electronic device according to various embodiments of the present document may include a speaker, a communication circuit, a memory, and at least one processor.
[0231] According to one embodiment, the memory may be executed by at least one processor, and upon execution, the electronic device may store instructions for receiving a first message from an external device through the communication circuit, checking whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input, and if the first message is a reply message, analyzing the association with at least some of the reference messages among the messages transmitted and received with the external device with the first message to determine at least one associated message, generating a second message based on the first message and the determined at least one associated message, and converting the generated second message into an audio signal and outputting it through the speaker or an external audio device wirelessly connected to the communication circuit.
[0232] According to one embodiment, the memory may store instructions that cause the electronic device to generate the second message using an AI model.
[0233] According to one embodiment, the memory may store instructions for the electronic device to check whether the received first message is a reply message when the first message is received in hands-free mode, and to generate and output the second message.
[0234] According to one embodiment, the memory may store instructions that allow the electronic device to determine whether the first message is a reply message based on message type information of message attribute information included in the first message.
[0235] According to one embodiment, the memory may store instructions that allow the electronic device to determine whether the first message is a reply message based on context analysis of at least some of the messages included in the same conversation session as the first message.
[0236] According to one embodiment, the memory may store instructions that cause the electronic device to determine at least one reference message as the at least one associated message by comparing the semantic distance with the first message among the reference messages.
[0237] According to one embodiment, the memory may store instructions that cause the electronic device to generate at least one first text information by combining at least a portion of the contents of the first message and the associated message.
[0238] According to one embodiment, the memory may store instructions such that the electronic device generates a prompt including a request for a message combination including the first message and the associated message and transmits it to an AI model, and receives at least one first text information from the AI model as a response to the prompt.
[0239] According to one embodiment, the memory may store instructions that cause the electronic device to combine the plurality of first text information to generate second text information when the first text information is a plurality of pieces.
[0240] According to one embodiment, the memory may store instructions such that the electronic device generates a prompt including a request for the combination of at least one first text information and the first text information and transmits it to an AI model, and receives the second text information from the AI model as a response to the prompt.
[0241] According to one embodiment, the memory may store instructions that cause the electronic device to convert the generated second text information using a natural language generator (NLG) to generate the second message.
[0242] According to one embodiment, the memory may store instructions for the electronic device to generate emotion information from text or image information of the first message and to generate the second image based on the emotion information.
[0243] According to one embodiment, the memory may store instructions for the electronic device to receive the first message through a message application, and when a user voice input through a microphone includes a message confirmation request to a voice assistant client, to check whether the received first message is a reply message and to generate and output the second message.
[0244] According to one embodiment, the memory may store instructions that cause the electronic device to display the second message on the voice assistant client screen displayed on the display.
[0245] A method performed by an electronic device according to various embodiments of the present document may include: receiving a first message from an external device; checking whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input; if the first message is a reply message, determining at least one associated message by analyzing the association with at least some of the reference messages among the messages transmitted and received with the external device with respect to the first message; generating a second message based on the first message and the determined at least one associated message; and converting the generated second message into an audio signal and outputting it.
[0246] According to one embodiment, the operation of generating the second message may include the operation of generating the second message using an AI model.
[0247] According to one embodiment, the operation of determining whether the received first message is a reply message may include an operation of determining whether the first message is a reply message based on message type information of message attribute information included in the first message, or an operation of determining whether the first message is a reply message based on context analysis of at least some of the messages included in the same conversation session as the first message.
[0248] According to one embodiment, the operation of determining the at least one associated message may include comparing the semantic distance with the first message among the reference messages and determining the at least one reference message as the at least one associated message.
[0249] According to one embodiment, the operation of generating the second message may include: generating at least one first text information by combining at least a portion of the contents of the first message and the associated message; generating a second text information by combining the plurality of first text information when the first text information is a plurality; and generating the second message by converting the generated second text information using a natural language generator (NLG).
[0250] A computer-readable non-transient recording medium according to various embodiments of the present document may store instructions for performing operations such as receiving a first message from an external device, checking whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input, determining at least one associated message by analyzing the association of at least some of the reference messages among the messages transmitted and received with the external device with the first message if the first message is a reply message, generating a second message based on the first message and the determined at least one associated message, and converting the generated second message into an audio signal and outputting it.
[0251] The electronic device according to the various embodiments disclosed in this document may be of various forms. The electronic device may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a consumer electronics device. The electronic device according to the embodiments of this document is not limited to the devices described above.
[0252] The various embodiments of this document and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of said items unless the relevant context clearly indicates otherwise. In this document, phrases such as "A or B," "at least one of A and B," "at least one of A or B," "A, B or C," "at least one of A, B and C," and "at least one of A, B, or C" may each include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Terms such as "first," "second," or "first" or "second" may be used simply to distinguish said components from other said components and do not limit said components in any other aspect (e.g., importance or order). Where any (e.g., 1st) component is referred to as “coupled” or “connected” to another (e.g., 2nd) component, with or without the terms “functionally” or “communicationly,” it means that said any component may be connected to said other component directly (e.g., via a wire), wirelessly, or through a third component.
[0253] The term “module” as used in the various embodiments of this document may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).
[0254] Various embodiments of the present document may be implemented as software (e.g., program (140)) comprising one or more instructions stored in a storage medium (e.g., internal memory (136) or external memory (138)) readable by a machine (e.g., electronic device (101)). For example, a processor (e.g., processor (120)) of the machine (e.g., electronic device (101)) may call at least one of the one or more instructions stored in the storage medium and execute it. This enables the machine to be operated to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' simply means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.
[0255] According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)) or an application store (e.g., Play Store). TM It can be distributed online (e.g., downloaded or uploaded) through ) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
[0256] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities, and some of the multiple entities may be separated and placed in other components. According to various embodiments, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.
Claims
1. In an electronic device, speaker; Communication circuit; Memory; and It includes at least one processor, The above memory can be executed by at least one processor, and at the time of execution, the electronic device, A first message is received from an external device through the above communication circuit, and Check whether the received first message is a reply message to a transmission message sent by the electronic device to the external device according to user input, and If the first message is a reply message, at least one associated message is determined by analyzing the association with the first message for at least some of the reference messages among the messages transmitted and received with the external device, and Based on the first message and the determined at least one associated message, a second message is generated, and An electronic device that stores instructions for converting the generated second message into an audio signal and outputting it through the speaker or an external audio device wirelessly connected to the communication circuit.
2. In Paragraph 1, The above memory is, the electronic device, An electronic device that stores instructions for generating the second message using an AI model.
3. In Paragraph 1 or 2, The above memory is, the electronic device, An electronic device that stores instructions for checking whether the received first message is a reply message when the first message is received in hands-free mode, and for generating and outputting the second message.
4. In any one of paragraphs 1 to 3, The above memory is, the electronic device, An electronic device that stores instructions for determining whether the first message is a reply message based on message type information of message attribute information included in the first message.
5. In any one of paragraphs 1 to 3, The above memory is, the electronic device, An electronic device storing instructions that determine whether the first message is a reply message based on context analysis of at least some of the messages included in the same conversation session as the first message.
6. In any one of paragraphs 1 through 5, The above memory is, the electronic device, An electronic device storing instructions that determine at least one reference message as the at least one associated message by comparing the semantic distance with the first message among the above reference messages.
7. In any one of paragraphs 1 through 6, The above memory is, the electronic device, At least one first text information is generated by combining at least a portion of the contents of the first message and the associated message, and Generate a prompt including the above first message and the above associated message, and a request for a message combination, and transmit it to the AI model, and An electronic device storing instructions that receive at least one first text information as a response to the prompt from the AI model.
8. In Paragraph 7, The above memory is, the electronic device, An electronic device that stores instructions for generating second text information by combining multiple first text information when the first text information is multiple.
9. In Paragraph 8, The above memory is, the electronic device, An electronic device storing instructions that generate and transmit to an AI model a prompt including at least one first text information and a request for a combination of the first text information, and receive the second text information from the AI model as a response to the prompt.
10. In either Paragraph 8 or Paragraph 9, The above memory is, the electronic device, An electronic device that stores instructions for generating the second message by converting the generated second text information using a natural language generator (NLG).
11. In any one of paragraphs 1 through 10, The above memory is, the electronic device, An electronic device that stores instructions for generating emotion information from text or image information of the first message and generating the second image based on the emotion information.
12. In any one of paragraphs 1 through 11, The above memory is, the electronic device, Receive the first message through a message application, An electronic device that stores instructions for checking whether the received first message is a reply message and generating and outputting the second message when a user voice input through a microphone includes a request to check a message to a voice assistant client.
13. In any one of paragraphs 1 through 12, The above memory is, the electronic device, An electronic device that stores instructions for displaying the second message on the voice assistant client screen displayed on the display.
14. In a method performed by an electronic device. The operation of receiving a first message from an external device; An operation to determine whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input; If the first message is a reply message, the operation of determining at least one associated message by analyzing the association with the first message for at least some of the reference messages among the messages transmitted and received with the external device; The operation of generating a second message based on the first message and at least one determined associated message; and A method comprising the operation of converting the generated second message into an audio signal and outputting it.
15. In a computer-readable non-transient recording medium, The operation of receiving a first message from an external device; An operation to determine whether the received first message is a reply message to a transmission message transmitted by the electronic device to the external device according to user input; If the first message is a reply message, the operation of determining at least one associated message by analyzing the association with the first message for at least some of the reference messages among the messages transmitted and received with the external device; The operation of generating a second message based on the first message and at least one determined associated message; and A recording medium storing instructions that perform the operation of converting the generated second message into an audio signal and outputting it.