Wearable device, method, and non-transitory computer-readable storage medium for recognizing user intent from user input

The wearable device addresses the challenge of accurately recognizing user intent by using eye tracking and voice input to generate precise AI responses, minimizing hallucinations through Retrieval Augmented Generation.

WO2026134588A1PCT designated stage Publication Date: 2026-06-25SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-10-17
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing technologies struggle to accurately recognize user intent from a combination of voice input and gaze information, leading to potential hallucinations in artificial intelligence responses.

Method used

A wearable device equipped with a camera for eye tracking and a microphone for voice input, along with a processor that generates prompts based on gaze and voice data to provide accurate responses, utilizing Retrieval Augmented Generation (RAG) to enhance user intent recognition.

Benefits of technology

Enhances the accuracy of user intent recognition by integrating eye tracking and voice input, reducing hallucinations and improving the reliability of AI responses.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025016529_25062026_PF_FP_ABST
    Figure KR2025016529_25062026_PF_FP_ABST
Patent Text Reader

Abstract

This wearable device may comprise at least one processor. The at least one processor is set to: while an assistant application is being executed, acquire a speech input through at least one microphone, and acquire gaze information of a user through at least one camera; on the basis of the speech input and the gaze information, obtain criterion information for identifying a dataset to be used to generate a prompt; identify at least one dataset from among a plurality of datasets stored in a memory, on the basis of the criterion information; generate the prompt on the basis of the at least one dataset; and display, through a display, a response to the speech input, obtained on the basis of the prompt.
Need to check novelty before this filing date? Find Prior Art

Description

Wearable device, method, and non-transient computer-readable storage medium for recognizing user intent from user input

[0001] The following descriptions relate to a wearable device, a method, and a non-transient computer-readable storage medium for recognizing user intent from user input.

[0002] Artificial intelligence models (e.g., Large Language Models (LM)) can generate responses based on prompts. To reduce hallucination, artificial intelligence models can generate responses by further utilizing information stored in external storage. This method of utilizing information stored in external storage can be referred to as Retrieval Augmented Generation (RAG).

[0003] The information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.

[0004] A wearable device is provided. The wearable device may include at least one camera configured to perform eye tracking. The wearable device may include at least one microphone configured to acquire voice input. The wearable device may include a display positioned in front of the user's eyes when the wearable device is worn by the user. The wearable device may include a memory that stores instructions and includes one or more storage media. The wearable device may include at least one processor including a processing circuit. When the instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to acquire the voice input through the at least one microphone and acquire the user's gaze information through the at least one camera while an assistant application is running. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may obtain criterion information for identifying a dataset to be used to generate a prompt based on the voice input and the gaze information. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may identify at least one dataset among a plurality of datasets stored in the memory based on the criterion information. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may generate the prompt based on the at least one dataset.When the above instructions are executed individually or collectively by the at least one processor, the wearable device may cause a response to the voice input, obtained based on the prompt, to be displayed through the display.

[0005] A method is provided by a wearable device comprising at least one camera configured to perform eye tracking, at least one microphone configured to acquire voice input, memory, and a display. The method may include the operation of acquiring the voice input through the at least one microphone and acquiring user gaze information through the at least one camera while an assistant application is running. The method may include the operation of acquiring criterion information for identifying a dataset to be used to generate a prompt based on the voice input and the gaze information. The method may include the operation of identifying at least one dataset among a plurality of datasets stored in the memory based on the criterion information. The method may include the operation of generating the prompt based on the at least one dataset. The method may include the operation of displaying a response to the voice input, acquired based on the prompt, through the display.

[0006] A non-transient computer-readable storage medium is provided. The non-transient computer-readable storage medium may store one or more programs. The one or more programs may include instructions that cause the wearable device to acquire voice input through the at least one microphone and acquire gaze information of the user through the at least one camera while an assistant application is running, when executed individually or collectively by at least one processor of the wearable device. The one or more programs may include instructions that cause the wearable device to acquire criterion information for identifying a dataset to be used to generate a prompt based on the voice input and the gaze information, when executed individually or collectively by at least one processor of the wearable device. The above one or more programs may include instructions that cause the wearable device to identify at least one data set among a plurality of data sets stored in the memory based on the reference information when executed individually or collectively by at least one processor of the wearable device. The above one or more programs may include instructions that cause the wearable device to generate the prompt based on the at least one data set when executed individually or collectively by at least one processor of the wearable device. The above one or more programs may include instructions that cause the wearable device to display a response to the voice input obtained based on the prompt through the display when executed individually or collectively by at least one processor of the wearable device.

[0007] In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components.

[0008] Figure 1 is a block diagram of an electronic device in a network environment.

[0009] FIG. 2a illustrates an example of a perspective view of a wearable device. FIG. 2b illustrates an example of one or more hardware components arranged within the wearable device.

[0010] FIGS. 3A and FIGS. 3B illustrate an example of the appearance of a wearable device.

[0011] Figure 4 illustrates the components of a wearable device.

[0012] FIGS. 5A and FIGS. 5B illustrate a system for transmitting application data to an artificial intelligence core.

[0013] FIG. 6 is a flowchart illustrating the operations of a wearable device for acquiring reference information and intention information.

[0014] FIG. 7 illustrates a situation for explaining the operations of a wearable device for acquiring reference information and intention information.

[0015] FIG. 8 is a flowchart showing the operations of a wearable device for generating a prompt.

[0016] Figure 9a illustrates a storage facility where data sets are stored.

[0017] Figure 9b illustrates a coordinate system for determining distances between reference information and data sets.

[0018] FIG. 10 is a flowchart illustrating the operations of a wearable device for displaying a response to voice input.

[0019] Figure 11 illustrates a screen displaying a response to voice input.

[0020] FIG. 12 is a flowchart illustrating the operations of a wearable device for displaying a response to voice input.

[0021] FIGS. 13a and FIGS. 13b illustrate screens displaying a response to voice input.

[0022] The terms used in this disclosure are used merely to describe specific embodiments and are not intended to limit the scope of other embodiments. A singular expression may include a plural expression unless the context clearly indicates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as generally understood by those skilled in the art described in this disclosure. Terms used in this disclosure that are defined in a general dictionary may be interpreted as having the same or similar meaning as they have in the context of the relevant technology, and are not to be interpreted in an ideal or overly formal sense unless explicitly defined in this disclosure. In some cases, even terms defined in this disclosure are not to be interpreted to exclude the embodiments of this disclosure.

[0023] In the various embodiments of the present disclosure described below, a hardware-based approach is described as an example. However, since the various embodiments of the present disclosure include techniques using both hardware and software, the various embodiments of the present disclosure do not exclude a software-based approach.

[0024] Additionally, in this disclosure, expressions of "greater than" or "less than" may be used to determine whether a specific condition is satisfied or fulfilled; however, this is merely for the purpose of expressing an example and does not exclude descriptions of "greater than" or "less than." Conditions described as "greater than" may be replaced with "greater than," conditions described as "less than" may be replaced with "less than," and conditions described as "greater than and less than" may be replaced with "greater than and less than." Furthermore, "A" to "B" below refer to at least one of elements from A (including A) to B (including B). Below, "C" and / or "D" refers to including at least one of "C" or "D," i.e., {"C", "D", "C" and "D"}.

[0025] Figure 1 is a block diagram of an electronic device in a network environment.

[0026] Referring to FIG. 1, in a network environment (100), an electronic device (101) may communicate with an electronic device (102) through a first network (198) (e.g., a short-range wireless communication network) or with at least one of an electronic device (104) or a server (108) through a second network (199) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (101) may communicate with the electronic device (104) through a server (108). According to one embodiment, the electronic device (101) may include a processor (120), memory (130), input module (150), sound output module (155), display module (160), audio module (170), sensor module (176), interface (177), connection terminal (178), haptic module (179), camera module (180), power management module (188), battery (189), communication module (190), subscriber identification module (196), or antenna module (197). In some embodiments, at least one of these components (e.g., connection terminal (178)) may be omitted from the electronic device (101), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (176), camera module (180), or antenna module (197)) may be integrated into a single component (e.g., display module (160)).

[0027] The processor (120) can control at least one other component (e.g., a hardware or software component) of the electronic device (101) connected to the processor (120) by executing software (e.g., a program (140)), for example, and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (120) can store commands or data received from other components (e.g., a sensor module (176) or a communication module (190)) in volatile memory (132), process the commands or data stored in volatile memory (132), and store the resulting data in non-volatile memory (134). According to one embodiment, the processor (120) may include a main processor (121) (e.g., a central processing unit or an application processor) or an auxiliary processor (123) that can operate independently or together with it (e.g., a graphics processing unit, a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor). For example, if the electronic device (101) includes a main processor (121) and an auxiliary processor (123), the auxiliary processor (123) may be configured to use lower power than the main processor (121) or to be specialized for a designated function. The auxiliary processor (123) may be implemented separately from the main processor (121) or as part thereof.

[0028] The auxiliary processor (123) may control at least some of the functions or states associated with at least one component of the electronic device (101) (e.g., display module (160), sensor module (176), or communication module (190)) on behalf of the main processor (121) while the main processor (121) is in an inactive (e.g., sleep) state, or together with the main processor (121) while the main processor (121) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (123) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (180) or communication module (190)). According to one embodiment, the auxiliary processor (123) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (101) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (108)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, the artificial intelligence model may include a software structure, either additionally or substantially.

[0029] The memory (130) can store various data used by at least one component of the electronic device (101) (e.g., processor (120) or sensor module (176)). The data may include, for example, software (e.g., program (140)) and input or output data for related commands. The memory (130) may include volatile memory (132) or non-volatile memory (134).

[0030] The program (140) may be stored as software in memory (130) and may include, for example, an operating system (142), middleware (144), or an application (146).

[0031] The input module (150) can receive commands or data to be used for a component of the electronic device (101) (e.g., processor (120)) from outside the electronic device (101) (e.g., user). The input module (150) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).

[0032] The sound output module (155) can output a sound signal to the outside of the electronic device (101). The sound output module (155) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.

[0033] The display module (160) can visually provide information to an external (e.g., user) of the electronic device (101). The display module (160) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (160) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.

[0034] The audio module (170) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (170) can acquire sound through the input module (150) or output sound through the sound output module (155) or an external electronic device (e.g., electronic device (102)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (101).

[0035] The sensor module (176) can detect the operating state of the electronic device (101) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (176) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.

[0036] The interface (177) may support one or more specified protocols that can be used for the electronic device (101) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (102)). According to one embodiment, the interface (177) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

[0037] The connection terminal (178) may include a connector through which the electronic device (101) can be physically connected to an external electronic device (e.g., electronic device (102)). According to one embodiment, the connection terminal (178) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).

[0038] The haptic module (179) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that can be perceived by the user through tactile or kinesthetic senses. According to one embodiment, the haptic module (179) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.

[0039] The camera module (180) can capture still images and video. According to one embodiment, the camera module (180) may include one or more lenses, image sensors, image signal processors, or flashes.

[0040] The power management module (188) can manage power supplied to the electronic device (101). According to one embodiment, the power management module (188) can be implemented, for example, as at least part of a power management integrated circuit (PMIC).

[0041] The battery (189) can supply power to at least one component of the electronic device (101). According to one embodiment, the battery (189) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

[0042] The communication module (190) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (101) and an external electronic device (e.g., electronic device (102), electronic device (104), or server (108)), and the performance of communication through the established communication channel. The communication module (190) may include one or more communication processors that operate independently of the processor (120) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (190) may include a wireless communication module (192) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (194) (e.g., LAN (local area network) communication module, or power line communication module). The corresponding communication module among these communication modules can communicate with an external electronic device (104) through a first network (198) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (199) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (192) can identify or authenticate the electronic device (101) within a communication network such as the first network (198) or the second network (199) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (196).

[0043] The wireless communication module (192) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (192) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (192) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (192) can support various requirements specified in the electronic device (101), external electronic device (e.g., electronic device (104)), or network system (e.g., second network (199)). According to one embodiment, the wireless communication module (192) may support a Peak data rate (e.g., 20 Gbps or more) for eMBB realization, loss coverage (e.g., 164 dB or less) for mMTC realization, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for URLLC realization.

[0044] An antenna module (197) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (197) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (197) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (198) or a second network (199), may be selected from the plurality of antennas, for example, by a communication module (190). A signal or power may be transmitted or received between the communication module (190) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (197).

[0045] According to various embodiments, the antenna module (197) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to a second surface (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.

[0046] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.

[0047] According to one embodiment, commands or data may be transmitted or received between the electronic device (101) and an external electronic device (104) through a server (108) connected to a second network (199). Each of the external electronic devices (102, or 104) may be the same or a different type of device as the electronic device (101). According to one embodiment, all or part of the operations performed on the electronic device (101) may be performed on one or more of the external electronic devices (102, 104, or 108). For example, if the electronic device (101) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (101) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (101). The electronic device (101) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (101) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device (104) may include an Internet of Things (IoT) device. The server (108) may be an intelligent server using machine learning and / or neural networks. According to one embodiment, the external electronic device (104) or the server (108) may be included within the second network (199).The electronic device (101) can be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

[0048] In embodiments of the present disclosure, an electronic device (e.g., the electronic device (101) of FIG. 1) may be a wearable device (101). The wearable device (101) may include a head-mounted display (HMD) that is wearable on a user's head. The wearable device (101) may be referred to as a head-mount device (HMD), a headgear electronic device, a glasses-type electronic device, a video see-through (VST) or visible see-through (VST) device, an extended reality (XR) device, a virtual reality (VR) device, and / or an augmented reality (AR) device. Although the external appearance of the wearable device (101) having the form of glasses is illustrated, embodiments of the present disclosure are not limited thereto. An example of a hardware configuration included within the wearable device (101) is described exemplarily with reference to FIG. 4. An example of the structure of a wearable device (101) that can be worn on a user's head is described with reference to FIG. 2a, FIG. 2b, FIG. 3a, and / or FIG. 3b.

[0049] FIG. 2a illustrates an example of a perspective view of a wearable device. FIG. 2b illustrates an example of one or more hardware components disposed within the wearable device. According to one embodiment, the wearable device (101) may have the form of glasses that are wearable on a part of a user's body (e.g., head). The wearable device (101) of FIG. 2a and FIG. 2b may be an example of the electronic device (101) of FIG. 1. The wearable device (101) may include a head-mounted display (HMD). For example, the housing of the wearable device (101) may include a flexible material such as rubber and / or silicone that has a shape that adheres to a part of the user's head (e.g., a part of the face covering both eyes). For example, the housing of the wearable device (101) may include one or more straps that can be twined around the user's head, and / or one or more temples that can be attached to the ears of the head.

[0050] Referring to FIG. 2a, a wearable device (101) according to one embodiment may include at least one display (250) and a frame (200) supporting at least one display (250).

[0051] According to one embodiment, a wearable device (101) may be worn on a part of a user's body. The wearable device (101) may provide augmented reality (AR), virtual reality (VR), or mixed reality (MR) that combines augmented reality and virtual reality to a user wearing the wearable device (101). For example, the wearable device (101) may display a virtual reality image provided by at least one optical device (282, 284) of FIG. 2b on at least one display (250) in response to a specified gesture of the user obtained through the motion recognition camera (260-2, 260-3) of FIG. 2b.

[0052] According to one embodiment, at least one display (250) can provide visual information to a user. For example, at least one display (250) may include a transparent or translucent lens. At least one display (250) may include a first display (250-1) and / or a second display (250-2) spaced apart from the first display (250-1). For example, the first display (250-1) and the second display (250-2) may be positioned at locations corresponding to the user's left eye and right eye, respectively.

[0053] Referring to FIG. 2b, at least one display (250) may provide visual information transmitted from external light to a user through a lens included in at least one display (250) and other visual information distinct from said visual information. The lens may be formed based on at least one of a Fresnel lens, a pancake lens, or a multi-channel lens. For example, at least one display (250) may include a first surface (231) and a second surface (232) opposite to the first surface (231). A display area may be formed on the second surface (232) of at least one display (250). When a user wears the wearable device (101), external light may be transmitted to the user by being incident on the first surface (231) and transmitted through the second surface (232). As another example, at least one display (250) can display an augmented reality image combined with a virtual reality image provided by at least one optical device (282, 284) on a real image transmitted through external light in a display area formed on the second surface (232).

[0054] In one embodiment, at least one display (250) may include at least one waveguide (233, 234) that diffracts light emitted from at least one optical device (282, 284) and transmits it to a user. At least one waveguide (233, 234) may be formed based on at least one of glass, plastic, or polymer. A nano pattern may be formed on the exterior or at least a portion of the interior of at least one waveguide (233, 234). The nano pattern may be formed based on a polygonal and / or curved grating structure. Light incident on one end of at least one waveguide (233, 234) may be propagated to the other end of at least one waveguide (233, 234) by the nano pattern. At least one waveguide (233, 234) may include at least one diffractive element (e.g., DOE (diffractive optical element), HOE (holographic optical element)) and at least one reflective element (e.g., a reflective mirror). For example, at least one waveguide (233, 234) may be placed within a wearable device (101) to guide a screen displayed by at least one display (250) to the user's eye. For example, the screen may be transmitted to the user's eye based on total internal reflection (TIR) ​​occurring within at least one waveguide (233, 234).

[0055] A wearable device (101) can analyze an object included in a real-world image collected through a camera (260-4), combine a virtual object corresponding to an object among the analyzed objects that is the target of augmented reality provision, and display it on at least one display (250). The virtual object may include at least one of text and an image regarding various information related to the object included in the real-world image. The wearable device (101) can analyze the object based on a multi-camera such as a stereo camera. For the object analysis, the wearable device (101) can perform spatial recognition (e.g., SLAM (simultaneous localization and mapping)) using a multi-camera and / or time-of-flight (ToF). A user wearing the wearable device (101) can view the image displayed on at least one display (250).

[0056] According to one embodiment, the frame (200) may be formed as a physical structure that allows the wearable device (101) to be worn on the user's body. According to one embodiment, the frame (200) may be configured so that when the user wears the wearable device (101), the first display (250-1) and the second display (250-2) can be positioned corresponding to the user's left and right eyes. The frame (200) may support at least one display (250). For example, the frame (200) may support the first display (250-1) and the second display (250-2) so that they are positioned corresponding to the user's left and right eyes.

[0057] Referring to FIG. 2a, the frame (200) may include an area (220) in which at least a portion of the frame contacts a part of the user's body when the user wears the wearable device (101). For example, the area (220) of the frame (200) in contact with a part of the user's body may include an area in contact with a part of the user's nose, a part of the user's ear, and a part of the side of the user's face that the wearable device (101) contacts. According to one embodiment, the frame (200) may include a nose pad (210) that contacts a part of the user's body. When the wearable device (101) is worn by the user, the nose pad (210) may contact a part of the user's nose. The frame (200) may include a first temple (204) and a second temple (205) that contact a different part of the user's body distinct from the part of the user's body.

[0058] For example, the frame (200) may include a first rim (201) covering at least a portion of a first display (250-1), a second rim (202) covering at least a portion of a second display (250-2), a bridge (203) positioned between the first rim (201) and the second rim (202), a first pad (211) positioned along a portion of the edge of the first rim (201) from one end of the bridge (203), a second pad (212) positioned along a portion of the edge of the second rim (202) from the other end of the bridge (203), a first temple (204) extending from the first rim (201) and fixed to a portion of the wearer's ear, and a second temple (205) extending from the second rim (202) and fixed to a portion of the ear opposite to the first. The first pad (211) and the second pad (212) may come into contact with a part of the user's nose, and the first temple (204) and the second temple (205) may come into contact with a part of the user's face and a part of the ear. The temples (204, 205) may be rotatably connected to the rim through the hinge units (206, 207) of FIG. 2B. The first temple (204) may be rotatably connected to the first rim (201) through a first hinge unit (206) positioned between the first rim (201) and the first temple (204). The second temple (205) may be rotatably connected to the second rim (202) through a second hinge unit (207) positioned between the second rim (202) and the second temple (205). According to one embodiment, a wearable device (101) can identify an external object touching the frame (200) (e.g., a user's fingertip) and / or a gesture performed by said external object by using a touch sensor, a grip sensor, and / or a proximity sensor formed on at least a portion of the surface of the frame (200).

[0059] According to one embodiment, the wearable device (101) may include hardware that performs various functions (e.g., hardware to be described later based on the block diagram of FIG. 4). For example, the hardware may include a battery module (270), an antenna module (275), at least one optical device (282, 284), speakers (e.g., speakers (255-1, 255-2)), a microphone (e.g., microphones (265-1, 265-2, 265-3)), a light-emitting module (not shown), and / or a PCB (printed circuit board) (290) (e.g., a printed circuit board). The various hardware may be placed within a frame (200).

[0060] According to one embodiment, a microphone (e.g., microphones (265-1, 265-2, 265-3)) of a wearable device (101) is positioned on at least a portion of a frame (200) to acquire a sound signal. A first microphone (265-1) positioned on a bridge (203), a second microphone (265-2) positioned on a second rim (202), and a third microphone (265-3) positioned on a first rim (201) are shown in FIG. 2b, but the number and position of the microphones (265) are not limited to the embodiment of FIG. 2b. If there are two or more microphones (265) included in the wearable device (101), the wearable device (101) can identify the direction of the sound signal by using a plurality of microphones positioned on different portions of the frame (200).

[0061] According to one embodiment, at least one optical device (282, 284) may project a virtual object onto at least one display (250) to provide various image information to a user. For example, at least one optical device (282, 284) may be a projector. At least one optical device (282, 284) may be disposed adjacent to at least one display (250) or included within at least one display (250) as part of at least one display (250). According to one embodiment, a wearable device (101) may include a first optical device (282) corresponding to a first display (250-1) and a second optical device (284) corresponding to a second display (250-2). For example, at least one optical device (282, 284) may include a first optical device (282) positioned at the edge of a first display (250-1) and a second optical device (284) positioned at the edge of a second display (250-2). The first optical device (282) may transmit light to a first waveguide (233) positioned on the first display (250-1), and the second optical device (284) may transmit light to a second waveguide (234) positioned on the second display (250-2).

[0062] In one embodiment, the camera (260) may include a shooting camera (260-4), an eye tracking camera (ET CAM) (260-1), and / or a motion recognition camera (260-2, 206-3). The shooting camera (260-4), the eye tracking camera (260-1), and the motion recognition camera (260-2, 260-3) may be positioned at different locations on the frame (200) and may perform different functions. The eye tracking camera (260-1) may output data indicating the position of the eyes or the gaze of a user wearing the wearable device (101). For example, the wearable device (101) may detect the gaze from an image containing the user's pupils obtained through the eye tracking camera (260-1). A wearable device (101) can identify an object focused by a user (e.g., a real object, and / or a virtual object) by using the user's gaze obtained through an eye-tracking camera (260-1). The wearable device (101), having identified the focused object, can perform a function (e.g., gaze interaction) for interaction between the user and the focused object. The wearable device (101) can represent a portion corresponding to the eyes of an avatar representing the user in a virtual space by using the user's gaze obtained through an eye-tracking camera (260-1). The wearable device (101) can render an image (or screen) displayed on at least one display (250) based on the position of the user's eyes. For example, the visual quality of a first area related to the gaze within the image and the visual quality of a second area distinguished from the first area (e.g., resolution, brightness, saturation, grayscale, PPI (pixels per inch)) may differ from each other. In the present disclosure, the term “resolution” is used to refer to the density of pixels of an image and / or display.The density and / or resolution of the pixels may be measured based on units of PPI and / or dpi (dots per inch) or may be parameterized. The wearable device (101) may acquire an image having a visual quality of a first region and a visual quality of a second region that matches the user's gaze by using foveated rendering. For example, if the wearable device (101) supports an iris recognition function, user authentication may be performed based on iris information acquired using an eye-tracking camera (260-1). An example in which the eye-tracking camera (260-1) is positioned toward the user's right eye is illustrated in FIG. 2b, but the embodiment is not limited thereto, and the eye-tracking camera (260-1) may be positioned solely toward the user's left eye or toward both eyes.

[0063] In one embodiment, the camera (260-4) can capture a real image or background to be matched with a virtual image in order to implement augmented reality or mixed reality content. The camera (260-4) can be used to acquire high-resolution images based on HR (high resolution) or PV (photo video). The camera (260-4) can capture an image of a specific object located at the position viewed by the user and provide the image to at least one display (250). The at least one display (250) can display a single image in which information regarding a real image or background including the image of the specific object acquired using the camera (260-4) and a virtual image provided through at least one optical device (282, 284) are superimposed. The wearable device (101) can compensate for depth information (e.g., the distance between the wearable device (101) and an external object acquired through a depth sensor) using the image acquired through the camera (260-4). The wearable device (101) can perform object recognition through an image acquired using a shooting camera (260-4). The wearable device (101) can perform a function of focusing on an object (or subject) in an image (e.g., auto focus) and / or an optical image stabilization (OIS) function (e.g., anti-shake function) using the shooting camera (260-4). The wearable device (101) can perform a pass-through function to superimpose an image acquired through the shooting camera (260-4) onto at least a portion of a screen representing a virtual space while displaying the screen representing a virtual space on at least one display (250). In one embodiment, the shooting camera (260-4) may be placed on a bridge (203) positioned between a first rim (201) and a second rim (202).

[0064] The eye tracking camera (260-1) can achieve more realistic augmented reality by tracking the gaze of a user wearing the wearable device (101), thereby matching the user's gaze with visual information provided to at least one display (250). For example, when the user looks straight ahead, the wearable device (101) can naturally display environmental information related to the user's front on at least one display (250) at the location where the user is situated. The eye tracking camera (260-1) may be configured to capture an image of the user's pupil to determine the user's gaze. For example, the eye tracking camera (260-1) may receive a gaze detection light reflected from the user's pupil and track the user's gaze based on the position and movement of the received gaze detection light. In one embodiment, the eye tracking camera (260-1) may be positioned at locations corresponding to the user's left and right eyes. For example, the eye-tracking camera (260-1) may be positioned within the first rim (201) and / or the second rim (202) to face the direction in which the user wearing the wearable device (101) is located.

[0065] A motion recognition camera (260-2, 260-3) can provide a specific event to a screen provided on at least one display (250) by recognizing the movement of the user's entire body or part thereof, such as the user's torso, hands, or face. A motion recognition camera (260-2, 260-3) can recognize the user's gesture, acquire a signal corresponding to the gesture, and provide a display corresponding to the signal to at least one display (250). A processor can identify the signal corresponding to the gesture and, based on the identification, perform a designated function. A motion recognition camera (260-2, 260-3) can be used to perform spatial recognition functions using SLAM and / or depth maps for a 6-degrees-of-freedom pose (6 dof pose). A processor can use the motion recognition camera (260-2, 260-3) to perform gesture recognition functions and / or object tracking functions. In one embodiment, a motion recognition camera (260-2, 260-3) may be placed on the first rim (201) and / or the second rim (202).

[0066] The camera (260) included in the wearable device (101) is not limited to the eye-tracking camera (260-1) and motion recognition camera (260-2, 260-3) described above. For example, the wearable device (101) can identify external objects included within the field of view (FoV) by using a camera positioned toward the user's field of view (FoV). The identification of external objects by the wearable device (101) can be performed based on a sensor for identifying the distance between the wearable device (101) and the external object, such as a depth sensor and / or a time of flight (ToF) sensor. The camera (260) positioned toward the FoV can support an autofocus (AF) function and / or an optical image stabilization (OIS) function. For example, the wearable device (101) may include a camera (260) (e.g., a face tracking camera) positioned toward the face to acquire an image including the face of a user wearing the wearable device (101).

[0067] Although not illustrated, according to one embodiment, the wearable device (101) may further include a light source (e.g., LED) that emits light toward a subject (e.g., user's eye, face, and / or an object outside the FoV) being photographed using a camera (260). The light source may include an LED of infrared wavelength. The light source may be placed in at least one of the frame (200) and hinge units (206, 207).

[0068] According to one embodiment, the battery module (270) can supply power to the electronic components of the wearable device (101). In one embodiment, the battery module (270) may be placed within the first temple (204) and / or the second temple (205). For example, the battery module (270) may be a plurality of battery modules (270). The plurality of battery modules (270) may each be placed in the first temple (204) and the second temple (205). In one embodiment, the battery module (270) may be placed at the end of the first temple (204) and / or the second temple (205).

[0069] The antenna module (275) can transmit a signal or power to the outside of the wearable device (101) or receive a signal or power from the outside. In one embodiment, the antenna module (275) may be placed within the first temple (204) and / or the second temple (205). For example, the antenna module (275) may be placed close to one side of the first temple (204) and / or the second temple (205).

[0070] The speaker (255) can output an acoustic signal to the outside of the wearable device (101). The acoustic output module may be referred to as the speaker. In one embodiment, the speaker (255) may be placed within a first temple (204) and / or a second temple (205) to be positioned adjacent to the ear of a user wearing the wearable device (101). For example, the speaker (255) may include a second speaker (255-2) positioned adjacent to the user's left ear by being placed within the first temple (204), and a first speaker (255-1) positioned adjacent to the user's right ear by being placed within the second temple (205).

[0071] A light-emitting module (not shown) may include at least one light-emitting element. The light-emitting module may emit light of a color corresponding to a specific state or emit light with an action corresponding to a specific state in order to visually provide information regarding a specific state of the wearable device (101) to the user. For example, if the wearable device (101) requires charging, it may emit red light at a constant frequency. In one embodiment, the light-emitting module may be placed on the first rim (201) and / or the second rim (202).

[0072] Referring to FIG. 2b, a wearable device (101) according to one embodiment may include a printed circuit board (PCB) (290). The PCB (290) may be included in at least one of a first temple (204) or a second temple (205). The PCB (290) may include an interposer disposed between at least two sub-PCBs. One or more hardware components included in the wearable device (101) (e.g., hardware components illustrated by different blocks in FIG. 4) may be disposed on the PCB (290). The wearable device (101) may include a flexible PCB (FPCB) for interconnecting the hardware components.

[0073] According to one embodiment, a wearable device (101) may include at least one of a gyroscope sensor, a gravity sensor, and / or an acceleration sensor for detecting the posture of the wearable device (101) and / or the posture of a body part (e.g., head) of a user wearing the wearable device (101). Each of the gravity sensor and the acceleration sensor may measure gravitational acceleration and / or acceleration based on designated three-dimensional axes (e.g., x-axis, y-axis, and z-axis) that are perpendicular to each other. The gyroscope sensor may measure the angular velocity of each of the designated three-dimensional axes (e.g., x-axis, y-axis, and z-axis). At least one of the gravity sensor, the acceleration sensor, and the gyroscope sensor may be referred to as an inertial measurement unit (IMU). According to one embodiment, the wearable device (101) can identify a user's motion and / or gesture performed to execute or stop a specific function of the wearable device (101) based on an IMU.

[0074] FIGS. 3A and 3B illustrate an example of the appearance of a wearable device. The wearable device (101) of FIGS. 3A and 3B may be an example of the electronic device (101) of FIG. 1, or the wearable device (101) of FIGS. 2A and 2B. According to one embodiment, an example of the appearance of a first surface (310) of the housing of the wearable device (101) may be illustrated in FIG. 3A, and an example of the appearance of a second surface (320) opposite to the first surface (310) may be illustrated in FIG. 3B.

[0075] Referring to FIG. 3a, according to one embodiment, a first surface (310) of a wearable device (101) may have a shape that is attachable to a part of a user's body (e.g., the user's face). Although not illustrated, the wearable device (101) may further include a strap for fixing to a part of a user's body and / or one or more temples (e.g., a first temple (204) and / or a second temple (205) of FIG. 2a and FIG. 2b). A first display (250-1) for outputting an image to the left eye among the user's two eyes, and a second display (250-2) for outputting an image to the right eye among the two eyes may be disposed on the first surface (310). The wearable device (101) may further include rubber or silicone packing formed on the first surface (310) to prevent interference by light different from light emitted from the first display (250-1) and the second display (250-2) (e.g., ambient light).

[0076] According to one embodiment, a wearable device (101) may include cameras (260-1) for photographing and / or tracking both eyes of a user adjacent to each of the first display (250-1) and the second display (250-2). The cameras (260-1) may be referenced to the eye-tracking camera (260-1) of FIG. 2B. According to one embodiment, a wearable device (101) may include cameras (260-5, 260-6) for photographing and / or recognizing a user's face. The cameras (260-5, 260-6) may be referenced to FT cameras. The wearable device (101) may control an avatar representing the user in a virtual space based on the motion of the user's face identified using the cameras (260-5, 260-6). For example, the wearable device (101) can change the texture and / or shape of a part of an avatar (e.g., a part of an avatar representing a human face) by using information obtained by cameras (260-5, 260-6) (e.g., FT cameras) and representing the facial expression of a user wearing the wearable device (101).

[0077] Referring to FIG. 3b, on a second surface (320) opposite to the first surface (310) of FIG. 3a, a camera (e.g., cameras (260-7, 260-8, 260-9, 260-10, 260-11, 260-12)), and / or a sensor (e.g., a depth sensor (330)) may be placed to acquire information related to the external environment of the wearable device (101). For example, cameras (260-7, 260-8, 260-9, 260-10) may be placed on the second surface (320) to recognize external objects. The cameras (260-7, 260-8, 260-9, 260-10) may be referenced to the motion recognition cameras (260-2, 260-3) of FIG. 2b.

[0078] For example, using cameras (260-11, 260-12), the wearable device (101) can acquire images and / or videos to be transmitted to each of the user's two eyes. Camera (260-11) may be placed on the second surface (320) of the wearable device (101) to acquire an image to be displayed through a second display (250-2) corresponding to the right eye among the two eyes. Camera (260-12) may be placed on the second surface (320) of the wearable device (101) to acquire an image to be displayed through a first display (250-1) corresponding to the left eye among the two eyes. Cameras (260-11, 260-12) may be referenced to the shooting camera (260-4) of FIG. 2B.

[0079] According to one embodiment, a wearable device (101) may include a depth sensor (330) disposed on a second surface (320) to identify the distance between the wearable device (101) and an external object. Using the depth sensor (330), the wearable device (101) may obtain spatial information (e.g., a depth map) for at least a portion of the FoV of a user wearing the wearable device (101). Although not illustrated, a microphone may be disposed on the second surface (320) of the wearable device (101) to obtain sound output from an external object. The number of microphones may be one or more, depending on the embodiment.

[0080] Hereinafter, with reference to FIG. 4, the hardware or software configuration of the wearable device (101) will be described.

[0081] Figure 4 illustrates the components of a wearable device.

[0082] In FIG. 4, the wearable device (101) may include a processor (401), memory (402), communication circuit (403), display (404), microphone (405), sensor (406), camera (407), and artificial intelligence module (408). For example, the processor (401), memory (402), communication circuit (403), display (404), microphone (405), sensor (406), camera (407), and artificial intelligence module (408) may be electrically and / or operably coupled with each other by a communication bus. The hardware components being operationally connected may mean that a direct or indirect connection between the hardware components is established via wired or wireless means so that a second hardware component (e.g., memory (402), communication circuit (403), display (404), microphone (405), sensor (406), camera (407), and / or artificial intelligence module (408)) is controlled by a first hardware component (e.g., processor (401)) among the hardware components. The artificial intelligence module (408) illustrated in FIG. 4 is illustrated as a hardware component, but the present disclosure is not limited thereto. For example, the artificial intelligence module (408) may correspond to a software component. The hardware components illustrated in FIG. 4 are illustrated based on different blocks, but the present disclosure is not limited thereto. For example, at least a portion of the hardware components illustrated in FIG. 4 (e.g., processor (401), memory (402), communication circuit (403), display (404), microphone (405), sensor (406), camera (407), and / or artificial intelligence module (408)) may be included in a single integrated circuit such as a system on chip (SoC) or a system in package (SIP).The type and number of hardware components included in the wearable device (101) are not limited to those shown in FIG. 4. For example, the wearable device (101) may include only some of the hardware components shown in FIG. 4.

[0083] In one embodiment, the wearable device (101) may include a processor (401). The processor (401) may include a hardware component for processing data based on one or more instructions. The hardware component for processing data may include, for example, an arithmetic and logic unit (ALU), a floating point unit (FPU), and a field programmable gate array (FPGA). As an example, the hardware component for processing data may include a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processing unit (DSP), a microcontroller (MCU), and / or a neural processing unit (NPU). The number of processors (401) may be one or more. For example, the processor (401) may have the structure of a multi-core processor, such as a dual core, a quad core, or a hexa core. The processor (401) of FIG. 4 can have the same content as the processor (120) of FIG. 1 applied substantially.

[0084] In one embodiment, the processor (401) may include various processing circuits and / or a number of processors. For example, the term “processor” as used herein, including in the claims, may include various processing circuits including at least one processor, and one or more of the at least one processor may be configured to perform the various functions described below in a distributed manner, individually and / or collectively. As used below, where “processor,” “at least one processor,” and “one or more processors” are described as being configured to perform various functions, these terms encompass, for example, but not limited to, situations where one processor performs some of the cited functions and other processor(s) perform other parts of the cited functions, and also situations where one processor can perform all of the cited functions. Additionally, the at least one processor may include a combination of processors that perform the enumerated / disclosed various functions, for example, in a distributed manner. The at least one processor may execute program instructions to achieve or perform the various functions.

[0085] In one embodiment, the wearable device (101) may include a memory (402). The memory (402) may include a hardware component for storing data and / or instructions that are input to or output from the processor (401). For example, the memory (402) may include volatile memory such as random-access memory (RAM) and / or non-volatile memory such as read-only memory (ROM). The volatile memory may include, for example, at least one of dynamic RAM (DRAM), static RAM (SRAM), cache RAM, and pseudo SRAM (PSRAM). The non-volatile memory may include, for example, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), flash memory, a hard disk, a compact disk, and an embedded multimedia card (eMMC).

[0086] In one embodiment, one or more instructions (or commands) representing operations and / or operations performed by the processor (401) of the wearable device (101) may be stored within the memory (402) of the wearable device (101). A set of one or more instructions may be referred to as a program, firmware, operating system, process, routine, sub-routine, and / or application. Hereinafter, the statement that an application is installed within the wearable device (101) may mean that one or more instructions provided in the form of an application are stored within the memory (402), and that one or more applications are stored in an executable format by the processor (401) of the wearable device (101). The specific details regarding the memory (402) of FIG. 4 may be substantially the same as the details regarding the memory (130) of FIG. 1.

[0087] In one embodiment, the wearable device (101) may include a communication circuit (403). The communication circuit (403) may include a circuit for supporting the transmission and / or reception of electrical signals between the wearable device (101) and an external device (e.g., a server (410)). The communication circuit (403) may include at least one of a modem, an antenna, and an O / E (optic / electronic) converter. The communication circuit (403) may support the transmission and / or reception of electrical signals based on various types of communication means such as Ethernet, Bluetooth, BLE (Bluetooth Low Energy), ZigBee, LTE (Long Term Evolution), and 5G NR (New Radio). The specific details regarding the communication circuit (403) of FIG. 4 may be substantially the same as those regarding the communication module (109) and / or antenna module (197) of FIG. 1.

[0088] In one embodiment, the wearable device (101) may include a display (404). The display (404) may include a display panel, a touch sensor, and / or a processing circuit. In one embodiment, the display panel may be used to display visual information (e.g., affordance, image, screen, object, UI (user interface), GUI (graphic user interface), and / or visual object). Specific details regarding the display (404) of FIG. 4 may be substantially the same as those regarding the display (250) of FIG. 2a through FIG. 3b.

[0089] In one embodiment, the wearable device (101) may include a microphone (405). The microphone (405) may be configured to acquire sound (e.g., voice input, utterance) acquired from outside the wearable device (101). The specific details regarding the microphone (405) of FIG. 4 may be substantially the same as the details regarding the microphone (265) of FIG. 2a to FIG. 3b.

[0090] In one embodiment, the wearable device (101) may include a sensor (406). For example, the sensor (406) may generate electrical information that can be processed by a processor (401) and / or memory (402) from non-electronic information related to the wearable device (101). For example, the sensor (406) may include a global positioning system (GPS) sensor for detecting the geographic location of the wearable device (101). For example, the specific details regarding the sensor (406) of FIG. 4 may be substantially the same as the details regarding the sensor module (176) of FIG. 1.

[0091] In one embodiment, the wearable device (101) may include a camera (407). For example, the camera (407) may include a camera for outputting data indicating the position or gaze of the eye(s) of a user wearing the wearable device (101) (e.g., a gaze tracking camera (260-1)), a motion recognition camera (e.g., a motion recognition camera (260-2, 260-3)), and a camera for capturing an object around the wearable device (101) (e.g., a shooting camera (260-4)). For example, the wearable device (101) may identify an object focused by the user's gaze through the camera (407). The specific details regarding the camera (407) of FIG. 4 may be substantially the same as the details regarding the camera (260) of FIG. 2a through FIG. 3b.

[0092] In one embodiment, the wearable device (101) may include an artificial intelligence module (408). The artificial intelligence module (408) may be a unit (function code, separate device, circuit, or set of instructions) for performing functions. For example, the artificial intelligence module (408) may be a unit (function code, separate device, circuit, or set of instructions) for automatic speech recognition (ASR), machine translation (MT), large language model (LLM), large vision model (LVM), and / or large multimodal model (LMM). Hereinafter, the artificial intelligence module (270) may be referred to as an artificial intelligence model or other terms having an equivalent technical / functional meaning.

[0093] FIGS. 5A and FIGS. 5B illustrate a system for transmitting application data to an artificial intelligence core.

[0094] Referring to FIG. 5a, the system may include application data (501), an AI SDK (artificial intelligence software development kit) (502), and an AI core (503). For example, the application data (501) may include a plurality of data sets stored in the memory (402) of the wearable device (101). Each of the plurality of data sets may include personal information of the user. The personal information may include information obtained while the wearable device (101) is used by the user (e.g., search history, card usage history, conversation history, images, videos, and / or application data). In an example that is not limited, the personal information may include analysis information on information obtained while the user is using the wearable device (101) (e.g., user preference analysis information). For example, the AI ​​SDK (502) may have a function for collecting application data (501) and transmitting it to the AI ​​core (503). The AI ​​core (503) may correspond to the artificial intelligence module (408) of FIG. 4. For example, the AI ​​core (503) may include automatic speech recognition (ASR), machine translation (MT), large language model (LLM), and / or large vision model (LVM). However, this is merely an example and the present disclosure is not limited thereto. For example, the AI ​​core (503) may further include other artificial intelligence models (e.g., large multimodal model (LMM)).

[0095] For example, the AI ​​core (503) may provide an instruction to the AI ​​SDK (502) via an API (application programming interface) that represents a data set required by the AI ​​core (503) from among the application data (501). The AI ​​SDK (502) may acquire (or collect) the data set required by the AI ​​core (503) from among the application data (501) according to the instruction. The AI ​​SDK (502) may provide the acquired (or collected) data set to the AI ​​core (503). The AI ​​core (503) may generate a response using an artificial intelligence model based on the data set acquired from the AI ​​SDK (502).

[0096] As described above, the AI ​​core (503) can obtain a data set required by the AI ​​core (503) from the application data (501) through the AI ​​SDK (502). However, since only the data set called by the AI ​​core (503) is used, the personalization of the response generated by the AI ​​core (503) may be limited. For the personalization of the response generated by the AI ​​core (503), the entire data set included in the application data (501) may be obtained. However, if the entire data set is used, the size of the prompt (or the number of tokens) may increase. If the size of the prompt (or the number of tokens) increases, the amount of computation (or computational cost, computational load) for generating the response may increase.

[0097] In FIG. 5b, a system for solving the aforementioned problems is described. Referring to FIG. 5b, the system may further include a monitoring module (510), an intent estimation module (520), and a prompt generation module (530). The term 'module' as used in FIG. 5b refers to a unit implemented in hardware or software to perform a predefined function. For example, the term 'module' may be referred to as logic, logic block, component, circuit, or other terms having an equivalent technical or functional meaning.

[0098] For example, the system may include a monitoring module (510). The monitoring module (510) may acquire real-time information in response to the execution of an assistant application. For example, the real-time information may include voice input (or utterance), gaze information, location information of the wearable device (101) identified based on a global positioning system (GPS) sensor, image information of the area around the wearable device (101) acquired through a camera (407), information on the user's status (e.g., driving status, travel status, and / or work status) acquired through a sensor (406), and / or conversation information with another user. However, this is merely an example and the present disclosure is not limited thereto. The real-time information may further include other information that the wearable device (101) can collect (or acquire). For example, the monitoring module (510) may perform preprocessing (e.g., STT (speech to text)) on real-time information and then provide the information to the intent estimation module (520). In one example, the monitoring module (510) may be referred to as an intent assistant information sensor or another term having an equivalent technical meaning.

[0099] For example, the system may include an intent estimation module (520). The intent estimation module (520) may obtain criterion information and intent information based on real-time information. For example, an artificial intelligence model (e.g., LLM, LMM) may be used to obtain criterion information and intent information based on real-time information. For example, criterion information may be used to identify a data set to be used to generate a prompt among a plurality of data sets stored in memory (402). Criterion information may consist of a set of words or a sentence. In one example, each word may represent a category (or attribute) of an object identified by gaze information. In one example, a sentence may represent a category (or attribute) of an object identified by gaze information. In one example, criterion information may be referred to as intended data select information or other terms having an equivalent technical / functional meaning. For example, intent information can be used to define a method for generating a prompt. The intent estimation module (520) can provide reference information and intent information to the prompt generation module (530).

[0100] For example, the system may include a prompt generation module (530). The prompt generation module (530) may be a unit that performs the function of changing voice input into a prompt. The prompt generation module (530) may identify at least one data set among application data (501) based on reference information.

[0101] For example, application data (501) may include multiple data sets. For example, a data set included in the application data (501) may be stored along with information for identifying the data set. The information for identifying the data set may be referred to as semantic information or other terms having an equivalent technical / functional meaning. In one example, the information for identifying the data set may be a word representing the data set. In one example, the information for identifying the data set may be a vector representing the data set. The vector may be an n-dimensional vector for representing semantic similarity between words (or sentences). In one example, the vector may be referred to as a semantic vector or other terms having an equivalent technical / functional meaning.

[0102] For example, multiple user intentions can be identified based on real-time information. For example, the prompt generation module (530) can obtain multiple reference information and multiple intention information from the intention estimation module (520). For example, the prompt generation module (530) can obtain first reference information associated with a first user intention and second reference information associated with a second user intention from the intention estimation module (520). The prompt generation module (530) can determine first distances (or semantic distances) between the first reference information and multiple data sets and second distances between the second reference information and multiple data sets. The prompt generation module (530) can identify whether the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. For example, the difference being within the threshold distance may indicate a high semantic similarity between the first reference information and the second reference information. The prompt generation module (530) can obtain one reference information based on the first reference information and the second reference information, upon identifying that the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. The reference information may include the first reference information and the second reference information. In another example, the difference being outside the threshold distance may indicate that there is low semantic similarity between the first reference information and the second reference information. The prompt generation module (530) can identify the reference information having a lower average distance among the first reference information and the second reference information, upon identifying that the difference between the average distance of the first distances and the average distance of the second distances is outside the threshold distance. In yet another example, the prompt generation module (530) can identify first data sets having words included in the first reference information among a plurality of data sets.The prompt generation module (530) can obtain a first score based on the usage frequency of the first data sets (e.g., recent usage frequency and / or usage frequency of all users). The prompt generation module (530) can identify second data sets among a plurality of data sets that have words included in the second reference information. The prompt generation module (530) can obtain a second score based on the usage frequency of the second data sets. For example, the prompt generation module (530) can identify reference information having a higher score among the first reference information and the second reference information.

[0103] For example, the prompt generation module (530) can determine distances (or semantic distances) between reference information and multiple data sets. For example, the distance may indicate the degree of relevance between reference information and data sets. In one example, the shorter the distance between reference information and data sets, the higher the relevance between reference information and data sets. In one example, the longer the distance between reference information and data sets, the lower the relevance between reference information and data sets.

[0104] For example, reference information may be composed of a set of words. For example, the prompt generation module (530) may identify at least one data set having words included in the reference information among the application data (501). For example, reference information may be composed of a set of words. In one example, the reference information may include a first word, a second word, and a third word. The prompt generation module (530) may identify a first data set having the first word, a second data set having the second word, and a third data set having the third word among a plurality of data sets.

[0105] For example, reference information may consist of a set of words. A word may represent a category (or attribute) of an object identified by gaze information. For example, the prompt generation module (530) may identify vectors representing words included in the reference information. The prompt generation module (530) may determine multiple distances between a vector representing the word and vectors representing multiple data sets. For example, the prompt generation module (530) may identify a data set having a minimum distance from the vector representing the word among multiple data sets. The prompt generation module (530) may identify data sets corresponding to words included in the reference information according to the above method. In another example, the prompt generation module (530) may identify data set(s) within a predefined distance from the vector representing the word among multiple data sets. The prompt generation module (530) may identify data sets corresponding to words included in the reference information according to the above method.

[0106] For example, reference information may consist of a sentence. The sentence may represent the category (or attribute) of an object identified by the gaze information. The prompt generation module (530) may determine multiple distances between a vector representing the sentence and vectors representing multiple data sets. For example, the prompt generation module (530) may identify a data set having a minimum distance from the vector representing the sentence among multiple data sets. In another example, the prompt generation module (530) may identify data set(s) within a predefined distance from the vector representing the sentence among multiple data sets.

[0107] For example, the prompt generation module (530) can generate a prompt based on at least one data set and / or real-time information identified by reference information. For example, the prompt generation module (530) can generate a prompt by changing voice input based on at least one data set and / or real-time information. For example, the prompt generation module (530) can provide the prompt to the AI ​​core (503) via the AI ​​SDK (502) to generate a response using an on-device artificial intelligence model. In another example, the prompt generation module (530) can provide the prompt to the communication circuit (403) to generate a response using the artificial intelligence model of the server (410).

[0108] FIG. 6 is a flowchart illustrating operations of a wearable device for acquiring reference information and intention information. The operations of FIG. 6 may be performed by the electronic device (101) of FIG. 1 or the wearable device (101) of FIG. 2a through FIG. 4. For example, at least some of the operations may be controlled by a processor (401) of the wearable device (101). In the following, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed. For example, at least two operations may be performed in parallel.

[0109] Referring to FIG. 6, in operation 601, a wearable device (101) according to one embodiment can acquire voice input and gaze information of a user wearing the wearable device (101).

[0110] In one embodiment, the wearable device (101) may execute an assistant application. For example, the assistant application may be executed based on a user's voice call and / or a press on a physical button. The assistant application may be an artificial intelligence-based application that enables interaction between the wearable device (101) and the user based on voice input (or utterance). The assistant application may be referred to as a voice assistant application, a voice assistant, or other terms having an equivalent technical or functional meaning. In response to the execution of the assistant application, the wearable device (101) may display an affordance (or object, visual object) indicating the execution of the assistant application through the display (404). However, the present disclosure is not limited thereto. For example, an assistant application can run when affordances are not displayed on the screen (e.g., in the background).

[0111] In one embodiment, the wearable device (101) may acquire a user's voice input (or speech) through a microphone (405) while an assistant application is running. The wearable device (101) may generate text corresponding to the voice input by performing speech to text (STT) on the voice input. In one example, automatic speech recognition (ASR) may be used for STT. However, this is merely an example and the present disclosure is not limited thereto. For example, other artificial intelligence models and / or algorithms may be used to convert the voice input (or speech) into text.

[0112] In one embodiment, the wearable device (101) can acquire user gaze information through the camera (407) while the assistant application is running. For example, the wearable device (101) can identify the user's gaze by performing gaze tracking (or eye tracking) on ​​the eye(s) of the user wearing the wearable device (101). The wearable device (101) can identify an object (e.g., an actual object) focused by the user using the user's gaze. The wearable device (101) can acquire gaze information based on the object focused by the user. For example, the gaze information may represent an object focused by the user's gaze.

[0113] In one embodiment, the wearable device (101) may further acquire real-time information while the assistant application is running. For example, the real-time information may include location information of the wearable device (101) identified based on a global positioning system (GPS), screen information displayed through a display (404), image information of the area around the wearable device (101) acquired through a camera (407), information on the user's status (e.g., driving status, travel status, and / or work status) acquired through a sensor (406), and / or conversation information with another user acquired through a microphone (405). However, this is merely an example and the present disclosure is not limited thereto. For example, the real-time information may further include other information that can be acquired through the microphone (405), sensor (406), and / or camera (407) in addition to the information described above.

[0114] In operation 602, a wearable device (101) according to one embodiment may acquire criterion information and intent information. For example, the wearable device (101) may acquire criterion information and intent information based on voice input, gaze information, and / or real-time information. For example, criterion information may be used to identify a data set to be used to generate a prompt from a plurality of data sets of a user stored in memory (402). For example, intent information may be used to define a method for generating a prompt.

[0115] In one embodiment, a wearable device (101) can obtain reference information based on voice input and gaze information. For example, the wearable device (101) can obtain reference information based on voice input and gaze information by using an artificial intelligence model (e.g., LLM (large language model), LMM (large multimodal model)). The artificial intelligence model can use text converted from voice input and an object displayed by the gaze information as input. The artificial intelligence model can output reference information based on the input. For example, the reference information may consist of a set of words or a sentence. In one example, each word may represent a category (or attribute) of an object identified by the gaze information. In one example, a sentence may represent a category (or attribute) of an object identified by the gaze information. In one example, if the object identified by gaze information is a smartphone, the corresponding category (or attribute) may be performance, design, camera, bezel, or UI (user interface). However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto.

[0116] In one embodiment, the wearable device (101) may acquire intent information representing a user intent based on voice input and gaze information. The intent information may be intended to define a method for generating a prompt based on voice input. For example, the wearable device (101) may acquire reference information based on voice input and gaze information using an artificial intelligence model (e.g., LLM, LMM). The artificial intelligence model may use text converted from voice input and an object displayed by the gaze information as input. The artificial intelligence model may output intent information based on the input. In one example, the intent information may represent an intent to interpret a conversation, an intent to translate text, an intent to acquire performance information about an object, an intent to acquire design information about an object, an intent to acquire price information about an object, and / or an intent to acquire usage information about an object. However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto. For example, intention information may indicate other intentions in addition to the examples described above.

[0117] In one embodiment, a plurality of user intentions may be identified based on voice input and gaze information. For example, a wearable device (101) may obtain a first reference information associated with a first user intention and a second reference information associated with a second user intention based on voice input and gaze information. The wearable device (101) may determine a first distance (or semantic distance) between the first reference information and a plurality of data sets and a second distance between the second reference information and a plurality of data sets. The wearable device (101) may identify whether the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. For example, the difference being within the threshold distance may indicate a high semantic similarity between the first reference information and the second reference information. A wearable device (101) may obtain one reference information based on a first reference information and a second reference information, upon identifying that the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. The reference information may include the first reference information and the second reference information. In another example, the difference being outside the threshold distance may indicate that there is low semantic similarity between the first reference information and the second reference information. The wearable device (101) may identify a reference information having a lower average distance among the first reference information and the second reference information, upon identifying that the difference between the average distance of the first distances and the average distance of the second distances is outside the threshold distance.

[0118] In one embodiment, a plurality of user intentions may be identified based on voice input and gaze information. For example, a wearable device (101) may obtain a first reference information associated with a first user intention and a second reference information associated with a second user intention based on voice input and gaze information. The wearable device (101) may identify first data sets having words included in the first reference information among a plurality of data sets. The wearable device (101) may obtain a first score based on the usage frequency (e.g., recent usage frequency and / or usage frequency of all users) for the first data sets. The wearable device (101) may identify second data sets having words included in the second reference information among a plurality of data sets. The wearable device (101) may obtain a second score based on the usage frequency for the second data sets. For example, a higher score may indicate a higher usage frequency for the data sets. For example, the wearable device (101) can identify reference information having a higher score among the first reference information and the second reference information. The wearable device (101) can obtain reference information with a high usage frequency by the above method.

[0119] FIG. 7 illustrates a situation for explaining the operations of a wearable device for acquiring reference information and intention information.

[0120] Referring to FIG. 7, the wearable device (101) can acquire a user's voice input (or utterance) while an assistant application is running. In the example illustrated in FIG. 7, the voice input may be "let me know more about this." However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto. For example, the wearable device (101) can acquire the user's gaze information through a camera (407) in response to the execution of the assistant application. For example, the wearable device (101) can acquire the user's gaze (702) by performing gaze tracking (or eye tracking) on ​​the eyes(s) of the user wearing the wearable device (101). A wearable device (101) can identify an object (701) focused by the user by using the user's gaze (702). The wearable device (101) can acquire gaze information based on the object (701) focused by the user. In the example illustrated in FIG. 7, the gaze information may represent an object (701) (e.g., a smartphone).

[0121] For example, a wearable device (101) can acquire criterion information and intent information based on voice input and gaze information. For example, an artificial intelligence model (e.g., LLM (large language model), LMM (large multimodal model)) may be used to acquire criterion information and intent information. The artificial intelligence model can output criterion information and intent information by using text converted from voice input and an object displayed by gaze information as input.

[0122] For example, reference information may consist of a set of words. In the example illustrated in FIG. 7, the wearable device (101) may acquire intention information indicating a user intention to acquire performance information of an object (701). When the user intention to acquire performance information of an object (701) is identified by an artificial intelligence model, the reference information may be [smartphone, performance, camera]. However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto.

[0123] For example, reference information may be composed of a sentence. In the example illustrated in FIG. 7, the wearable device (101) may acquire intention information indicating a user intention to acquire performance information of an object (701). When the user intention to acquire performance information of an object (701) is identified by an artificial intelligence model, the reference information may be [camera performance of a smartphone]. However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto.

[0124] For example, a first user intention and a second user intention may be identified by an artificial intelligence model. For example, the first user intention may be for obtaining performance information of an object (701). The first user intention may be related to first reference information. For example, the second user intention may be for obtaining design information of an object (701). The second user intention may be related to second reference information. The wearable device (101) may determine first distances (or semantic distances) between the first reference information and a plurality of data sets and second distances between the second reference information and a plurality of data sets. The wearable device (101) may identify whether the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. For example, the wearable device (101) may acquire one intention information based on a first user intention and a second user intention, upon identifying that the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. In one example, the intention information may be an 'intention to acquire information about the performance and design of a smartphone.' The reference information may be [smartphone, performance, camera, design, bezel, UI (user interface)]. In another example, the wearable device (101) may identify a reference information having a lower average distance among the first reference information and the second reference information, upon identifying that the difference between the average distance of the first distances and the average distance of the second distances is outside the threshold distance. In one example, the reference information having a lower average distance may be the first reference information. The intention information may be an 'intention to acquire information about the performance of a smartphone.' The reference information may be [smartphone, performance, camera].

[0125] FIG. 8 is a flowchart illustrating the operations of a wearable device for generating a prompt. The operations of FIG. 8 may be performed by the electronic device (101) of FIG. 1 or the wearable device (101) of FIG. 2a through FIG. 4. For example, at least some of the operations may be controlled by the processor (401) of the wearable device (101). In the following, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed. For example, at least two operations may be performed in parallel. For example, the operations of the wearable device (101) illustrated in FIG. 8 may be performed subsequently to the operations of the wearable device (101) illustrated in FIG. 6.

[0126] Referring to FIG. 8, in operation 801, a wearable device (101) according to one embodiment can identify at least one data set among a plurality of data sets based on criterion information.

[0127] In one embodiment, each of the plurality of data sets may include personal information of the user. For example, the personal information may include information obtained while the wearable device (101) is used by the user (e.g., search history, card usage history, conversation history, images, videos and / or application data). In an example that is not limited to examples, the personal information may include analysis information on information obtained while the user is using the wearable device (101) (e.g., user preference analysis information).

[0128] In one embodiment, a plurality of data sets may be stored in the memory (402) of the wearable device (101). For example, the memory (402) may include internal storage and / or external storage. Each of the plurality of data sets may be stored in the internal storage and / or external storage. For example, each of the plurality of data sets may be stored along with information for identifying the data set. In one example, the information for identifying the data set may be referred to as semantic information or other terms having an equivalent technical meaning. In one example, the information for identifying the data set may be a word representing the data set. In one example, the information for identifying the data set may be a vector representing the data set. The vector may be an n-dimensional vector for representing semantic similarity between words (or sentences). In one example, a vector may be referred to as a semantic vector or another term having an equivalent technical meaning.

[0129] In one embodiment, the wearable device (101) can determine distances (or semantic distances) between reference information and a plurality of data sets. For example, the distance may represent the degree of relevance between the reference information and the data sets. In one example, the shorter the distance between the reference information and the data sets, the higher the relevance between the reference information and the data sets. In one example, the longer the distance between the reference information and the data sets, the lower the relevance between the reference information and the data sets.

[0130] For example, reference information may consist of a set of words. The words may represent the category (or attribute) of an object identified by gaze information. In one example, if the object is a smartphone, the category (or attribute) may be performance, design, camera, bezel, and / or user interface. For example, a wearable device (101) may identify vectors representing words included in the reference information. The wearable device (101) may determine multiple distances between a vector representing a word and vectors representing multiple data sets. For example, the wearable device (101) may identify a data set having a minimum distance from the vector representing a word among multiple data sets. The wearable device (101) may identify data sets corresponding to words included in the reference information according to the above method. In another example, the wearable device (101) can identify data set(s) that are within a predefined distance from a vector representing a word among a plurality of data sets. The wearable device (101) can identify data sets corresponding to words included in reference information according to the method above. In an example that is not limited, the wearable device (101) can identify at least one data set having words included in reference information among a plurality of data sets. In one example, the reference information may include a first word, a second word, and a third word. The wearable device (101) can identify a first data set having the first word, a second data set having the second word, and a third data set having the third word from a plurality of data sets.

[0131] For example, reference information may consist of a sentence. The sentence may represent a category (or attribute) of an object identified by gaze information. The wearable device (101) may determine multiple distances between a vector representing a sentence and vectors representing multiple data sets. For example, the wearable device (101) may identify a data set having a minimum distance from the vector representing a sentence among multiple data sets. In another example, the wearable device (101) may identify data set(s) within a predefined distance from the vector representing a sentence among multiple data sets.

[0132] In operation 802, a wearable device (101) according to one embodiment may generate a prompt based on at least one data set and intent information. For example, to generate a prompt, at least one data set identified by reference information among a plurality of data sets stored in memory (402) may be used. At least one data set may be associated with a user's preference. For example, to generate a prompt, intent information identified based on voice input and / or gaze information may be used. The intent information may represent a user intent. In an example that is not limited, real-time information may additionally be used to generate a prompt.

[0133] Figure 9a illustrates a storage facility where data sets are stored.

[0134] Referring to FIG. 9a, the memory (402) of the wearable device (101) may include an internal storage (910) and an external storage (920). For example, the internal storage (910) and the external storage (920) may be Android-based. For example, the internal storage (910) may include a public space (911) where system-related information (e.g., operating system (OS) information) is stored, a space for a first application (912), a space for a second application (913), a space for a third application (914), and a space for a fourth application (915). For example, the external storage (920) may include a first public space (921) where information related to photos is stored, a second public space (922) where information related to videos is stored, a third public space (923) where information related to music is stored, a fourth public space (924) where download information is stored, a space for a first application (925), a space for a second application (926), a space for a third application (927), and a space for a fourth application (928).

[0135] For example, each of the multiple data sets may include personal information of the user. The personal information may include information obtained while the wearable device (101) is used by the user (e.g., search history, card usage history, conversation history, images, videos, music, and / or application data). In examples that are not limited to this, the personal information may include analysis information on information obtained while the user is using the wearable device (101) (e.g., user preference analysis information).

[0136] For example, a plurality of data sets may be stored in the memory (402) of the wearable device (101). For example, the memory (402) may include an internal storage (910) and / or an external storage (920). Each of the plurality of data sets may be stored in at least some of the spaces of the internal storage and / or the external storage. For example, each of the plurality of data sets may be stored along with information for identifying the data set. In one example, the information for identifying the data set may be referred to as semantic information or other terms having an equivalent technical meaning. In one example, the information for identifying the data set may be a word representing the data set. In one example, the information for identifying the data set may be a vector representing the data set. The vector may be an n-dimensional vector for representing semantic similarity between words (or sentences). In one example, a vector may be referred to as a semantic vector or another term having an equivalent technical meaning.

[0137] Figure 9b illustrates a coordinate system for determining distances between reference information and data sets.

[0138] Referring to FIG. 9b, the first vector (931) may represent a vector of reference information. The reference information may be used to identify a data set to be used to generate a prompt. For example, a plurality of data sets stored in memory (402) may include a first data set and a second data set. The second vector (932) may represent a vector of the first data set. The third vector (933) may represent a vector of the second data set. Although FIG. 9b describes a case where there are two plurality of data sets stored in memory (402), this is merely an example and the present disclosure is not limited thereto. The number of plurality of data sets may exceed two.

[0139] For example, a wearable device (101) can determine a first distance between a vector (931) of reference information and a vector (932) of a first data set. For example, the first distance may represent the degree of relevance between the reference information and the first data set. In one example, the shorter the first distance, the higher the relevance between the reference information and the first data set. In one example, the longer the first distance, the lower the relevance between the reference information and the first data set.

[0140] For example, a wearable device (101) can determine a second distance between a vector (931) of reference information and a vector (933) of a second data set. For example, the second distance may indicate the degree of correlation between the reference information and the second data set. In one example, the shorter the second distance, the higher the correlation between the reference information and the first data set. In one example, the longer the second distance, the lower the correlation between the reference information and the second data set.

[0141] For example, the wearable device (101) may identify a first data set as a data set to be used to generate a prompt, based on the identification that a first distance associated with a first data set is shorter than a second distance associated with a second data set. In another example, the wearable device (101) may identify a second data set as a data set to be used to generate a prompt, based on the identification that a first distance associated with a first data set is longer than a second distance associated with a second data set. In examples that are not limited, the wearable device (101) may identify data sets within a predefined distance from a vector of reference information (931). For example, the wearable device (101) may identify a first data set and a second data set within a predefined distance from a vector of reference information (931). The first data set and the second data set may be used to generate a prompt.

[0142] FIG. 10 is a flowchart illustrating the operations of a wearable device for displaying a response to voice input. The operations of FIG. 10 may be performed by the electronic device (101) of FIG. 1 or the wearable device (101) of FIG. 2a through FIG. 4. For example, at least some of the operations may be controlled by the processor (401) of the wearable device (101). In the following, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed. For example, at least two operations may be performed in parallel. For example, the operations of the wearable device (101) illustrated in FIG. 10 may be performed subsequently to the operations of the wearable device (101) illustrated in FIG. 8.

[0143] Referring to FIG. 10, in operation 1001, a wearable device (101) according to one embodiment can obtain a response to voice input based on a prompt.

[0144] In one embodiment, the wearable device (101) may obtain (or generate) a response according to a prompt using an on-device artificial intelligence model. As described in FIGS. 5a through 9b, the prompt may be generated using only a subset of data identified based on reference information among the entire datasets stored in memory (402). The size (or number of tokens) of the prompt generated based on the subset of data may be relatively small. Since the size (or number of tokens) of the prompt is small, the amount of computation (or computational cost, computational load) required to generate a response based on the prompt may be relatively small. Since the amount of computation required to generate a response is relatively small, the on-device artificial intelligence model of the wearable device (101) may be used to generate a response based on the prompt. For example, an on-device artificial intelligence model may output a response to voice input based on a prompt. In one example, the on-device artificial intelligence model may be an LLM or an LMM. However, this is merely an example and the present disclosure is not limited thereto. For example, other artificial intelligence models (e.g., machine translation (MT), large vision model (LVM)) may be used to generate a response according to a prompt.

[0145] In one embodiment, the wearable device (101) may transmit a request message containing a prompt to the server (410). As described in FIGS. 5a through 9b, the prompt may be generated using only a subset of data identified based on reference information from among all data sets stored in memory (402). The size of the prompt (or the number of tokens) generated based on the subset of data may be relatively small. Since the size of the prompt (or the number of tokens) is small, network congestion between the wearable device (101) and the server (410) may be reduced. Additionally, by minimizing the user's personal information included in the prompt, security issues arising from the leakage of personal information may be prevented. For example, the response to voice input may be generated based on the artificial intelligence model of the server (410) (e.g., LLM, LMM, MT, and / or LVM). The wearable device (101) can obtain (or receive) a response message from the server (410) that includes a response according to the prompt in response to a request message.

[0146] In operation 1002, a wearable device (101) according to one embodiment may display a response to a voice input through a display (404). For example, the wearable device (101) may output a response to a voice input through a speaker (255). Text-to-speech (TTS) may be used to output a response to a voice input through the speaker (255).

[0147] FIG. 11 illustrates a screen displaying a response to voice input. Some operations of the wearable device (101) described in FIG. 11 may be performed subsequently to the operations of the wearable device (101) described in FIG. 7.

[0148] Referring to FIG. 11, the wearable device (101) can acquire a user's voice input (or utterance) while an assistant application is running. In the example illustrated in FIG. 11, the voice input may be "let me know more about this." In response to the execution of the assistant application, the wearable device (101) can acquire gaze information indicating an object (701) (e.g., a smartphone). Based on the voice input and gaze information, the wearable device (101) can acquire criterion information for identifying a data set to be used to generate a prompt and intent information for defining a method to generate a prompt. In the example illustrated in FIG. 11, the criterion information is [performance of the smartphone], and the intent information may indicate a user's intent to acquire performance information. The wearable device (101) can identify a data set from a plurality of data sets stored in memory (402) based on reference information. In one example, the wearable device (101) can identify a data set having a minimum semantic distance from the reference information among the plurality of data sets. In the example illustrated in FIG. 11, the data set identified based on the reference information may be a user's search history regarding camera performance. The wearable device (101) can generate a prompt based on the data set and intent information. For example, the prompt may be as shown in [Table 1] below.

[0149] Please provide information regarding the performance of smartphone B released by company A, specifically concerning its camera pixels.

[0150] For example, the wearable device (101) can obtain a response according to a prompt by using an on-device artificial intelligence model (e.g., a large language model (LLM), a large multimodal model (LMM)). In another example, the wearable device (101) can send a request message containing a prompt to the server (410). The wearable device (101) can receive (or obtain) a response message containing a response according to a prompt from the server (410) in response to the request message.

[0151] For example, the wearable device (101) may display a response to voice input through a display (404). In the example illustrated in FIG. 11, the response displayed through the display (404) may be, "The smartphone is equipped with a 200MP main rear camera, a 10MP telephoto camera with 3x zoom, and a 50MP telephoto camera with 5x zoom." However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto.

[0152] FIG. 12 is a flowchart illustrating the operations of a wearable device for displaying a response to voice input. The operations of FIG. 12 may be performed by the electronic device (101) of FIG. 1 or the wearable device (101) of FIG. 2a through FIG. 4. For example, at least some of the operations may be controlled by a processor (401) of the wearable device (101). In the following, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed. For example, at least two operations may be performed in parallel.

[0153] Referring to FIG. 12, in operation 1201, a wearable device (101) according to one embodiment can acquire voice input and gaze information of a user wearing the wearable device (101).

[0154] In one embodiment, the wearable device (101) may execute an assistant application. For example, the assistant application may be executed based on a user's voice call and / or a press on a physical button. The assistant application may be an artificial intelligence-based application that enables interaction between the wearable device (101) and the user based on voice input (or utterance). The assistant application may be referred to as a voice assistant application, a voice assistant, or other terms having an equivalent technical or functional meaning. In response to the execution of the assistant application, the wearable device (101) may display an affordance (or object, visual object) indicating the execution of the assistant application through the display (404). However, the present disclosure is not limited thereto. For example, an assistant application can run when affordances are not displayed on the screen (e.g., in the background).

[0155] In one embodiment, the wearable device (101) may acquire a user's voice input (or speech) through a microphone (405) while an assistant application is running. The wearable device (101) may generate text corresponding to the voice input by performing speech to text (STT) on the voice input. In one example, automatic speech recognition (ASR) may be used for STT. However, this is merely an example and the present disclosure is not limited thereto. For example, other artificial intelligence models and / or algorithms may be used to convert the voice input (or speech) into text.

[0156] In one embodiment, the wearable device (101) can acquire user gaze information through the camera (407) while the assistant application is running. For example, the wearable device (101) can identify the user's gaze by performing gaze tracking (or eye tracking) on ​​the eye(s) of the user wearing the wearable device (101). The wearable device (101) can identify an object (e.g., an actual object) focused by the user using the user's gaze. The wearable device (101) can acquire gaze information based on the object focused by the user. For example, the gaze information may represent an object focused by the user's gaze.

[0157] In one embodiment, the wearable device (101) may further acquire real-time information while the assistant application is running. For example, the real-time information may include location information of the wearable device (101) identified based on a global positioning system (GPS), screen information displayed through a display (404), image information of the area around the wearable device (101) acquired through a camera (407), information on the user's status (e.g., driving status, travel status, and / or work status) acquired through a sensor (406), and / or conversation information with another user acquired through a microphone (405). However, this is merely an example and the present disclosure is not limited thereto. For example, the real-time information may further include other information that can be acquired through the microphone (405), sensor (406), and / or camera (407) in addition to the information described above.

[0158] In operation 1202, a wearable device (101) according to one embodiment can obtain reference information based on voice input and gaze information. For example, the reference information can be used to identify a dataset to be used to generate a prompt from a plurality of data sets of a user stored in memory (402).

[0159] In one embodiment, a wearable device (101) can obtain reference information based on voice input and gaze information. For example, the wearable device (101) can obtain reference information based on voice input and gaze information by using an artificial intelligence model (e.g., LLM (large language model), LMM (large multimodal model)). The artificial intelligence model can use text converted from voice input and an object displayed by the gaze information as input. The artificial intelligence model can output reference information based on the input. For example, the reference information may consist of a set of words or a sentence. In one example, each word may represent a category (or attribute) of an object identified by the gaze information. In one example, a sentence may represent a category (or attribute) of an object identified by the gaze information. In one example, if the object identified by gaze information is a smartphone, the corresponding category (or attribute) may be performance, design, camera, bezel, or UI (user interface). However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto.

[0160] In one embodiment, the wearable device (101) may further acquire intent information representing a user intent based on voice input and gaze information. The intent information may be intended to define a method for generating a prompt based on voice input. In one example, the intent information may represent an intent to interpret a conversation, an intent to translate text, an intent to obtain performance information about a device, an intent to obtain price information about a device, and / or an intent to obtain use information of an object. However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto. For example, the intent information may further represent other intents in addition to the examples described above.

[0161] In one embodiment, a plurality of user intentions may be identified based on voice input and gaze information. For example, a wearable device (101) may identify a first reference information associated with a first user intention and a second reference information associated with a second user intention based on voice input and gaze information. The wearable device (101) may determine a first distance (or semantic distance) between the first reference information and a plurality of data sets and a second distance between the second reference information and a plurality of data sets. The wearable device (101) may identify whether the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. For example, the wearable device (101) may obtain one reference information based on the first reference information and the second reference information upon identifying that the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. The reference information may include the first reference information and the second reference information. In another example, the wearable device (101) can identify reference information having a lower average distance among the first reference information and the second reference information based on the identification that the difference between the average distance of the first distances and the average distance of the second distances is outside the threshold distance.

[0162] In one embodiment, a plurality of user intentions may be identified based on voice input and gaze information. For example, a wearable device (101) may identify a first reference information associated with a first user intention and a second reference information associated with a second user intention based on voice input and gaze information. The wearable device (101) may identify first data sets having words included in the first reference information among a plurality of data sets. The wearable device (101) may obtain a first score based on the usage frequency (e.g., recent usage frequency and / or usage frequency of all users) for the first data sets. The wearable device (101) may identify second data sets having words included in the second reference information among a plurality of data sets. The wearable device (101) may obtain a second score based on the usage frequency for the second data sets. For example, a wearable device (101) can identify reference information having a higher score among the first reference information and the second reference information.

[0163] In operation 1203, a wearable device (101) according to one embodiment can identify at least one data set among a plurality of data sets stored in memory (402) based on reference information.

[0164] In one embodiment, each of the plurality of data sets may include personal information of the user. For example, the personal information may include information obtained while the wearable device (101) is used by the user (e.g., search history, card usage history, conversation history, images, and / or application data). In an example that is not limited to examples, the personal information may include analysis information regarding information obtained while the user is using the wearable device (101) (e.g., user preference analysis information).

[0165] In one embodiment, a plurality of data sets may be stored in the memory (402) of the wearable device (101). For example, the memory (402) may include internal storage and / or external storage. Each of the plurality of data sets may be stored in the internal storage and / or external storage. For example, each of the plurality of data sets may be stored along with information for identifying the data set. In one example, the information for identifying the data set may be referred to as semantic information or other terms having an equivalent technical meaning. In one example, the information for identifying the data set may be a word representing the data set. In one example, the information for identifying the data set may be a vector representing the data set. The vector may be an n-dimensional vector for representing semantic similarity between words (or sentences). In one example, a vector may be referred to as a semantic vector or another term having an equivalent technical meaning.

[0166] In one embodiment, the wearable device (101) can determine distances (or semantic distances) between reference information and a plurality of data sets. For example, the distance may represent the degree of relevance between the reference information and the data sets. In one example, the shorter the distance between the reference information and the data sets, the higher the relevance between the reference information and the data sets. In one example, the longer the distance between the reference information and the data sets, the lower the relevance between the reference information and the data sets.

[0167] For example, reference information may consist of a set of words. A word may represent a category (or attribute) of an object identified by gaze information. For example, a wearable device (101) may identify vectors representing words included in reference information. The wearable device (101) may determine multiple distances between a vector representing a word and vectors representing multiple data sets. For example, the wearable device (101) may identify a data set having a minimum distance from a vector representing a word among multiple data sets. The wearable device (101) may identify data sets corresponding to words included in reference information according to the above method. In another example, the wearable device (101) may identify data set(s) within a predefined distance from a vector representing a word among multiple data sets. The wearable device (101) may identify data sets corresponding to words included in reference information according to the above method. In a non-limiting example, the wearable device (101) can identify at least one data set having words included in reference information among a plurality of data sets. In one example, the reference information may include a first word, a second word, and a third word. The wearable device (101) can identify a first data set having the first word, a second data set having the second word, and a third data set having the third word from a plurality of data sets.

[0168] For example, reference information may consist of a sentence. The sentence may represent a category (or attribute) of an object identified by gaze information. The wearable device (101) may determine multiple distances between a vector representing a sentence and vectors representing multiple data sets. For example, the wearable device (101) may identify a data set having a minimum distance from the vector representing a sentence among multiple data sets. In another example, the wearable device (101) may identify data set(s) within a predefined distance from the vector representing a sentence among multiple data sets.

[0169] In operation 1204, a wearable device (101) according to one embodiment may generate a prompt based on at least one data set identified by reference information. For example, the wearable device (101) may further utilize intent information and / or real-time information obtained based on voice input and gaze information to generate a prompt. For example, at least one data set identified by reference information may be associated with the user's preference. For example, intent information may be used to define a method for generating a prompt.

[0170] In operation 1205, a wearable device (101) according to one embodiment may display a response to voice input through a display (404) based on a prompt.

[0171] In one embodiment, the wearable device (101) can generate a response according to a prompt using an artificial intelligence model. As described above, the prompt may be generated using only a subset of data identified based on reference information among the entire datasets stored in memory (402). The size (or number of tokens) of the prompt generated based on the subset of data may be relatively small. Since the size (or number of tokens) of the prompt is small, the amount of computation (or computational cost, computational load) required to generate a response based on the prompt may be relatively small. Since the amount of computation required to generate a response is relatively small, an on-device artificial intelligence model of the wearable device (101) may be used to generate a response based on the prompt. For example, an on-device artificial intelligence model may output a response to voice input based on a prompt. In one example, the on-device artificial intelligence model may be an LLM or an LMM. However, this is merely an example and the present disclosure is not limited thereto. For example, other artificial intelligence models (e.g., machine translation (MT), large vision model (LVM)) may be used to generate a response according to a prompt.

[0172] In one embodiment, the wearable device (101) may transmit a request message containing a prompt to the server (410). As described above, the prompt may be generated using only a subset of data identified based on reference information from among the entire datasets stored in memory (402). The size of the prompt (or the number of tokens) generated based on the subset of data may be relatively small. Since the size of the prompt (or the number of tokens) is small, network congestion between the wearable device (101) and the server (410) may be reduced. Additionally, by minimizing the user's personal information included in the prompt, security issues arising from the leakage of personal information may be prevented. For example, the response to voice input may be generated based on the artificial intelligence model of the server (410) (e.g., LLM, LMM, MT, and / or LVM). The wearable device (101) can receive a response message from the server (410) that includes a response according to the prompt in response to a request message.

[0173] In one embodiment, the wearable device (101) may display a response to a voice input through a display (404). The wearable device (101) may output a response to a voice input through a speaker (255). Text-to-speech (TTS) may be performed to output a response to a voice input through the speaker (255).

[0174] FIGS. 13a and FIGS. 13b illustrate screens displaying responses to voice input. The wearable device (101) of FIG. 13a may be different from the wearable device (101) of FIG. 13b. For example, the wearable device (101) of FIG. 13a may be owned by a user (1301), and the wearable device (101) of FIG. 13b may be owned by a user (1302). FIG. 13a and FIG. 13b describe examples in which different responses are output to the same voice input depending on the user's personal information stored in memory (402).

[0175] Referring to FIG. 13a, in a situation (1310), the wearable device (101) can obtain voice input (1311) (or utterance) from a user (1301) while an assistant application is running. In the example illustrated in FIG. 13a, the voice input (1311) may be "can you recommend some places to visit." The wearable device (101) can obtain real-time information in response to the execution of the assistant application. For example, real-time information may include image information around the wearable device (101) obtained through a camera (407), location information of the wearable device (101) identified by a sensor (406) (e.g., a GPS (global positioning system) sensor), and gaze information of the user (1301) obtained through the camera (407). Based on voice input and real-time information, the wearable device (101) may obtain criterion information for identifying a data set to be used to generate a prompt and intent information for defining a method to generate a prompt. In the example illustrated in FIG. 13a, the criterion information is [a recommendation for a place], and the intent information may represent a user intent to obtain recommendation information for a place. Based on the criterion information, the wearable device (101) may identify a data set from a plurality of data sets stored in memory (402). In one example, the wearable device (101) can identify a data set having a minimum semantic distance from reference information among a plurality of data sets. In the example illustrated in FIG. 13a, the data set identified based on reference information may be a search history for game B.The wearable device (101) can generate prompts based on a data set, intent information, and / or real-time information. For example, the prompts may be as shown in [Table 2] below. For example, a user's voice input (e.g., recommend places worth visiting) can be changed to a prompt as shown in [Table 2] below.

[0176] Please recommend a place near Street A that is currently operating in relation to Game B.

[0177] In situation (1320), for example, the wearable device (101) can obtain a response according to a prompt using an on-device artificial intelligence model (e.g., a large language model (LLM), a large multimodal model (LMM)). In another example, the wearable device (101) can send a request message containing a prompt to the server (410). The wearable device (101) can receive (or obtain) a response message containing a response according to a prompt from the server (410) in response to the request message.

[0178] For example, the wearable device (101) may display a response (1321) to voice input through a display (404). In the example illustrated in FIG. 13a, the response (1321) displayed through the display (404) may be, 'You are currently on Street A. Near Street A, there is a Center C where a tournament for Game B is being held. Center C is open from 10:00 to 18:00 and offers an opportunity to compete with others.' However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto.

[0179] Referring to FIG. 13b, in a situation (1330), the wearable device (101) can acquire voice input (1331) (or speech) from the user (1302) while the assistant application is running. In the example illustrated in FIG. 13b, the voice input (1331) may be "Recommend places worth visiting." The wearable device (101) can acquire real-time information in response to the execution of the assistant application. For example, the real-time information may include image information around the wearable device (101) acquired through the camera (407), location information of the user (1302) identified by the sensor (406) (e.g., GPS sensor), and gaze information of the user (1302) acquired through the camera (407). A wearable device (101) can obtain criterion information for identifying a data set to be used to generate a prompt based on voice input and real-time information, and intent information for defining a method to generate a prompt. In the example illustrated in FIG. 13b, the criterion information is [a recommendation for a place], and the intent information may represent a user intent to obtain recommendation information for a place. The wearable device (101) can identify a data set from a plurality of data sets stored in memory (402) based on the criterion information. In one example, the wearable device (101) can identify a data set having a minimum semantic distance from the criterion information among the plurality of data sets. In the example illustrated in FIG. 13b, the data set identified based on the criterion information may be log information for a social network service (SNS) application and / or log information for a photo application. The wearable device (101) can generate a prompt based on a data set, intention information, and / or real-time information.For example, the prompt may be as shown in [Table 3] below. For example, the user's voice input (e.g., recommend places worth visiting) can be changed to a prompt as shown in [Table 3] below.

[0180] Please recommend some good places around Street A to take photos of people.

[0181] In situation (1340), for example, the wearable device (101) can obtain a response according to a prompt using an on-device artificial intelligence model (e.g., a large language model (LLM), a large multimodal model (LMM)). In another example, the wearable device (101) can send a request message containing a prompt to the server (410). The wearable device (101) can receive (or obtain) a response message containing a response according to a prompt from the server (410) in response to the request message.

[0182] For example, the wearable device (101) may display a response (1341) to voice input through a display (404). In the example illustrated in FIG. 13b, the response (1341) displayed through the display (404) may be, "You are currently on Street A. Around Street A, there is a restaurant D that is frequently uploaded to social media. Restaurant D is famous for having a delicious menu E." However, this is merely an example for illustrative purposes and the present disclosure is not limited thereto.

[0183] The technical problems to be solved in this disclosure are not limited to those mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art to which this disclosure pertains.

[0184] A wearable device as described above may include at least one camera configured to perform eye tracking. The wearable device may include at least one microphone configured to acquire voice input. The wearable device may include a display positioned in front of the user's eyes when the wearable device is worn by the user. The wearable device may include a memory that stores instructions and includes one or more storage media. The wearable device may include at least one processor including a processing circuit. When the instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to acquire the voice input through the at least one microphone and acquire the user's gaze information through the at least one camera while an assistant application is running. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may obtain criterion information for identifying a dataset to be used to generate a prompt based on the voice input and the gaze information. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may identify at least one dataset among a plurality of datasets stored in the memory based on the criterion information. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may generate the prompt based on the at least one dataset.When the above instructions are executed individually or collectively by the at least one processor, the wearable device may cause a response to the voice input, obtained based on the prompt, to be displayed through the display.

[0185] For example, when the above instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to identify an object focused by the user's gaze based on the gaze information. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to obtain reference information including at least one attribute among the attributes of the object identified based on the voice input.

[0186] For example, when the above instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to identify the at least one data set representing the at least one attribute among the plurality of data sets.

[0187] For example, when the above instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to determine distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets. When the above instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to identify the at least one data set by identifying a data set having a minimum distance from the vector representing the reference information among the plurality of data sets.

[0188] For example, when the instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to determine distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets. When the instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to identify the at least one data set among the plurality of data sets that is within a predefined distance from the vector representing the reference information.

[0189] For example, the reference information may include first reference information associated with a first user intention and second reference information associated with a second user intention. When the instructions are executed individually or collectively by the at least one processor, the wearable device may cause first distances between a first vector representing the first reference information and a plurality of vectors representing the plurality of data sets. When the instructions are executed individually or collectively by the at least one processor, the wearable device may cause second distances between a second vector representing the second reference information and a plurality of vectors representing the plurality of data sets.

[0190] For example, when the instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to determine whether the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. When the instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to identify the at least one data set among the plurality of data sets using the reference information generated based on the first reference information and the second reference information, in accordance with the determination that the difference is within the threshold distance. When the instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to identify the at least one data set among the plurality of data sets using the reference information having the lower average distance among the first reference information and the second reference information, in accordance with the determination that the difference is outside the threshold distance.

[0191] For example, when the instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to acquire intent information indicating a user intent based on the voice input and the gaze information. When the instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to generate the prompt based on the at least one data set and the intent information.

[0192] For example, when the instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to transmit the prompt generated based on the at least one data set to the server device. When the instructions are executed individually or collectively by the at least one processor, the wearable device may cause the wearable device to obtain the response to the voice input generated based on the prompt from the server device.

[0193] For example, when the instructions are executed individually or collectively by the at least one processor, the wearable device may be caused to obtain the response according to the prompt generated based on the at least one data set using an on-device artificial intelligence model.

[0194] A method performed by a wearable device comprising at least one camera configured to perform the aforementioned eye tracking, at least one microphone configured to acquire voice input, memory, and a display may include the operation of acquiring the voice input through the at least one microphone and acquiring user gaze information through the at least one camera while an assistant application is running. The method may include the operation of acquiring criterion information for identifying a dataset to be used to generate a prompt based on the voice input and the gaze information. The method may include the operation of identifying at least one dataset among a plurality of datasets stored in the memory based on the criterion information. The method may include the operation of generating the prompt based on the at least one dataset. The method may include the operation of displaying a response to the voice input, obtained based on the prompt, through the display.

[0195] For example, the above method may include an operation of identifying an object focused by the user's gaze based on the gaze information. The above method may include an operation of obtaining reference information including at least one attribute identified based on the voice input among the attributes of the object.

[0196] For example, the above method may include an operation of identifying the at least one data set representing the at least one attribute among the plurality of data sets.

[0197] For example, the above method may include an operation of determining distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets. The above method may include an operation of identifying the at least one data set by identifying a data set having a minimum distance from the vector representing the reference information among the plurality of data sets.

[0198] For example, the above method may include an operation of determining distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets. The above method may include an operation of identifying at least one data set among the plurality of data sets that is within a predefined distance from the vector representing the reference information.

[0199] For example, the reference information may include first reference information associated with a first user intention and second reference information associated with a second user intention. The method may include an operation of determining first distances between a first vector representing the first reference information and a plurality of vectors representing the plurality of data sets. The method may include an operation of determining second distances between a second vector representing the second reference information and a plurality of vectors representing the plurality of data sets.

[0200] For example, the above method may include an operation of determining whether the difference between the average distance of the first distances and the average distance of the second distances is within a threshold distance. The above method may include an operation of identifying at least one data set among the plurality of data sets using the reference information generated based on the first reference information and the second reference information, in accordance with the determination that the difference is within the threshold distance. The above method may include an operation of identifying at least one data set among the plurality of data sets using the reference information having a lower average distance among the first reference information and the second reference information, in accordance with the determination that the difference is outside the threshold distance.

[0201] For example, the above method may include an operation of obtaining intent information indicating a user intent based on the voice input and the gaze information. The above method may include an operation of generating the prompt based on the at least one data set and the intent information.

[0202] For example, the above method may include the operation of transmitting the prompt generated based on the at least one data set to a server device. The above method may include the operation of obtaining the response to the voice input, generated based on the prompt, from the server device.

[0203] For example, the above method may include the operation of obtaining the response according to the prompt generated based on the at least one data set using an on-device artificial intelligence model.

[0204] The effects obtainable from the present disclosure are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art to which the present disclosure belongs.

[0205] For one or more embodiments, at least one of the components described in one or more of the prior art drawings may be configured to perform one or more operations, techniques, processes and / or methods as described in the present disclosure. For example, a processor (e.g., a baseband processor) described in the present disclosure in relation to one or more of the prior art drawings may be configured to operate according to one or more examples described in the present disclosure. As another example, circuits associated with user equipment (UE), a base station, a network element, etc., as described above in relation to one or more of the prior art drawings may be configured to operate according to one or more examples described herein.

[0206] Any of the embodiments described above may be combined with any other embodiment (or combination of embodiments) unless otherwise explicitly stated. The foregoing description of one or more embodiments is for illustrative and explanatory purposes only, and is not intended to limit or exhaust the scope of the embodiments in the exact form disclosed. Modifications and variations are possible in light of the foregoing teachings or may be obtained from the practice of various embodiments.

[0207] The electronic devices according to the various embodiments disclosed in this document may be of various forms. The electronic devices may include, for example, portable communication devices (e.g., smartphones), computer devices, portable multimedia devices, portable medical devices, cameras, electronic devices, or consumer electronics. The electronic devices according to the embodiments of this document are not limited to the devices described above.

[0208] The various embodiments of this document and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of said items unless the relevant context clearly indicates otherwise. In this document, phrases such as "A or B," "at least one of A and B," "at least one of A or B," "A, B or C," "at least one of A, B and C," and "at least one of A, B, or C" may each include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Terms such as "first," "second," or "first" or "second" may be used simply to distinguish said components from other said components and do not limit said components in any other aspect (e.g., importance or order). Where any (e.g., 1st) component is referred to as "coupled" or "connected" to another (e.g., 2nd) component, with or without the terms "functionally" or "communicationly," it means that said any component may be connected to said other component directly (e.g., via a wire), wirelessly, or through a third component.

[0209] The term “module” as used in the various embodiments of this document may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).

[0210] Various embodiments of the present document may be implemented as software (e.g., program (140)) comprising one or more instructions stored in a storage medium (e.g., internal memory (136) or external memory (138)) readable by a machine (e.g., electronic device (101)). For example, a processor (e.g., processor (120)) of the machine (e.g., electronic device (101)) may call at least one of the one or more instructions stored in the storage medium and execute it. This enables the machine to be operated to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' simply means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.

[0211] According to one embodiment, the method according to the various embodiments disclosed herein may be provided as included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0212] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities, and some of the multiple entities may be separated and placed in other components. According to various embodiments, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.

Claims

1. In a wearable device, At least one camera configured to perform eye tracking; At least one microphone configured to acquire voice input; A display positioned in front of the user's eyes when the above-mentioned wearable device is worn by the user; Memory for storing instructions and including one or more storage media; and It includes at least one processor comprising a processing circuit, and When the above instructions are executed individually or collectively by the at least one processor, the wearable device, While the assistant application is running, the voice input is acquired through the at least one microphone and the user's gaze information is acquired through the at least one camera, and Based on the above voice input and the above gaze information, criterion information for identifying a dataset to be used to generate a prompt is obtained, and Based on the above reference information, at least one data set among a plurality of data sets stored in the memory is identified, and Generate the prompt based on at least one data set, and Causing to display the response to the voice input, obtained based on the above prompt, through the display, Wearable device.

2. In Paragraph 1, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Based on the above gaze information, identify the object focused by the user's gaze, and Causing to obtain the reference information representing at least one attribute identified based on the voice input among the attributes of the above object, Wearable device.

3. In Paragraph 2, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Causing to identify the at least one data set representing the at least one attribute among the plurality of data sets above, Wearable device.

4. In Paragraph 1, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Determining the distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets, and By identifying a data set having a minimum distance from the vector representing the reference information among the plurality of data sets, thereby causing the identification of at least one data set, Wearable device.

5. In Paragraph 1, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Determining the distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets, and Causing to identify at least one data set within a predefined distance from the vector representing the reference information among the plurality of data sets. Wearable device.

6. In Paragraph 1, The above reference information includes first reference information associated with a first user intent and second reference information associated with a second user intent, and When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Determining first distances between a first vector representing the first reference information and a plurality of vectors representing the plurality of data sets, and Causing to determine second distances between a second vector representing the second reference information and a plurality of vectors representing the plurality of data sets, Wearable device.

7. In Paragraph 6, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Determining whether the difference between the average distance of the first distances and the average distance of the second distances is within a critical distance, Based on the determination that the difference is within the threshold distance, at least one data set among the plurality of data sets is identified using the reference information including the first reference information and the second reference information, and Causing to identify at least one data set among the plurality of data sets using reference information having the lower average distance among the first reference information and the second reference information, in accordance with the determination that the above difference is outside the above threshold distance, Wearable device.

8. In Paragraph 1, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Based on the above voice input and the above gaze information, intent information indicating user intent is obtained, and Causing to generate the prompt based on the above at least one data set and the above intention information, Wearable device.

9. In Paragraph 1, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Transmitting the prompt generated based on the above at least one data set to a server device, and Causing to obtain the response to the voice input, generated based on the above prompt, from the server device, Wearable device.

10. In Paragraph 1, When the above instructions are executed individually or collectively by the at least one processor, the wearable device, Using an on-device artificial intelligence model, causing to obtain the response according to the prompt generated based on the at least one data set, Wearable device.

11. A method performed by a wearable device comprising at least one camera configured to perform eye tracking, at least one microphone configured to acquire voice input, memory, and a display, wherein While the assistant application is running, the operation of acquiring the voice input through the at least one microphone and acquiring the user's gaze information through the at least one camera; An operation to obtain criterion information for identifying a dataset to be used to generate a prompt based on the voice input and the gaze information; An operation of identifying at least one data set among a plurality of data sets stored in the memory based on the above reference information; The operation of generating the prompt based on at least one data set; and A method including an operation of displaying a response to the voice input, obtained based on the above prompt, through the display. method.

12. In Clause 11, the operation of acquiring the above reference information is, An operation to identify an object focused by the user's gaze based on the above gaze information; and A method comprising the operation of obtaining reference information representing at least one attribute identified based on the voice input among the attributes of the object, method.

13. In paragraph 12, the operation of identifying at least one data set is, The operation of identifying the at least one data set representing the at least one attribute among the plurality of data sets, method.

14. In paragraph 11, the operation of identifying at least one data set is, An operation to determine the distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets; and The operation of identifying at least one data set by identifying a data set having a minimum distance from the vector representing the reference information among the plurality of data sets, method.

15. In paragraph 11, the operation of identifying at least one data set is, An operation to determine the distances between a vector representing the reference information and a plurality of vectors representing the plurality of data sets; and The operation of identifying at least one data set within a predetermined distance from the vector representing the reference information among the plurality of data sets, method.