Electronic device for controlling target on basis of voice command, and operating method thereof

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The electronic device addresses the challenge of identifying controllable objects in complex interfaces by using a microphone and image sensor to create virtual objects for voice-controlled devices, enhancing user convenience and control capabilities.

WO2026121789A1PCT designated stage Publication Date: 2026-06-11SAMSUNG ELECTRONICS CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: SAMSUNG ELECTRONICS CO LTD
Filing Date: 2025-12-02
Publication Date: 2026-06-11

Application Information

Patent Timeline

02 Dec 2025

Application

11 Jun 2026

Publication

WO2026121789A1

IPC: G06F3/16; G10L15/22; G06F3/00; G06N20/00; G06F3/0484; H04L67/125; G06F3/048

AI Tagging

Application Domain

Sound input/output Machine learning

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure KR2025020399_11062026_PF_FP_ABST

Patent Text Reader

Abstract

An electronic device is disclosed. An electronic device according to an embodiment of the present disclosure may comprise: a microphone; an image sensor; a memory for storing target information including information about a device or an application which can be controlled through a voice command; and at least one processor electrically connected to the microphone, the camera, and the memory, wherein the at least one processor is configured to: acquire a first user input through the microphone; acquire a surrounding space image of the electronic device through the image sensor in response to acquiring the first user input; identify one or more targets on the basis of the surrounding space image and the target information; generate one or more first virtual objects for the one or more targets; and display the generated first virtual objects.

Need to check novelty before this filing date? Find Prior Art

Description

Electronic device for controlling a target based on voice commands and method of operation thereof

[0001] The embodiments disclosed in this document relate to an electronic device for controlling a target based on voice commands and a method of operating the same.

[0002] With the recent advancement of device control technology using voice commands, users can now execute or control various applications on their devices via voice. Furthermore, it has become possible to control not only the user's own device (e.g., smartphone) but also external electronic devices connected to it (e.g., TVs and air conditioners connected via Home IoT) using voice. Additionally, voice commands can be effectively utilized in XR (extended reality) devices. For instance, voice commands can be used in conjunction with gesture-based control actions to enhance work efficiency. Moreover, users can quickly perform intended actions without navigating complex menus. A natural and immersive user experience can be provided without disrupting visual workflows; through these characteristics, interfaces in various digital environments can be improved, and user convenience can be enhanced.

[0003] The information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.

[0004] Existing voice assistant systems may have limitations in clearly providing users with information about objects controllable via voice commands. As a result, users may need to identify and learn on their own which objects can be controlled through voice commands (e.g., user devices, external electronic devices, applications, etc.). In the case of devices that display large amounts of information on a screen through complex interfaces, such as XR (extended reality) devices, users may experience greater difficulty recognizing on-screen objects controllable via voice commands compared to conventional smartphones or tablets.

[0005] Based on the discussion described above, the present disclosure provides an electronic device and method for enabling control of a target through voice commands.

[0006] An electronic device according to one embodiment of the present disclosure may be provided. The electronic device may include at least one processor comprising a microphone, an image sensor, and a processing circuit, and a memory comprising at least one storage medium for storing instructions. The memory may store target information including information regarding a device or application that can be controlled via voice commands. The instructions may cause the HMD device to perform at least one action when executed individually or collectively by the at least one processor. The at least one action may include an action of obtaining a first user input through the microphone. The at least one action may include an action of obtaining an image of the surrounding space of the electronic device through the image sensor in response to obtaining the first user input. The at least one action may include an action of identifying one or more targets based on the surrounding space image and the target information. The at least one action may include an action of creating one or more first virtual objects for the one or more targets. The at least one action may include an action of displaying the created first virtual object.

[0007] A method of operation of an electronic device according to one embodiment of the present disclosure may be provided. The method of operation of the electronic device may include at least one operation. The at least one operation may include an operation of acquiring a first user input through the microphone. The at least one operation may include an operation of acquiring an image of the surrounding space of the electronic device through the image sensor in response to acquiring the first user input. The at least one operation may include an operation of identifying one or more targets based on the surrounding space image and the target information. The at least one operation may include an operation of creating one or more first virtual objects for the one or more targets. The at least one operation may include an operation of displaying the created first virtual object.

[0008] According to one embodiment, a storage medium may be provided for storing at least one instruction readable by a computer. The at least one instruction may cause the electronic device to perform at least one operation when executed by at least a part of at least one processor of the electronic device. The at least one operation may include an operation of acquiring a first user input through the microphone. The at least one operation may include an operation of acquiring an image of the surrounding space of the electronic device through the image sensor in response to acquiring the first user input. The at least one operation may include an operation of identifying one or more targets based on the surrounding space image and the target information. The at least one operation may include an operation of creating one or more first virtual objects for the one or more targets. The at least one operation may include an operation of displaying the created first virtual object.

[0009] The embodiments of the present disclosure provide the effect of enabling the user to easily control devices and applications through voice commands by visually providing the user with targets controllable through voice commands.

[0010] In addition, embodiments of the present disclosure provide the effect of enabling control of a device that is out of the user's field of vision through voice commands.

[0011] The effects obtainable in the present disclosure are not limited to those mentioned in the various embodiments, and other unmentioned effects will be clearly understood by those skilled in the art to which the present disclosure pertains from the description below.

[0012] FIG. 1 is a block diagram of an electronic device in a network environment according to various embodiments.

[0013] FIG. 2 illustrates a block configuration of an electronic device according to one embodiment.

[0014] FIG. 3 illustrates the operation flow of an electronic device according to one embodiment.

[0015] FIG. 4 illustrates the operation flow of an electronic device according to one embodiment.

[0016] FIG. 5 illustrates the operation flow of an electronic device according to one embodiment.

[0017] FIG. 6 illustrates the operation flow of an electronic device according to one embodiment.

[0018] FIG. 7 illustrates the identification and display flow of a target controllable according to a voice command of an electronic device according to one embodiment.

[0019] FIG. 8 illustrates the identification and display flow of a target controllable according to a voice command of an electronic device according to one embodiment.

[0020] FIG. 9 illustrates an example of a virtual object controllable according to a voice command of an electronic device according to one embodiment.

[0021] FIG. 10 illustrates an example of a screen displayed through an electronic device according to one embodiment.

[0022] FIG. 11 illustrates an example of a screen displayed through an electronic device according to one embodiment.

[0023] FIG. 12 illustrates an example of a virtual object display of an electronic device according to one embodiment.

[0024] FIG. 13 illustrates an example of a virtual object display of an electronic device according to one embodiment.

[0025] FIG. 14 illustrates an example of a virtual object display of an electronic device according to one embodiment.

[0026] FIG. 15 illustrates an example of a virtual object display of an electronic device according to one embodiment.

[0027] In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components.

[0028] FIG. 1 is a block diagram of an electronic device (101) in a network environment (100) according to various embodiments. Referring to FIG. 1, in the network environment (100), the electronic device (101) may communicate with an electronic device (102) through a first network (198) (e.g., a short-range wireless communication network) or may communicate with at least one of an electronic device (104) or a server (108) through a second network (199) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (101) may communicate with the electronic device (104) through a server (108). According to one embodiment, the electronic device (101) may include a processor (120), memory (130), input module (150), sound output module (155), display module (160), audio module (170), sensor module (176), interface (177), connection terminal (178), haptic module (179), camera module (180), power management module (188), battery (189), communication module (190), subscriber identification module (196), or antenna module (197). In some embodiments, at least one of these components (e.g., connection terminal (178)) may be omitted from the electronic device (101), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (176), camera module (180), or antenna module (197)) may be integrated into a single component (e.g., display module (160)).

[0029] The processor (120) can control at least one other component (e.g., hardware or software component) of the electronic device (101) connected to the processor (120) by executing software (e.g., program (140)), for example, and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (120) can store commands or data received from other components (e.g., sensor module (176) or communication module (190)) in volatile memory (132), process the commands or data stored in volatile memory (132), and store the resulting data in non-volatile memory (134). According to one embodiment, the processor (120) may include a main processor (121) (e.g., central processing unit or application processor) or an auxiliary processor (123) that can operate independently or together with it (e.g., graphics processing unit, neural processing unit (NPU), image signal processor, sensor hub processor, or communication processor). For example, if the electronic device (101) includes a main processor (121) and an auxiliary processor (123), the auxiliary processor (123) may be configured to use lower power than the main processor (121) or to be specialized for a designated function. The auxiliary processor (123) may be implemented separately from the main processor (121) or as part thereof.

[0030] The auxiliary processor (123) may control at least some of the functions or states associated with at least one component of the electronic device (101) (e.g., display module (160), sensor module (176), or communication module (190)) on behalf of the main processor (121) while the main processor (121) is in an inactive (e.g., sleep) state, or together with the main processor (121) while the main processor (121) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (123) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (180) or communication module (190)). According to one embodiment, the auxiliary processor (123) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (101) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (108)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, the artificial intelligence model may include a software structure, either additionally or substantially.

[0031] The memory (130) can store various data used by at least one component of the electronic device (101) (e.g., processor (120) or sensor module (176)). The data may include, for example, input data or output data for software (e.g., program (140)) and related commands. The memory (130) may include volatile memory (132) or non-volatile memory (134).

[0032] The program (140) may be stored as software in memory (130) and may include, for example, an operating system (142), middleware (144), or an application (146).

[0033] The input module (150) can receive commands or data to be used for a component of the electronic device (101) (e.g., processor (120)) from outside the electronic device (101) (e.g., user). The input module (150) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).

[0034] The sound output module (155) can output a sound signal to the outside of the electronic device (101). The sound output module (155) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.

[0035] The display module (160) can visually provide information to an external (e.g., user) of the electronic device (101). The display module (160) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (160) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.

[0036] The audio module (170) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (170) can acquire sound through the input module (150) or output sound through the sound output module (155) or an external electronic device (e.g., electronic device (102)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (101).

[0037] The sensor module (176) can detect the operating state of the electronic device (101) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (176) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.

[0038] The interface (177) may support one or more specified protocols that can be used for the electronic device (101) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (102)). According to one embodiment, the interface (177) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

[0039] The connection terminal (178) may include a connector through which the electronic device (101) can be physically connected to an external electronic device (e.g., electronic device (102)). According to one embodiment, the connection terminal (178) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).

[0040] The haptic module (179) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that the user can perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module (179) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.

[0041] The camera module (180) can capture still images and video. According to one embodiment, the camera module (180) may include one or more lenses, image sensors, image signal processors, or flashes.

[0042] The power management module (188) can manage the power supplied to the electronic device (101). According to one embodiment, the power management module (188) can be implemented, for example, as at least part of a power management integrated circuit (PMIC).

[0043] The battery (189) can supply power to at least one component of the electronic device (101). According to one embodiment, the battery (189) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

[0044] The communication module (190) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (101) and an external electronic device (e.g., electronic device (102), electronic device (104), or server (108)), and the performance of communication through the established communication channel. The communication module (190) may include one or more communication processors that operate independently of the processor (120) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (190) may include a wireless communication module (192) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (194) (e.g., LAN (local area network) communication module, or power line communication module). The corresponding communication module among these communication modules can communicate with an external electronic device (104) through a first network (198) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (199) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN)). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (192) can identify or authenticate the electronic device (101) within a communication network such as the first network (198) or the second network (199) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (196).

[0045] The wireless communication module (192) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (192) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (192) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (192) can support various requirements specified in the electronic device (101), external electronic device (e.g., electronic device (104)), or network system (e.g., second network (199)). According to one embodiment, the wireless communication module (192) can support a Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mMTC, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for realizing URLLC.

[0046] An antenna module (197) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (197) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (197) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (198) or a second network (199), may be selected from the plurality of antennas, for example, by a communication module (190). A signal or power may be transmitted or received between the communication module (190) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (197). According to various embodiments, the antenna module (197) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to two surfaces (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.

[0047] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.

[0048] According to one embodiment, commands or data may be transmitted or received between the electronic device (101) and an external electronic device (104) through a server (108) connected to a second network (199). Each of the external electronic devices (102, or 104) may be the same or different type of device as the electronic device (101). According to one embodiment, all or part of the operations performed on the electronic device (101) may be performed on one or more of the external electronic devices (102, 104, or 108). For example, if the electronic device (101) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (101) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (101). The electronic device (101) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (101) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device (104) may include an Internet of Things (IoT) device. The server (108) may be an intelligent server using machine learning and / or neural networks. According to one embodiment, the external electronic device (104) or the server (108) may be included within the second network (199).The electronic device (101) can be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

[0049] FIG. 2 illustrates a block configuration of an electronic device according to one embodiment. The electronic device (200) of FIG. 2 may include a device corresponding to the electronic device (100) of FIG. 1. Referring to FIG. 2, a functional configuration of an electronic device for identifying a target controllable according to a voice command, displaying a virtual object regarding the identified target, and controlling the target based on a voice command may be described.

[0050] According to one embodiment, the electronic device (200) may include a target management module (210), a virtual interface module (220), and a voice assistance module (230).

[0051] In one embodiment, the target management module (210) can identify and store targets that can be controlled via voice commands.

[0052] In one embodiment, the target may include a device or application that can be controlled via voice commands.

[0053] In one embodiment, the virtual interface module (220) can create a virtual object regarding a target that can be controlled via voice command, display the created virtual object, and perform a specific action based on the created virtual object.

[0054] In one embodiment, the voice assistance module (230) can identify a voice command. Additionally, it can perform an operation to transmit to the target management module (210) and the virtual interface module (220) to perform an action corresponding to the voice command.

[0055] According to one embodiment, the target management module (210) may include a target recognition module (212), a target information management module (214), and a target database (216).

[0056] In one embodiment, the target management module (210) can perform a series of operations to recognize and extract devices and applications controllable via voice commands based on the target recognition module (212), the target information management module (214), and the target DB (216).

[0057] In one embodiment, the target information management module (214) can store devices and applications controllable via voice commands in the target DB (216).

[0058] In one embodiment, the target recognition module (212) can perform object detection on an image obtained through the electronic device (200). If a device corresponding to an object detected through the target recognition module (212) matches one of the devices stored as targets in the target DB (216), the device corresponding to the detected object can be determined as a target.

[0059] In one embodiment, the target recognition module (212) performs object detection on an image obtained through the electronic device (200), but if no detected object exists, it can identify (extract) at least one target based on the user's location information (e.g., the room where the user is located, the distance from another room), and the distance information between the user and another device.

[0060] In one embodiment, the target may include a device and an application that can be controlled via voice commands. For example, it may include a real device (physical object) that can be controlled from a second device via voice commands or an API, an app icon that is arbitrarily generated on the screen the user is viewing and can be controlled via voice commands or an API (application programming interface), and an app-related UI (virtual object).

[0061] In one embodiment, the target management module (210) can recognize and extract a target present in an image (screen) viewed by a user through an electronic device by means of a target recognition module (212), based on information regarding devices and applications controllable via voice commands stored in the target DB (216). For convenience of explanation, devices and applications controllable via voice commands may be referred to as “targets” and information regarding devices and applications controllable via voice commands may be referred to as “target information” in the following description.

[0062] In one embodiment, the target management module (210) can extract targets according to the target extraction option.

[0063] In one embodiment, the target management module (210) can identify an IoT device included in the screen that the user is viewing through the electronic device, an IoT device that is not included in the screen that the user is viewing through the electronic device but exists in the space where the user is located (e.g., room) or in the direction the user is looking (e.g., adjacent room), and a virtual object (e.g., application) that can be executed on the screen that the user is viewing. In the following description, the IoT device may include an electronic device connected to the same network as the electronic device. For example, the IoT device may include a plurality of home appliances (e.g., TV, monitor, washing machine, air conditioner, speaker, etc.) connected to the same home network as the electronic device.

[0064] In one embodiment, the target information management module (214) can store and manage target information in the target DB (216). Target information may include user information of an electronic device (e.g., user ID, information regarding devices and applications available to the user (authorization information), information regarding the user's voice, information regarding the user's usage history of devices and applications, information regarding the user's control history of devices and applications via voice commands, etc.), device information of a device controllable via voice commands (e.g., identification information, images (e.g., images stored on an IoT server, images or videos taken and uploaded by the user, or rendered images), device information (e.g., model name, serial number), information regarding supported voice commands, location information of the device (e.g., coordinates, Room where it is located), network information to which the device is connected, information regarding other devices to which the device is connected, etc.), information regarding a virtual object (e.g., application ID, name, package name, control level, information regarding controllable individuals or groups, information regarding supported voice commands, information regarding the AIP provided for control, information on the state of the virtual object (e.g., execution state information), image information of the virtual object (e.g., icon image, UI object image, etc.), and images of the space where the user is located (e.g., house or room) (e.g., floor plan, captured image or video).

[0065] In one embodiment, when the target is an application, information regarding a virtual object may be stored in a memory area that stores information about the application. In this case, when a specific event (e.g., control mode or voice command mode) occurs, each application determines whether it can be controlled by voice commands and may change the UI of the virtual object or add specific functions to the object according to a predefined action.

[0066] In one embodiment, even when the control mode is not set, if a change in the user's location information is identified (e.g., if the user is identified to be moving in a room), the electronic device can identify and display information regarding devices and applications that can be controlled via voice commands.

[0067] In one embodiment, the target recognition module (212) can identify at least one target based on target information stored in the target DB (216), information regarding the screen that the user is viewing through an electronic device (e.g., an image captured through an image sensor or camera), and the user's location information.

[0068] According to one embodiment, the virtual interface module (220) may include a virtual interface creation module (222), a location optimization module (224), an effect application module (226), and an interaction management module (228).

[0069] In one embodiment, the virtual interface creation module (222) can update the UI regarding the target (create UI objects) based on the target information transmitted from the target management module (210) so that information regarding the target can be intuitively understood.

[0070] In one embodiment, the virtual object may include an object that allows the user to intuitively identify through the UI that the target is an object controllable via voice commands. For example, the virtual object may include 2D icons, 3D icons, buttons, and widgets. For example, the virtual object may have the form of an icon in which an image of the device is included in the application icon.

[0071] In one embodiment, the virtual interface generation module (222) can generate virtual objects of different sizes by reflecting the perspective of the actual device and the user. In other words, the virtual interface generation module (222) can determine the size of the virtual object based on the distance between the electronic device and the target.

[0072] In one embodiment, a virtual object can be created based on features such as the shape and color information of the target.

[0073] In one embodiment, when user input (e.g., a user gazing at a virtual object) is identified regarding a virtual object, subsequently identified voice commands can be determined to be for that virtual object. For example, when a user looks at a created virtual object and gives a voice command, a target corresponding to that object is predefined as the target of the command, and the command can be interpreted to control the target. For example, if a user performs a gesture of clicking a virtual object, an ON / OFF action of the corresponding target can be performed. For example, when user input is identified regarding a virtual object, examples of commands to control a target corresponding to the virtual object, examples of commands frequently used by actual users regarding the target, examples of commands frequently used by multiple users, examples of commands likely to be used at the current time depending on the user's state, or a combination of the aforementioned examples may be displayed together.

[0074] In one embodiment, the interaction management module (228) can perform a series of operations to enable the user to control the device and application through the user's voice commands via the generated UI object.

[0075] In one embodiment, the effect application module (226) can apply an effect (e.g., highlighting) to an existing device or application to indicate that the target is controllable.

[0076] In one embodiment, the effect application module (226) may display an icon or widget form generated through the virtual interface creation module (222) to indicate that the device or application is controllable.

[0077] In one embodiment, the position optimization module (224) can determine the display position of a virtual object. For example, the display position may include pixels corresponding to the device within the image frame and pixels within a predetermined range.

[0078] In one embodiment, the interaction management module (228) can perform a predefined action according to the user input when user input (voice command) for a created virtual object is identified.

[0079] In one embodiment, the effect application module (226) can apply various dynamic UI effects to a target. For example, the effect application module (226) can obtain information regarding the shape of the device by performing object detection or edge detection on a device recognized as controllable. Based on the information regarding the shape of the device, the effect application module (226) can apply a highlighting effect to the edges of the device. At this time, if it is recognized that a user is looking at a specific device, the effect displayed on the specific device (e.g., highlighting) can be changed (e.g., changing the color or intensity). Through the application of such effects, it can be visually provided that the user's gaze information has been well recognized.

[0080] In one embodiment, when user input (e.g., gazing at a virtual object on the target) is identified with respect to the target, a specific effect may be applied to the target or the virtual object on the target. Subsequently, when the user makes a voice command, the electronic device (200) may pre-define the target as the target of the command and interpret the user's voice command.

[0081] In one embodiment, when user input (e.g., gazing at a virtual object on the target) is identified for a target, the virtual object displayed in relation to other devices excluding the target may not be displayed, or UI effects applied to the device may be disabled.

[0082] In one embodiment, the position optimization module (224) can adjust the position of the generated virtual object to place it at an optimized position on the user screen.

[0083] In one embodiment, the electronic device can identify at least one target by performing object detection on an image of the surrounding space of the electronic device acquired through the electronic device. The electronic device can determine the location where a virtual object is to be displayed on the target by utilizing coordinate information for the detected target. Within a predetermined range of the coordinate information for the detected target, the electronic device can update the location where the virtual object is to be displayed by considering the user's line of sight and the locations of other virtual objects and elements displayed on the screen. For example, if the target is located at the bottom right of the image acquired through the electronic device, the location where the virtual object is to be displayed may be determined to be at the top or left of the target, rather than to the right or bottom of the target.

[0084] In one embodiment, the electronic device can determine the location of a virtual object so as not to overlap with the location of another object (e.g., a real object existing within the space where the actual user is located) if the location where the virtual object is displayed for a target overlaps with the location of another object included in an image acquired through the electronic device.

[0085] In one embodiment, the interaction management module (228) may identify an event corresponding to the additional user input when identifying additional user input regarding a virtual object or UI effect for a target, and perform an action corresponding to the identified event. User input may be identified by interactions such as user gaze information, gestures, voice commands, and physical button selections.

[0086] In one embodiment, the electronic device can perform an action corresponding to a voice command for a target identified by user input.

[0087] In one embodiment, the electronic device can perform an On / Off function of a target for a target identified by user input.

[0088] In one embodiment, the electronic device may display for a target identified by user input examples of commands supported by the target, examples of commands frequently used by actual users, examples of commands frequently used by multiple users, examples of commands likely to be used at the current time based on the user state, or a combination of the aforementioned examples.

[0089] According to one embodiment, the voice assistance module (230) may include a voice control detection module (232), an ASR (234), an NLU (236), and an execution module (238).

[0090] In one embodiment, the voice assistance module (230) can understand user commands and analyze intent to provide appropriate responses and actions.

[0091] In one embodiment, the voice assistance module (230) can control the app and the device according to voice commands.

[0092] In one embodiment, the voice assistance module (230) can search for information desired by the user according to voice commands.

[0093] In one embodiment, the voice control detection module (232) may perform the function of recognizing that a user has started a command. For example, the voice control detection module (232) may determine that the user has started a command if a specified call command is identified. For example, the voice control detection module (232) may determine that the user has started a voice command if a specified gesture is identified. For example, the voice control detection module (232) may determine that the user has started a command if the user's voice is identified in an environment without a listener. For example, the voice control detection module (232) may determine that the user has started a command if it identifies that a specific physical button is selected.

[0094] In one embodiment, the ASR (234) can convert the user's voice into text.

[0095] In one embodiment, the NLU (236) can analyze the intent of a command based on text. The NLU (236) can analyze the intent of a command by pre-defining a domain because the target of the command is determined when a voice command is transmitted. The NLU (236) can determine the intent of the command by transmitting to a prompt for command analysis which target the command is directed to.

[0096] In one embodiment, the execution module (238) can execute an actual operation based on the analyzed results. For example, the execution module (238) can perform an actual operation using the command intent classification result received from the NLU (236).

[0097] FIG. 3 illustrates the operation flow of an electronic device according to one embodiment. The electronic device of FIG. 3 may include a device corresponding to the electronic device (100) of FIG. 1 and the electronic device (200) of FIG. 2. In the following description, the electronic device is described on the premise that it is an XR device capable of providing augmented reality services to a user, but this is merely an example and can be applied to various types of user devices such as smartphones and wearable devices.

[0098] According to one embodiment, in operation 310, the electronic device can identify a first user input through a microphone.

[0099] In one embodiment, the first user input may include a user input for executing a mode to control the device via a voice command. For example, the first user input may include a designated call command, namely a voice command including “Voice Control Mode” and “Activate”. For example, the first user input may include a designated gesture. For example, the first user input may include the voice of a user identified when there is no noise in the space where the electronic device is located (e.g., when the ambient noise level is below a predetermined value). For example, the first user input may include an input through a physical input unit identified via the electronic device.

[0100] In one embodiment, the microphone may include a microphone included in an electronic device. For example, the electronic device may identify a voice containing an intention to control the device through a user's voice command via the microphone of the electronic device.

[0101] In one embodiment, the microphone may include a microphone included in another device adjacent to the electronic device. In this case, the other device may identify the first user input and transmit information related thereto to the electronic device. For example, if the TV identifies a voice through a microphone included in the TV that executes a mode for controlling the device via a voice command from the user of the electronic device, the TV may transmit to the electronic device, other devices, and an IoT server that the first user input has been identified. The electronic device may receive from the TV that the first user input has been identified and, accordingly, identify the first user input.

[0102] In one embodiment, the electronic device may determine whether the first user input is for a user who has the authority to control the device or application via voice commands. For example, the electronic device may identify a voice command that activates a voice control mode and determine whether the voice command is a voice command spoken by an authorized user. The electronic device may identify the user who generated the first user input based on the first user input and voice information previously stored in the electronic device or stored on a server, and determine whether the identified user is an authorized user. All of the following operations are described on the premise that the identified user is an authorized user, but this is merely an example, and if the electronic device identifies that the first user input is generated by an unauthorized user, the following operations may not be performed.

[0103] According to one embodiment, in operation 320, the electronic device can acquire an image of the surrounding space of the electronic device through an image sensor.

[0104] In one embodiment, the image sensor may include all or part of the configuration of the camera module (180) of FIG. 1. The image or video acquired through the image sensor may include an image or video in the direction in which the image sensor of the electronic device is facing.

[0105] In one embodiment, an image or video acquired through an image sensor can be displayed or projected onto a display, screen, etc. of an electronic device, and a user can view the image or video displayed on the display, screen, etc. For example, the electronic device can display an image or video acquired in real time through the image sensor toward the user's eyes or on a screen that the eyes are looking at. The user can identify the image or video acquired through the image sensor of the electronic device.

[0106] In one embodiment, the electronic device may display images or video acquired through an image sensor together with virtual objects (e.g., icons, rendered images, etc.) generated or pre-stored by the electronic device and UI effects. The electronic device may provide augmented reality-related content to the user by providing virtual objects and UI effects generated by the electronic device, received from an external device, or stored in the electronic device, along with images and video regarding the actual surrounding space where the user is located. For example, the electronic device may display on the screen icons for various types of home appliances (e.g., TVs, speakers, refrigerators, etc.) placed in the space where the user is located, along with UI effects that highlight the icons.

[0107] In one embodiment, the surrounding space image may include an image of the direction the user is looking. The electronic device may be worn by the user, and the direction the user is looking may coincide with the direction the image sensor is facing. The image or video acquired through the image sensor may include an image or video of the direction the user is looking.

[0108] According to one embodiment, in operation 330, the electronic device may identify one or more targets based on an image of the surrounding space of the electronic device and target information. The target information may include information corresponding to the target information described in FIG. 2. A target may refer to a device and an application controllable via voice command. The target information may be stored in the memory of the electronic device or in an external device.

[0109] In one embodiment, the electronic device can analyze an image of the surrounding space to determine whether at least one physical device exists within the image of the surrounding space. The electronic device can take the image of the surrounding space as input and determine whether a physical device is included within the image based on a machine learning model trained to detect objects included within the image (which may be stored in the electronic device or in an IoT server).

[0110] In one embodiment, the electronic device may determine that a physical device exists in an image of the surrounding space. For example, the electronic device may analyze an acquired image to detect an object having the shape of a TV within the image and, accordingly, determine that a TV exists in the image of the surrounding space. For example, the electronic device may perform object detection using a machine learning model on an acquired image to detect an object having the shape of a chair within the image and, accordingly, determine that a speaker exists in the image of the surrounding space.

[0111] In one embodiment, if the electronic device determines that a physical device exists, it may determine, based on target information, whether the physical device is a device that can be controlled by voice commands. If the physical device is a device that can be controlled by voice commands, the electronic device may determine the physical device as a target. If the physical device is a device that cannot be controlled by voice commands, the electronic device may determine that the physical device is not a target. For example, if the physical device is a TV that can be controlled by voice commands, the electronic device may determine the physical device as a target. For example, if the physical device is a chair that cannot be controlled by voice commands, the electronic device may determine that the physical device is not a target.

[0112] In one embodiment, the electronic device may determine that no IoT device exists in the surrounding space image. For example, the electronic device may analyze the surrounding space image and determine that there is no object having the shape of a specific device within the image.

[0113] In one embodiment, the electronic device can identify one or more targets based on the location information of the electronic device, i.e., the location information of the user. Regardless of the surrounding space image acquired, if the user's current location is identified, the electronic device can determine as targets devices and applications controllable via voice commands located at that location (e.g., a living room). Alternatively, the electronic device can identify the user's target space (e.g., a room located in the direction of view) based on the identified user's current location and the surrounding space image acquired. The electronic device can determine as targets devices and applications controllable via voice commands located at the user's target space.

[0114] In one embodiment, the electronic device may determine as a target at least one application that the electronic device can currently control via voice commands based on the current state of the electronic device. For example, an application that uses an API corresponding to the API for voice command control, such as a phone, text message, or internet browsing installed on the electronic device, may be determined as a target. However, among the applications installed on the electronic device, an application that does not use an API corresponding to the API for voice command control (e.g., an application installed from a separate server that is not installed by default on the electronic device or capable of interacting with the electronic device) may be determined not to be a target.

[0115] In one embodiment, the electronic device can identify application information of a screen viewed by a user. Based on the application information and target information of the screen viewed by the user, the electronic device can identify an application that can be controlled via voice command. Subsequently, additional information such as the application's ID information and MAC address can be extracted.

[0116] According to one embodiment, in operation 340, the electronic device may create one or more first virtual objects for one or more targets. Operation 340 may include all of the operation contents of the virtual interface module (220) of FIG. 2.

[0117] In one embodiment, the first virtual object is a virtual object created for a target and may include an icon, a rendered image, a UI effect displayed in a part of the target, etc.

[0118] In one embodiment, the electronic device may generate first virtual objects for each of one or more targets. For example, if one or more targets are all IoT devices, an icon may be generated for each IoT device.

[0119] According to one embodiment, in operation 350, the electronic device may display one or more first virtual objects. Operation 350 may include all of the operation contents of the virtual interface module (220) of FIG. 2.

[0120] In one embodiment, the electronic device may display a first virtual object within a predetermined range at a location where a target corresponding to the generated first virtual object exists on the screen. For example, if the electronic device identifies a TV included in an image as a target, it may display an icon for the TV within a predetermined distance from the area occupied by the TV on the image.

[0121] An example of the first virtual object display according to operation 350 can be described in detail in the drawings described below.

[0122] The operations described in FIG. 3 relate to a flow of operations for identifying devices and applications that can be controlled via voice commands among devices existing within the space where the user is located and applications that the user can execute, according to the user's voice command, and for visually displaying them. After the operations according to FIG. 3 are performed, the user can execute devices and applications via voice commands, and the operations for this can be described in the drawings described later.

[0123] FIG. 4 illustrates the operation flow of an electronic device according to one embodiment. The electronic device of FIG. 4 may include a device corresponding to the electronic device of FIG. 1 to FIG. 3. In describing the operation details of FIG. 4, details that overlap with those described in FIG. 1 to FIG. 3 may be omitted. FIG. 4 may partially overlap with the operation details of the electronic device according to FIG. 3 and may include details regarding an operation flow in which the device and application are controlled according to user input after the display of a virtual object described in FIG. 3.

[0124] According to one embodiment, in operation 410, the electronic device can obtain a second user input for any one of one or more targets.

[0125] In one embodiment, the second user input may include a user input for selecting a first target or a first virtual object created for the first target.

[0126] In one embodiment, any one of the one or more targets may include a target identified as being controlled by a user. For convenience of explanation, among the one or more targets, the target where the second user input is obtained may be referred to as the first target, and the target where it is not obtained may be referred to as the second target.

[0127] In one embodiment, the second user input may include a user input gazing at a first target or a first virtual object for the first target. For example, the electronic device may track the movement of the user's eyeball through a camera module (e.g., a front camera facing the user's eyes) and identify the location where the user is gazing. Accordingly, the electronic device may determine the target corresponding to the location where the user is gazing as the first target.

[0128] In one embodiment, the second user input may include a first target or a gesture for selecting the first target. For example, if the user identifies a gesture of grabbing a point where the TV is located through hand gestures, the electronic device may determine that a second user input for the TV has been acquired.

[0129] In one embodiment, the electronic device may determine that the first target is a target that the user controls via voice commands. Accordingly, operations corresponding to subsequently acquired voice commands may be performed with respect to the first target.

[0130] According to one embodiment, in operation 420, the electronic device can create a second virtual object for one target.

[0131] In one embodiment, the second virtual object may include an object for visually indicating that the selected first target has been selected by the user.

[0132] In one embodiment, the second virtual object may include an object in which the form (e.g., color, shape) of the first virtual object corresponding to the selected first target is partially modified or added. For example, referring to FIG. 7, the second virtual object (732) may include an object in which the color of the first virtual object (722) for the target (712) is changed. For example, referring to FIG. 8, the second virtual object (832) may include an object having a form in which the highlight effect applied to the entire area of the target rather than the border is applied to the first virtual object (822) for the target (712) (e.g., highlight effect applied to the border portion).

[0133] According to one embodiment, in operation 430, the electronic device can display a second virtual object. The electronic device can display a second virtual object created in operation 420.

[0134] In one embodiment, the electronic device can display a second virtual object in place of a first virtual object. In other words, it can display a second virtual object without displaying the first virtual object.

[0135] In one embodiment, the electronic device may display a second virtual object at a location corresponding to the location where the first virtual object is displayed.

[0136] In one embodiment, the electronic device may display the location of the second virtual object at a location different from the location where the first virtual object is displayed. The location where the second virtual object is displayed may include, for example, a location corresponding to the location where the second user input is identified.

[0137] According to one embodiment, in operation 440, the electronic device may acquire a third user input. The third user input may include acquiring a voice command from a user to control a target.

[0138] In one embodiment, a third user input can be obtained through a microphone of an electronic device.

[0139] In one embodiment, the third user input may include an input signal that is acquired through an acoustic input unit (e.g., a microphone) of an external electronic device located near the electronic device, rather than a microphone of the electronic device, and transmitted to the electronic device.

[0140] In one embodiment, the third user input may include a voice command requesting the execution of a target. For example, the third user input may include a voice command requesting the execution of a specific device or application, such as “Turn on the TV” or “Run the application.”

[0141] In one embodiment, the third user input may include a voice command requesting a specific task to be performed through a target. For example, the third user input may include a voice command requesting a specific task to be performed through a specific device or application, such as “turn on the education channel,” “call a friend,” “raise the indoor temperature,” “adjust the sound,” or “tell me the weather for tomorrow.”

[0142] According to one embodiment, in operation 450, the electronic device may perform an operation corresponding to a third user input with respect to a target. This may include all operations that perform an operation according to the third user input.

[0143] In one embodiment, when the third user input includes information about one target (the first target), the electronic device can perform an operation corresponding to the third user input with respect to the first target.

[0144] In one embodiment, if the third user input does not include information about the first target, the electronic device may perform an operation corresponding to the third user input with respect to the first target.

[0145] In one embodiment, if the third user input includes a control command regarding a target other than the first target, the electronic device may perform an operation corresponding to the third user input with respect to the other target.

[0146] In one embodiment, if the third user input includes information executable for a plurality of targets, the electronic device may perform an operation corresponding to the third user input for one of the identified targets.

[0147] FIG. 5 illustrates the operation flow of an electronic device according to one embodiment. The electronic device of FIG. 5 may include a device corresponding to the electronic device of FIG. 1 to FIG. 4. In the description of FIG. 5, descriptions that overlap with those described in FIG. 1 to FIG. 4 may be omitted.

[0148] According to one embodiment, in operation 510, the electronic device may acquire an image of the surrounding space of the electronic device. Operation 510 may include all operation contents corresponding to operation 320 of FIG. 3.

[0149] According to one embodiment, in operation 520, the electronic device may determine whether at least one object is identified in the image. The image may include an image of the surrounding space of the electronic device obtained in operation 510.

[0150] In one embodiment, at least one object may include an object regarding a device controllable via voice command and an object regarding a device not controllable via voice command.

[0151] In one embodiment, the electronic device can determine whether at least one object is identified with respect to an acquired surrounding space image by using a machine learning model trained to detect objects included in an input image.

[0152] In one embodiment, the electronic device may determine whether at least one object is identified in an image based on home rendering image information included in the target information of the electronic device. The target information may include information regarding external electronic devices used by the user of the electronic device (e.g., product images, rendered images, actual captured images, or video). By utilizing such information when performing object detection, the accuracy and efficiency of object detection can be increased.

[0153] In one embodiment, the electronic device can perform object detection on the type of device (e.g., TV, speaker, refrigerator) stored in target information from an image stored in frame units of real-time video of a screen viewed by a user. Subsequently, the electronic device can determine whether there is a match by comparing the object detected through object detection with the device image and rendering image stored in the target information.

[0154] In one embodiment, the electronic device can determine whether a device or application corresponding to at least one object exists in the target information based on at least one object identified in the image and the location of the user. In other words, instead of performing operation 540 after confirming whether the object exists, it can determine whether a device or application corresponding to the object is included in the target information based on the object identified in operation 520 and the location of the identified user.

[0155] In one embodiment, if the electronic device determines in operation 520 that at least one object has been identified in an image, it may perform operation 530.

[0156] According to one embodiment, in operation 530, the electronic device can determine whether a device or application corresponding to the object exists in the target information.

[0157] In one embodiment, the electronic device can determine whether an object detected through object detection is a device or application that can be controlled via voice commands. For example, the electronic device can determine the identified object as a target if the identified object matches any one of the devices that can be controlled via voice commands in the target information. For example, the electronic device can determine that the identified object is not a target if the identified object is not included in the devices that can be controlled via voice commands in the target information.

[0158] In one embodiment, the electronic device may determine that a device or application corresponding to an object exists in the target information if the device or application corresponding to the object matches any one of the devices and applications controllable through voice commands included in the target information. The electronic device may determine that a device or application corresponding to an object does not exist in the target information if the device or application corresponding to the object does not match any of the devices and applications controllable through voice commands included in the target information.

[0159] In one embodiment, the electronic device may determine an object as a target if it determines that a device or application corresponding to the object exists in the target information. In one embodiment, the electronic device may determine that an object is not a target if it determines that a device or application corresponding to the object does not exist in the target information.

[0160] In one embodiment, if the electronic device determines in operation 530 that a device corresponding to an object exists in the target information, the electronic device may perform operation 540.

[0161] According to one embodiment, in operation 540, the electronic device may determine whether the location of the object identified through the surrounding space image matches the location of the device or application corresponding to the object. Operation 540 may include an operation in which the electronic device determines whether the room information of the candidate device or application determined to match matches the room information that the user is currently viewing through the screen.

[0162] In one embodiment, if the electronic device determines in operation 540 that the location of the device corresponding to the object and the location of the object identified through the surrounding space image match, in operation 550, the device or application corresponding to the object may be determined as a target and additional information regarding the target may be extracted.

[0163] In one embodiment, additional information regarding the target may include information for controlling the target via voice commands. For example, additional information regarding the target may include the target's ID, coordinate information (e.g., 2D bounding box coordinates), 3D coordinates, and depth information.

[0164] In one embodiment, the electronic device may perform operation 560 if it determines in operation 520 that at least one object is not identified in the image. For example, when a user looks at a wall facing the bedroom or a closed door of the bedroom, the acquired image may simply contain an image of the wall or the closed door. In this case, no object may be identified in the image. Even if no object is identified in the image, it is necessary to visually display a device or application that can be controlled via voice command in the bedroom. As another example, when a user looks at a wall facing one side of the living room (a wall where devices such as a TV are not placed), the acquired image may simply be an image of the wall. In this case, no object may be identified in the image. Even if no object is identified in the image, the electronic device may visually display to the user a device or application that can be controlled via voice command in the living room. Operations 560, 570, and 580 may relate to an operation flow for visually providing a device and application that can be controlled via voice command to the user in a situation where no object can be identified in the acquired image as in the above situation.

[0165] According to one embodiment, in operation 560, the electronic device can determine a target area based on the user's location information and home IoT information.

[0166] In one embodiment, the user's location information may correspond to the location information of an electronic device. The user's location information may include information regarding the room where the user is currently located. For example, if the user is in the living room, the user's location information may include information regarding devices and applications placed in the living room. Home IoT information may include information regarding IoT devices placed in the user's home. The target area may include the room where the user's eyes are directed or the room where the user is currently located.

[0167] In one embodiment, the user may determine either the room where the user is currently located or the room located in the direction the user is looking as the target area. For example, if a bedroom door is identified in an image acquired through an electronic device, the electronic device may determine the bedroom as the target area. For example, if a wall separating the bedroom from the current area is identified in an image acquired through an electronic device, the electronic device may determine the bedroom as the target area. For example, if no other room is identified in an image acquired through an electronic device (e.g., if the space between the first bedroom and the second bedroom is identified), the electronic device may determine the room where the user is currently located as the target area.

[0168] According to one embodiment, in operation 570, the electronic device can identify a device or application included in the target area. A device included in the target area can be determined based on target information.

[0169] According to one embodiment, in operation 580, the electronic device may determine a device or application included in the target area as a target and may extract additional information about the target. Operation 580 may include an operation corresponding to operation 550.

[0170] FIG. 6 illustrates the operation flow of an electronic device according to one embodiment. The electronic device of FIG. 6 may include a device corresponding to the electronic device of FIG. 1 to FIG. 5. In the description of FIG. 6, descriptions that overlap with those described in FIG. 1 to FIG. 5 may be omitted. FIG. 6 relates to the operation flow of an electronic device when the electronic device is not an actual physical device (e.g., TV, air conditioner, etc.) but rather an application that can be controlled via voice commands is intended to be visually provided to the user.

[0171] According to one embodiment, in operation 610, the electronic device can identify an application regarding an image of the surrounding space of the electronic device.

[0172] In one embodiment, an application regarding the surrounding space of an electronic device may include information regarding an application executable in the space where the electronic device is currently located. For example, it may include a telephone application, an SMS application, a search application, etc.

[0173] According to one embodiment, in operation 620, the electronic device can determine whether the identified application is included in the target information.

[0174] In one embodiment, target information may include information such as identification information of an application, name, package name, control level, controllable individual or group, support command, API provided for control, and state information of the object (execution state information).

[0175] In one embodiment, if it is determined that the identified application is included in the target information, the electronic device may perform operation 630.

[0176] In one embodiment, target information may be stored in memory for the application.

[0177] In one embodiment, when a specific event (control mode or voice command mode) occurs (e.g., identification of a first user input), the electronic device can determine whether the application is an object that can be controlled by voice commands and change the UI of the application or add a specific function to the object according to a predefined action.

[0178] In one embodiment, even when the electronic device does not identify the first user input, it can perform a series of operations (e.g., the operation flow of FIG. 3) that provide information about devices and applications that can be controlled automatically via voice commands when there is a change in the user's location information, such as when moving to a different room.

[0179] According to one embodiment, in operation 630, the electronic device can determine the application as a target and extract additional information.

[0180] FIG. 7 illustrates the identification and display flow of a target controllable according to a voice command of an electronic device according to one embodiment.

[0181] Referring to FIG. 7, in the first step (710), the electronic device can acquire an image of the surrounding space. The image of the surrounding space may include a plurality of physical devices (712, 714, 76).

[0182] In one embodiment, the electronic device can identify a first user input in the first step (710). For example, the electronic device can identify a user’s voice saying “Voice control enabled.”

[0183] In one embodiment, the electronic device may create a virtual object for at least one identified target in the second step (720). Since the first device (712) (e.g., TV), the second device (e.g., speaker), and the third device (e.g., smart light) are all devices that can be controlled by voice commands, a first virtual object (722, 724, 726) may be created and displayed for each of the devices.

[0184] In one embodiment, with reference to FIG. 7, the first virtual object (712, 714, 716) may have the form of an icon including the shape of a physical device. The first virtual object (712, 714, 716) is created based on the shape of an actual device so that the user can intuitively identify whether it is a virtual object or an icon for a specific device.

[0185] In one embodiment, the first virtual objects (712, 714, 716) may each be displayed adjacent to their corresponding targets. The first virtual object (722) may be displayed at the bottom of the first device (712), which is the corresponding target. The first virtual object (724) may be displayed at the top of the first device (724), which is the corresponding target. The first virtual object (726) may be displayed at the top of the first device (726), which is the corresponding target.

[0186] The display location of the first virtual object (722, 724, 726) shown in FIG. 7 is merely an example, and it may be displayed at various locations considering the location of the device, margin space within the image, pixel distribution, etc.

[0187] In one embodiment, in step 3 (730), if the electronic device identifies input from a user (second user input) gazing at at least one of the targets, it may create and display a second virtual object for the gazing target. For example, referring to FIG. 7, when a user gazes at the first device (712) or the first virtual object (722), the electronic device may create a corresponding second virtual object (732) and display it at the bottom of the second device (712) because the second user input has been identified in the second device (712). The second virtual object (732) may have a modified form of a part of the first virtual object (722), and referring to FIG. 7, the color of the second virtual object (732) may be different from the color of the first virtual object (722). In other words, when a user looks at an icon below the TV, the color of the icon regarding the TV displayed on the screen may change.

[0188] In one embodiment, when a voice command (third user input) for controlling a device or application via voice is identified while the second virtual object is displayed, the electronic device may perform the corresponding action based on a target (device) corresponding to the second virtual object. For example, if the user commands “Play Top 100 music and turn up the volume” in the third step (730), the action of playing music and raising the volume may be performed on the first device (712) (e.g., TV) rather than on the first device (714) (e.g., speaker).

[0189] FIG. 8 illustrates the identification and display flow of a target controllable according to a voice command of an electronic device according to one embodiment.

[0190] In one embodiment, the form of the first virtual object may not be an icon containing the form of a device, such as the first virtual object (722, 724, 726) of FIG. 7, but may have the form of a UI effect displayed on the device. For example, referring to FIG. 8, the first virtual object (822) for the TV (712) may be a highlight effect displayed along the edge of the TV, the first virtual object (824) for the speaker (714) may be a highlight effect displayed along the edge of the speaker, and the first virtual object displayed for the lamp (716) may be a highlight effect displayed along the edge of the lamp. In the first step (810), when the user is sitting on a living room sofa and wearing an electronic device (e.g., an XR device), if the first user input is identified on the electronic device, the second step (820) may be initiated. In the second step (820), the edge portions of the devices (712, 714, 716) that can be controlled via voice command may be highlighted. This allows the user to intuitively identify which devices can be controlled by voice.

[0191] In one embodiment, in the second step (820), when a second user input is identified, for example, when the user looks at the TV (712), the shape of the first virtual object (822) changes, and the color of the first virtual object (822) formed by the border of the TV (712) that was displayed on the screen may change.

[0192] In one embodiment, when the electronic device identifies a user's voice command in the third step (830), the electronic device may perform an action corresponding to the voice command through a device corresponding to the second virtual object (822). For example, when the electronic device identifies a voice command "Turn on channel 11," an action of turning on channel 11 of the TV (722) may be performed.

[0193] FIG. 9 illustrates an example of a virtual object controllable according to a voice command of an electronic device according to one embodiment. The virtual object of FIG. 9 may refer to a first virtual object related to an application.

[0194] Referring to FIG. 9, in one embodiment, in the first step (910), a plurality of applications (912) that are executable by the electronic device may be displayed. The example illustrated in FIG. 9 may include the operation details for displaying an application that is controllable by voice command and is visible within the user's field of vision.

[0195] In one embodiment, when a first user input is identified through an electronic device in the first step (910), a second step (920) may be initiated. In the second step (920), the electronic device may display one or more targets (922) including an application that can be controlled via voice command among a plurality of executable applications (912). At this time, a first virtual object may be created for each of the one or more targets (922). The first virtual object illustrated in FIG. 9 may include a highlight effect formed on the border of the icons of one or more targets. Alternatively, for example, the background color of the app icon may be changed and displayed. At this time, the first virtual object may not be displayed for an application (924) that is not a target. Through this display method, the user can intuitively identify which app can be controlled by voice.

[0196] In one embodiment, if a second user input gazing at a specific icon or application is identified in the second step (920), a third step (930) may be initiated. In the third step (930), a second virtual object (922) may be created and displayed for the application in which the second user input is identified. For example, if the user looks at an internet app, the background color of the internet app icon may be changed (922).

[0197] In one embodiment, when a user's voice command to control the application is identified in the third step (930), for example, when a voice command such as “search for information A” is identified, an action of searching for information A in the internet app may be performed. For example, if the user gazes at a phone icon and commands “Mom,” the NLU determines within the phone domain that the command “Mom” means to call a contact saved as Mom, and can execute a call function to that contact.

[0198] FIG. 10 illustrates an example of a screen displayed through an electronic device according to one embodiment. FIG. 10 may include a description of a case in which a target (physical device) that is partially visible within the user's field of view is displayed and controllable by voice command.

[0199] In one embodiment, the image illustrated in FIG. 10 may be an image of the surrounding space of an electronic device obtained while the user is not located in the room where the target is, but is looking at the room where the target is. For example, in the case of a TV, the entire target is not visible, but since the target can be controlled by voice commands, the operation of determining the target and creating a first virtual object can be performed. The first virtual object may include an icon (1012, 1016) or a highlight effect (1022, 1026) corresponding to the device.

[0200] FIG. 11 illustrates an example of a screen displayed through an electronic device according to one embodiment. FIG. 12 illustrates an example of a virtual object display of an electronic device according to one embodiment. FIG. 11 and FIG. 12 may include operational details for displaying an object (physical device) that is controllable by voice command and is not entirely visible within the user's field of view. FIG. 11 and FIG. 12 may be described on the premise that the devices are arranged in an arrangement corresponding to FIG. 7 and FIG. 8 described above.

[0201] FIG. 11 may include an image obtained by an electronic device when the user is not located in a room where the device is. Although the speaker is not fully visible to the user's eyes, based on information that the user is looking at a room with a TV and a light (target area) and information that the speaker is also present in that room (home IoT information), the user can create and display a first virtual object (1102) for the speaker.

[0202] FIG. 12 is a state in which a user looks at a room containing devices while not being located in a room containing devices. In this case, the user is looking toward the wall, so no objects are detected in the surrounding space image acquired by the electronic device, but information about the room located across the wall can be inferred to create and display first virtual objects (1202, 1204, 1206) of controllable devices located in that room, such as a speaker, TV, and light.

[0203] FIG. 13 illustrates an example of a virtual object display of an electronic device according to one embodiment. FIG. 13 may include all the contents described in FIG. 7 and may explain an example in which the display position of the first virtual object (722, 724, 726) changes.

[0204] In one embodiment, the electronic device may display the first virtual object (13204, 1306, 1308) in a predetermined area (e.g., a portion of the upper right side of the screen) without displaying it in a location adjacent to the devices corresponding to the first virtual object (the first device (712, 714, 176)).

[0205] At this time, if the user wants to control the light, a second user input looking at the first virtual object (1302) of the light is identified by the electronic device, and an update of the first virtual object may occur. At this time, if a third user input occurs (e.g., lower the brightness), the corresponding action (e.g., the brightness of the light is adjusted) may be performed.

[0206] In one embodiment, even when the electronic device identifies a second user input to a first device (712, 714, 716) (e.g., a TV, which is a real physical object) rather than a first virtual object, a UI update occurs to the light icon and a series of actions to control the light via voice command may be performed.

[0207] In one embodiment, the electronic device can provide a first virtual object of a target controllable by voice commands in a corresponding ROOM based solely on ROOM information where the user is located, regardless of the acquired surrounding space image (user's field of view).

[0208] In one embodiment, the electronic device can calculate distance information between the electronic device and a target (e.g., a physical device) through a network or a sensor. The electronic device can create and display a first virtual object based on the calculated distance information. For example, a first virtual object may be created for voice-controllable devices located close to the electronic device.

[0209] FIG. 14 illustrates an example of a virtual object display of an electronic device according to one embodiment. FIG. 15 illustrates an example of a virtual object display of an electronic device according to one embodiment.

[0210] Referring to FIGS. 14 and 15, information about a command that controls a target corresponding to the first virtual object can also be provided.

[0211] For example, when a first virtual object is selected by a method (1401, 1501) such as a touch or gesture, a UI object (1403, 1503) representing an action supported by the first virtual object may be displayed. The action supported by the first virtual object may include information regarding a command that the user is likely to use at present. When the electronic device identifies a third user input (e.g., selecting a specific command using a touch or gesture, or voice identification that reads a provided command exactly as it is), it may perform an action corresponding to the command information.

[0212] An electronic device according to one embodiment of the present disclosure comprises a microphone, an image sensor, a memory storing target information including information regarding a device or application that can be controlled via voice command, and at least one processor electrically connected to the microphone, the camera, and the memory. The at least one processor may be configured to acquire a first user input through the microphone, acquire an image of the surrounding space of the electronic device through the image sensor in response to acquiring the first user input, identify one or more targets based on the surrounding space image and the target information, create one or more first virtual objects for the one or more targets, and display the created first virtual objects.

[0213] In one embodiment, the at least one processor may be configured to acquire a second user input for one of the one or more targets, and in response to acquiring the second user input, to create a second virtual object for the one target and display the second virtual object.

[0214] In one embodiment, the second user input may include input from a user gazing at the one target.

[0215] In one embodiment, the at least one processor may be configured to acquire a third user input through the microphone and perform an operation corresponding to the third user input through the one target.

[0216] In one embodiment, the at least one processor creates a third virtual object for the one target, and the third virtual object may include information for controlling the one target.

[0217] In one embodiment, the at least one processor may be configured to take the surrounding space image as input, identify at least one object included in the surrounding space image based on a machine learning model trained to detect objects included in the image, and determine whether the at least one object is included in the target information.

[0218] In one embodiment, the at least one processor may be configured to determine whether the location of a device or application corresponding to the at least one object and the location of the at least one object identified through the surrounding space image match in response to a determination that the at least one object is included in the target information, and to determine the at least one object as a target in response to a determination that the location of a device or application corresponding to the at least one object and the location of the at least one object identified through the surrounding space image match.

[0219] In one embodiment, the at least one processor may be configured to identify either information of a room the electronic device looks at or information of a room where the electronic device is located, in response to a determination that the at least one object is not included in the target information.

[0220] In one embodiment, the at least one processor may be configured to determine the one or more targets a device or application included in the information of the room the electronic device looks at or the information of the room where the electronic device is located.

[0221] In one embodiment, the first virtual object may be displayed within a predetermined distance from the target.

[0222] A method of operating an electronic device according to one embodiment of the present disclosure may include: acquiring a first user input; acquiring an image of the surrounding space of the electronic device in response to acquiring the first user input; identifying one or more targets based on the image of the surrounding space and the target information; generating one or more first virtual objects for the one or more targets; and displaying the generated first virtual objects.

[0223] A method of operation of an electronic device according to one embodiment may include an operation of obtaining a second user input for one of the one or more targets, an operation of creating a second virtual object for the one target in response to obtaining the second user input, and an operation of displaying the second virtual object.

[0224] A method of operating an electronic device according to one embodiment may include, the second user input, input of a user gazing at one target.

[0225] A method of operation of an electronic device according to one embodiment may include an operation of acquiring a third user input and an operation of performing an operation corresponding to the third user input through the one target.

[0226] A method of operation of an electronic device according to one embodiment includes the operation of creating a third virtual object for the one target, and the third virtual object may include information for controlling the one target.

[0227] A method of operation of an electronic device according to one embodiment may include taking the surrounding space image as input, identifying at least one object included in the surrounding space image based on a machine learning model trained to detect objects included in the image, and determining whether the at least one object is included in the target information.

[0228] A method of operation of an electronic device according to one embodiment may include, in response to a determination that the at least one object is included in target information, an operation of determining whether the location of a device or application corresponding to the at least one object and the location of the at least one object identified through the surrounding space image match, and in response to a determination that the location of a device or application corresponding to the at least one object and the location of the at least one object identified through the surrounding space image match, an operation of determining the at least one object as a target.

[0229] A method of operation of an electronic device according to one embodiment may include an operation of identifying either information of a room viewed by the electronic device or information of a room where the electronic device is located, in response to a determination that at least one object is not included in the target information.

[0230] A method of operating an electronic device according to one embodiment may include an operation of determining one or more targets a device or application included in information of a room the electronic device looks at or information of a room where the electronic device is located.

[0231] In one embodiment, the first virtual object may be displayed within a predetermined distance from the target.

[0232] The electronic device according to the various embodiments disclosed in this document may be of various forms. The electronic device may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a consumer electronics device. The electronic device according to the embodiments of this document is not limited to the devices described above.

[0233] The various embodiments of this document and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of said items unless the relevant context clearly indicates otherwise. In this document, phrases such as "A or B," "at least one of A and B," "at least one of A or B," "A, B or C," "at least one of A, B and C," and "at least one of A, B, or C" may each include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Terms such as "first," "second," or "first" or "second" may be used simply to distinguish said components from other said components and do not limit said components in any other aspect (e.g., importance or order). Where any (e.g., 1st) component is referred to as “coupled” or “connected” to another (e.g., 2nd) component, with or without the terms “functionally” or “communicationly,” it means that said any component may be connected to said other component directly (e.g., via a wire), wirelessly, or through a third component.

[0234] The term “module” as used in the various embodiments of this document may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).

[0235] Various embodiments of the present document may be implemented as software (e.g., program (140)) comprising one or more instructions stored in a storage medium (e.g., internal memory (136) or external memory (138)) readable by a machine (e.g., electronic device (101)). For example, a processor (e.g., processor (120)) of the machine (e.g., electronic device (101)) may call at least one of the one or more instructions stored in the storage medium and execute it. This enables the machine to be operated to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' simply means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.

[0236] According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)) or an application store (e.g., Play Store). TM It can be distributed online (e.g., downloaded or uploaded) through ) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0237] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities, and some of the multiple entities may be separated and placed in other components. According to various embodiments, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.

Claims

1. In an electronic device, mike; Image sensor; At least one processor (120;1970) including a processing circuit; and It includes a memory (130;1980) comprising at least one storage medium for storing commands, wherein the memory stores target information including information regarding a device or application that can be controlled via voice commands; When the above instructions are executed individually or collectively by the at least one processor, the electronic device: A first user input is obtained through the above microphone, and In response to acquiring the first user input, an image of the surrounding space of the electronic device is acquired through the image sensor, and Based on the above surrounding space image and the above target information, one or more targets are identified, and Create one or more first virtual objects for the above one or more targets, and A device that causes the first virtual object created above to be displayed.

2. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, the electronic device: Acquire a second user input for one of the above one or more targets, and In response to obtaining the second user input, a second virtual object for the one target is created, and A device that causes the display of the second virtual object.

3. In Claim 1, The above second user input is a device comprising input from a user gazing at the above one target.

4. In Claim 2, When the above instructions are executed individually or collectively by the at least one processor, the electronic device: Acquire a third user input through the above microphone, and A device that causes an action corresponding to the third user input to be performed through the above-mentioned target.

5. In Claim 4, When the above instructions are executed individually or collectively by the at least one processor, the electronic device: Causing the creation of a third virtual object for the aforementioned target, The above third virtual object is a device comprising information for controlling the one target.

6. In Claim 3, When the above instructions are executed individually or collectively by the at least one processor, the electronic device: Based on a machine learning model trained to detect objects included in the image and taking the above surrounding space image as input, at least one object included in the above surrounding space image is identified, and A device that causes to determine whether at least one object is included in the target information.

7. In Claim 5, When the above instructions are executed individually or collectively by the at least one processor, the electronic device: In response to determining that the above at least one object is included in the target information, determining whether the location of the device or application corresponding to the above at least one object matches the location of the above at least one object identified through the surrounding space image, and A device that causes to determine the at least one object as a target in response to determining that the location of a device or application corresponding to the at least one object matches the location of the at least one object identified through the surrounding space image.

8. In Claim 5, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, A device that causes to identify either information of a room the electronic device is looking at or information of a room where the electronic device is located, in response to a determination that at least one of the above objects is not included in the target information.

9. In Claim 4, When the above instructions are executed individually or collectively by the at least one processor, the electronic device: A device that causes a device or application included in information of a room viewed by the electronic device or information of a room where the electronic device is located to be determined as one or more targets.

10. In Claim 1, The first virtual object is a device that is displayed within a predetermined distance from the target.

11. In a method of operating an electronic device, The operation of obtaining the first user input, The operation of acquiring an image of the surrounding space of the electronic device in response to acquiring the first user input above, An operation of identifying one or more targets based on the above surrounding space image and target information stored in the memory of the electronic device, wherein the target information includes information regarding a device or application that can be controlled via voice command, and The operation of creating one or more first virtual objects for the above one or more targets, A method comprising an operation to display the first virtual object created above.

12. In Claim 11, The operation of obtaining a second user input for one of the above one or more targets, The operation of creating a second virtual object for the one target in response to acquiring the second user input, A method comprising an operation to display the second virtual object.

13. In Claim 11, The above second user input includes input from a user gazing at one target, a method.

14. In Claim 12, Action of obtaining third user input, A method comprising an action that performs an action corresponding to the third user input through the above-mentioned target.

15. In Claim 14, The operation of creating a third virtual object for the above-mentioned target, The above third virtual object includes information for controlling the one target, a method.