Electronic device and control method thereof

The electronic device enhances call quality and voice recognition by using a camera and driving mechanism to adjust its position and audio settings based on user movement, addressing the challenges of varying distances and directions in home robots.

WO2026127543A1PCT designated stage Publication Date: 2026-06-18SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-12-08
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Home robots experience decreased call quality and voice recognition rates when users move around indoors due to varying distances and directions relative to the robot, leading to suboptimal voice input and output.

Method used

The electronic device includes a camera, microphones, speakers, and a driving mechanism to adjust its position and audio settings based on user movement, ensuring effective voice reception and output by moving to the user's frontal direction and adjusting microphone sensitivity and speaker volume accordingly.

🎯Benefits of technology

Maintains consistent call quality and voice recognition by dynamically adjusting to user movement, ensuring clear voice input and output regardless of distance and direction changes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025020938_18062026_PF_FP_ABST
    Figure KR2025020938_18062026_PF_FP_ABST
Patent Text Reader

Abstract

An electronic device is disclosed. The device comprises: a camera; at least one microphone; at least one speaker; a communication unit; a driving device for moving the electronic device; a memory in which at least one instruction is stored; and at least one processor which operates according to execution of the at least one instruction. The at least one processor: when a user makes a phone call through the communication unit or executes a voice recognition mode while walking, controls the driving device to move to an area corresponding to the front direction of the user on the basis of a captured image from the camera; and adjusts the volume of at least one of the microphone and the speaker according to the distance and direction relative to the user, which change while moving by the driving device.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device and control method thereof

[0001] The present disclosure relates to an electronic device and a method for controlling the same.

[0002] Driven by advancements in electronic technology, various types of electronic devices are being developed and disseminated. Among these, home robots are being used in recent years to move around indoor spaces, assist users in their daily lives, and perform appropriate activities. These home robots can support a convenient and efficient lifestyle for users.

[0003] For example, a home robot can relay a user's phone call near the user, receive voice input from the user, and provide a corresponding response or perform an action.

[0004] In this situation, when a user makes a phone call using a home robot while moving within an indoor space, call quality or the robot's voice recognition rate may decrease depending on the distance between the user and the home robot. Therefore, there is a need for technology that enables the robot to move to a position where it can more effectively receive the voice of a walking user.

[0005] According to at least one embodiment of the present disclosure, an electronic device comprises a camera, at least one microphone, at least one speaker, a communication unit, a driving device for moving the electronic device, a memory storing at least one instruction, and at least one processor that operates according to the execution of the at least one instruction. When an event corresponding to user input is identified, the at least one processor controls the driving device to move to an area corresponding to the front direction of the user based on an image acquired through the camera, controls the receiving sensitivity of the microphone to change based on the distance and direction from the user resulting from the movement to the area corresponding to the front direction, and controls the volume corresponding to audio output through the speaker to change.

[0006] The above processor can control the driving device so that, if the user maintains the phone call without moving after the identified event is identified as an event corresponding to a phone call while walking, it moves to an area within a preset distance corresponding to the user's frontal direction and then directs the at least one microphone toward the user.

[0007] The above processor can control the driving device to move to a position that does not hinder the user's movement within the diffusion range of the user's voice and to drive in a direction corresponding to the user's movement when the user starts moving again after maintaining the phone call without moving.

[0008] The above at least one microphone includes a plurality of microphones arranged in a plurality of different directions, and the processor can control the reception sensitivity of the microphone directed toward the user among the plurality of microphones to increase and the reception sensitivity of the microphone directed toward the noise source to decrease when there is a noise source within a preset area.

[0009] When multiple user voice inputs are received while the event is being identified, the processor can control the driving device to move to a position where the diffusion ranges of each of the multiple users' voices overlap each other, based on an image acquired through the camera.

[0010] The above processor can control the driving device to move to an area within a preset distance corresponding to the frontal direction of the user identified as the main user, based on at least one of the number of voice inputs and the voice input time of each of the plurality of users.

[0011] The above processor can control the driving device to move the position of the electronic device based on the position of the user within the preset area when at least one of the plurality of users moves outside the preset area.

[0012] The above processor can control the driving device to move to the boundary position between the diffusion range of the user's voice and the communication connection area when, after the identified event is identified as an event corresponding to a phone call while walking, the user moves outside the area where a communication connection is possible between the device corresponding to the phone call and the communication unit while performing the call.

[0013] The above at least one speaker includes at least one left speaker positioned on the left side of the electronic device and at least one right speaker positioned on the right side, and the processor can output a left audio signal through the at least one left speaker and a right audio signal through the at least one right speaker when the electronic device is moving in a direction corresponding to the front direction of the user, and when the position of the electronic device is a position corresponding to the front direction of the user, output the left audio signal through the at least one right speaker and output the right audio signal through the at least one left speaker.

[0014] If movement to an area corresponding to the front direction of the user is not possible, the above processor can control the driving device to move to an adjacent area of ​​the user.

[0015] According to at least one embodiment of the present disclosure, a control method for an electronic device comprises the steps of: moving to an area corresponding to the frontal direction of the user based on an image acquired through a camera when an event corresponding to user input is identified; and controlling the reception sensitivity of at least one microphone included in the electronic device to be changed based on the distance and direction from the user resulting from the movement to the area corresponding to the frontal direction, and controlling the volume of audio output through at least one speaker included in the electronic device to be changed.

[0016] The above moving step may, if the user maintains the phone call without moving after the identified event is identified as an event corresponding to a phone call while walking, move to an area within a preset distance corresponding to the user's frontal direction and then direct the at least one microphone toward the user.

[0017] The above control method may further include the step of, when the user maintains the phone call without moving and then starts moving again, moving to a position that does not hinder the user's movement within the diffusion range of the user's voice and driving in a direction corresponding to the user's movement.

[0018] The above at least one microphone includes a plurality of microphones arranged in a plurality of different directions, and the step of adjusting the volume can control the reception sensitivity of the microphone directed toward the user among the plurality of microphones to increase and the reception sensitivity of the microphone directed toward the noise source to decrease when there is a noise source within a preset area.

[0019] The above control method may further include the step of, when a plurality of user voice inputs are received while the event is being identified, moving to a position where the diffusion ranges of each of the plurality of users' voices overlap each other based on an image acquired through the camera.

[0020] The above control method may further include the step of moving to an area within a preset distance corresponding to the frontal direction of the user identified as the main user, based on at least one of the number of voice inputs and the voice input time of each of the plurality of users.

[0021] The above control method may further include the step of moving the position of the electronic device based on the position of the user within the preset area when at least one of the plurality of users moves outside a preset area.

[0022] The above control method may further include the step of moving to a boundary position between the diffusion range of the user's voice and the area where communication is possible, if, after the identified event is identified as an event corresponding to a phone call while walking, the user moves outside the area where communication is possible between the device corresponding to the phone call and the communication unit while performing the call.

[0023] The above at least one speaker includes at least one left speaker positioned to the left of the electronic device and at least one right speaker positioned to the right, and the control method may further include the step of outputting a left audio signal through the at least one left speaker and outputting a right audio signal through the at least one right speaker when the electronic device is moving in a direction corresponding to the front direction of the user; and the step of outputting the left audio signal through the at least one right speaker and outputting the right audio signal through the at least one left speaker when the position of the electronic device is a position corresponding to the front direction of the user.

[0024] The above control method may further include the step of moving to an adjacent area of ​​the user if movement to an area corresponding to the front direction of the user is not possible.

[0025] FIG. 1 is a schematic diagram illustrating the operation of an electronic device according to at least one embodiment of the present disclosure.

[0026] FIG. 2 is a block diagram briefly illustrating the configuration of an electronic device according to at least one embodiment of the present disclosure.

[0027] FIG. 3 is a block diagram illustrating the detailed configuration of an electronic device according to at least one embodiment of the present disclosure.

[0028] FIG. 4 is a drawing showing the case where an electronic device according to at least one embodiment of the present disclosure moves in the direction of the user's front.

[0029] FIG. 5 is a drawing showing a case where the volume of a plurality of microphones of an electronic device is adjusted according to at least one embodiment of the present disclosure.

[0030] FIGS. 6 to 8 are drawings illustrating the operation of an electronic device in a situation where the voices of a plurality of users are input according to at least one embodiment of the present disclosure.

[0031] FIG. 9 is a drawing showing a case where an electronic device is located within an area where a communication connection is possible according to at least one embodiment of the present disclosure.

[0032] FIG. 10 is a drawing showing the operation of controlling the output of two speakers of an electronic device according to at least one embodiment of the present disclosure.

[0033] FIG. 11 is a drawing showing the operation of an electronic device according to at least one embodiment of the present disclosure when it cannot move to an area corresponding to the front direction of the user.

[0034] FIG. 12 is a flowchart illustrating a method for controlling an electronic device according to at least one embodiment of the present disclosure.

[0035] FIG. 13 is a flowchart illustrating in detail a method for controlling an electronic device according to whether a user is walking, according to at least one embodiment of the present disclosure.

[0036] FIG. 14 is a flowchart illustrating the operation of an electronic device when the voices of a plurality of users are input according to at least one embodiment of the present disclosure.

[0037] The terms used in the embodiments of this disclosure have been selected to be as widely used and general as possible, taking into account their functions within this disclosure; however, these terms may vary depending on the intent of those skilled in the art, case law, the emergence of new technologies, etc. Additionally, in specific cases, terms have been arbitrarily selected by the applicant, and in such cases, their meanings will be described in detail in the relevant explanatory section of this disclosure. Therefore, terms used in this disclosure should be defined not merely by their names, but based on their meanings and the overall content of this disclosure.

[0038] In this specification, expressions such as “have,” “may have,” “include,” or “may include” indicate the presence of such features (e.g., numerical values, functions, operations, or components such as parts) and do not exclude the presence of additional features.

[0039] The expression "at least one of A or / and B" should be understood as representing either "A" or "B" or "A and B".

[0040] Expressions such as "first," "second," "first," or "second" used in this specification may modify various components regardless of order and / or importance, and are used only to distinguish one component from another and do not limit said components.

[0041] Where it is stated that a component (e.g., Component 1) is "(operatively or communicatively) coupled with / to" or "connected to" another component (e.g., Component 2), it should be understood that the component may be directly connected to the other component or connected through the other component (e.g., Component 3).

[0042] The singular expression includes the plural expression unless the context clearly indicates otherwise. In this application, terms such as "comprising" or "consisting of" are intended to specify the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0043] In the present disclosure, a "module" or "part" performs at least one function or operation and may be implemented in hardware or software, or a combination of hardware and software. Additionally, a plurality of "modules" or a plurality of "parts" may be integrated into at least one module and implemented by at least one processor (not shown), except for a "module" or "part" that needs to be implemented in specific hardware.

[0044] In this specification, the term "user" may refer to a person using an electronic device or a device using an electronic device (100) (e.g., an artificial intelligence electronic device).

[0045] An embodiment of the present disclosure will be described in more detail below with reference to the attached drawings.

[0046] FIG. 1 is a schematic diagram showing the operation of an electronic device (100) according to at least one embodiment of the present disclosure.

[0047] According to one embodiment, the electronic device (100) may move to an area corresponding to the user's front direction when an event resulting from user input is identified. The event resulting from user input may include cases where the user makes a phone call while walking or activates a voice recognition mode.

[0048] According to various embodiments of the present disclosure, the electronic device (100) may be a device of various types capable of driving. For example, the electronic device (100) may be a mobile electronic device (100) including a projector. In addition, the electronic device (100) may be implemented as a mobile robot, a robot vacuum cleaner, a wireless speaker, an AGV (Automated Guided Vehicle), an AMR (Autonomous Mobile Robot), a social robot, etc. Although various embodiments of the present disclosure have been illustrated and described based on the case where they are implemented as a device capable of driving, some embodiments may be implemented as an electronic device that is not capable of autonomous driving but is easy to move around. For example, the electronic device (100) may be implemented as a projector, a tablet, a mobile phone, a laptop, etc.

[0049] In the following description, the case is explained based on the case in which an electronic device (100) equipped with a projection unit is implemented, in particular among electronic devices capable of driving.

[0050] This electronic device (100) can project images onto various projection surfaces, such as walls, furniture surfaces, dedicated screens, floors, and ceilings, while moving to any position within the space where the device is placed (e.g., within a home).

[0051] When the electronic device (100) is equipped with a microphone and a speaker, the electronic device (100) can receive voice input from a user or output various audio signals.

[0052] If an electronic device (100) is equipped with an antenna for receiving phone calls, various RF signal processing chips and circuits for processing signals received through the antenna, and an application for receiving phone calls is installed, it can perform a direct phone call with an external device (e.g., a mobile phone used by the other party). As an example, the electronic device (100) can receive voice calls using a cellular network through a unique USIM (Universal Subscriber Identity Module).

[0053] Alternatively, if the electronic device (100) is not equipped with such hardware and software but instead has a communication chip capable of communicating with a surrounding terminal device (e.g., telephone, mobile phone, etc.), it may perform a telephone call relayed by the terminal device when a telephone call session between the terminal device and an external device is established. According to one example, the electronic device (100) may receive a call directly via a network connection or receive a call received from the terminal device via a Bluetooth connection. The electronic device (100) may communicate with the terminal device using the HFP (Hands-Free Profile) method, output an audio signal received from the terminal device through a speaker, and transmit an audio signal input through a microphone to the terminal device.

[0054] According to another embodiment, the electronic device (100) may perform a voice recognition mode even when not performing such a phone call. The voice recognition mode is an operation mode that is controlled by a voice spoken by a user or performs an interaction corresponding to that voice.

[0055] According to one example, the electronic device (100) may receive a user's voice containing a trigger voice. The trigger voice may be a voice containing a specific word to activate the electronic device (100) to run a voice recognition mode. For example, when the electronic device (100) receives a trigger voice saying "Hey, Bixby" from the user, it may run a voice recognition mode and provide a response to receive the next voice from the user.

[0056] The voice recognition mode can be broadly classified into a voice control mode that identifies a control code corresponding to a user's voice and performs an action corresponding to that control code (e.g., turn-on or turn-off action, volume up action, channel change action, etc.), and a voice conversation mode that understands the user's voice and provides a corresponding answer or information.

[0057] In the case of voice control mode, the electronic device (100) may store information about control codes corresponding to each voice in advance. In the case of voice conversation mode, the electronic device (100) may perform natural language processing on the received user's voice. For example, the electronic device (100) may perform natural language processing on the received user's voice using a Large Language Model (LLM).

[0058] According to one example, the electronic device (100) can analyze the context and meaning of a sentence from a user's voice through natural language processing and provide a voice response corresponding to the user's voice or perform an action corresponding to the user's voice. According to one example, when the electronic device (100) receives a voice from a user saying "How is the weather today?", it can analyze the meaning of the voice through natural language processing and provide "information about today's weather" corresponding to the user's voice. According to one example, when the electronic device (100) receives a voice from a user saying "Play music," it can launch a music application corresponding to the user's voice.

[0059] The electronic device (100) may be implemented to perform a voice recognition mode independently, but is not necessarily limited thereto and may perform a voice recognition mode in conjunction with at least one external server. For example, the voice recognition mode described above may be performed by communicating through a communication unit with an external server that receives an audio signal spoken by a user from the electronic device (100), performs natural language processing to convert it into text, or an external server that understands the meaning of the converted text and provides an answer or information corresponding to that meaning.

[0060] In various situations as described above, the electronic device (100) can receive the user's voice through a microphone or output various audio signals through a speaker. However, if the user is not fixed in one position within the space and moves in any direction, the distance between the electronic device (100) and the user may increase, or the direction of the user's face (hereinafter referred to as the frontal direction) may not be facing the electronic device (100), and thus the voice spoken by the user may not be accurately received.

[0061] To this end, an electronic device (100) according to various embodiments of the present disclosure can move to an area corresponding to the front direction of the user when an event corresponding to user input is identified.

[0062] The user's frontal direction may be the direction the user's face is facing, but it is not necessarily limited to this. For instance, since a user may look in various directions while moving, the direction of movement or the direction the front of the torso is facing may be identified as the frontal direction when the user is moving. However, as users generally direct their gaze in the direction of movement in most cases, the following explanation will be based on the case where the frontal direction is the direction of the user's face.

[0063] The area corresponding to the user's frontal direction may be an area range that is easy to receive the user's voice among the area ranges including the user's frontal direction. The area corresponding to the frontal direction may be determined based on the general diffusion range in which the user's voice spreads (hereinafter referred to as the diffusion range of the user's voice). For example, assuming that the voice spreads at an angle of 40 to 60 degrees to the left and right respectively relative to the front of the user's face, an area range with a total angle of 80 to 120 degrees to the left and right, including the user's frontal direction, may be the aforementioned certain range.

[0064] Referring to FIG. 1, when a user resumes walking while speaking, the electronic device (100) moves in a direction corresponding to the user's movement within the diffusion range of the user's voice. Since the user is moving, if the electronic device (100) is positioned in the direction in front of the user, it may interfere with the user's movement. Therefore, as shown in FIG. 1, the electronic device (100) moves to a forward area to the left or right of the user's front direction and then moves in the same direction as the user's movement. In this case, the microphone equipped in the electronic device (100) comes within the diffusion range of the user's voice, so the user's voice can be input well, and the audio signal output from the speaker can also be heard well by the user.

[0065] Since the electronic device (100) and the user are moving, the distance or direction from the user may change frequently. For example, if the user receives a phone call or activates voice recognition mode while facing away from the electronic device (100), the initial position of the electronic device (100) moves out of the voice diffusion range described above. In this case, the electronic device (100) moves quickly to the area in front of the user and then moves together with the user at the speed of movement; thus, the distance and direction of the user continue to change until it reaches that position. In this case, the magnitude of the audio signal input from the electronic device (100) or the magnitude of the audio signal output from the electronic device (100) changes.

[0066] Accordingly, according to one embodiment of the present disclosure, the electronic device (100) can control the reception sensitivity of at least one microphone to change according to the distance and direction from the user changing while moving, and can adjust the volume corresponding to the audio output through at least one speaker to change. The electronic device (100) can move to an area corresponding to the frontal direction of the walking user, control the reception sensitivity of at least one microphone to change according to the change in the distance and direction from the user changing while moving, and adjust the volume corresponding to the audio output through at least one speaker to change. Specifically, so that the user and the call partner can hear a phone voice of almost the same volume despite the movement, the volume can be increased at a long distance and decreased as the distance becomes shorter, and the volume can be maintained while driving together while maintaining a certain distance.

[0067] The electronic device (100) can reduce the reception sensitivity of at least one microphone when the distance from the user decreases, and can increase the reception sensitivity when the distance from the user increases.

[0068] The electronic device (100) can decrease the volume corresponding to audio output through at least one speaker when the distance from the user decreases, and can increase the volume when the distance from the user increases.

[0069] The distance to the user may be the distance between the main body of the electronic device (100) and the user, but if a microphone or speaker is attached to the outer surface of the main body of the electronic device (100), the distance between the location of the microphone or speaker and the user may be identified as the distance to the user.

[0070] The direction of the user may be the relationship between the front direction of the user and the front direction of the main body of the electronic device (100). For example, if the user's front direction faces the same direction as the front direction of the main body and the user is positioned in front of the main body of the electronic device (100), the electronic device (100) can increase the volume of the microphone and speaker compared to when the user's front is facing the front of the main body. The direction relative to the user may be the angle formed by the electronic device (100) with respect to the user's front direction. If the angle formed by the electronic device (100) with respect to the user's front direction increases, the electronic device (100) can increase the volume of the microphone.

[0071] Figure 1 illustrates the case where the electronic device (100) operates within a home, but the electronic device (100) can be used in various environments such as a factory, an office or inside a building, or a government office.

[0072] Hereinafter, with reference to the drawings, various embodiments will be described in which an electronic device (100) identifies the location of a user, moves according to the location of the user, and performs the function of adjusting the reception sensitivity of the microphone and the volume of the speaker.

[0073] FIG. 2 is a block diagram briefly illustrating the configuration of an electronic device (100) according to at least one embodiment of the present disclosure.

[0074] According to FIG. 2, the electronic device (100) includes a communication unit (105), memory, one or more processors (115), a camera (120), at least one microphone (125), at least one speaker (130), and a driving device (135). However, it is not limited thereto, and the electronic device (100) may be implemented with some components excluded or with other components included.

[0075] The communication unit (105) may include wired or wireless input / output interfaces (or input / output terminals) according to various standards. The communication unit (105) may be configured to communicate with various types of external devices according to various types of communication methods.

[0076] The communication unit (105) may include at least one of a WiFi module, a Bluetooth module, a wireless communication module, an NFC module, and a UWB (Ultra Wide Band) module. Here, each communication module may be implemented in the form of at least one hardware chip. Specifically, the WiFi module may perform communication in the WiFi manner. The electronic device (100) may perform P2P communication with other devices according to the WiFi Direct communication standard. WiFi Direct technology is a technology that is installed in portable devices and mobile terminals such as TVs, laptops, printers, and cameras, and provides a foundation for using content and services between devices through direct communication between terminals without the need for separate equipment such as access points or routers. WiFi Direct is also referred to as WiFi P2P.

[0077] The Bluetooth module can perform communication using the Bluetooth method. When using the Bluetooth module, various connection information such as SSID is first transmitted and received, and after establishing a communication connection using this, various information can be transmitted and received. In addition, the wireless communication module can perform communication according to various communication standards such as IEEE, Zigbee, 3G (3rd Generation), 3GPP (3rd Generation Partnership Project), LTE (Long Term Evolution), and 5G (5th Generation). Furthermore, the NFC module can perform communication using the NFC (Near Field Communication) method, which uses the 13.56 MHz band among various RF-ID frequency bands such as 135 kHz, 13.56 MHz, 433 MHz, 860~960 MHz, and 2.45 GHz. In addition, the UWB module can accurately measure the Time of Arrival (ToA), which is the time it takes for a pulse to reach a target, and the Angle of Arrival (AoA), which is the angle of arrival of the pulse at the transmitting device, through communication between UWB antennas; accordingly, this enables precise distance and location recognition within an error range of tens of centimeters indoors.

[0078] In one or more embodiments of the present disclosure, the communication unit (105) can receive a user's phone call. Specifically, the communication unit (105) can receive a phone call directly through a wireless communication module, or receive a phone call received from a terminal device through a Bluetooth connection.

[0079] At least one instruction regarding an electronic device (100) may be stored in the memory (110). Additionally, an operating system (O / S) for operating the electronic device (100) may be stored in the memory (110). Furthermore, various software programs or applications for operating the electronic device (100) may be stored in the memory (110) according to various embodiments of the present disclosure. Additionally, the memory (110) may be implemented as a volatile memory such as S-RAM (Static Random Access Memory) or D-RAM (Dynamic Random Access Memory), a non-volatile memory such as Flash Memory, ROM (Read Only Memory), EPROM (Erasable Programmable Read Only Memory), or EEPROM (Electrically Erasable Programmable Read Only Memory), a hard disk drive (HDD), or a solid-state drive (SSD).

[0080] Specifically, various software modules for operating an electronic device (100) according to various embodiments of the present disclosure may be stored in the memory (110), and the processor (115) may control the operation of the electronic device (100) by executing the various software modules stored in the memory (110). That is, the memory (110) is accessed by the processor (115), and reading / writing / modifying / deleting / updating of data by the processor (115) may be performed.

[0081] The memory (110) may be a configuration provided separately from the processor (115), may be an internal memory built into the processor (115), and may also be used to include a memory card (not shown) (e.g., micro SD card, memory stick) or an external hard drive mounted on the electronic device (100).

[0082] In one or more embodiments of the present disclosure, the memory (110) may store a Large Language Model (LM) for natural language processing of the user's voice. Alternatively, a trigger voice for activating the electronic device (100) or a captured image of the user may be stored.

[0083] In addition, various information necessary within the scope of achieving the purpose of the present disclosure may be stored in the memory (110), and the information stored in the memory (110) may be updated as it is received from an external device or input by a user.

[0084] The processor (115) controls the overall operation of the electronic device (100). Specifically, the processor (115) is connected to the configuration of the electronic device (100) including a display and a memory (110), and can control the overall operation of the electronic device (100) by executing at least one instruction stored in the memory (110) as described above.

[0085] The processor (115) can be implemented in various ways. For example, the processor (115) may include or be defined by one or more of a central processing unit (CPU) that processes digital signals, a Micro Controller Unit (MCU), a micro processing unit (MPU), a controller, an application processor (AP), a communication processor (CP), or an ARM processor. Additionally, the processor (115) may be implemented as a System on Chip (SoC) or Large Scale Integration (LSI) with built-in processing algorithms, or as a Field Programmable Gate Array (FPGA). The processor (115) can perform various functions by executing computer executable instructions stored in memory (110).

[0086] The processor (115) can perform a method according to one or more embodiments of the present disclosure based on the execution of at least one instruction stored in memory (110).

[0087] In cases where a method according to one or more embodiments of the present disclosure includes a plurality of operations, the plurality of operations may be performed by one processor (115) or by a plurality of processors (115). For example, when a first operation, a second operation, and a third operation are performed by a method according to one or more embodiments, the first operation, the second operation, and the third operation may all be performed by a first processor, or the first operation and the second operation may be performed by a first processor (e.g., a general-purpose processor) and the third operation may be performed by a second processor (e.g., an artificial intelligence dedicated processor).

[0088] One or more processors (115) may be implemented as a single-core processor including one core, or as one or more multicore processors including multiple cores (e.g., homogeneous multicore or heterogeneous multicore). When one or more processors (115) are implemented as multicore processors, each of the multiple cores included in the multicore processor may include internal processor memory such as cache memory or on-chip memory, and a common cache shared by multiple cores may be included in the multicore processor. Additionally, each of the multiple cores included in the multicore processor (or some of the multiple cores) may independently read and execute program instructions for implementing a method according to one or more embodiments of the present disclosure, or all (or some) of the multiple cores may be linked together to read and execute program instructions for implementing a method according to one or more embodiments of the present disclosure.

[0089] When a method according to one or more embodiments of the present disclosure includes a plurality of operations, the plurality of operations may be performed by one of the plurality of cores included in a multi-core processor, or may be performed by a plurality of cores. For example, when a first operation, a second operation, and a third operation are performed by a method according to one or more embodiments, the first operation, the second operation, and the third operation may all be performed by a first core included in a multi-core processor, or the first operation and the second operation may be performed by a first core included in a multi-core processor and the third operation may be performed by a second core included in a multi-core processor.

[0090] In the embodiments of the present disclosure, the processor (115) may refer to a system-on-chip (SoC) in which one or more processors and other electronic components are integrated, a single-core processor, a multi-core processor, or a core included in a single-core processor or a multi-core processor, wherein the core may be implemented as a CPU, GPU, APU, MIC, DSP, NPU, hardware accelerator, or machine learning accelerator, but the embodiments of the present disclosure are not limited thereto. For convenience of explanation, one or more processors will be referred to as the processor (115) below.

[0091] The camera (120) may be a device for identifying the front direction of a user located in front of the electronic device (100). The camera (120) may be implemented as various types of cameras such as a depth camera, a stereo camera, an AI camera, an infrared camera, a motion camera, etc.

[0092] According to one example, the camera (120) may be positioned to capture the front of the electronic device (100). For example, the camera (120) may be positioned in the central area of ​​the top bezel of the electronic device (100). The camera (120) may be positioned in a direction and angle to capture the front of the electronic device (100). Although only one camera (120) is shown in FIG. 2, according to an embodiment, the electronic device (100) may include a plurality of cameras (120) distributed in various different directions of the main body.

[0093] The processor (115) can control the camera (120) to photograph the user and store the captured image in memory (110). This shooting operation may be performed during the aforementioned phone call or when the voice recognition mode is executed, but is not limited thereto, and may be performed frequently or periodically while the electronic device (100) is operating.

[0094] According to one embodiment, when an event corresponding to user input is identified, the processor (115) can control the driving device (135) to move to an area corresponding to the user's front direction based on an image acquired through the camera (120).

[0095] As described above, a phone call may be made by directly connecting a call session with an external device from the electronic device (100), or by extending a call session connected between an external device and a terminal device so that the electronic device (100) can make the phone call on behalf of the external device.

[0096] For example, when a terminal device (e.g., a mobile phone) connected to the communication unit (105) via Bluetooth receives a phone call from an external device, the terminal device can transmit a signal to notify the fact of receiving the phone call through the communication unit (105). The processor (115) can output a phone call notification sound using a speaker (130) or output a phone call notification message using a display or other output element provided in the electronic device (100). The user can respond to the phone call by using a button (not shown) or a remote control button provided in the electronic device (100).

[0097] When the user accepts the phone call, the processor (115) transmits an answer signal to the terminal device through the communication unit (105), and the terminal device establishes a phone call session with an external device. Subsequently, the processor (115) outputs the audio signal received from the terminal device through the speaker (130), receives the voice signal spoken by the user through the microphone (125), and transmits it to the terminal device through the communication unit (105). In this way, the phone call received from the terminal device can be handled on behalf of the user.

[0098] Furthermore, since the voice recognition mode has been explained in detail in the aforementioned section, a redundant explanation is omitted.

[0099] Specifically, the processor (115) divides all pixels within the captured image into multiple pixel groups and extracts the pixel representative values ​​of the pixels included in each pixel group. The processor (115) identifies the location of pixel groups having pixel representative values ​​that are within a similar range, and if multiple similar pixel groups are located consecutively, identifies that the similar pixel groups form an edge of a single object. The processor (115) can estimate what kind of object it is based on the shape of the edge, the pixel values ​​of the pixels belonging within the edge, etc., and can identify the distance to the object based on the size of the edge. For example, if the processor (115) identifies an object corresponding to the shape of the user's lips, it estimates that the surrounding object containing that object is the user's face and identifies whether the direction of the face is facing the front of the electronic device (100). If the processor (115) identifies the user's body but does not identify the face area, it can identify that the user's front direction is not facing the electronic device (100) but is facing away from the electronic device (100). When multiple captured images are taken continuously, the processor (115) may identify whether the distance to the user is moving further away or closer based on the change in the size of the face area within the captured images. As described above, the processor (115) can identify the distance and direction to the user based on the captured images. This object recognition method is merely an example and is not necessarily limited thereto, and the processor (115) may input the captured images into an artificial intelligence model trained for user identification and identify the distance or face direction of the user based on the result.

[0100] According to one embodiment, the processor (115) can adjust the reception sensitivity of the microphone (125) and the volume corresponding to the audio output through the speaker (130) according to the distance and direction from the user that changes while moving by the driving device (135). The microphone (125) is configured to receive user voice or other sounds and convert them into audio data. The processor (115) can transmit the user voice signal received through the microphone (125) to a terminal device through the communication unit (105). Alternatively, the processor (115) can output information corresponding to the user's intention based on the user's voice signal received through the microphone (125).

[0101] The microphone (125) includes a diaphragm that vibrates by a signal input from the outside, an electrical circuit that outputs an electrical signal corresponding to the vibration of the diaphragm, and an amplification circuit for adjusting the magnitude of the electrical signal. The processor (115) can adjust the gain of the amplification circuit to adjust the reception sensitivity of the audio signal input through the microphone (125). The microphone (125) can be implemented in various forms such as a resistive microphone, a condenser microphone, a fiber optic microphone, a piezoelectric microphone, an electret microphone, a dynamic microphone, and a MEMS microphone depending on its operating principle, so specific illustration is omitted.

[0102] The number and placement locations of the microphones (125) can be determined in various ways. For example, the electronic device (100) may include a single microphone (125) placed on the front of the main body of the electronic device (100). Alternatively, the electronic device (100) may include multiple microphones (125) placed in multiple different directions relative to the main body of the electronic device (100).

[0103] The speaker (130) can convert and amplify a digital audio signal processed by the processor (115) into an analog audio signal and output it. For example, the speaker (130) may include at least one speaker (130) unit capable of outputting at least one channel, a D / A converter, an audio amplifier, etc. For example, the speaker (130) may output information corresponding to the caller and the caller's intention regarding a received call. The processor (115) can adjust the volume of the speaker (130) by adjusting the amplification gain of the audio amplifier.

[0104] The electronic device (100) may include at least one speaker (130). The electronic device (100) may include a plurality of speakers (130) positioned on the left and right sides of the electronic device (100).

[0105] The driving device (135) is configured to move the main body of the electronic device (100). The driving device (135) may include a plurality of wheels, a driving motor for rotating each of the plurality of wheels, a gear, a shaft, etc. The plurality of wheels are provided on the lower side or side of the main body of the electronic device (100) to support the main body of the electronic device (100) from the bottom surface. When the driving motor operates and the driving force is transmitted to the plurality of wheels so that each wheel rotates, the electronic device (100) can be moved by the frictional force between the bottom surface and the wheels. In addition, the driving device (135) may change the rotational speed of at least one of the plurality of wheels or adjust the alignment direction of the wheels differently when changing direction. Depending on the type of electronic device (100) and the characteristics of the space where the electronic device (100) is located (e.g., roughness of the bottom surface, frictional force, etc.), an endless track or the like may be used instead of wheels.

[0106] As described above, the electronic device (100) can be implemented in various types. FIG. 3 is a block diagram for explaining the detailed configuration when implemented as an electronic device (100) equipped with a projection function.

[0107] As illustrated in FIG. 3, the electronic device (100) may further include a communication unit (105), memory (120), one or more processors (115), a camera (120), at least one microphone (125), at least one speaker (130), and a driving device (135), as well as a projection unit (140), an input / output interface (145), and a sensor (150). However, the configurations illustrated in FIG. 2 and FIG. 3 are merely exemplary, and it is understood that in carrying out the present disclosure, new configurations may be added or some configurations may be omitted in addition to the configurations illustrated in FIG. 2 and FIG. 3. The basic operation description and specific examples of the communication unit (105), memory (120), one or more processors (115), camera (120), at least one microphone (125), at least one speaker (130), and driving device (135) among the configurations of FIG. 3 have been described in FIG. 2, so a redundant description is omitted.

[0108] The projection unit (140) can perform various functions under the control of the processor (115). Specifically, the projection unit (140) can project various images. The projection unit (140) may include a display module for displaying an image, a light source for emitting light containing the image displayed on the display module, and at least one lens for transmitting the emitted light in a projection direction.

[0109] The projection unit (140) may perform various operations related to image projection. For example, the projection unit (140) may adjust the focus of the image or perform a keystone correction function depending on the distance from the projection surface (e.g., projection distance). The keystone correction function refers to a function that corrects a distorted image. For example, the projection unit (140) may perform horizontal keystone correction if image distortion occurs in the left-right direction, and perform vertical keystone correction if image distortion occurs in the up-down direction. In addition, the projection unit (140) may perform quick corner keystone correction to correct the unbalanced corners of the area.

[0110] The sensor (150) is configured to move through space and identify a receiver. The sensor (150) may include at least one of a Lidar sensor, a depth camera (120), an Inertial Measurement Unit (IMU) sensor, a Time of Flight (ToF) sensor, a light sensor, an infrared sensor, an ultrasonic sensor, a gyroscope, an accelerometer, and a proximity sensor.

[0111] A LiDAR sensor can project light rays (e.g., laser, near-infrared light, visible light, ultraviolet light, etc.) in a 360-degree direction around it and detect light reflected by various surrounding objects (e.g., walls, furniture, home appliances, etc.) to output sensing information for obtaining information about the distance to surrounding objects. A depth camera is a sensor that projects a laser or infrared light onto an external object and receives the returning light rays with a stereo camera to measure the distance to the external object in three dimensions and sense depth data. An IMU sensor is a sensor for detecting the movement of an electronic device (100) and may include at least one of a geomagnetic sensor, an accelerometer, and a gyroscope. A ToF sensor can measure the distance to an external object by using the time (flight time) when the reflected signal is received after outputting a signal such as a laser.

[0112] In the above description, the processor (115) is described as identifying the distance and direction to the user based on an image acquired through the camera (120); however, if a sensor (150) is additionally included, the processor (115) can measure the distance and direction to the user based on the sensing values ​​of various sensors. For example, the distance to the user can be measured through a LiDAR sensor or a ToF sensor.

[0113] The input / output interface (145) may be any one of the following interfaces: HDMI (High Definition Multimedia Interface), MHL (Mobile High-Definition Link), USB (Universal Serial Bus), DP (Display Port), Thunderbolt, VGA (Video Graphics Array) port, RGB port, D-SUB (D-subminiature), and DVI (Digital Visual Interface).

[0114] When a content source, such as a multimedia player or a game player, is connected through the input / output interface (145) of the electronic device (100), the processor (115) may control the projection unit (140) to project a content image provided by the content source.

[0115] The input / output interface (145) can be connected to a communication circuit. The input / output interface (145) can transmit information received from an external device to the communication circuit or transmit information received through the communication interface to an external device.

[0116] FIG. 4 is a drawing showing the case where an electronic device (100) according to at least one embodiment of the present disclosure moves to an area corresponding to the front direction of the user.

[0117] The processor (115) can control the driving device (135) so that when the identified event is identified as an event corresponding to a phone call while walking, and the user continues the phone call without moving, the device moves to an area within a preset distance corresponding to the user's frontal direction, and then the at least one microphone (125) is directed toward the user.

[0118] If the user maintains a phone call without moving, the processor (115) can control the driving device (135) to move to a pre-set distance area corresponding to the user's front direction. Specifically, the processor (115) can control the driving device (135) to move to a position within a pre-set distance relative to the user's front direction. For example, the processor (115) can control the driving device (135) to move to a position 1m away from the user.

[0119] The processor (115) can control the driving device (135) to be positioned closer or further away than a preset distance depending on the volume and frequency of the user's voice. For example, the processor (115) can control the driving device (135) to be positioned closer than a preset distance if the volume of the user's voice is low.

[0120] When a user starts speaking while stationary, the processor (115) can control the driving device (135) to move to a preset distance area corresponding to the user's front direction. For example, when a user makes a phone call while stationary, the processor (115) can control the driving device (135) to move to the user's front direction and receive the call through the communication unit (105).

[0121] According to another embodiment, when a user activates a voice recognition mode while stationary, the processor (115) can control the driving device (135) to move to a pre-set distance area corresponding to the user's front direction. Specifically, when the processor (115) receives a voice signal containing a trigger voice from the user, it can control the driving device (135) to move to a pre-set distance area corresponding to the user's front direction. The processor (115) can output an answer corresponding to the user's voice or perform an action corresponding to the user's voice at the moved location.

[0122] Referring to FIG. 4, it is illustrated that when a user speaks a voice asking about today's weather, the electronic device (100) moves to a pre-set distance area corresponding to the user's front direction. The electronic device (100) can output an answer corresponding to the user's voice while moving to the position in the user's front direction. When voice is subsequently input from the user, the processor (115) can continue the conversation by outputting an answer to the user's voice at the position in the user's front direction.

[0123] The processor (115) can control the driving device (135) so that at least one microphone (125) is directed toward the user after the electronic device (100) moves to an area within a preset distance corresponding to the user's front direction. For example, if the electronic device (100) includes a main microphone among one or more microphones (125), the processor (115) can control the driving device (135) so that the main microphone is directed toward the user. Here, the main microphone may be a microphone with superior audio input performance compared to other microphones, but is not limited thereto, and may be a microphone placed at a specific location on the main body of the electronic device (100). For example, as described in the examples above, when the electronic device (100) is driving in the same manner as the user in a front area spaced about 15 degrees from the user's front direction, a microphone placed in a side direction relative to the main body of the electronic device (100) may be advantageous for receiving the user's voice. Therefore, such a microphone can be used as the main microphone.

[0124] Alternatively, the processor (115) can control the driving device (135) so that one or more microphones (125), specifically the microphone (125) positioned on the front of the electronic device (100), is directed toward the user. By controlling the driving device (135) so that at least one microphone (125) is directed toward the user, the processor (115) can better receive the user's voice input.

[0125] The processor (115) can control the driving device (135) to move to a position that does not hinder the user's movement within the diffusion range (10) of the user's voice and to drive in a direction corresponding to the user's movement when the user starts moving again after maintaining a phone call without moving.

[0126] The diffusion range (10) of the user's voice is the range in which the user's voice spreads out, and the processor (115) can efficiently receive voice input within the diffusion range (10) through the microphone (125). Referring to FIG. 4, the diffusion range (10) is illustrated while the user is speaking. The area within a preset distance corresponding to the user's frontal direction may be a location within the diffusion range (10) in which the user's voice spreads out.

[0127] The processor (115) can acquire a captured image of the user through the camera (120) and identify the direction corresponding to the user's movement based on the acquired captured image. The processor (115) can identify the direction corresponding to the user's movement through the direction of the user's gaze, lips, and face included in the captured image, as well as the direction of the legs.

[0128] Alternatively, the processor (115) can acquire a series of captured images of the user through the camera (120) and predict the direction corresponding to the user's movement through the direction in which the user's gaze and leg movements change through the series of captured images.

[0129] The processor (115) can control the driving device (135) to move to a position within the diffusion range (10) of the user's voice that does not hinder the user's movement when the user starts moving again after maintaining a phone call without moving. When the user starts moving, if the electronic device (100) remains in the frontal direction of the user, it may hinder the user's walking. Therefore, the processor (115) can control the driving device (135) to move to a position within the diffusion range (10) of the user's voice that does not hinder the user's walking, while moving a certain distance away from the user.

[0130] Additionally, the processor (115) can control the driving device (135) to drive in a direction corresponding to the user's movement. The processor (115) can control the driving device (135) to drive in a direction corresponding to the user's movement so that it can remain at a constant distance from the walking user.

[0131] FIG. 5 is a drawing showing a case in which the volume of a plurality of microphones (125) of an electronic device (100) according to at least one embodiment of the present disclosure is adjusted.

[0132] Referring to FIG. 5, at least one microphone (125) may include a plurality of microphones (125-1, 125-2, 125-3, 125-4) arranged in different directions relative to the main body of the electronic device (100). For example, two microphones (125-1, 125-2) may be arranged in the front direction on both sides relative to the main body of the electronic device (100), and two microphones (125-3, 125-4) may be arranged in the rear direction on both sides. Alternatively, two microphones may be arranged on the front of the electronic device (100) and two microphones on the rear. The arrangement of the plurality of microphones (125) of the electronic device (100) shown in FIG. 5 is merely an example and is not limited to FIG. 5.

[0133] While the electronic device (100) is moving, a noise source (20) may be within a preset area. Specifically, the noise source (20) may be within a certain range relative to the location of the electronic device (100) and the user. For example, the noise source (20) may be in the opposite direction from the user (10) relative to the electronic device (100), or in the same direction as the user. The noise source (20) may be a device or object that generates an audio signal unrelated to the operation of the electronic device (100). For example, if a barking dog is nearby or a cleaning robot vacuum is nearby, such a dog or robot vacuum may be the noise source.

[0134] While the electronic device (100) is driving, the microphone (125) directed toward the user and the microphone (125) directed toward the noise source (20) may be different. Referring to FIG. 5, at the initial position where driving begins, the microphone (125-1, 125-3) positioned to the left of the main body of the electronic device (100) is directed toward the user, and the microphone (125-2, 125-4) positioned to the right of the main body of the electronic device (100) is directed toward the noise source (20).

[0135] At the final position after the electronic device (100) has driven, the microphone (125-1, 125-2) positioned at the front relative to the main body of the electronic device (100) is facing the user, and the microphone (125-3, 125-4) positioned at the rear relative to the main body of the electronic device (100) is facing the noise source (20).

[0136] The processor (115) can increase the reception sensitivity of the microphone (125) directed toward the user among the plurality of microphones and decrease the reception sensitivity of the microphone (125) directed toward the noise source (20). Referring to FIG. 5, at the initial position where the electronic device (100) starts driving, the processor (115) can increase the reception sensitivity of the microphone (125-1, 125-3) positioned to the left relative to the main body of the electronic device (100) and decrease the reception sensitivity of the microphone (125-2, 125-4) positioned to the right relative to the main body of the electronic device (100) that is directed toward the noise source (20).

[0137] Additionally, at the last position after the electronic device (100) has driven, the processor (115) can increase the reception sensitivity of the microphone (125-1, 125-2) positioned at the front relative to the main body of the electronic device (100), and decrease the reception sensitivity of the microphone (125-3, 125-4) positioned at the rear relative to the main body of the electronic device (100) facing the noise source (20).

[0138] Meanwhile, the processor (115) can adjust the reception sensitivity of a plurality of microphones (125) based on the distance between the electronic device (100) and the user while the electronic device (100) is driving. Referring to FIG. 5, the distance from the user to the microphone (125-2) positioned on the right front side relative to the main body of the electronic device (100) has decreased while driving. Therefore, the processor (115) can reduce the reception sensitivity of the microphone (125-2) positioned on the right front side relative to the main body of the electronic device (100) while the electronic device (100) is driving. The reception sensitivity of the remaining microphones (125) can also be adjusted based on the distance from the user.

[0139] FIG. 6 is a diagram showing the operation of an electronic device (100) in a situation where the voices of a plurality of users are input according to at least one embodiment of the present disclosure.

[0140] When voice input from multiple users is received while an event is being identified, the processor (115) can control the driving device (135) to move to a position where the diffusion range (10) of each user's voice overlaps with each other, based on an image obtained through the camera (120).

[0141] According to one embodiment, the voices of multiple users may be input while an event is being identified. For example, a phone call may be received by multiple users who are together in one space, and the multiple users may converse with the other party of the phone call through the electronic device (100). As another example, while a user is making a phone call through the electronic device (100), another user may join the phone call.

[0142] The processor (115) identifies that various voices have been input based on the frequency, resonance characteristics, timbre, etc. of the voice, and thereby identifies that multiple users have participated in the conversation. For example, the processor (115) can identify multiple users by extracting the frequency spectrum of the voice.

[0143] Subsequently, the processor (115) can identify the source of the voice signal. Specifically, the processor (115) can identify the source of the sound through the difference in the time it takes for the voice signal to reach a plurality of microphones (125). The processor (115) can identify each user corresponding to each voice.

[0144] When voice is input from multiple users, the processor (115) can acquire captured images of multiple users through the camera (120). The processor (115) can identify the location of multiple users through the identified sound source and acquire captured images of multiple users through the camera (120). The processor (115) can identify the frontal direction of multiple users based on the captured images. Specifically, the processor (115) can identify the head direction or gaze direction of multiple users based on the captured images. Subsequently, the processor (115) can identify the diffusion range (10) of the user voice based on the frontal direction of multiple users. Referring to FIG. 6, the processor (115) can identify the diffusion range (10-1) of the voice of user (A) and the diffusion range (10-2) of the voice of user (B).

[0145] The processor (115) can control the driving device (135) to move to a position where the diffusion range (10) of each of the multiple users overlaps with each other. Referring to FIG. 6, the processor (115) controls the movement to a position where the diffusion range (10-1) of the voice of user (A) and the diffusion range (10-2) of the voice of user (B) overlap, and can receive the voices of the multiple users through the microphone (125).

[0146] The processor (115) can control the driving device (135) to position the microphone (125) in a direction that can receive the voices of multiple users at a location where the diffusion ranges (10) of each of the multiple users overlap each other. Specifically, the processor (115) can control the driving device (135) so that the microphone (125) is directed toward one of the multiple users. For example, the processor (115) can control the driving device (135) so that the microphone (125) is directed toward the user who was previously on a phone call. Alternatively, the processor (115) can control the driving device (135) so that the microphone (125) is directed toward the middle direction of the multiple users.

[0147] If the electronic device (100) includes one or more microphones (125), the processor (115) can control the driving device (135) so that the microphone (125) positioned at the front faces the main user.

[0148] In at least one embodiment of the present disclosure, the processor (115) can control the driving device (135) so that the microphone (125) is directed toward the primary user among a plurality of users. Further details regarding this will be described later.

[0149] FIG. 7 is a diagram showing the operation of an electronic device (100) in a situation where there is a primary user among a plurality of users according to at least one embodiment of the present disclosure.

[0150] The processor (115) can control the driving device (135) to move to an area within a preset distance corresponding to the frontal direction of the user identified as the main user, based on at least one of the number of voice inputs and the voice input time of each of the multiple users.

[0151] The primary user may be the user who speaks more proactively during a call among multiple users. For example, the primary user may be a user who inputs voice more frequently or for a longer duration than other users. The number of voice inputs may refer to the number of times a user speaks within a certain period. The voice input duration may refer to the time during a conversation when the user speaks without interruption.

[0152] The processor (115) can identify a primary user based on at least one of the number of voice inputs and the voice input time of each of a plurality of users. Specifically, the processor (115) can identify each user corresponding to each voice and can measure the number of times and the time when each user's voice is input. The processor (115) can identify a primary user among a plurality of users by comparing the number of voice inputs and the voice input time of each user. Referring to FIG. 7, the processor (115) can identify user (A) as the primary user.

[0153] The processor (115) can control the driving device (135) to move to an area within a preset distance corresponding to the frontal direction of the identified main user. The processor (115) can better receive the main user's voice by positioning itself closer to the main user, who has a higher volume of speech.

[0154] Referring to FIG. 7, the processor (115) can control the driving device (135) to move to a pre-set distance area corresponding to the frontal direction of the main user (A). At this time, the pre-set distance area corresponding to the frontal direction may be a location close to the main user among the areas where the diffusion range (10) of a plurality of user voices overlaps.

[0155] Meanwhile, the processor (115) can control the driving device (135) so that the microphone (125) is directed toward the main user. For example, if the electronic device (100) includes one or more microphones (125), the processor (115) can control the driving device (135) so that the microphone (125) positioned at the front is directed toward the main user.

[0156] FIG. 8 is a diagram showing the operation of an electronic device (100) in a situation where some of the users among a plurality of users are leaving, according to at least one embodiment of the present disclosure.

[0157] The processor (115) can control the driving device (135) to move the position of the electronic device (100) based on the position of the user within the preset area when at least one of the multiple users moves outside the preset area.

[0158] While multiple users are participating in a phone call, at least one user may move outside a pre-set area. The pre-set area may be an area within a pre-set distance from the electronic device (100). For example, the pre-set area may be an area within a distance where the electronic device (100) can receive the user's voice input. For example, at least one user may stop participating in the phone call and leave that space. Referring to FIG. 8, the appearance of user (B) leaving is illustrated.

[0159] The processor (115) can identify that a user has left when there is at least one user within a preset area and there is a user leaving the electronic device (100).

[0160] The processor (115) can identify whether at least one user has moved outside a preset area based on the captured image. The processor (115) can determine whether at least one user has moved outside a preset area by identifying an object within the captured image as described above.

[0161] Alternatively, the processor (115) can identify whether at least one user has moved outside a preset area based on the user's voice. Specifically, the processor (115) can identify whether at least one user has moved outside a preset area if the voice of at least one user is not input for a preset time.

[0162] When the processor (115) identifies that at least one user has moved outside a preset area, it can control the driving device (135) to move the position of the electronic device (100) based on the location of the user within the preset area.

[0163] The processor (115) can set a location where the diffusion range (10) of at least one user's voice overlaps with at least one user within a preset area as a moving location.

[0164] When one user remains within a preset area from the electronic device (100), the processor (115) can set an area corresponding to the front direction of one user as a movement position. Referring to FIG. 8, the processor (115) can set an area corresponding to the front direction of user (A) as a movement position based on user (A) remaining within a preset distance from the electronic device (100).

[0165] The processor (115) can control the driving device (135) to move to the location of the electronic device (100) that has been reset based on the user within a preset area from the electronic device (100).

[0166] FIG. 9 is a drawing showing a case where an electronic device (100) is located within an area where a communication connection is possible according to at least one embodiment of the present disclosure.

[0167] The processor (115) can receive a call received from the terminal device (30) through the communication unit (105). For example, the processor (115) can receive a call received from the terminal device (30) via Bluetooth communication.

[0168] The processor (115) can control the driving device (135) to move to the boundary position of the communication connection area (40) between the user voice diffusion range (10) and the communication connection area (40) when the user moves out of the communication connection area (40) between the device (30) (hereinafter referred to as the 'terminal device (30)') and the communication unit (105) while the user is performing a call after the identified event is identified as an event corresponding to a phone call while walking.

[0169] The communication connection area (40) may be a distance at which the terminal device (30) and the communication unit (105) can communicate. For example, in the case of Bluetooth communication, it may be the minimum distance at which the terminal device (30) and the communication unit (105) can communicate via Bluetooth. FIG. 8 illustrates a communication connection area (40) at which the terminal device (30) and the communication unit (105) can communicate.

[0170] A user may move out of the area (40) where communication is possible between the terminal device (30) and the communication unit (105) while making a phone call. For example, a user may move away from the area (40) where communication is possible by making a phone call while walking and moving away from the terminal device (30) which was in a fixed position. Alternatively, the user may be within the area (40) where communication is possible, but the electronic device (100) may move out of the area (40) where communication is possible while moving in a direction corresponding to the user's movement. Referring to FIG. 8, the appearance of a user moving out of the area (40) where communication is possible is illustrated.

[0171] In this case, the processor (115) can control the driving device (135) to move to a boundary location between the diffusion range (10) of the user voice and the area (40) where communication is possible. When the user moves out of the area (40) where communication is possible, the processor (115) can identify a boundary location between the diffusion range (10) and the area (40) where communication is possible and control the driving device (135) to move to that location and wait.

[0172] When a user is within a communication connection area (40), but the electronic device (100) moves in a direction corresponding to the user's movement and moves out of the communication connection area (40), the processor (115) can identify a boundary location between the diffusion range (10) and the communication connection area (40) and control the driving device (135) to move to that location and wait.

[0173] Meanwhile, the processor (115) may output a signal sound through the speaker (130) while waiting at the boundary position between the diffusion range (10) and the area (40) where communication is possible. The signal sound may be an alert sound indicating that the user has moved out of the area (40) where communication is possible. Alternatively, the signal sound may be a phrase indicating that the user has moved out of the area (40) where communication is possible. By outputting the signal sound, the processor (115) may induce the user not to move out of the area (40) where communication is possible any further. Referring to FIG. 9, the electronic device (100) may output a signal sound to induce the user to turn around and walk into the area (40) where communication is possible.

[0174] FIG. 10 is a drawing showing the operation of controlling the output of two speakers (130) of an electronic device (100) according to at least one embodiment of the present disclosure.

[0175] At least one speaker (130) may include at least one left speaker (130-1) positioned on the left side of the electronic device (100) and at least one right speaker (130-2) positioned on the right side. The processor (115) may output audio signals differently to the right speaker (130-2) and the left speaker (130-1) of the electronic device (100), respectively. Through this, the user can experience the three-dimensional quality of the audio.

[0176] The electronic device (100) can travel in a direction corresponding to the user's front direction. The direction corresponding to the user's front direction may be the same direction as the moving user's front direction. For example, if the electronic device (100) travels in a direction corresponding to the user's movement following the moving user, it can travel in a direction corresponding to the user's front direction.

[0177] In this case, the processor (115) can output a left audio signal through at least one left speaker (130-1) and a right audio signal through at least one right speaker (130-2) when the electronic device (100) is moving in a direction corresponding to the user's front direction.

[0178] On the other hand, the electronic device (100) may be positioned in a direction corresponding to the user's front direction. For example, when the user is stationary, the electronic device (100) may be positioned in a direction facing the user within a certain range of the user's front direction.

[0179] In this case, the processor (115) can output a left audio signal through at least one right speaker (130-2) and a right audio signal through at least one left speaker (130-1) if the position of the electronic device (100) corresponds to the front direction of the user.

[0180] Additionally, even if the angle formed by the front direction of the user and the front direction of the electronic device (100) is greater than or equal to a preset angle, the processor (115) can output a left audio signal through at least one right speaker (130-2) and output a right audio signal through at least one left speaker (130-1).

[0181] Through this, the user can feel the three-dimensionality of the audio by listening to an audio signal that matches the user's direction, regardless of the direction between the user and the electronic device (100).

[0182] Meanwhile, the processor (115) can adjust the volume of at least one left speaker (130-1) and at least one right speaker (130-2) respectively according to the distance and direction from the user as the electronic device (100) moves. For example, if at least one right speaker (130-2) moves closer to the user as the electronic device (100) moves, the processor (115) can reduce the volume of at least one right speaker (130-2). By adjusting the volume of at least one right speaker (130-2) and at least one left speaker (130-1) respectively, the electronic device (100) can provide a more accurate sound experience.

[0183] Meanwhile, the processor (115) can control the input of voice signals separated into left and right through microphones (125) placed on both sides when recording the user's voice. Through this, spatial depth and directionality of the recorded audio signal can be realized.

[0184] At this time, according to one embodiment of the present disclosure, the processor (115) can control the microphones (125) positioned on both sides to receive voice signals differently depending on the direction of the user's head.

[0185] At least one microphone (125) may include at least one left microphone positioned to the left of the main body of the electronic device (100) and at least one right microphone positioned to the right of the main body. The processor (115) can control the right microphone and the left microphone to receive different inputs, respectively. Through this, the electronic device (100) can record sound coming from the left and sound coming from the right differently.

[0186] The processor (115) can control the electronic device (100) to receive a left voice signal through at least one left microphone and a right voice signal through at least one right microphone when the electronic device (100) is moving in a direction corresponding to the user's front direction.

[0187] On the other hand, the processor (115) can control the electronic device (100) to receive a right voice signal through at least one left microphone and a left voice signal through at least one right microphone when the position of the electronic device (100) corresponds to the front direction of the user.

[0188] In addition, even if the angle formed by the front direction of the user and the front direction of the electronic device (100) is greater than a preset angle, the processor (115) can control to receive a right voice signal through at least one left microphone and receive a left voice signal through at least one right microphone.

[0189] FIG. 11 is a drawing showing the operation of an electronic device (100) according to at least one embodiment of the present disclosure when it cannot move to an area corresponding to the front direction of the user.

[0190] If the processor (115) is unable to move to an area corresponding to the user's front direction, it can control the driving device (135) to move to an adjacent area of ​​the user.

[0191] When a user is exercising, the user's frontal direction is often skewed upward or downward compared to normal situations, causing the user's voice to be dispersed. Therefore, when a user is exercising, the electronic device (100) must be positioned closer than a preset distance relative to the user's frontal direction in order to receive the user's voice better. Referring to FIG. 11, as the user exercises, the user's frontal direction is skewed upward, causing the user's voice to be dispersed.

[0192] Alternatively, the electronic device (100) may be unable to move because there is an obstacle within the area corresponding to the user's front direction. For example, the electronic device (100) may be blocked by an obstacle such as furniture in front of the user and unable to move in the user's front direction.

[0193] At this time, the processor (115) can control the driving device (135) to move to an adjacent area of ​​the user.

[0194] The user's adjacent area may be a location closer to the user than a pre-set distance area corresponding to the user's frontal direction. In cases where the user's frontal direction is facing upward or downward, such as when the user is exercising, the processor (115) may control the driving device (135) to move to a location closer to the user than a pre-set distance relative to the user's frontal direction. Referring to FIG. 11, it is illustrated that the user's frontal direction is skewed upward so that the electronic device (100) moves to the user's adjacent area so as to be closer to the user.

[0195] If the driving device (135) cannot be positioned in the frontal direction of the user due to an obstacle, the processor (115) can control the driving device (135) to be positioned in an area adjacent to the user. Specifically, the processor (115) can control the driving device (135) to move to a position close to the user from the side of the user.

[0196] A method for controlling an electronic device (100) according to the present disclosure will be explained below through FIGS. 12 to 14.

[0197] FIG. 12 is a flowchart illustrating a control method of an electronic device (100) according to at least one embodiment of the present disclosure.

[0198] According to FIG. 12, when an event based on user input is identified, the electronic device (100) can move to an area corresponding to the user's front direction based on an image acquired through a camera (S1210).

[0199] The electronic device (100) can receive a phone call and output the voice of the received phone call through a speaker (130), or receive the user's voice through a microphone (125) and transmit it to a direct caller's terminal device via a network connection, or transmit it to a caller's terminal device via a Bluetooth connection.

[0200] The electronic device (100) can receive the user's voice and execute a voice recognition mode. Specifically, the electronic device (100) can receive the user's voice and provide a response corresponding to the user's voice or perform a corresponding action.

[0201] The user's front direction may be the direction in which the user's head is facing. Alternatively, it may be the direction of the user's gaze. The area corresponding to the user's front direction may be an area within the user's front direction area that is designed to receive the user's voice well. For example, the area corresponding to the front direction may be the diffusion range (10) of the user's voice.

[0202] The electronic device (100) can control the reception sensitivity of at least one microphone (125) included in the electronic device (100) to change based on the distance and direction from the user as it moves to an area corresponding to the front direction, and can control the volume of audio corresponding to the audio output through at least one speaker (135) included in the electronic device (100) to change (S1220).

[0203] The distance from the user may be the distance between the electronic device (100) and the user. The electronic device (100) may decrease the reception sensitivity of at least one microphone (125) when the distance from the user decreases, and may increase the reception sensitivity when the distance from the user increases.

[0204] The electronic device (100) can decrease the volume corresponding to audio output through at least one speaker (135) when the distance from the user decreases, and can increase the volume when the distance from the user increases.

[0205] The direction relative to the user may be an angle formed by the electronic device (100) relative to the user's front direction. If the angle formed by the electronic device (100) relative to the user's front direction increases, the electronic device (100) can increase the volume of the microphone (125).

[0206] FIG. 13 is a flowchart illustrating in detail a method for controlling an electronic device (100) according to whether a user is walking, according to at least one embodiment of the present disclosure.

[0207] The electronic device (100) can identify an event based on user input (S1310). The electronic device (100) can identify an event corresponding to a phone call. Alternatively, the electronic device (100) can identify an event corresponding to a voice recognition mode.

[0208] While the user is walking, a phone call may be received and a phone call may be initiated, and the user may activate the voice recognition mode of the electronic device (100). The user may activate the voice recognition mode of the electronic device (100) by speaking a voice that includes a trigger voice. The electronic device (100) may recognize the user's voice and provide a response corresponding to the user's voice or perform a corresponding action.

[0209] The electronic device (100) can identify whether the user is moving based on a captured image of the user (S1320).

[0210] The electronic device (100) can acquire a continuous series of images of the user through a camera (120) and identify whether the user is moving through changes in the user's movement in the images. It can also identify the direction corresponding to the user's movement through the user's gaze and the direction in which the legs are extended.

[0211] If the user maintains the phone call without moving, the electronic device (100) can move to a pre-set distance area corresponding to the user's front direction so that at least one microphone (125) is directed toward the user (S1330).

[0212] The electronic device (100) can move to an area within a preset distance corresponding to the user's front direction. For example, when the user makes a phone call while stationary, the electronic device (100) can move to the user's front direction and receive the call. Another example is when the user activates a voice recognition mode while stationary, the electronic device (100) can move to an area within a preset distance corresponding to the user's front direction and provide an answer corresponding to the user's voice or perform a corresponding action.

[0213] The electronic device (100) can move to an area within a preset distance corresponding to the front direction of the user and then direct at least one microphone (125) toward the user. For example, if the electronic device (100) includes a main microphone (125) among one or more microphones (125), the electronic device (100) can direct the main microphone (125) toward the user. Alternatively, the electronic device (100) can direct the microphone (125) placed on the front of the electronic device (100) among one or more microphones (125) toward the user. By directing at least one microphone (125) toward the direction of the user, the electronic device (100) can better receive the user's voice.

[0214] When the user resumes movement, the user moves to a position within the diffusion range (10) of the user's voice that does not interfere with the user's movement, and can drive in a direction corresponding to the user's movement (S1340).

[0215] The diffusion range (10) of the user's voice is the range in which the user's voice spreads, and the electronic device (100) can efficiently receive voice within the diffusion range (10) through the microphone (125).

[0216] The electronic device (100) can identify the direction corresponding to the user's movement based on the captured images. The electronic device (100) can identify the direction corresponding to the user's movement through the direction in which the user's gaze and legs are extended included in the captured images. Alternatively, the electronic device (100) can acquire a series of captured images of the user through a camera (120) and predict the direction corresponding to the user's movement through the direction in which the user's gaze and leg movements change through the series of captured images.

[0217] The electronic device (100) can move to a position that is a certain distance away from the user so as not to interfere with the user's walking, and can avoid the direction corresponding to the user's movement within the diffusion range (10) of the user's voice.

[0218] The electronic device (100) can move in a direction corresponding to the movement of the user. The electronic device (100) can move in a direction corresponding to the movement of the user so as to remain at a constant distance from the walking user.

[0219] At least one of the receiving sensitivity of the microphone (125) and the volume of the speaker (130) can be adjusted according to the distance and direction from the user that changes while the electronic device (100) is moving (S1350).

[0220] FIG. 14 is a flowchart showing the operation of an electronic device (100) when the voices of a plurality of users are input according to at least one embodiment of the present disclosure.

[0221] Voice input from multiple users may be received while the electronic device (100) is performing a phone call (S1410). For example, a phone call may be received by multiple users who are together in one space, allowing multiple users to converse with the other party of the phone call through the electronic device (100). As another example, while a user is making a phone call through the electronic device (100), another user may join the phone call.

[0222] The electronic device (100) can identify the primary user among multiple users (S1420).

[0223] The primary user may be the user who speaks more proactively during a call among multiple users. For example, the primary user may be a user who inputs voice more frequently or for a longer duration than other users. The number of voice inputs may refer to the number of times a user speaks within a certain period. The voice input duration may refer to the time during a conversation when the user speaks without interruption.

[0224] The electronic device (100) can identify a primary user based on at least one of the number of voice inputs and the voice input time of each of a plurality of users. Specifically, the electronic device (100) can identify each user corresponding to each voice, and can measure the number of voice inputs and the voice input time by measuring the number of times and the time when each user's voice is input. The electronic device (100) can identify a primary user among a plurality of users by comparing the number of voice inputs and the voice input time of each user.

[0225] If the main user is not identified, the electronic device (100) can move to a position where the diffusion range (10) of each of the multiple users' voices overlaps with each other (1430).

[0226] The electronic device (100) can identify the frontal direction of multiple users based on captured images of multiple users. The electronic device (100) can identify the diffusion range (10) of the user's voice based on the frontal direction of multiple users. The electronic device (100) can move to a position where the diffusion range (10) of each of the multiple users' voices overlaps with one another.

[0227] Additionally, the electronic device (100) may position the microphone (125) in a direction that can receive the voices of multiple users at a location where the diffusion ranges (10) of each of the multiple users overlap each other. Specifically, the electronic device (100) may direct the microphone (125) toward one of the multiple users. For example, the electronic device (100) may direct the microphone (125) toward the user who was already on a phone call. Alternatively, the electronic device (100) may direct the microphone (125) toward the middle of the multiple users.

[0228] When the main user is identified, the electronic device (100) can move to an area within a preset distance corresponding to the main user's frontal direction (S1440). By positioning the electronic device (100) closer to the main user who has a higher volume of speech, it can better receive the main user's voice.

[0229] The area within a preset distance corresponding to the frontal direction of the main user may be a location close to the main user among the areas where the diffusion range (10) of multiple user voices overlaps.

[0230] Meanwhile, the electronic device (100) may direct the microphone (125) toward the main user. For example, if the electronic device (100) includes one or more microphones (125), the electronic device (100) may direct the microphone (125) positioned at the front toward the main user.

[0231] At least one of the multiple users may move outside a preset area (S1450). While the multiple users are participating in a phone call, at least one user may move away from the electronic device (100) by a preset distance. For example, at least one user may stop participating in the phone call and leave that space.

[0232] The electronic device (100) can identify that a user has left when there is at least one user within a preset area of ​​the electronic device (100) and there is a user leaving the electronic device (100).

[0233] The electronic device (100) can move its position based on the location of the user within a preset area from the electronic device (100) (S1460).

[0234] The electronic device (100) can set a position where the diffusion range (10) of at least one user's voice overlaps with at least one user remaining within a preset distance from the electronic device (100) as a moving position.

[0235] If one user remains within a preset area from the electronic device (100), the electronic device (100) can move to an area corresponding to the front direction of the user.

[0236] Afterward, the electronic device (100) can point the microphone (125) toward the remaining user from the moved position.

[0237] Meanwhile, the control method of the electronic device (100) described in FIGS. 12 to 14 may be performed by an electronic device (100) having the configuration of FIG. 2, but is not necessarily limited thereto, and may be performed by an electronic device (100) having various other configurations.

[0238] Although the operation of the electronic device has been described in various ways based on the case of a telephone call or voice recognition mode, the various embodiments of the present disclosure may also be applied to cases where various services are provided using at least one of a microphone or a speaker. For example, when performing various operations such as conducting an online video conference, listening to music output from the electronic device (100), or a user performing voice recording, the position of the electronic device, movement speed, microphone volume, speaker volume, etc., can be appropriately adjusted according to the distance and direction from the user as described above.

[0239] The various embodiments described above may be implemented individually, but are not necessarily limited thereto, and may be implemented together in combination with at least one other embodiment, either partially or wholly.

[0240] The methods according to the various embodiments of the present disclosure described above can be implemented by software upgrade or hardware upgrade alone for an existing electronic device (100).

[0241] Meanwhile, according to a specific example of the present disclosure, the control method according to the various embodiments described above may be implemented as software comprising instructions stored on a non-transitory machine-readable storage media that can be read by various machines (e.g., computers), such as an electronic device (100).

[0242] Specifically, a program for performing a control method may be provided in a state where it is stored on a non-transient computer-readable recording medium, comprising the steps of: moving to an area corresponding to the user's frontal direction based on an image acquired through a camera when an event corresponding to user input is identified; controlling the reception sensitivity of at least one microphone included in an electronic device to change based on the distance and direction from the user resulting from the movement to the area corresponding to the frontal direction; and controlling the volume corresponding to audio output through at least one speaker included in the electronic device to change.

[0243] When stored software or instructions are executed by the processor (115)(130), the processor (115)(130) may perform operations according to the various embodiments described above, either directly or by using other components. Instructions may include code generated or executed by a compiler or an interpreter. Here, 'non-transient' means only that the storage medium does not contain a signal and is tangible, and does not distinguish whether data is stored semi-permanently or temporarily in the storage medium.

[0244] Additionally, according to one or more embodiments of the present disclosure, the method according to the various embodiments described above may be provided as included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed online through an online store as well as the non-transient readable recording medium described above. In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created in a storage medium such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0245] Additionally, each component (e.g., module or program) according to the various embodiments described above may be composed of a single or multiple entities, and some of the aforementioned sub-components may be omitted, or other sub-components may be further included in the various embodiments. Generally or additionally, some components (e.g., module or program) may be integrated into a single entity to perform the functions performed by each of the respective components prior to integration in the same or similar manner. The operations performed by the module, program, or other components according to the various embodiments may be executed sequentially, in parallel, iteratively, or heuristically, or at least some operations may be executed in a different order, omitted, or other operations added.

[0246] Although preferred embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the specific embodiments described above. It is understood that various modifications can be made by those skilled in the art without departing from the essence of the present disclosure as claimed in the claims, and such modifications should not be understood individually from the technical spirit or perspective of the present disclosure.

Claims

1. In an electronic device, camera; At least one microphone; At least one speaker; Communications Department; A driving device for moving the above electronic device; Memory where at least one instruction is stored; It includes at least one processor that operates according to the execution of at least one of the above instructions, and The above at least one processor is, When an event based on user input is identified, the driving device is controlled to move to an area corresponding to the user's front direction based on an image acquired through the camera, and Controlling the reception sensitivity of the at least one microphone to be changed based on the distance and direction from the user due to movement to an area corresponding to the front direction, and controlling the volume corresponding to the audio output through the at least one speaker to be changed. Electronic device.

2. In Paragraph 1, The above processor is, After the above-mentioned identified event is identified as an event corresponding to a phone call while walking, if the user maintains the phone call without moving, the driving device is controlled to move to an area within a preset distance corresponding to the user's frontal direction, and then the at least one microphone is directed toward the user. Electronic device.

3. In Paragraph 2, When the user maintains the phone call without moving and then starts moving again, the driving device is controlled to move to a position within the diffusion range of the user's voice that does not hinder the user's movement and to drive in a direction corresponding to the user's movement. Electronic device.

4. In Paragraph 1, The above at least one microphone is, It includes a plurality of microphones arranged in the above-mentioned different directions, The above processor is, If there is a noise source within a preset area, the receiving sensitivity of the microphone directed toward the user among the plurality of microphones is controlled to increase, and the receiving sensitivity of the microphone directed toward the noise source is controlled to decrease. Electronic device.

5. In Paragraph 1, The above processor is, When multiple user voice inputs are received while the above event is being identified, the driving device is controlled to move to a position where the diffusion ranges of each of the multiple users' voices overlap each other, based on an image acquired through the camera. electronic devices 6. In Paragraph 5, The above processor is, Based on at least one of the number of voice inputs and the voice input time of each of the plurality of users, the driving device is controlled to move to an area within a preset distance corresponding to the frontal direction of the user identified as the main user. Electronic device.

7. In Paragraph 5, The above processor is, When at least one of the plurality of users moves outside a preset area, the driving device is controlled to move the position of the electronic device based on the position of the user within the preset area. Electronic device.

8. In Paragraph 1, The above processor is, An electronic device that controls the driving device to move to the boundary position between the diffusion range of the user's voice and the area where communication is possible, when the user moves outside the area where communication is possible between the device corresponding to the phone call and the communication unit while the user is performing a call after the above-mentioned identified event is identified as an event corresponding to a phone call while walking.

9. In Paragraph 1, The above at least one speaker includes at least one left speaker positioned on the left side of the electronic device and at least one right speaker positioned on the right side. The above processor is, If the electronic device is moving in a direction corresponding to the front direction of the user, it outputs a left audio signal through the at least one left speaker and outputs a right audio signal through the at least one right speaker. An electronic device that, if the position of the electronic device corresponds to the front direction of the user, outputs the left audio signal through the at least one right speaker and outputs the right audio signal through the at least one left speaker.

10. In Paragraph 1, The above processor is, An electronic device that controls the driving device to move to an adjacent area of ​​the user if movement to an area corresponding to the frontal direction of the user is not possible.

11. In a method for controlling an electronic device, When an event based on user input is identified, a step of moving to an area corresponding to the frontal direction of the user based on an image acquired through a camera; and A control method comprising the step of controlling the reception sensitivity of at least one microphone included in the electronic device to change based on the distance and direction from the user as the user moves to an area corresponding to the front direction, and controlling the volume of audio output through at least one speaker included in the electronic device to change.

12. In Paragraph 11, The above moving step is, A control method in which, after the above-mentioned identified event is identified as an event corresponding to a phone call while walking, if the user maintains the phone call without moving, the user moves to an area within a preset distance corresponding to the user's frontal direction and then the at least one microphone is directed toward the user.

13. In Paragraph 12, The above control method is, A control method further comprising the step of: maintaining the phone call without the user moving, and then when the user starts moving again, moving to a position that does not hinder the user's movement within the diffusion range of the user's voice and driving in a direction corresponding to the user's movement.

14. In Paragraph 11, The above at least one microphone is, It includes a plurality of microphones arranged in the above-mentioned different directions, The step of adjusting the volume above is, A control method that, if there is a noise source within a pre-set area, controls the reception sensitivity of the microphone directed toward the user among the plurality of microphones to increase and controls the reception sensitivity of the microphone directed toward the noise source to decrease.

15. In Paragraph 11, The above control method is, A control method further comprising the step of, when multiple user voice inputs are received while the above event is being identified, moving to a position where the diffusion ranges of each of the multiple users' voices overlap each other based on an image acquired through the camera.