Electronic device, and voice recognition method of electronic device
By using separate processors to tune and channel audio signals for improved speech recognition, the device addresses suboptimal voice recognition issues, enhancing accuracy and clarity in call environments.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SAMSUNG ELECTRONICS CO LTD
- Filing Date
- 2025-12-17
- Publication Date
- 2026-06-25
Smart Images

Figure KR2025022034_25062026_PF_FP_ABST
Abstract
Description
Electronic device and voice recognition method of electronic device
[0001] The embodiments disclosed in this document relate to a technology for performing speech recognition on an audio signal received by an electronic device from an external device and an audio signal transmitted by the electronic device to the external device.
[0002] As speech recognition technology advances, various electronic devices, including microphones, are providing speech recognition capabilities. For example, when an electronic device receives user speech through a microphone, it can generate text corresponding to the user speech through speech recognition. The electronic device can provide functions or services corresponding to user speech through speech recognition. For instance, the electronic device can store audio signals transmitted and received via a call recording function and perform speech recognition on the audio signals recorded during the call.
[0003] The information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.
[0004] An electronic device according to an embodiment disclosed in this document may include a microphone, a speaker, a communication circuit, a first processor, a second processor, and a memory. The first processor may be configured to acquire a first receiving signal received through the communication circuit and a first transmitting signal corresponding to a user utterance input through the microphone while in communication with an external device through the communication circuit, to generate a second transmitting signal by tuning the first transmitting signal based on a first setting value, to generate a second receiving signal and a third receiving signal by tuning the first receiving signal based on a second setting value and a third setting value, respectively, and to transmit at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal to the second processor through independent channels. The second processor may be configured to perform voice recognition for at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal.
[0005] A voice recognition method for an electronic device according to an embodiment disclosed in this document may include: acquiring a first receiving signal received through a communication circuit of the electronic device and a first transmitting signal corresponding to a user utterance input through a microphone of the electronic device by a first processor of the electronic device during a call with an external device; tuning the first transmitting signal based on a first setting value by the first processor to generate a second transmitting signal; tuning the first receiving signal based on a second setting value and a third setting value by the first processor to generate a second receiving signal and a third receiving signal, respectively; transmitting at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal to a second processor through independent channels by the first processor; and performing voice recognition for at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal by the second processor of the electronic device.
[0006] A storage medium according to one embodiment disclosed in this document may store instructions and / or a program such that, when executed by a first processor and / or a second processor of an electronic device, the first processor acquires a first receiving signal received through the communication circuit and a first transmitting signal corresponding to a user utterance input through the microphone while in communication with an external device through the communication circuit, tunes the first transmitting signal based on a first setting value to generate a second transmitting signal, tunes the first receiving signal based on a second setting value and a third setting value respectively to generate a second receiving signal and a third receiving signal, transmits at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal to the second processor through independent channels, and the second processor performs voice recognition for at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal.
[0007] An electronic device according to one embodiment disclosed in this document may include a microphone, a speaker, a communication circuit, a first processor, a second processor, and a memory. The first processor may include at least one call recording module configured to transmit signals related to a call to the second processor while in a call with an external device through the communication circuit. The at least one call recording module may be configured to acquire a first receiving signal received through the communication circuit and a first transmitting signal corresponding to a user utterance input through the microphone during the call, to generate a second transmitting signal by tuning the first transmitting signal based on a first setting value, to generate a second receiving signal and a third receiving signal by tuning the first receiving signal based on a second setting value and a third setting value, respectively, and to transmit at least one of the first transmitting signal, the second transmitting signal, the first receiving signal, and the second receiving signal or the third receiving signal to the second processor through independent channels. The second processor may include a recording module configured to generate and store call recording data based on the first receiving signal and the first transmitting signal received from the first processor, and at least one voice recognition module configured to perform voice recognition for each of the second receiving signal or the third receiving signal received from the first processor and the second transmitting signal.
[0008] FIG. 1 is a block diagram of an electronic device according to one embodiment.
[0009] FIG. 2 is a diagram showing the configuration of an electronic device according to one embodiment.
[0010] Figure 3a is a diagram showing the configuration of an electronic device according to a comparative example.
[0011] FIG. 3b is a diagram showing the configuration of an electronic device according to one embodiment.
[0012] FIG. 3c is a diagram showing the configuration of an electronic device according to one embodiment.
[0013] Figure 4a is a diagram showing the configuration of an electronic device according to a comparative example.
[0014] FIG. 4b is a diagram showing the configuration of an electronic device according to one embodiment.
[0015] FIG. 4c is a diagram showing the configuration of an electronic device according to one embodiment.
[0016] FIG. 4d is a diagram showing the configuration of an electronic device according to one embodiment.
[0017] FIG. 4e is a diagram showing the configuration of an electronic device according to one embodiment.
[0018] FIG. 5 is a flowchart of a voice recognition method of an electronic device according to one embodiment.
[0019] FIG. 6a is a flowchart of a voice recognition method of an electronic device according to one embodiment.
[0020] FIG. 6b is a flowchart of a voice recognition method of an electronic device according to one embodiment.
[0021] FIGS. 7a and 7b are flowcharts of voice recognition operations of an electronic device according to one embodiment.
[0022] FIG. 8 is a drawing showing a user interface provided by an electronic device according to one embodiment.
[0023] FIG. 9 shows an electronic device in a network environment according to various embodiments.
[0024] FIG. 10 is a block diagram of an electronic device for supporting at least one network communication according to one embodiment.
[0025] FIG. 11 is a block diagram illustrating an automatic voice recognition module according to one embodiment.
[0026] In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components.
[0027] FIG. 1 is a block diagram of an electronic device according to one embodiment.
[0028] According to one embodiment, an electronic device (100) (e.g., electronic device (100; 200; 300b; 300c; 400b; 400c; 400d; 400e; 900; 1001)) may include a microphone (110), a speaker (120), a communication circuit (130) (e.g., a communication circuit (210; 301b; 303c; 960)), a memory (140) (e.g., a memory (920)), a first processor (150) (e.g., a first processor (220; 303b; 303c), a processor (910), or a communication processor (1060)), and a second processor (160) (e.g., a second processor (240; 305b; 305c) or a processor (910; 1020)).
[0029] According to one embodiment, the microphone (110) may include at least one microphone (110). For example, the microphone (110) may receive external sound and convert it into an audio signal. For example, the microphone (110) may receive user speech and generate an audio signal corresponding to the user speech. The microphone (110) may receive user speech during a call and convert it into a transmission signal corresponding to the user speech.
[0030] According to one embodiment, the speaker (120) may include at least one speaker (120). The speaker (120) may output a sound corresponding to an audio signal. For example, the speaker (120) may output a sound corresponding to a transmission signal to be transmitted to an external device during a call, a translated transmission signal, a reception signal received from an external device, and / or a translated reception signal.
[0031] According to one embodiment, the communication circuit (130) can transmit and receive information and / or data with an external device. For example, the communication circuit (130) may include at least one modem. The communication circuit (130) can transmit a transmission signal corresponding to a user's utterance to an external device during a call and receive a reception signal corresponding to a user's utterance (i.e., the call partner) from the external device.
[0032] According to one embodiment, the memory (140) may store instructions that control the operation of the electronic device (100) when executed individually or collectively by at least one processor (e.g., a first processor (150) and / or a second processor (160)). For example, the instructions may be stored in one memory (140) or multiple memories (140). The memory (140) may store information and / or data related to the operations of the electronic device (100) at least temporarily. The memory (140) may include a memory (140) independent of at least one processor, and may include a processor internal memory (140) mounted on at least one processor.
[0033] For example, the memory (140) may include a shared storage space accessible to a plurality of processors (e.g., a first processor (150) and a second processor (160)). The memory (140) (e.g., the shared storage space) may at least temporarily store a receiving signal received from an external device and / or a transmitting signal corresponding to a user speech input through the microphone (110). The memory (140) may at least temporarily store signals tuned by the first processor (150). The memory (140) may at least temporarily store content obtained as a result of performing voice recognition by the second processor (160). The memory (140) may at least temporarily store translated content and / or a signal corresponding to the translated content. The memory (140) may store call data recorded through a call recording function (e.g., a call recording file in a specified format).
[0034] According to one embodiment, the first processor (150) can control the operations of the electronic device (100) by executing instructions stored in memory (140) individually or collectively. For example, the first processor (150) may include at least one first processor (150), and the operations described in this disclosure as being performed by the 'first processor (150)' may be understood as being performed individually or collectively by at least one first processor (150). For example, at least one first processor (150) may control each of the operations of the electronic device (100) described below independently or collectively. In this disclosure, the first processor (150) is described as an audio digital signal processor (ADSP), but is not limited thereto. For example, the ADSP may be included in a communication processor (CP) (e.g., CP (1160) of FIG. 11) or implemented by being integrated with the CP. According to various embodiments, at least one first processor (150) may include circuits such as a CPU (central processing unit), MPU (micro processor unit), AP (application processor), CP (communication processor), SoC (system on chip), and / or IC (integrated circuit).
[0035] According to one embodiment, the first processor (150) can acquire a first receiving signal received through the communication circuit (130) of the electronic device (100) and a first transmitting signal corresponding to a user utterance input through the microphone (110) of the electronic device while in a call with an external device.
[0036] The first processor (150) can generate a second transmission signal by tuning a first transmission signal based on a first setting value. For example, the first processor (150) can generate a second transmission signal by removing noise from the first transmission signal or / or performing echo canceling. For example, the first setting value may be determined experimentally as a value to improve speech recognition performance (recognition rate) or through a trained artificial intelligence model (e.g., machine learning). For example, the second transmission signal may be a signal tuned with a focus on improving speech recognition performance rather than improving the hearing performance (e.g., intelligibility) of the person on the call (e.g., user or call partner). For example, the operation of tuning based on the first setting value may be included in the preprocessing operation of the first transmission signal. According to various embodiments, in the present disclosure, the first transmission signal may include a first transmission signal preprocessed based on a specified tuning value. For example, the preprocessed first transmission signal may be a first transmission signal tuned based on a setting value for improving listening performance, rather than a setting value for improving speech recognition performance.
[0037] The first processor (150) can generate a second received signal by tuning the first received signal based on a second setting value. The first processor (150) can generate a third received signal by tuning the first received signal based on a third setting value. For example, the first processor (150) can generate the second received signal and / or the third received signal by performing energy (volume) adjustment, filter application (e.g., tone adjustment, frequency characteristic adjustment), dynamic range adjustment, noise removal, and / or echo canceling of the first received signal. For example, the second setting value and the third setting value may be different. The second setting value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve the user's hearing performance. The third setting value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve speech recognition performance (recognition rate). For example, the second received signal may be a signal tuned with a focus on improving the hearing performance (e.g., increasing clarity) of the person on the call (e.g., user or call partner), and the third received signal may be a signal tuned with a focus on improving speech recognition performance. For example, an operation to tune based on the second setting value and / or the third setting value may be included in the post-processing operation of the first received signal.
[0038] The first processor (150) can transmit at least one of the first receiving signal, the first transmitting signal, the second transmitting signal, the second receiving signal, and the third receiving signal to the second processor (160) through independent channels. For example, the first processor (150) can map channels to each of the first receiving signal, the first transmitting signal, the second transmitting signal, the second receiving signal, and the third receiving signal based on channel setting information received from the second processor (160). The first processor (150) can map the first channel to the first receiving signal, map the second channel to the first transmitting signal, map the third channel to the second transmitting signal, map the fourth channel to the second receiving signal, and map the fifth channel to the third receiving signal. The first processor (150) can transmit a first receiving signal to the second processor (160) through a first channel, transmit a first transmitting signal to the second processor (160) through a second channel, transmit a second transmitting signal to the second processor (160) through a third channel, transmit a second receiving signal to the second processor (160) through a fourth channel, and transmit a third receiving signal to the second processor (160) through a fifth channel.
[0039] The first processor (150) can transmit at least one of a first transmission signal or a third transmission signal to an external electronic device (100) through a communication circuit (130). For example, when the 'My Voice Blocking' setting is enabled, the first processor (150) can transmit the third transmission signal to the external electronic device (100) through the communication circuit (130). For example, when the 'My Voice Blocking' setting is disabled, the first processor (150) can transmit the first transmission signal and the third transmission signal to the external electronic device (100) through the communication circuit (130). For example, the first processor (150) can transmit a signal in which the first transmission signal and the third transmission signal are mixed through the communication circuit (130).
[0040] The first processor (150) can output a sound corresponding to at least one of the second received signal or the fourth received signal through the speaker (120). For example, when the 'opponent voice blocking' setting is enabled, the first processor (150) can output a sound corresponding to the fourth received signal through the speaker (120). For example, when the 'opponent voice blocking' setting is disabled, the first processor (150) can output a sound corresponding to the second received signal and the fourth received signal through the speaker (120). For example, the first processor (150) can output a sound corresponding to a signal in which the second received signal and the fourth signal are mixed through the speaker (120).
[0041] According to one embodiment, the second processor (160) can generate and store call recording data (e.g., a call recording file in a specified format) based on a first receiving signal and a first transmitting signal received from the first processor (150).
[0042] According to one embodiment, the second processor (160) can perform voice recognition on at least one of a second received signal or a third received signal received from the first processor (150). The second processor (160) can obtain a second content (e.g., a second text) corresponding to the second received signal and / or the third received signal through voice recognition. The second processor (160) can perform voice recognition on a second transmitted signal received from the first processor (150). The second processor (160) can obtain a first content (e.g., a first text) corresponding to the second transmitted signal through voice recognition.
[0043] The second processor (160) can transmit a third transmission signal and a fourth reception signal corresponding to the translated content to the first processor (150). For example, the second processor (160) can generate a third transmission signal corresponding to the translated first content. The second processor (160) can generate a third transmission signal by converting the translated first content into text-to-speech (TTS). The second processor (160) can transmit the third transmission signal to the first processor (150). The second processor (160) can generate a fourth reception signal corresponding to the translated second content. The second processor (160) can generate a fourth reception signal by converting the translated second content into TTS. The second processor (160) can transmit the fourth reception signal to the first processor (150).
[0044] According to one embodiment, the electronic device (100) can improve the voice recognition rate and voice recognition performance by performing voice recognition on signals (e.g., a second transmission signal, a second reception signal, and / or a third reception signal) that are tuned for voice recognition performance enhancement rather than on the original voice (e.g., a first transmission signal and a first reception signal).
[0045] According to various embodiments, the configuration of the electronic device (100) is not limited to that shown in FIG. 1, and at least some configurations may be omitted, or at least one configuration (e.g., at least one of the components of FIG. 2, 3b, 3c, 4b to 4e, or 9 to 11) may be added.
[0046]
[0047] FIG. 2 is a diagram showing the configuration of an electronic device (200) according to one embodiment. In the following description, an audio signal is processed during a call while the 'in-call translation' function of the electronic device (200) is activated, but the embodiments of the present disclosure are not limited thereto. In the present disclosure, the call may include voice calls and / or video calls based on various communication protocols.
[0048] According to one embodiment, an electronic device (200) (e.g., electronic device (100; 300b; 300c; 400b; 400c; 400d; 400e; 900; 1001)) may include a communication circuit (210) (e.g., communication circuit (130; 301b; 303c; 960)), a first processor (220) (e.g., first processor (150; 303b; 303c), processor (910), or communication processor (1060)), a kernel (230), and a second processor (240) (e.g., second processor (160; 305b; 305c) or processor (910; 1020)).
[0049] According to one embodiment, the communication circuit (210) may include at least one modem. The first modem (211) may transmit a signal received from the first processor (220) (e.g., at least one of a first transmission signal or a third transmission signal) to an external device. The second modem (212) may transmit a signal received from the external device (e.g., a first reception signal) to the first processor (220).
[0050] According to one embodiment, the first processor (220) comprises a preprocessing module (221) (e.g., preprocessing module (331b; 331c) or transmission signal processing module (790)), a transmission mute module (222) (e.g., transmission mute module (332b; 332c)), a first MFC (223) (media format converter), a transmission original sound mute module (224) (e.g., transmission original sound mute module (334b; 334c)), an encoder (225) (e.g., encoder (337b, 337c; 792)), a decoder (226) (e.g., decoder (338b; 338c; 798)), a reception mute module (227) (e.g., reception mute module (339b; 339c)), and a postprocessing module (228) (e.g., postprocessing module (340b; It may include a receiving signal processing module (794) or a receiving signal processing module (340c), a second MFC (229), a receiving original sound mute module (230) (e.g., receiving original sound mute module (342b; 342c)), and a call recording module (2221) (e.g., call recording module) (e.g., call recording module (310b; 3101c; 3103c; 410b; 410c; 410d; 410e; 786)).
[0051] The preprocessing module (221) can receive a first transmission signal corresponding to a user utterance received through the microphone (201) (MIC). The preprocessing module (221) can preprocess the first transmission signal to generate a second transmission signal. For example, the preprocessing module (221) can generate a second transmission signal by removing noise from the first transmission signal or / or performing echo canceling. For example, the preprocessing module (221) can generate a second transmission signal by removing noise components included in the first transmission signal and removing sound (echo components) output through the speaker (203) (SPK) from the user utterance (or first transmission signal) input through the microphone (201) (MIC). For example, the preprocessing module (221) can generate a second transmission signal by tuning the first transmission signal based on a first setting value. For example, the first setting value may be determined experimentally as a value to improve speech recognition performance (recognition rate) or through a trained artificial intelligence model (e.g., machine learning). For example, the preprocessing module (221) may include at least one filter. The first setting value may include a parameter value for adjusting the performance of at least one filter. For example, the second transmission signal may be a signal tuned with a focus on improving speech recognition performance rather than improving the hearing performance (e.g., intelligibility) of the person on the call (e.g., user or call partner). The preprocessing module (221) may transmit the first transmission signal and the second transmission signal, respectively, to the transmission mute module (222). According to various embodiments, in the present disclosure, the first transmission signal may include a first transmission signal preprocessed based on a specified tuning value. For example, the preprocessed first transmission signal may be a first transmission signal tuned based on a setting value for improving hearing performance rather than a setting value for improving speech recognition performance.
[0052] The transmission mute module (222) may prevent the first transmission signal and the second transmission signal from being transmitted to the call recording module (2221) and the transmission original sound mute module (224) (and / or the first MFC (223)) when the 'transmission blocking setting' is enabled. The transmission blocking setting may be a setting related to a call to prevent signals corresponding to user speech during a call (e.g., the first transmission signal, the second transmission signal, and the third transmission signal (translated transmission signal)) from being transmitted to an external device (e.g., the call partner's device). For example, when the transmission blocking setting is enabled, the first transmission signal and the second transmission signal may not be transmitted to the communication circuit (210). For example, when the transmission blocking setting is enabled, the electronic device (200) may not transmit the first transmission signal, the second transmission signal, and the translated transmission signal (e.g., the third transmission signal) to an external device. For example, if the transmission mute module (222) blocks the first transmission signal and the second transmission signal, the first transmission signal and the second transmission signal are not input to the call recording module (2221), and thus voice recognition and translation for the first transmission signal and the second transmission signal may not be performed. If the transmission mute setting is disabled, the first transmission signal and the second transmission signal may be transmitted to the call recording module (2221) and the transmission original sound mute module (224). For example, the first transmission signal and the second transmission signal may be split and input to the call recording module (2221) and the original sound mute module (224).
[0053] The first MFC (223) can convert the format of the audio signal. For example, the first MFC (223) can perform bit depth, sampling rate, and / or channel output adjustment (e.g., channel selection and change) of the audio signal. The first MFC (223) can operate as a channel selector. For example, when multiple audio signals (multiple channels) are input, the first MFC (223) can select and output at least one audio signal (select at least one channel). For example, when a first transmission signal and a second transmission signal are input, the first MFC (223) can select and output the first transmission signal. The first MFC (223) can transmit the first transmission signal to the transmission source sound mute module (224).
[0054] The transmission original sound mute module (224) can block the transmission of the first transmission signal to the encoder (225) and / or communication circuit (210) when the setting for blocking the transmission original sound (referred to as 'block my voice' in this disclosure) is activated. For example, the transmission original sound mute module (224) may be placed after the path where the first transmission signal is input to the call recording module (2221). By blocking the first transmission signal after the first transmission signal and the second transmission signal are input to the call recording module (2221), the transmission original sound mute module (224) may not affect the voice recognition and translation of the transmission signal (e.g., the second transmission signal) performed by the second processor (240). The transmission original sound mute module (224) can ensure that when the setting for blocking the transmission original sound is activated, the first transmission signal is not transmitted to an external device through the communication circuit (210), and only the third transmission signal (translated transmission signal) is transmitted.
[0055] According to one embodiment, the first processor (220) may receive a third transmission signal (translated transmission signal) from the second processor (240) via kernel audio (231). For example, the third transmission signal may be input to a transmission original sound mute module (224) and / or an encoder (225). For example, when a setting for blocking the transmission original sound is enabled, the transmission original sound mute module (224) blocks the first transmission signal from being input to the encoder (225), and the third transmission signal may be transmitted as an input to the encoder (225). For example, when a setting for blocking the transmission original sound is disabled, the first transmission signal and the third transmission signal may be mixed by a mixer module (not shown) placed between the transmission original sound mute module (224) and the encoder (225) and transmitted to the encoder (225). For example, the third transmission signal may be input to a speaker (203). For example, the speaker (230) can output a sound corresponding to a third transmission signal (e.g., corresponding to the translated user's speech).
[0056] The encoder (225) can encode an audio signal (e.g., a first transmission signal and / or a third transmission signal) based on an audio codec and transmit it to a communication circuit (210).
[0057] The decoder (226) can decode an encoded audio signal (e.g., a first received signal) received through the communication circuit (210) and transmit it to the receiving mute module (227).
[0058] The reception mute module (227) can block the first reception signal from being transmitted to the post-processing module (228) and / or the speaker (203) when the 'reception blocking setting' is enabled. The reception blocking setting may be a setting related to a call to prevent sounds corresponding to signals (e.g., the first reception signal, the second reception signal, and the third reception signal (translated reception signal)) corresponding to the other party's speech during a call from being output through the speaker (203). The reception mute module (227) can transmit the first reception signal to the post-processing module (228) when the reception blocking setting is disabled.
[0059] The post-processing module (228) can post-process the first received signal to generate a second received signal and / or a third received signal. For example, the post-processing module (228) can generate the second received signal and / or the third received signal by adjusting the energy (volume) of the first received signal, applying filters (e.g., tone adjustment, frequency characteristic adjustment), adjusting the dynamic range, removing noise, and / or echo canceling. For example, the post-processing module (228) can generate the second received signal by tuning the first received signal based on a second setting value, and generate the third received signal by tuning the first received signal based on a third setting value. For example, the second setting value and the third setting value may be different. The second setting value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve the user's hearing performance. The third setting value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve speech recognition performance (recognition rate). For example, the post-processing module (228) may include at least one filter. The second setting value and the third setting value may include parameter values for adjusting at least one filter. For example, the second receiving signal may be a signal tuned to focus on improving the hearing performance (e.g., increasing clarity) of the person on the call (e.g., user or call partner), and the third receiving signal may be a signal tuned to focus on improving voice recognition performance. The post-processing module (228) may transmit the second receiving signal and the third receiving signal to the call recording module (2221) and the second MFC (229).
[0060] The second MFC (229) can convert the format of the audio signal. For example, the second MFC (229) can perform bit depth, sampling rate, and / or channel output adjustment (e.g., channel selection and change) of the audio signal. The second MFC (229) can operate as a channel selector. For example, when multiple audio signals (multiple channels) are input, the second MFC (229) can select and output at least one audio signal (select at least one channel). For example, when a second received signal and a third received signal are input, the second MFC (229) can select and output the second received signal. The first MFC (223) can transmit the second received signal to the received original sound mute module (230).
[0061] The receiving original sound mute module (230) can block the second receiving signal from being transmitted to the speaker (203) when the setting for blocking the transmitted original sound (referred to as 'opponent voice blocking' in this disclosure) is activated. For example, the receiving original sound mute module (230) may be placed after the path where the second receiving signal and the third receiving signal are input to the call recording module (2221). By blocking the second receiving signal after the second receiving signal and the third receiving signal are input to the call recording module (2221), the receiving original sound mute module (230) may not affect the voice recognition and translation of the receiving signal (e.g., the second receiving signal and / or the third receiving signal) performed by the second processor (240). The receiving original sound mute module (230) can ensure that when the setting for blocking the receiving original sound is activated, the second receiving signal is not transmitted to the speaker (203), and only the fourth receiving signal (translated receiving signal) is transmitted.
[0062] According to one embodiment, the first processor (220) may receive a fourth received signal (translated received signal) from the second processor (240) via kernel audio (231). For example, the fourth received signal may be input to a speaker (203). For example, if a setting for blocking the received original sound is enabled, the second received signal may be blocked from being input to the speaker (203) by the received original sound mute module (230), and the fourth received signal may be input to the speaker (203). For example, if a setting for blocking the received original sound is disabled, the second received signal and the fourth received signal may be mixed by a mixer module (not shown) placed between the received original sound mute module (230) and the speaker (203) and transmitted to the speaker (203). For example, the fourth received signal may be input to an encoder (225). For example, a fourth received signal (e.g., corresponding to the translated speech of the call partner) can be transmitted to an external device (e.g., call partner device) through an encoder (225) and a communication circuit (210).
[0063] The call recording module (2221) can map each of the first transmission signal, the second transmission signal, the first reception signal, the second reception signal, and the third reception signal to independent channels. For example, the call recording module (2221) can receive channel setting information from the second processor (240) (e.g., an application (e.g., a conversation module (2435))) via an audio HAL (2411). For example, the channel setting information may include information on the channels that map each of the transmission signals and the reception signals. Based on the channel setting information, the call recording module (2221) can map the first reception signal to the first channel, map the first transmission signal to the second channel, map the second transmission signal to the third channel, map the second reception signal to the fourth channel, and map the third reception signal to the fifth channel. The channels to which each of the first transmission signal, the second transmission signal, the first reception signal, the second reception signal, and the third reception signal is mapped are not limited to those described above and may be changed.
[0064] The call recording module (2221) can transmit a first received signal to the kernel audio (231) through a first channel, transmit a first transmitted signal to the kernel audio (231) through a second channel, transmit a second transmitted signal to the kernel audio (231) through a third channel, transmit a second received signal to the kernel audio (231) through a fourth channel, and transmit a third received signal to the kernel audio (231) through a fifth channel. For example, the call recording module (2221) can transmit a first transmitted signal and a first received signal to the second processor (240) through independent channels, and also transmit a second transmitted signal, a second received signal, and / or a third received signal to the second processor (240) through independent channels, thereby enabling the second processor (240) to perform voice recognition on an audio signal (e.g., a second transmitted signal or a fourth received signal) that is relatively more suitable for performing voice recognition than the first transmitted signal and the first received signal.
[0065] According to one embodiment, the kernel (230) may include kernel audio (231) (e.g., kernel audio (420b; 420c; 420d; 420e)). The kernel audio (231) may transmit audio data (e.g., audio signals (e.g., first to third transmission signals and first to fourth reception signals)) between the first processor (220) and the second processor (240).
[0066] According to one embodiment, the second processor (240) may include an audio framework layer (241), a multimedia framework layer (242), and an application layer (243). For example, each component of the audio framework layer (241), the multimedia framework layer (242), and the application layer (243) may be implemented as a software module, in which case the operations of each component may be understood to be performed by the second processor (240).
[0067] The audio framework layer (241) may include an audio HAL (hardware abstraction layer) (2411) (e.g., audio HAL (430b; 430c; 430d; 430e)) and an audio flinger (2413) (e.g., audio flinger (440b; 440c; 440d; 440e)).
[0068] The audio HAL (2411) can transmit each of the first transmission signal, the second transmission signal, the first reception signal, the second reception signal, and the third reception signal, which are transmitted through the kernel audio (231), to the audio flinger (2413) through a separate channel. For example, the audio HAL (2411) can transmit the first reception signal to the audio flinger (2413) through a first channel, transmit the first transmission signal to the audio flinger (2413) through a second channel, transmit the second transmission signal to the audio flinger (2413) through a third channel, transmit the second reception signal to the audio flinger (2413) through a fourth channel, and transmit the third reception signal to the audio flinger (2413) through a fifth channel.
[0069] The audio flinger (2413) can independently transmit the first transmission signal, the second transmission signal, the first reception signal, the second reception signal, and / or the third reception signal to the first audio source module (2421) and the second audio source module (2422), respectively (e.g., using a time-division method). For example, the audio flinger (2413) can transmit the first transmission signal and / or the second transmission signal to the first audio source module (2421) and transmit the first reception signal, the second reception signal, and / or the third reception signal to the second audio source module (2422). For example, if a setting for voice recognition performance enhancement is enabled, the audio flinger (2413) can transmit the second transmission signal to the first audio source module (2421) and transmit the second reception signal and / or the third reception signal to the second audio source module (2422). For example, if a setting for improving voice recognition performance is enabled, the audio flinger (2413) may not transmit the first transmission signal to the first audio source module (2421) and may not transmit the first reception signal to the second audio source module (2422). According to one embodiment, the audio flinger (2413) may generate call recording data (e.g., a call recording file in a specified format) based on the first transmission signal and the first reception signal. For example, the audio flinger (2413) may mix and store the first transmission signal and the first reception signal. The call recording data may be data that stores the content of the conversation between the user and the call partner during a call.
[0070] The multimedia framework layer (242) may include a first audio source module (2421) (e.g., audio source module (780)), a second audio source module (2422) (e.g., audio source module (780)), a first TTS (text to speech) module (e.g., TTS module (772)), and a second TTS module (2424) (e.g., TTS module (772)).
[0071] The first audio source module (2421) can transmit the second transmission signal received from the audio flinger (2413) to the first ASR module (2431).
[0072] The second audio source module (2422) can transmit the second received signal and / or third received signal received from the audio flinger (2413) to the second ASR module (2432).
[0073] The first TTS module (2423) can convert the first content, translated into a specified language (e.g., the language used by the user (call partner) of the specified external device), into an audio signal. For example, the first TTS module (2423) can generate a third transmission signal corresponding to the translated first content.
[0074] The second TTS module (2424) can convert the second content translated into a specified language (e.g., the language used by the user of the specified electronic device (200)) into an audio signal. For example, the second TTS module (2424) can generate a fourth reception signal corresponding to the translated second content.
[0075] The application layer (243) comprises a first speech recognition module (e.g., a first automatic speech recognition (ASR) module) (e.g., a first ASR module (351b; 351c; 451b; 451c; 451d; 451e) or an ASR module (776; 1100)), a second speech recognition module (e.g., a second ASR module (2432)) (e.g., a second ASR module (352b; 352c; 452b; 452c; 452d; 452e) or an ASR module (776; 1100)), a first translation module (2433) (e.g., a first translation module (353b; 353c) or a translation module (774)), a second translation module (2434) (e.g., a second translation module (354b; 354c) or a translation module (774)), and a conversation It may include a module (2435) (e.g., a user interface (770)).
[0076] The first ASR module (2431) can perform speech recognition on the second transmission signal. The first ASR module (2431) can obtain a first content (e.g., a first text) corresponding to the second transmission signal through speech recognition. The first ASR module (2431) can transmit the first content to the first translation module (2433).
[0077] The second ASR module (2432) can perform speech recognition on the second received signal and / or the third received signal. The second ASR module (2432) can obtain second content (e.g., second text) corresponding to the second received signal and / or the third received signal through speech recognition. The second ASR module (2432) can transmit the second content to the second translation module (2434).
[0078] For example, each of the first ASR module (2431) and the second ASR module (2432) can perform voice recognition by receiving a second transmission signal tuned to a first setting value for improving the voice recognition rate, or a second reception signal and / or a third reception signal tuned to a second setting value and a third setting value, rather than the first transmission signal and the first reception signal, thereby improving voice recognition performance compared to the case where voice recognition is performed on the first transmission signal and the first reception signal.
[0079] The first translation module (2433) can translate the first content (e.g., text obtained as a result of speech recognition for the second transmission signal) into another language (e.g., the language used by the user of the external device (the call partner)). The first translation module (2433) can transmit the translated first content to the conversation module (2435).
[0080] The second translation module (2434) can translate the second content (e.g., text obtained as a result of speech recognition for the second received signal and / or the third received signal) into another language (e.g., the language used by the user of the electronic device (200)). The second translation module (2434) can transmit the translated second content to the conversation module (2435).
[0081] The conversation module (2435) can transmit the translated first content to the first TTS module (2423). The conversation module (2435) can transmit the translated second content to the second TTS module (2424). The conversation module (2435) can provide the first content, the second content, the translated first content, and / or the translated second content to the user through a user interface (e.g., a display).
[0082] According to various embodiments, the configuration of the electronic device (200) is not limited to that shown in FIG. 2, and at least some configurations may be omitted, or at least one configuration (e.g., at least one of the components of FIG. 1, 3b, 3c, 4b to 4e, or 9 to 11) may be added. According to one embodiment, at least some of the configurations of the electronic device (200) (e.g., a first translation module (2433) and a second translation module (2434), a first ASR module (2431) and a second ASR module (2432), a first audio source module (2421) and a second audio source module (2422), and / or a first TTS module (2423) and a second TTS module (2424)) may be implemented as a single integrated module.
[0083]
[0084] FIG. 3a is a diagram showing the configuration of an electronic device according to a comparative example, and FIG. 3b and 3c illustrate the configuration of an electronic device according to one embodiment. In the following, descriptions that overlap with FIG. 2 are omitted or briefly described. For example, FIG. 3a to 3c illustrate the configurations of FIG. 2 more briefly, and may show a form in which some configurations of the electronic device (e.g., kernel (230) and framework (e.g., audio framework and multimedia framework)) are omitted.
[0085] Referring to FIG. 3a, the electronic device (300a) may include a microphone (307a), a speaker (309a), a communication circuit (301a), a first processor (303a), and a second processor (305a).
[0086] The microphone (307a) can acquire an audio signal (e.g., a first transmission signal) corresponding to user speech. The speaker (309a) can output a sound corresponding to the audio signal (e.g., a first reception signal and / or a translated first reception signal (Rx')).
[0087] The communication circuit (301a) may include a transmitting module (311a) and a receiving module (312a). For example, the transmitting module (311a) and the receiving module (312a) may include a modem.
[0088] The first processor (303a) (e.g., ADSP) may include an audio playback module (330a), a call recording module (310a), a call forwarding module (320a), a preprocessing module (331a), a transmission mute module (332a), a transmission original sound mute module (333a), a first mixer module (335a), a second mixer module (340a), an MFC (334a), an encoder (336a), a decoder (337a), a reception mute module (338a), and a postprocessing module (339a).
[0089] The preprocessing module (331a) can receive a first transmission signal corresponding to a user utterance received through the microphone (307a). The preprocessing module (331a) can preprocess the first transmission signal. For example, the preprocessing module (331a) can remove noise from the first transmission signal and / or perform echo canceling. For example, the preprocessing module (331a) can remove noise components included in the first transmission signal and remove sound (echo components) output through the speaker (309a) that is input through the microphone (307a). For example, the preprocessing module (331a) can remove signal components corresponding to sound output through the speaker (309a) from the first transmission signal based on a signal output from the postprocessing module (339a) (e.g., a second received signal). The preprocessing module (331a) can transmit the preprocessed first transmission signal to the transmission mute module (332a) through a single channel.
[0090] The transmission mute module (332a) can block the first transmission signal from being transmitted to the call recording module (310a) and the first mixer module (335a) when the transmission blocking setting is enabled. For example, when the transmission blocking setting is enabled, the first transmission signal corresponding to the user's speech and the translated first transmission signal (e.g., Tx') may not be transmitted to an external device. When the transmission blocking setting is disabled, the first transmission signal may be transmitted to the call recording module (310a) and the transmission original sound mute module (333a). For example, the first transmission signal may be split and input to the call recording module (310a) and the transmission original sound mute module (333a).
[0091] The transmission original sound mute module (333a) can block the first transmission signal from being transmitted to the first mixer module (335a), encoder (336a), and / or communication circuit (301a) when the setting for blocking the transmission original sound (referred to as 'block my voice') is activated. For example, the transmission original sound mute module (333a) may be placed after the path where the first transmission signal is input to the call recording module (310a). For example, if the transmission mute module (332a) blocks the first transmission signal, voice recognition and translation for the first transmission signal may not be performed because the first transmission signal is not input to the call recording module (310a). The transmission original sound mute module (333a) can block the first transmission signal after the first transmission signal is input into the call recording module (310a), so that the first transmission signal is not transmitted to an external device through the communication circuit (301a), and only the translated first transmission signal (Tx') is transmitted.
[0092] The MFC (334a) can convert the format of an audio signal (e.g., a first transmission signal and / or a first reception signal). For example, the MFC (334a) can perform bit depth, sampling rate, and / or channel output adjustment of the input audio signal (e.g., changing the number of output channels and / or mapped channels). The MFC (334a) can transmit the translated first transmission signal (Tx') received from the call forwarding module (320a) to the first mixer module (335a).
[0093] The first mixer module (335a) can mix the input signals and transmit them to the encoder (336a). For example, when the first transmission signal and the translated first transmission signal (Tx') are input, the first mixer module (335a) can mix the first transmission signal and the translated first transmission signal (Tx') and transmit them to the encoder (336a).
[0094] The encoder (336a) can encrypt the input signal and transmit it to the communication circuit (301a). For example, the encoder (336a) can convert the input signal into a specified format and / or compress it. For example, the encoder (336a) can encode the digital signal (e.g., the first transmission signal or the translated first transmission signal) converted from the analog signal into a speech codec supported by the network.
[0095] The decoder (337a) can decode the first received signal received from the communication circuit (301a). For example, the decoder (337a) can convert the first received signal into a specified format or restore the compressed first received signal to its original signal. For example, the decoder (337a) can convert a speech frame encoded with a speech codec into pulse code modulation (PCM) (e.g., the first received signal). The decoder (337a) can transmit the decoded first received signal to the receiving mute module (338a). For example, the first received signal output from the decoder (337a) can be split and input to the receiving mute module (338a) and the call recording module (310a).
[0096] The reception mute module (338a) can block the first reception signal from being transmitted to the post-processing module (339a) (and / or speaker (309a)) when the reception blocking setting is enabled. The reception mute module (338a) can transmit the first reception signal to the post-processing module (339a) when the reception blocking setting is disabled.
[0097] The post-processing module (339a) can post-process the first received signal to generate a second received signal. For example, the post-processing module (339a) can generate the second received signal by adjusting the energy (volume) of the first received signal, applying filters (e.g., tone adjustment, frequency characteristic adjustment), adjusting the dynamic range, removing noise, and / or echo canceling. The post-processing module (339a) can transmit the second received signal to the second mixer module (340a).
[0098] The second mixer module (340a) can mix the input signals and transmit them to the encoder (336a). For example, when the second received signal and the translated first received signal (Rx') are input, the second mixer module (340a) can mix the second received signal and the translated first received signal and transmit them to the speaker (309a).
[0099] The audio playback module (330a) can operate as a path for transmitting sound (e.g., a translated first received signal (Rx')). For example, the audio playback module (330a) can transmit the translated first received signal (Rx') from the second processor (305a) to the first processor (303a) (e.g., a second mixer module (340a)).
[0100] The call forwarding module (320a) can operate as a path for transmitting an audio signal (e.g., a translated first transmission signal (Tx')). For example, the call forwarding module (320a) can transmit the translated first transmission signal (Tx') from the second processor (305a) to the first processor (303a) (e.g., MFC (334a)).
[0101] The call recording module (310a) may operate as a path through which an audio signal (e.g., a first transmission signal and / or a first reception signal) is transmitted from the first processor (303a) to the second processor (305a). The call recording module (310a) may map the first transmission signal and the first transmission signal to independent channels. For example, the call recording module (310a) may map the first reception signal and the first transmission signal to different channels based on channel setting information received from the second processor (305a). In this disclosure, it is described that the first reception signal is mapped to the first channel and the first transmission signal is mapped to the second channel, but this is not limited thereto, and the channels to which each signal is mapped may be changed. For example, the call recording module (310a) may map the first reception signal to the first channel and the first transmission signal to the second channel. The call recording module (310a) can transmit a first received signal to a second processor (305a) (e.g., a second ASR module (352a)) through a first channel and transmit a first transmitted signal to a second processor (305a) (e.g., a first ASR module (351a)) through a second channel. For example, the call recording module (310a) can transmit the first transmitted signal and the first received signal to the second processor (305a) through independent, different channels, thereby enabling the second processor (305a) to perform voice recognition on each of the first transmitted signal and the first received signal without separating the mixed first transmitted signal and the first received signal again. For example, since there is no need to separate or extract the first transmitted signal from the mixed signal, no additional process for voice recognition is required, and voice recognition performance can be improved by accurately distinguishing and processing the first transmitted signal and the first received signal.
[0102] According to one embodiment, the second processor (305a) may include a first ASR module (351a), a second ASR module (352a), a first translation module (353a), and a second translation module (354a). According to one embodiment, the first ASR module (351a) and the second ASR module (352a) may be implemented as a single module, and the first translation module (353a) and the second translation module (354a) may be implemented as a single module.
[0103] The first ASR module (351a) can receive a first transmission signal through a second channel and generate first content (e.g., first text) through voice recognition of the first transmission signal. The first ASR module (351a) can transmit the first content to the first translation module (353a) when the translation function during a call is activated. The second ASR module (352a) can receive a first reception signal through a first channel and generate second content (e.g., second text) through voice recognition of the first reception signal. The second ASR module (352a) can transmit the second content to the second translation module (354a) when the translation function during a call is activated. For example, the first ASR module (351a) and the second ASR module (352a) can each receive the first transmission signal or the first reception signal, respectively, through a channel separated from the first processor (303a) (e.g., call recording module (310a)) to perform voice recognition.
[0104] The first translation module (353a) can translate the first content into another language (e.g., the language used by the call partner). For example, the first translation module (353a) can transmit an audio signal (e.g., the translated first transmission signal (Tx')) corresponding to the first content translated through the call transmission module (320a) to the first processor (303a).
[0105] The second translation module (354a) can translate the second content into another language (e.g., the user's language). For example, the second translation module (354a) can transmit an audio signal (e.g., the translated first reception signal (Rx')) corresponding to the translated second content to the first processor (303a) through the audio playback module (330a).
[0106]
[0107] Referring to FIG. 3b, an electronic device (300b) (e.g., electronic device (100; 200; 300c; 400b; 400c; 400d; 400e; 900; 1001)) may include a microphone (307b) (e.g., microphone (110; 201)), a speaker (309b) (e.g., speaker (120; 203)), a communication circuit (301b) (e.g., communication circuit (130; 210; 960)), a first processor (303b) (e.g., first processor (150; 220), processor (910), or communication processor (1060)), and a second processor (305b) (e.g., second processor (160; 240) or processor (910; 1020)). Compared to FIG. 3a, the electronic device (300b) can transmit audio signals (e.g., a second transmission signal, a second reception signal, and / or a third reception signal) tuned from the first processor (303b) to the second processor (305b) to improve voice recognition performance.
[0108] The microphone (307b) can acquire an audio signal (e.g., a first transmission signal) corresponding to user speech. The speaker (309b) can output sound corresponding to the audio signal (e.g., a second reception signal and / or a fourth reception signal (Rx')).
[0109] The communication circuit (301b) may include a transmitting module (311b) and a receiving module (312b). For example, the transmitting module (311b) and the receiving module (312b) may include a modem.
[0110] The first processor (303b) (e.g., ADSP) includes an audio playback module (330b) (e.g., audio playback module (330c)), a call recording module (310b) (e.g., call recording module (2221; 410b; 410c; 410d; 410e; 786)), a call forwarding module (320b) (e.g., call forwarding module (788)), a preprocessing module (331b) (e.g., preprocessing module (221) or transmission signal processing module (790)), a transmission mute module (332b) (e.g., transmission mute module (222)), a first MFC (333b), a transmission original sound mute module (334b) (e.g., transmission original sound mute module (224)), a second MFC (335b), a first mixer module (336b), and a second MFC (335b). It may include an encoder (337b) (e.g., encoder (225; 792)), a decoder (338b) (e.g., decoder (226; 798)), a receiving mute module (339b) (e.g., receiving mute module (227)), and a post-processing module (340b) (e.g., post-processing module (228) or receiving signal processing module (794)), a third MFC (341b), a receiving original sound mute module (342b) (e.g., receiving original sound mute module (230)), and a second mixer module (343b).
[0111] The preprocessing module (331b) can receive a first transmission signal corresponding to a user utterance received through the microphone (307b). The preprocessing module (331b) can preprocess the first transmission signal to generate a second transmission signal. For example, the preprocessing module (331b) can generate a second transmission signal by removing noise from the first transmission signal or / or performing echo canceling. For example, the preprocessing module (331b) can generate a second transmission signal by removing noise components included in the first transmission signal and removing sound (echo components) output through the speaker (309b) that is input through the microphone (307b). For example, the preprocessing module (331b) can remove signal components corresponding to sound output through the speaker (309b) from the first transmission signal based on a signal output from the postprocessing module (e.g., a second received signal). For example, the preprocessing module (331b) can generate a second transmission signal by tuning a first transmission signal based on a first setting value. For example, the first setting value may be determined experimentally as a value to improve speech recognition performance (recognition rate) or through a trained artificial intelligence model (e.g., machine learning). For example, the second transmission signal may be a signal tuned with a focus on improving speech recognition performance rather than improving the hearing performance (e.g., intelligibility) of the person on the call (e.g., user or call partner). The preprocessing module (331b) may transmit each of the first transmission signal and the second transmission signal to the transmission mute module (332b). According to various embodiments, in the present disclosure, the first transmission signal may include a first transmission signal preprocessed based on a specified tuning value. For example, the preprocessed first transmission signal may be a first transmission signal tuned based on a setting value to improve hearing performance rather than a setting value to improve speech recognition performance.
[0112] The transmission mute module (332b) can block the first transmission signal and the second transmission signal from being transmitted to the call recording module (310b) and the first MFC (333b) when the transmission blocking setting is enabled. For example, when the transmission blocking setting is enabled, the first transmission signal corresponding to user speech and the translated transmission signal (e.g., the third transmission signal (Tx')) may not be transmitted to an external device. The transmission mute module (332b) can transmit the first transmission signal and the second transmission signal to the call recording module (310b) and the first MFC (333b) when the transmission blocking setting is disabled.
[0113] When multiple audio signals (multiple channels) are input to the first MFC (333b), it can select and output at least one audio signal (select at least one channel). For example, when a first transmission signal and a second transmission signal are input to the first MFC (333b), it can select and output the first transmission signal. The first MFC (333b) can transmit the first transmission signal to the transmission source sound mute module (334b).
[0114] The transmission original sound mute module (334b) can block the first transmission signal from being transmitted to the first mixer module (336b), encoder (337b), and / or communication circuit (301b) when the setting for blocking the transmission original sound (referred to as 'block my voice') is activated.
[0115] The second MFC (335b) can convert the format of an audio signal (e.g., a third transmission signal). The second MFC (335b) can transmit the third transmission signal (Tx') (i.e., the translated transmission signal) received from the call forwarding module (320b) to the first mixer module (336b).
[0116] The first mixer module (336b) can mix the input signals and transmit them to the encoder (337b). For example, when the first transmission signal and the third transmission signal (Tx') are input, the first mixer module (336b) can mix the first transmission signal and the third transmission signal (Tx') and transmit them to the encoder (337b). For example, when only the third transmission signal is input, the first mixer module (336b) can transmit the third transmission signal to the encoder (337b).
[0117] The encoder (337b) can encrypt the input signal and transmit it to the communication circuit (301b). For example, the encoder (337b) can encode the first transmission signal and / or the third transmission signal into a speech codec supported by the network.
[0118] The decoder (338b) can decode the first received signal received from the communication circuit (301b). For example, the decoder (338b) can convert the first received signal into a specified format or restore the compressed first received signal to its original signal. The decoder (338b) can transmit the decoded first received signal to the reception mute module (339b) and the call recording module (310b).
[0119] The reception mute module (339b) can block the first reception signal from being transmitted to the post-processing module when the reception blocking setting is enabled. The reception mute module (339b) can transmit the first reception signal to the post-processing module when the reception blocking setting is disabled.
[0120] The post-processing module (340b) can post-process the first received signal to generate a second received signal and / or a third received signal. For example, the post-processing module (340b) can generate the second received signal and / or the third received signal by performing energy (volume) adjustment, filter application (e.g., tone adjustment, frequency characteristic adjustment), dynamic range adjustment, noise removal, and / or echo canceling of the first received signal. For example, the post-processing module (340b) can generate the second received signal by tuning the first received signal based on a second setting value, and generate the third received signal by tuning the first received signal based on a third setting value. For example, the second received signal may be a signal tuned with a focus on improving the hearing performance (e.g., increasing clarity) of the person on the call (e.g., user or call partner), and the third received signal may be a signal tuned with a focus on improving voice recognition performance. For example, the second received signal may be output through the speaker (309b) and correspond to the sound heard by the user. The post-processing module may transmit the second received signal and / or the third received signal to the call recording module (310b) and the third MFC (341b). According to one embodiment, the post-processing module (340b) may determine the number of output channels based at least partially on the number of speakers (309b). For example, if there are multiple speakers (309b), the post-processing module (340b) may generate a second received signal by tuning the first received signal based on a second setting value, and output the second received signal by mapping it to a channel corresponding to each of the multiple speakers (309b). For example, if the speaker (309b) of the electronic device (300b) includes a left speaker and a right speaker, the post-processing module (340b) can output a second receiving signal corresponding to the left speaker, a second receiving signal corresponding to the right speaker, and a third receiving signal.For example, in FIG. 3b, the '3CH' output of the post-processing module (340b) indicates a case where the electronic device (300b) includes two speakers (309b) and outputs a second receiving signal and a third receiving signal corresponding to each of the two speakers, but the embodiments of the present disclosure are not limited thereto.
[0121] When multiple audio signals (multiple channels) are input to the third MFC (341b), it can select and output at least one audio signal (select at least one channel). For example, when the second reception signal and the fourth reception signal are input to the third MFC (341b), it can select and output the second reception signal. The third MFC (341b) can transmit the second reception signal to the reception original sound mute module (342b). For example, when the third MFC (341b) receives multiple second reception signals (e.g., multiple second reception signals corresponding to each of the multiple speakers (309b)) from the post-processing module (340b), it can transmit the multiple second reception signals to the third MFC module (341b).
[0122] The receiving original sound mute module (342b) can block the second receiving signal from being transmitted to the second mixer module (343b) and / or speaker (309b) when the setting for blocking the transmitted original sound (referred to as 'opponent voice blocking' in this disclosure) is activated. The receiving original sound mute module (342b) can transmit the second receiving signal to the second mixer module (343b) when the setting for blocking the receiving original sound is deactivated. According to one embodiment, the receiving original sound mute module (342b) can transmit the second receiving signal received from the third MFC (341b) to the second mixer (343b). For example, the number of channels of the second receiving signal transmitted by the receiving original sound mute module (342b) to the second mixer (343b) can be determined based at least partially on the number of speakers (309b) of the electronic device (300b). For example, if there is only one speaker (309b) of the electronic device (300b), the receiving original sound mute module (342b) can transmit a second receiving signal to a second mixer (343b) through one channel. If there are multiple speakers (309b) of the electronic device (300b), the receiving original sound mute module (342b) can transmit a second receiving signal corresponding to each of the multiple speakers (309b) to a second mixer (343b) through multiple channels. In FIG. 3b, the output of '1CH or 2CH' of the receiving original sound mute module (342b) indicates that a second receiving signal corresponding to the number of speakers (309b) is output assuming there is one or two speakers (309b), but the embodiments of the present disclosure are not limited thereto.
[0123] The second mixer module (343b) can mix the input signals and transmit them to the encoder (337b). For example, when the second received signal and the fourth received signal (Rx') (translated received signal) are input, the second mixer module (343b) can mix the second received signal and the fourth received signal (Rx') (translated received signal) and transmit them to the speaker (309b). When only the fourth received signal (Rx') is input, the second mixer module (343b) can transmit the fourth received signal to the speaker (309b).
[0124] The audio playback module (330b) can operate as a path for transmitting sound (e.g., a fourth received signal (Rx')). For example, the audio playback module (330b) can transmit the fourth received signal (Rx') from the second processor (305b) to the first processor (303b) (e.g., a second mixer module (343b)).
[0125] The call forwarding module (320b) can operate as a path for transmitting an audio signal (e.g., a third transmission signal (Tx')). For example, the call forwarding module (320b) can transmit the third transmission signal (Tx') from the second processor (305b) to the first processor (303b) (e.g., the second MFC (335b)).
[0126] The call recording module (310b) can operate as a path through which audio signals (e.g., a first transmission signal, a second transmission signal, a first reception signal, a second reception signal, and a third reception signal) are transmitted from the first processor (303b) to the second processor (305b). The call recording module (310b) can map each of the first transmission signal, the second transmission signal, the first reception signal, the second reception signal, and the third reception signal to independent channels. For example, the call recording module (310b) can map each of the first transmission signal, the second transmission signal, the first reception signal, the second reception signal, and the third reception signal to different channels based on channel setting information received from the second processor (305b). For example, the call recording module (310b) can map a first received signal to a first channel, map a first transmitted signal to a second channel, map a second transmitted signal to a third channel, map a second received signal to a fourth channel, and map a third received signal to a fifth channel. The call recording module (310b) can transmit the second transmitted signal to a second processor (305b) (e.g., a first ASR module (351b)) through the third channel, and transmit the second received signal and the third received signal to a second processor (305b) (e.g., a second ASR module (352b)) through the fourth and fifth channels.
[0127] According to one embodiment, the second processor (305b) may include a first ASR module (351b) (e.g., first ASR module (2431; 451b; 451c; 451d; 451e) or ASR module (776; 1100)), a second ASR module (352b) (e.g., second ASR module (2432; 452b; 452c; 452d; 452e) or ASR module (776; 1100)), a first translation module (353b) (e.g., first translation module (2433) or translation module (774)), and a second translation module (354b) (e.g., second translation module (2434) or translation module (774)). According to one embodiment, the first ASR module (351b) and the second ASR module (352b) can be implemented as one module, and the first translation module (353b) and the second translation module (354b) can be implemented as one module.
[0128] The first ASR module (351b) can receive a second transmission signal through a second channel and generate first content (e.g., first text) through voice recognition of the second transmission signal. The first ASR module (351b) can transmit the first content to the first translation module (353b) when the translation function is activated during a call. The second ASR module (352b) can receive a second reception signal through a fourth channel and / or receive a third reception signal through a fifth channel. The second ASR module (352b) can generate second content (e.g., second text) through voice recognition of the second reception signal and / or the third reception signal. The second ASR module (352b) can transmit the second content to the second translation module (354b) when the translation function is activated during a call.
[0129] The first translation module (353b) can translate the first content into another language (e.g., the language used by the call partner). For example, the first translation module (353b) can transmit an audio signal (e.g., a third transmission signal (Tx')) corresponding to the first content translated through the call transmission module (320b) to the first processor (303b).
[0130] The second translation module (354b) can translate the second content into another language (e.g., the user's language). For example, the second translation module (354b) can transmit an audio signal (e.g., a fourth reception signal (Rx')) corresponding to the translated second content to the first processor (303b) through the audio playback module (330b).
[0131]
[0132] Referring to FIG. 3c, an electronic device (300c) (e.g., electronic device (100; 200; 300b; 400b; 400c; 400d; 400e; 900; 1001)) may include a microphone (307c) (e.g., microphone (110; 201)), a speaker (309c) (e.g., speaker (120; 203)), a communication circuit (301c) (e.g., communication circuit (130; 210; 960)), a first processor (303c) (e.g., first processor (150; 220), processor (910), or communication processor (1060)), and a second processor (305c) (e.g., second processor (160; 240) or processor (910; 1020)). Compared to FIG. 3b, the electronic device (300c) may include a plurality of call recording modules (a first call recording module (3101c) and a second call recording module (3103c)) (e.g., call recording modules (2221; 410b; 410c; 410d; 410e; 786)).
[0133] The microphone (307c) can acquire an audio signal (e.g., a first transmission signal) corresponding to user speech. The speaker (309c) can output sound corresponding to the audio signal (e.g., a second reception signal and / or a fourth reception signal (Rx')).
[0134] The communication circuit (301c) may include a transmitting module (311c) and a receiving module (312c). For example, the transmitting module (311c) and the receiving module (312c) may include a modem.
[0135] The first processor (303c) (e.g., ADSP) comprises an audio playback module (330c), a first call recording module (3101c) (e.g., call recording module (2221; 410b; 410d; 410e; 786)), a second call recording module (3103c) (e.g., call recording module (2221; 410b; 410d; 410e; 786)), a call forwarding module (320c), a preprocessing module (331c) (e.g., preprocessing module (221) or transmission signal processing module (790)), a transmission mute module (332c) (e.g., transmission mute module (222)), a first MFC (333c), a transmission original sound mute module (334c) (e.g., transmission original sound mute module (224)), a second MFC (335c), and a first mixer module (336c). It may include a second MFC (335c), an encoder (337c) (e.g., encoder (225; 792)), a decoder (338c) (e.g., decoder (226; 798)), a receiving mute module (339c) (e.g., receiving mute module (227)), and a post-processing module (340c) (e.g., post-processing module (228) or receiving signal processing module (794)), a third MFC (341c), a receiving original sound mute module (342c) (e.g., receiving original sound mute module (230)), and a second mixer module (343c).
[0136] The preprocessing module (331c) can receive a first transmission signal corresponding to a user utterance received through the microphone (307c). The preprocessing module (331c) can preprocess the first transmission signal to generate a second transmission signal.
[0137] The transmission mute module (332c) can block the first transmission signal and the second transmission signal from being transmitted to the first call recording module (3101c), the second call recording module (3103c), and the first MFC (333c) when the transmission blocking setting is enabled. The transmission mute module (332c) can transmit the first transmission signal and the second transmission signal to the first call recording module (3101c), the second call recording module (3103c), and the first MFC (333c) when the transmission blocking setting is disabled.
[0138] When multiple audio signals (multiple channels) are input to the first MFC (333c), it can select and output at least one audio signal (select at least one channel). For example, when a first transmission signal and a second transmission signal are input to the first MFC (333c), it can select and output the first transmission signal. The first MFC (333c) can transmit the first transmission signal to the transmission source sound mute module (334c).
[0139] The transmission original sound mute module (334c) can block the first transmission signal from being transmitted to the first mixer module (336c), encoder (337c), and / or communication circuit (301c) when the setting for blocking the transmission original sound (referred to as 'block my voice') is activated.
[0140] The second MFC (335c) can convert the format of an audio signal (e.g., a third transmission signal). The second MFC (335c) can transmit the third transmission signal (Tx') (i.e., the translated transmission signal) received from the call forwarding module (320c) to the first mixer module (336c).
[0141] The first mixer module (336c) can mix the input signals and transmit them to the encoder (337c). For example, when the first transmission signal and the third transmission signal (Tx') are input, the first mixer module (336c) can mix the first transmission signal and the third transmission signal (Tx') and transmit them to the encoder (337c). For example, when only the third transmission signal is input, the first mixer module (336c) can transmit the third transmission signal to the encoder (337c).
[0142] The encoder (337c) can encrypt the input signal and transmit it to the communication circuit (301c). For example, the encoder (337c) can encode the first transmission signal and / or the third transmission signal into a speech codec supported by the network.
[0143] The decoder (338c) can decode the first received signal received from the communication circuit (301c). The decoder (338c) can transmit the decoded first received signal to the reception mute module (339c) and the first call recording module (3101c).
[0144] The reception mute module (339c) can block the first reception signal from being transmitted to the post-processing module when the reception blocking setting is enabled. The reception mute module (339c) can transmit the first reception signal to the post-processing module (340c) when the reception blocking setting is disabled.
[0145] The post-processing module (340c) can post-process the first received signal to generate a second received signal and / or a third received signal. For example, the post-processing module (340c) can generate a second received signal by tuning the first received signal based on a second setting value, and generate a third received signal by tuning the first received signal based on a third setting value. The post-processing module (340c) can transmit the second received signal and / or the third received signal to the second call recording module (3103c) and the third MFC (341c).
[0146] When multiple audio signals (multiple channels) are input, the third MFC (341c) can select and output at least one audio signal (select at least one channel). For example, when the second received signal and the fourth received signal are input, the third MFC (341c) can select and output the second received signal. The third MFC (341c) can transmit the second received signal to the received original sound mute module (342c).
[0147] The receiving original sound mute module (342c) can block the second receiving signal from being transmitted to the second mixer module (343c) and / or the speaker (309c) when the setting for blocking the transmitted original sound (referred to as 'blocking the other party's voice' in this disclosure) is activated. The receiving original sound mute module (342c) can transmit the second receiving signal to the second mixer module (343c) when the setting for blocking the receiving original sound is deactivated.
[0148] The second mixer module (343c) can mix the input signals and transmit them to the encoder (337c). For example, when the second received signal and the fourth received signal (Rx') (translated received signal) are input, the second mixer module (343c) can mix the second received signal and the fourth received signal (Rx') (translated first received signal) and transmit them to the speaker (309c). When only the fourth received signal (Rx') is input, the second mixer module (343c) can transmit the fourth received signal to the speaker (309c).
[0149] The audio playback module (330c) can operate as a path for transmitting sound (e.g., a fourth received signal (Rx')). For example, the audio playback module (330c) can transmit the fourth received signal (Rx') from the second processor (305c) to the first processor (303c) (e.g., a second mixer module (343c)).
[0150] The call forwarding module (320c) can operate as a path for transmitting an audio signal (e.g., a third transmission signal (Tx')). For example, the call forwarding module (320c) can transmit the third transmission signal (Tx') from the second processor (305c) to the first processor (303c) (e.g., the second MFC (335c)).
[0151] The first call recording module (3101c) can operate as a path through which audio signals (e.g., a first transmission signal and a first reception signal) are transmitted from the first processor (303c) to the second processor (305c). The first call recording module (3101c) can map the first transmission signal and the first reception signal to independent channels, respectively. For example, the first call recording module (3101c) can map the first transmission signal and the first reception signal to different channels based on channel setting information received from the second processor (305c). For example, the first call recording module (3101c) can map the first reception signal to the first-1 channel and the first transmission signal to the first-2 channel. The first call recording module (3101c) can transmit a first received signal to the second processor (305c) through the first-1 channel and transmit a first transmitted signal to the second processor (305c) through the first-2 channel.
[0152] The second call recording module (3103c) can operate as a path through which audio signals (e.g., a second transmission signal, a second reception signal, and a third reception signal) are transmitted from the first processor (303c) to the second processor (305c). The second call recording module (3103c) can map each of the second transmission signal, the second reception signal, and the third reception signal to independent channels. For example, the second call recording module (3103c) can map each of the second transmission signal, the second reception signal, and the third reception signal to different channels based on channel setting information received from the second processor (305c). For example, the second call recording module (3103c) can map the second transmission signal to channel 2-1, map the second reception signal to channel 2-2, and map the third reception signal to channel 2-3. The second call recording module (3103c) can transmit a second transmission signal to a second processor (305c) (e.g., a first ASR module (351c)) through a second-1 channel, transmit a second reception signal to a second processor (305c) (e.g., a second ASR module (352c)) through a second-2 channel, and transmit a third reception signal to a second processor (305c) (e.g., a second ASR module (352c)) through a second-3 channel.
[0153] In FIG. 3c, it is described that the first call recording module (3101c) transmits the first receiving signal and the first transmitting signal to the second processor (305c), and the second call recording module (3103c) transmits the second transmitting signal, the second receiving signal, and the third receiving signal to the second processor (305c); however, the embodiments of the present disclosure are not limited thereto, and according to various embodiments, the audio signals received by the first call recording module (3101c) and the second call recording module (3103c), the audio signals transmitted to the second processor (305c), and / or the channels to which the audio signals are mapped may be changed.
[0154] According to one embodiment, the second processor (305c) may include a first ASR module (351c), a second ASR module (352c), a first translation module (353c), and a second translation module (354c). According to one embodiment, the first ASR module (351c) and the second ASR module (352c) may be implemented as a single module, and the first translation module (353c) and the second translation module (354c) may be implemented as a single module.
[0155] The first ASR module (351c) can receive a second transmission signal through the second-1 channel and generate first content (e.g., first text) through voice recognition of the second transmission signal. The first ASR module (351c) can transmit the first content to the first translation module (353c) when the translation function is activated during a call. The second ASR module (352c) can receive a second reception signal through the second-2 channel and / or receive a third reception signal through the second-3 channel. The second ASR module (352c) can generate second content (e.g., second text) through voice recognition of the second reception signal and / or the third reception signal. The second ASR module (352c) can transmit the second content to the second translation module (354c) when the translation function is activated during a call.
[0156] The first translation module (353c) can translate the first content into another language (e.g., the language used by the call partner). For example, the first translation module (353c) can transmit an audio signal (e.g., a third transmission signal (Tx')) corresponding to the first content translated through the call transmission module (320c) to the first processor (303c).
[0157] The second translation module (354c) can translate the second content into another language (e.g., the user's language). For example, the second translation module (354c) can transmit an audio signal (e.g., a fourth reception signal (Rx')) corresponding to the translated second content to the first processor (303c) through the audio playback module (330c).
[0158] According to one embodiment, the second processor (305c) may further include a call recording module (not shown). For example, the call recording module may receive a first receiving signal through a first-1 channel and receive a first transmitting signal through a first-2 channel. The call recording module may generate and store call recording data based on the first receiving signal and the first transmitting signal.
[0159] According to various embodiments, the configuration of the electronic device is not limited to that shown in FIG. 3b or 3c, and at least some configurations may be omitted, or at least one configuration (e.g., at least one of the components of FIG. 1, 2, 4b to 4e, or 9 to 11) may be added. According to one embodiment, at least some of the configurations of the electronic device may be implemented by integrating them into a single module.
[0160]
[0161] FIG. 4a is a diagram showing the configuration of an electronic device according to a comparative example, and FIGS. 4b to 4e are diagrams showing the configuration of an electronic device according to one embodiment. Hereinafter, descriptions that overlap with FIGS. 2 and 3a to 3c are omitted or briefly described.
[0162] Referring to FIG. 4a, the electronic device (400a) may include a first processor, kernel audio (420a), and a second processor.
[0163] The first processor may include a call recording module (410a). The call recording module (410a) may include at least one MFC (e.g., a first MFC (411a) and a second MFC (412a)) and a multiplexing / demultiplexing module (415a).
[0164] The first MFC (411a) can receive a received signal through a communication circuit during a call via a single channel (mono channel). For example, the received signal may be a signal output from a decoder (not shown). The first MFC (411a) can convert the received signal into a specified format (e.g., convert bit depth, sampling rate, and channel output to specified values) and transmit it to a multiplexing / demultiplexing module (415a).
[0165] The second MFC (412a) can receive a transmission signal obtained through a microphone during a call via a single channel (mono channel). For example, the transmission signal may be a signal input to an encoder for transmission to an external device. The second MFC (412a) can convert the transmission signal into a specified format (e.g., convert bit depth, sampling rate, and channel output to specified values) and transmit it to a multiplexing / demultiplexing module (415a).
[0166] The multiplexing / demultiplexing module (415a) can map the received signal and the transmitted signal to different multiple channels (e.g., stereo channels). For example, the multiplexing / demultiplexing module (415a) can map channels to the received signal and the transmitted signal, respectively, based on channel setting information received from the second processor. For example, the multiplexing / demultiplexing module (415a) can map the received signal to the first channel and the transmitted signal to the second channel. The multiplexing / demultiplexing module (415a) can transmit the received signal to the kernel audio (420a) through the first channel and transmit the transmitted signal to the kernel audio (420a) through the second channel.
[0167] The kernel audio (420a) can transmit a received signal to a second processor (e.g., audio HAL (430a)) through a first channel and transmit a transmitted signal to a second processor (e.g., audio HAL (430a)) through a second channel.
[0168] The second processor may include an audio HAL (430a), an audio flinger (440a), and at least one ASR module (e.g., a first ASR module (451a) and a second ASR module (452a)). The audio HAL (430a) may transmit a received signal to the audio flinger (440a) through a first channel and transmit a transmitted signal to the audio flinger (440a) through a second channel. The audio flinger (440a) may transmit a received signal to the second ASR module (452a) through a first channel and transmit a transmitted signal to the first ASR module (451a) through a second channel.
[0169] The first ASR module (451a) can perform speech recognition on the transmitted signal to obtain a first content (e.g., a first text) corresponding to the transmitted signal.
[0170] The second ASR module (452a) can perform speech recognition on the received signal to obtain second content (e.g., second text) corresponding to the received signal.
[0171] For example, the first ASR module (451a) and the second ASR module (452a) can each independently receive only the transmitted signal or the received signal. As the transmitted signal and the received signal are transmitted and processed independently, the first ASR module (451a) and the second ASR module (452a) do not need to perform a separate process for distinguishing, extracting, or separating voice signals, and voice recognition can be performed by clearly distinguishing the transmitted signal and the received signal, thereby improving voice recognition performance.
[0172]
[0173] Referring to FIG. 4b, an electronic device (400b) (e.g., electronic device (100; 200; 300b; 300c; 400c; 400d; 400e; 900; 1001)) may include a first processor (e.g., first processor (150; 220; 303b; 303c), processor (910), or communication processor (1060)), kernel audio (420b) (e.g., kernel audio (231)), and a second processor (e.g., second processor (160; 240) or processor (910; 1020)). The first processor may include a call recording module (410b) (e.g., call recording module (2221; 310b; 3101c; 3103c; 786)). The call recording module (410b) may include at least one MFC (e.g., a first MFC (411b), a second MFC (412b), and a third MFC (413b)) and a multiplexing / demultiplexing module (415b).
[0174] The call recording module (410b) can receive a first reception signal through a single channel (mono channel) at the first data port (417b). The call recording module (410b) can receive a first transmission signal obtained through a microphone during a call and a second transmission signal tuned to a first setting value through a plurality of channels (stereo channels) at the second data port (418b). The call recording module (410b) can receive a second reception signal tuned to a second setting value and a third reception signal tuned to a first reception signal based on a third setting value through a plurality of channels (stereo channels) at the third data port (419b).
[0175] The first MFC (411b) can receive a first received signal through a communication circuit during a call via a single channel (mono channel). For example, the first received signal may be a signal output from a decoder (not shown). The first MFC (411b) can convert the first received signal into a specified format (e.g., convert bit depth, sampling rate, and channel output to specified values). The first MFC (411b) can transmit the first received signal to a multiplexing / demultiplexing module (415b).
[0176] The second MFC (412b) can receive a first transmission signal obtained through a microphone during a call via multiple channels (stereo channels) and a second transmission signal obtained by tuning the first transmission signal to a first setting value. For example, the first transmission signal may be a signal input to an encoder for transmission to an external device. The second transmission signal may be a signal generated by tuning the first transmission signal based on the first setting value to improve voice recognition performance. The second MFC (412b) can convert the first transmission signal and the second transmission signal into a specified format. The second MFC (412b) can transmit the first transmission signal and the second transmission signal to a multiplexing / demultiplexing module (415b).
[0177] The third MFC (413b) can receive a second receiving signal, which is a first receiving signal tuned based on a second setting value, and a third receiving signal, which is a first receiving signal tuned based on a third setting value, through multiple channels (stereo channels). For example, the second receiving signal may be a signal generated by tuning the first receiving signal based on the second setting value to improve hearing performance, and the third receiving signal may be a signal generated by tuning the first receiving signal based on the third setting value to improve voice recognition performance. The third MFC (413b) can convert the second receiving signal and the third receiving signal into a specified format. The third MFC (413b) can transmit the second receiving signal and the third receiving signal to a multiplexing / demultiplexing module (415b).
[0178] The multiplexing / demultiplexing module (415b) can map the received signal and the transmitted signal to different multiple channels. For example, the multiplexing / demultiplexing module (415b) can map channels to each of the first transmitted signal, the second transmitted signal, the first received signal, the second received signal, and the third received signal based on channel setting information received from the second processor. For example, the multiplexing / demultiplexing module (415b) can map the first received signal to the first channel, the first transmitted signal to the second channel, the second transmitted signal to the third channel, the second received signal to the fourth channel, and the third received signal to the fifth channel. The multiplexing / demultiplexing module (415b) can transmit each of the first received signal, the first transmitted signal, the second transmitted signal, the second received signal, and the third received signal to the kernel audio (420b) through each of the mapped channels.
[0179] The kernel audio (420b) can transmit a first received signal to a second processor (e.g., audio HAL (430b)) through a first channel, transmit a first transmitted signal to a second processor (e.g., audio HAL (430b)) through a second channel, transmit a second transmitted signal to a second processor (e.g., audio HAL (430b)) through a third channel, transmit a second received signal to a second processor (e.g., audio HAL (430b)) through a fourth channel, and transmit a third received signal to a second processor (e.g., audio HAL (430b)) through a fifth channel.
[0180] The second processor may include an audio HAL (430b) (e.g., audio HAL (2411; 784)), an audio flinger (440b) (e.g., audio flinger (2413; 782)), and at least one ASR module (e.g., a first ASR module (451b) (e.g., a first ASR module (2431; 351b; 351c) or an ASR module (776; 1100)) and a second ASR module (452b) (e.g., a second ASR module (2432; 352b; 352c) or an ASR module (776; 1100)). The audio HAL (430b) transmits a first received signal to the audio flinger (440b) through a first channel, transmits a first transmitted signal to the audio flinger (440b) through a second channel, and transmits a second transmitted signal through a third channel to the audio It can be transmitted to the flinger (440b), transmitted to the audio flinger (440b) through the fourth channel, and transmitted to the audio flinger (440b) through the fifth channel. The audio flinger (440b) can transmit the first received signal to the second ASR module (452b) through the first channel, transmit the first transmitted signal to the first ASR module (451b) through the second channel, transmit the second transmitted signal to the first ASR module (451b) through the third channel, transmit the second received signal to the second ASR module (452b) through the fourth channel, and transmit the third received signal to the second ASR module (452b) through the fifth channel. According to one embodiment, the audio flinger (440b) may not transmit the first transmitted signal and the first received signal to the first ASR module (451b) and the second ASR module (452b). For example, audio The flinger (440b) can transmit signals (e.g., a second transmission signal, a second reception signal, and / or a third reception signal) tuned to improve voice recognition performance to the first ASR module (451b) or the second ASR module (452b).According to one embodiment, the audio flinger (440b) may transmit at least one of a second received signal or a third received signal to the second ASR module (452b). For example, the audio flinger (440b) may transmit the third received signal to the second ASR module (452b) if the second received signal is a signal tuned for improved hearing performance and the third received signal is a signal tuned for improved speech recognition performance.
[0181] The first ASR module (451b) can perform speech recognition on the second transmission signal to obtain first content (e.g., first text) corresponding to the transmission signal. For example, if the first ASR module (451b) receives both the first transmission signal and the second transmission signal, it can perform speech recognition on the second transmission signal, which has been tuned to improve speech recognition performance.
[0182] The second ASR module (452b) can perform speech recognition on the received signal to obtain second content (e.g., second text) corresponding to the received signal. For example, when the second ASR module (452b) receives at least one of the second received signal or the third received signal and the first received signal, it can perform speech recognition on at least one of the tuned second received signal or the third received signal. For example, when the second ASR module (452b) receives the second received signal and the third received signal, it can perform speech recognition on the tuned third received signal to improve speech recognition performance.
[0183] For example, the first ASR module (451b) and the second ASR module (452b) can each perform voice recognition on signals tuned for voice recognition performance improvement (e.g., the second transmission signal or the third reception signal), thereby increasing the voice recognition rate and improving voice recognition performance compared to when voice recognition is performed on untuned signals (e.g., the first reception signal and the first transmission signal).
[0184]
[0185] Referring to FIG. 4c, an electronic device (400c) (e.g., electronic device (100; 200; 300b; 300c; 400b; 400d; 400e; 900; 1001)) may include a first processor (e.g., first processor (150; 220; 303b; 303c), processor (910), or communication processor (1060)), kernel audio (420c) (e.g., kernel audio (231)), and a second processor (e.g., second processor (160; 240) or processor (910; 1020)). The first processor may include a call recording module (410c) (e.g., call recording module (2221; 310b; 3101c; 3103c; 786)). The call recording module (410c) may include at least one MFC (e.g., a first MFC (411c), a second MFC (412c), a third MFC (413c), and a fourth MFC (414c)) and a multiplexing / demultiplexing module (415c). The second processor may include an audio HAL (430c) (e.g., audio HAL (2411; 784)), an audio flinger (440c) (e.g., audio flinger (2413; 782)), and at least one ASR module (e.g., a first ASR module (451c) (e.g., a first ASR module (2431; 351b; 351c) or an ASR module (776; 1100)) and a second ASR module (452c) (e.g., a second ASR module (2432; 352b; 352c) or an ASR module (776; 1100))). In the following description, any descriptions that overlap with FIGS. 4a and 4b are omitted or briefly described.
[0186] Compared to FIG. 4b, the call recording module (410c) of FIG. 4c may include more data ports (416c, 417c, 418c, 419c) than the call recording module (410b) of FIG. 4b. For example, while the call recording module (410b) of FIG. 4b received voice signals through each of the three data ports (417b, 418b, 419b), FIG. 4c may have the call recording module (410c) receive voice signals through four data ports. The call recording module (410c) may receive a first reception signal through a single channel (mono channel) at the first data port (416c). The call recording module (410c) may receive a first transmission signal through a single channel (mono channel) at the second data port (417c). The call recording module (410c) can receive a second transmission signal through a single channel (mono channel) at the third data port (418c). The call recording module (410c) can receive a second reception signal and / or a third reception signal through multiple channels (stereo channels) at the fourth data port (419c).
[0187] The first MFC (411c) can transmit a first received signal received through the first data port (416c) to the multiplexing / demultiplexing module (415c). The second MFC (412c) can transmit a first transmitted signal received through the second data port (417c) to the multiplexing / demultiplexing module (415c). The third MFC (413c) can transmit a second transmitted signal received through the third data port (418c) to the multiplexing / demultiplexing module (415c). The fourth MFC (414c) can transmit a second received signal and / or a third received signal received through the fourth data port (419c) to the multiplexing / demultiplexing module (415c).
[0188] According to various embodiments, the number of data ports of the call recording module (410c) and the signals and channels received through each of the data ports are not limited to those shown in FIG. 4c and may be changed.
[0189] The multiplexing / demultiplexing module (415c), kernel audio (420c), audio HAL (430c), audio flinger (440c), first ASR module (451c), and second ASR module (452c) can perform the same operation as the multiplexing / demultiplexing module (415b), kernel audio (420b), audio HAL (430b), audio flinger (440b), first ASR module (451b), and second ASR module (452b) of FIG. 4b.
[0190]
[0191] Referring to FIG. 4d, an electronic device (400d) (e.g., electronic device (100; 200; 300b; 300c; 400b; 400c; 400e; 900; 1001)) may include a first processor (e.g., first processor (150; 220; 303b; 303c), processor (910), or communication processor (1060)), kernel audio (420d) (e.g., kernel audio (231)), and a second processor (e.g., second processor (160; 240) or processor (910; 1020)). In the following description, any descriptions that overlap with FIG. 4a through 4c are omitted or briefly described.
[0192] The first processor may include a first call recording module (4101d) (e.g., call recording module (2221; 310b; 3101c; 786)) and a second call recording module (4103d) (e.g., call recording module (2221; 310b; 3103c; 786)).
[0193] The first call recording module (4101d) may include at least one MFC (e.g., the first MFC (4111d) and the second MFC (4112d)) and a first multiplexing / demultiplexing module (4120d).
[0194] The first MFC (4111d) can receive a first received signal through a communication circuit via a single channel (mono channel). The first MFC (4111d) can transmit the first received signal to the first multiplexing / demultiplexing module (4120d).
[0195] The second MFC (4112d) can receive a first transmission signal obtained through a microphone via a plurality of channels (stereo channels) and / or a second transmission signal tuned to a first setting value. The second MFC (4112d) can select at least one of the received signals and transmit it to the first multiplexing / demultiplexing module (4120d). For example, the second MFC (4112d) can transmit the first transmission signal to the first multiplexing / demultiplexing module (4120d).
[0196] The first multiplexing / demultiplexing module (4120d) can map channels to the first receiving signal and the first transmitting signal, respectively, based on channel setting information received from the second processor. For example, the first multiplexing / demultiplexing module (4120d) can map the first receiving signal to the first-1 channel and the first transmitting signal to the first-2 channel. The first multiplexing / demultiplexing module (4120d) can transmit the first receiving signal to the kernel audio (420d) through the first-1 channel. The first multiplexing / demultiplexing module (4120d) can transmit the first transmitting signal to the kernel audio (420d) through the first-2 channel.
[0197] The second call recording module (4103d) may include at least one MFC (the third MFC (4113d) and the fourth MFC (4114d)) and a second multiplexing / demultiplexing module (4140d).
[0198] The third MFC (4113d) can receive a first transmission signal obtained through a microphone via a plurality of channels (stereo channels) and / or a second transmission signal tuned to a first setting value. The third MFC (4113d) can select at least one of the received signals and transmit it to the first multiplexing / demultiplexing module (4120d). For example, the third MFC (4113d) can transmit the second transmission signal to the first multiplexing / demultiplexing module (4120d).
[0199] The fourth MFC (4114d) can receive a second receiving signal tuned based on a second setting value and / or a third receiving signal tuned based on a third setting value through multiple channels (stereo channels). The fourth MFC (4114d) can transmit the second receiving signal and / or the third receiving signal to the second multiplexing / demultiplexing module (4140d).
[0200] The second multiplexing / demultiplexing module (4140d) can map channels to the second transmission signal, the second reception signal, and the third reception signal, respectively, based on channel setting information received from the second processor. For example, the second multiplexing / demultiplexing module (4140d) can map the second transmission signal to the second-1 channel, map the second reception signal to the second-2 channel, and map the third reception signal to the second-3 channel. The second multiplexing / demultiplexing module (4140d) can transmit the second transmission signal to the kernel audio (420d) through the second-1 channel. The second multiplexing / demultiplexing module (4140d) can transmit the second reception signal to the kernel audio (420d) through the second-2 channel. The second multiplexing / demultiplexing module (4140d) can transmit the third reception signal to the kernel audio (420d) through the second-3 channel.
[0201] The kernel audio (420d) can transmit a first received signal to a second processor (e.g., audio HAL (430d)) through a first-1 channel, transmit a first transmitted signal to a second processor (e.g., audio HAL (430d)) through a first-2 channel, transmit a second transmitted signal to a second processor (e.g., audio HAL (430d)) through a second-1 channel, transmit a second received signal to a second processor (e.g., audio HAL (430d)) through a second-2 channel, and transmit a third received signal to a second processor (e.g., audio HAL (430d)) through a second-3 channel.
[0202] The second processor may include an audio HAL (430d) (e.g., audio HAL (2411; 784)), an audio flinger (440d) (e.g., audio flinger (2413; 782)), and at least one ASR module (e.g., a first ASR module (451d) (e.g., a first ASR module (2431; 351b; 351c) or an ASR module (776; 1100)) and a second ASR module (452d) (e.g., a second ASR module (2432; 352b; 352c) or an ASR module (776; 1100))). The audio HAL (430d) can transmit a first received signal to the audio flinger (440d) through the first-1 channel, transmit a first transmitted signal to the audio flinger (440d) through the first-2 channel, transmit a second transmitted signal to the audio flinger (440d) through the second-1 channel, transmit a second received signal to the audio flinger (440d) through the second-2 channel, and transmit a third received signal to the audio flinger (440d) through the second-3 channel. The audio flinger (440d) can transmit a first received signal to a second ASR module (452d) through a first-1 channel, transmit a first transmitted signal to a first ASR module (451d) through a first-2 channel, transmit a second transmitted signal to a first ASR module (451d) through a second-1 channel, transmit a second received signal to a second ASR module (452d) through a second-2 channel, and transmit a third received signal to a second ASR module (452d) through a second-3 channel. According to one embodiment, the audio flinger (440d) may not transmit the first transmitted signal and the first received signal to the first ASR module (451d) and the second ASR module (452d). For example, the audio flinger (440d) can transmit signals (e.g., a second transmission signal, a second reception signal, and / or a third reception signal) tuned to improve voice recognition performance to the first ASR module (451d) or the second ASR module (452d).According to one embodiment, the audio flinger (440d) can transmit at least one of a second received signal or a third received signal to the second ASR module (452d). For example, the audio flinger (440d) can transmit the third received signal to the second ASR module (452d) if the second received signal is a signal tuned for improved listening performance and the third received signal is a signal tuned for improved speech recognition performance.
[0203] According to one embodiment, a plurality of first channels (e.g., first-1 channel and first-2 channel) and a plurality of second channels (e.g., second-1 channel, second-2 channel, and second-3 channel) may be transmitted through different data ports in each of the kernel audio (420d), audio HAL (430d), and audio flinger (440d), but are not limited thereto.
[0204] FIG. 4d illustrates the operation of an electronic device in which a first call recording module (4101d) and a second call recording module (4103d) are used for different purposes (e.g., different signal transmission paths). For example, the first call recording module (4101d) may operate as a path for transmitting untuned signals (e.g., a first receiving signal and a first transmitting signal) from a first processor to a second processor. For example, the first call recording module (4101d) may operate as a path for transmitting untuned signals (e.g., a first receiving signal and a first transmitting signal) to generate call recording data during a call. The second call recording module (4103d) may operate as a path for transmitting tuned signals (e.g., a second transmitting signal, a second receiving signal, and a third receiving signal) from a first processor to a second processor. For example, the second call recording module (4103d) may operate as a path for transmitting signals (e.g., a second transmission signal, a second reception signal, and a third reception signal) tuned for voice recognition and call translation during a call. According to various embodiments, the types and number of signals input to and output to the first call recording module (4101d) and the second call recording module (4103d) are not limited to those shown in FIG. 4d.
[0205] The first ASR module (451d) and the second ASR module (452d) can perform the same operation as the first ASR module (451b, 451c) and the second ASR module (452b, 452c) of FIG. 4b and 4c.
[0206]
[0207] Referring to FIG. 4e, an electronic device (400e) (e.g., electronic device (100; 200; 300b; 300c; 400b; 400c; 400d; 900; 1001)) may include a first processor (e.g., first processor (150; 220; 303b; 303c), processor (910), or communication processor (1060)), kernel audio (420e) (e.g., kernel audio (231)), and a second processor (e.g., second processor (160; 240) or processor (910; 1020)). The first processor may include a call recording module (410e) (e.g., call recording module (2221; 310b; 3101c; 3103c; 786)). The call recording module (410e) may include at least one MFC (e.g., a first MFC (411e), a second MFC (412e), and a third MFC (413e)) and a multiplexing / demultiplexing module (415e). The second processor may include an audio HAL (430e) (e.g., audio HAL (2411; 784)), an audio flinger (440e) (e.g., audio flinger (2413; 782)), a recording module (445e), and at least one ASR module (e.g., a first ASR module (451e) (e.g., a first ASR module (2431; 351b; 351c) or an ASR module (776; 1100)) and a second ASR module (452e) (e.g., a second ASR module (2432; 352b; 352c) or an ASR module (776; 1100))). In the following description, any descriptions that overlap with FIGS. 4a and 4b are omitted or briefly described.
[0208] The call recording module (410e) (e.g., first MFC (411e), second MFC (412e), third MFC (413e), and multiplexing / demultiplexing module (415e)), kernel audio (420e), and audio HAL (430e) can perform the same operation as the call recording module (410b) of FIG. 4b (e.g., first MFC (411b), second MFC (412b), third MFC (413b), and multiplexing / demultiplexing module (415b)), kernel audio (420b), and audio HAL (430b).
[0209] The audio flinger (440e) can transmit a first received signal to a recording module (445e) through a first channel, transmit a first transmitted signal to a recording module (445e) through a second channel, transmit a second transmitted signal to a recording module (445e) through a third channel, transmit a second received signal to a recording module (445e) through a fourth channel, and transmit a third received signal to a recording module (445e) through a fifth channel.
[0210] The recording module (445e) can generate call recording data (e.g., a call recording file in a specified format) based on a first received signal and a first transmitted signal. For example, the call recording data can be generated based on the original sound of the untuned call targets (user and call counterpart) (e.g., the first received signal and the first transmitted signal). The recording module (445e) can transmit a second transmitted signal to the first ASR module (451e). The recording module (445e) can transmit a second received signal and / or a third received signal to the second ASR module (452e).
[0211] The first ASR module (451e) and the second ASR module (452e) can perform the same operation as the first ASR module (451b) and the second ASR module (452b) of FIG. 4b.
[0212] In FIG. 4e, it is described that the audio flinger (440e) transmits the first receiving signal, the first transmitting signal, the second transmitting signal, the second receiving signal, and the third receiving signal to the recording module, and the recording module transmits the second transmitting signal, the second receiving signal, and the third receiving signal to the first ASR module (451e) or the second ASR module (452e), but this is not limited thereto. According to one embodiment, the audio flinger (440e) may transmit the first receiving signal and the first transmitting signal to the recording module, transmit the second transmitting signal to the first ASR module (451e), and transmit the second receiving signal and / or the third receiving signal to the second ASR module (452e). In this case, the recording module may not transmit signals separately to the first ASR module (451e) and the second ASR module (452e).
[0213] According to various embodiments, the configuration of the electronic device (400e) is not limited to that shown in FIGS. 4b through 4e, and at least some configurations may be omitted, or at least one configuration (e.g., at least one of the components of FIGS. 1, 2, 3a through 3c, or 9 through 11) may be added, and at least some of FIGS. 4b through 4e may have a merged form. According to one embodiment, at least some of the configurations of the electronic device (400e) may be implemented by integrating them into a single module.
[0214]
[0215] FIG. 5 is a flowchart of a voice recognition method of an electronic device according to one embodiment.
[0216] In operation 510, an electronic device (e.g., electronic device (100; 200; 300b; 300c; 400b; 400c; 400d; 400e; 900; 1001)) can acquire, by a first processor (e.g., first processor (150; 220; 303b; 303c), processor (910), or communication processor (1060)) a first receiving signal received through a communication circuit of the electronic device (e.g., communication circuit (130; 210; 301b; 303c; 960)) and a first transmitting signal corresponding to a user utterance input through a microphone of the electronic device while in communication with an external device.
[0217] In operation 520, the electronic device may generate a second transmission signal by tuning a first transmission signal based on a first set value by a first processor. For example, the electronic device may generate a second transmission signal by removing noise from the first transmission signal or / or performing echo canceling by the first processor. For example, the first set value may be determined experimentally as a value to improve speech recognition performance (recognition rate) or through a trained artificial intelligence model (e.g., machine learning). For example, the second transmission signal may be a signal tuned with a focus on improving speech recognition performance rather than improving the hearing performance (e.g., intelligibility) of the person on the call (e.g., user or call partner). For example, the operation of tuning based on the first set value may be included in the preprocessing operation of the first transmission signal.
[0218] In operation 530, the electronic device may generate a second received signal by tuning a first received signal based on a second set value by a first processor. The electronic device may generate a third received signal by tuning a first received signal based on a third set value by a first processor. For example, the electronic device may generate the second received signal and / or the third received signal by performing energy (volume) adjustment, filter application (e.g., tone adjustment, frequency characteristic adjustment), dynamic range adjustment, noise removal, and / or echo canceling of the first received signal by a first processor. For example, the second set value and the third set value may be different. The second set value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve the user's hearing performance. The third set value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve speech recognition performance (recognition rate). For example, the second received signal may be a signal tuned with a focus on improving the hearing performance (e.g., increasing clarity) of the person on the call (e.g., user or call partner), and the third received signal may be a signal tuned with a focus on improving speech recognition performance. For example, an operation of tuning based on the second setting value and / or the third setting value may be included in the post-processing operation of the first received signal.
[0219] In operation 540, the electronic device may transmit at least one of the second receiving signal or the third receiving signal and the second transmitting signal, respectively, to the second processor (e.g., the second processor (160; 240; 305b; 305c) or the processor (910; 1020)) through independent channels by the first processor. For example, the electronic device may map channels to the first receiving signal, the first transmitting signal, the second transmitting signal, the second receiving signal, and the third receiving signal, respectively, based on channel setting information received from the second processor by the first processor. The first processor may map the first channel to the first receiving signal, map the second channel to the first transmitting signal, map the third channel to the second transmitting signal, map the fourth channel to the second receiving signal, and map the fifth channel to the third receiving signal. The electronic device can transmit a second transmission signal to a second processor through a third channel by a first processor, transmit a second reception signal to a second processor through a fourth channel, and transmit a third reception signal to a second processor through a fifth channel.
[0220] In operation 550, the electronic device may perform speech recognition for at least one of a second received signal or a third received signal received from a first processor by a second processor. The electronic device may obtain a second content (e.g., a second text) corresponding to the second received signal and / or the third received signal through speech recognition by the second processor. The electronic device may perform speech recognition for a second transmitted signal received from a first processor by a second processor. The electronic device may obtain a first content (e.g., a first text) corresponding to the second transmitted signal through speech recognition by the second processor.
[0221] According to one embodiment, the electronic device can improve the voice recognition rate and voice recognition performance by performing voice recognition on signals (e.g., a second transmission signal, a second reception signal, and / or a third reception signal) that are tuned for voice recognition performance enhancement, rather than on the original voice of the call (e.g., a first transmission signal and a first reception signal).
[0222] According to one embodiment, the operations of FIG. 5 may be performed simultaneously or in a different order, at least one operation may be omitted, or at least one operation (e.g., at least one of the operations of FIG. 6a, 6b, 7a, or 7b) may be added. Each of the operations of FIG. 5 may be performed in conjunction with at least some of the operations of FIG. 6a, 6b, 7a, or 7b.
[0223]
[0224] FIG. 6a is a flowchart of a voice recognition method of an electronic device according to one embodiment.
[0225] In operation 610, an electronic device (100; 200; 300b; 300c; 400b; 400c; 400d; 400e; 900; 1001) may transmit a first received signal and a first transmitted signal, respectively, to a second processor (e.g., a second processor (160; 240; 305b; 305c) or a processor (910; 1020)) through independent channels by a first processor (e.g., a first processor (150; 220; 303b; 303c), a processor (910), or a communication processor (1060)). For example, the electronic device may transmit the first received signal to the second processor through a first channel and transmit the first transmitted signal to the second processor through a second channel by the first processor. According to one embodiment, the 610 operation can be performed together with the 540 operation of FIG. 5.
[0226] In operation 620, the electronic device can generate and store call recording data (e.g., a call recording file in a specified format) based on a first receiving signal and a first transmitting signal received from a first processor by a second processor.
[0227] According to one embodiment, the operations of FIG. 6a may be performed simultaneously or in a different order, at least one operation may be omitted, or at least one operation (e.g., at least one of the operations of FIG. 5, 6b, 7a, or 7b) may be added. Each of the operations of FIG. 5 may be performed in conjunction with at least some of the operations of FIG. 5, 6b, 7a, or 7b.
[0228]
[0229] FIG. 6b is a flowchart of a voice recognition method of an electronic device according to one embodiment. For example, the operations of FIG. 6b may be performed after the operations of FIG. 5 have been performed.
[0230] In operation 630, an electronic device (e.g., electronic device (100; 200; 300b; 300c; 400b; 400c; 400d; 400e; 900; 1001)) can translate the content obtained as a result of speech recognition of at least one of the second received signal or the third received signal and each of the second transmitted signal by a second processor (e.g., second processor (160; 240; 305b; 305c) or processor (910; 1020)). For example, the electronic device can translate the content obtained as a result of speech recognition of at least one of the second transmitted signals (e.g., the first content) into a specified language (e.g., the language used by a specified call partner) by the second processor. For example, the electronic device can translate content obtained as a result of speech recognition of at least one of a second received signal or a third received signal (e.g., second content) into a specified language (e.g., language used by a specified user) by a second processor.
[0231] In operation 640, the electronic device may transmit a third transmission signal and a fourth reception signal corresponding to the translated content by the second processor to the first processor (e.g., the first processor (150; 220; 303b; 303c), the processor (910), or the communication processor (1060)). For example, the electronic device may generate a third transmission signal corresponding to the first content translated by the second processor. The electronic device may generate a third transmission signal by converting the first content translated by the second processor into text-to-speech (TTS) and transmit it to the first processor. For example, the electronic device may generate a fourth reception signal corresponding to the second content translated by the second processor. The electronic device may generate a fourth reception signal by converting the second content translated by the second processor into TTS and transmit it to the first processor.
[0232] In operation 650, the electronic device may transmit at least one of a first transmission signal or a third transmission signal to an external electronic device through a communication circuit by the first processor. For example, when the 'My Voice Blocking' setting is enabled, the electronic device may transmit the third transmission signal to an external electronic device through a communication circuit by the first processor. For example, when the 'My Voice Blocking' setting is disabled, the electronic device may transmit the first transmission signal and the third transmission signal to an external electronic device through a communication circuit by the first processor. For example, the electronic device may transmit a signal in which the first transmission signal and the third transmission signal are mixed through a communication circuit by the first processor.
[0233] In operation 660, the electronic device may output a sound corresponding to at least one of the second received signal or the fourth received signal through a speaker by means of a first processor. For example, when the 'opponent voice blocking' setting is enabled, the electronic device may output a sound corresponding to the fourth received signal through a speaker by means of a first processor. For example, when the 'opponent voice blocking' setting is disabled, the electronic device may output a sound corresponding to the second received signal and the fourth received signal through a speaker by means of a first processor. For example, the electronic device may output a sound corresponding to a signal in which the second received signal and the fourth received signal are mixed by means of a first processor through a speaker.
[0234] According to one embodiment, the operations of FIG. 6b may be performed simultaneously or in a different order, at least one operation may be omitted, or at least one operation (e.g., at least one of the operations of FIG. 5, 6a, 7a, or 7b) may be added. Each of the operations of FIG. 5 may be performed in conjunction with at least some of the operations of FIG. 5, 6a, 7a, or 7b.
[0235]
[0236] FIGS. 7a and 7b are flowcharts of voice recognition operations of an electronic device according to one embodiment.According to one embodiment, an electronic device (e.g., electronic device (100; 200; 300b; 300c; 400b; 400c; 400d; 400e; 900; 1001)) comprises a user interface (UI) (770) (e.g., a call application or conversation module (2435)), a text-to-speech (TTS) module (772) (e.g., a TTS module (2423; 2424)), a translation module (774) (e.g., a translation module (2433; 2434; 353b; 354b, 353c, 354c)), and an ASR module (776) (e.g., an ASR module (2431; 2432; 351b; 351c; 352b; 352c; 451b; 451c; 451d; 451e; 452b; 452c; 452d; 452e; 776; 1100)), audio recorder module (778), audio source module (780) (e.g., audio source module (2421; 2422)), audio flinger (782) (e.g., audio flinger (2413; 440b; 440c; 440d; 440e)), audio HAL (hardware abstraction layer) (784) (e.g., audio HAL (2411; 430b; 430c; 430d; 430e)), call recording module (786) (e.g., incall recording module) (e.g., call recording module (2221; 310b; 3101c; It may include 3103c; 410b; 410c; 410d; 410e), a call delivery module (788) (e.g., incall music delivery module) (e.g., call delivery module (320a; 320b)), a transmission signal processing module (790) (e.g., preprocessing module (221; 331b; 331c)), an encoder (792) (e.g., encoder (225; 337b; 337c)), a reception signal processing module (794) (e.g., postprocessing module (228; 340b; 340c)), a mixer module (796), and a decoder (798) (e.g., decoder (226; 338b; 338c)).For example, the user interface (770), TTS module (772), translation module (774), ASR module (776), audio recorder module (778), audio source module (780), audio flinger (782), and audio HAL (784) may be included in a second processor (e.g., AP) or / or may operate under the control of the second processor. For example, the call recording module (786), call forwarding module (788), transmission signal processing module (790), encoder (792), reception signal processing module (794), mixer module (796), and decoder (798) may be included in a first processor (e.g., ADSP) or / or may operate under the control of the first processor. FIGS. 7a and 7b illustrate the operation flow that follows each other, divided based on the audio HAL (784).
[0237] In operation 701, the user interface (770) can transmit translation call events to the audio HAL (784). For example, the user interface (770) can transmit a call event (which may be referred to as a 'translation call' in this disclosure) to the audio HAL (784) while the translation function is enabled during a call. For example, a user can make or receive a translation call through the user interface (770). The user interface (770) can transmit a translation call event to the audio HAL (784) when a translation call is initiated based on user input. According to one embodiment, the user interface (770) can transmit channel setting information to the audio HAL (784) along with the translation call event. For example, the channel setting information may include information on the channel to which the audio signal (e.g., a first receiving signal, a first transmitting signal, a second transmitting signal, a second receiving signal, and / or a third receiving signal) is mapped.
[0238] In operation 703, the transmission signal processing module (790) can receive a first transmission signal corresponding to a user utterance from the microphone.
[0239] In operation 705, the audio HAL (784) may transmit settings related to a translation call to the transmission signal processing module (790). For example, settings related to a translation call may include the user's language (translation language), the user's voice (translation voice), whether the user's voice blocking setting is enabled, whether the other party's voice blocking setting is enabled, the other party's language (translation language), the other party's voice (translation voice), and / or whether the voice recognition quality enhancement setting is enabled, but are not limited to those listed above.
[0240] In operation 707, the audio HAL (784) can transmit channel setting information to the call recording module (786). For example, the audio HAL (784) can transmit to the call recording module (786) information of the first channel to which the first received signal is mapped, information of the second channel to which the first transmitted signal is mapped, information of the third channel to which the second transmitted signal is mapped, information of the fourth channel to which the second received signal is mapped, and information of the fifth channel to which the fifth received signal is mapped.
[0241] In operation 709, the transmission signal processing module (790) can generate a second transmission signal by tuning the first transmission signal based on a first set value. For example, the transmission signal processing module (790) can generate the second transmission signal by removing noise from the first transmission signal and / or performing echo canceling. For example, the second transmission signal may be a signal tuned with a focus on improving speech recognition performance.
[0242] In operation 711, the transmission signal processing module (790) can determine whether the transmission blocking setting is enabled. If the transmission blocking setting is enabled, the transmission signal processing module (790) can block the first transmission signal and the second transmission signal from being transmitted to the call recording module (786).
[0243] In operation 713, the transmission signal processing module (790) can transmit the first transmission signal and the second transmission signal to the call recording module (786).
[0244] In operation 715, the decoder (798) can receive a first received signal received from an external device (a call partner device) through a communication circuit.
[0245] In operation 717, the decoder (798) can decode the first received signal and transmit it to the call recording module (786).
[0246] In operation 718, the decoder (798) can transmit the first received signal to the received signal processing module (794).
[0247] In operation 719, the receiving signal processing module (794) can generate a second receiving signal by tuning the first receiving signal based on a second setting value. The receiving signal processing module (794) can generate a third receiving signal by tuning the first receiving signal based on a third setting value. For example, the receiving signal processing module (794) can generate the second receiving signal and / or the third receiving signal by performing energy (volume) adjustment, filter application (e.g., tone adjustment, frequency characteristic adjustment), dynamic range adjustment, noise removal, and / or echo canceling of the first receiving signal. For example, the second setting value and the third setting value may be different. The second setting value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve the user's hearing performance. The third setting value may be determined experimentally or through a trained artificial intelligence model (e.g., machine learning) as a value to improve speech recognition performance (recognition rate). For example, the second received signal may be a signal tuned with a focus on improving the hearing performance (e.g., increasing clarity) of the person on the call (e.g., user or call partner), and the third received signal may be a signal tuned with a focus on improving speech recognition performance.
[0248] In operation 721, the receiving signal processing module (794) can transmit the second receiving signal and the third receiving signal to the call recording module (786).
[0249] In operation 723, the call recording module (786) can transmit a first receiving signal, a first transmitting signal, a second transmitting signal, a second receiving signal, and a third receiving signal to the audio HAL (784) through a plurality of mapped channels. For example, the call recording module (786) can transmit the first receiving signal to the audio HAL (784) through the first channel, transmit the first transmitting signal to the audio HAL (784) through the second channel, transmit the second transmitting signal to the audio HAL (784) through the third channel, transmit the second receiving signal to the audio HAL (784) through the fourth channel, and transmit the third receiving signal to the audio HAL (784) through the fifth channel.
[0250] In operation 725, the audio HAL (784) can transmit a first received signal, a first transmitted signal, a second transmitted signal, a second received signal, and a third received signal to the audio flinger (782) through a plurality of mapped channels. For example, the audio HAL (784) can transmit the first received signal to the audio flinger (782) through a first channel, transmit the first transmitted signal to the audio flinger (782) through a second channel, transmit the second transmitted signal to the audio flinger (782) through a third channel, transmit the second received signal to the audio flinger (782) through a fourth channel, and transmit the third received signal to the audio flinger (782) through a fifth channel. According to one embodiment, the audio flinger (782) can generate and store call recording data based on the first received signal and the first transmitted signal.
[0251] In operation 727, the audio flinger (782) can transmit a second transmission signal to the audio source module (780) through a third channel. For example, the audio flinger (782) can transmit a second transmission signal, which is tuned to improve voice recognition performance among the first transmission signal and the second transmission signal, to the audio source module (780).
[0252] In operation 729, the audio flinger (782) can transmit a third received signal to the audio source module (780) through the fifth channel. For example, the audio flinger (782) can transmit a third received signal, which is tuned to improve speech recognition performance among the first received signal, the second received signal, and the third received signal, to the audio source module (780). In FIGS. 7a and 7b, it is described assuming that the audio flinger (782) transmits only the third received signal to the audio source module (780), but it is not limited thereto, and according to various embodiments, the audio flinger (782) can transmit the second received signal (through the fourth channel) and / or the third received signal (through the fifth channel) to the audio source module (780).
[0253] In operation 731, the audio source module (780) can transmit a second transmission signal to the audio recorder module (778) through a third channel. According to one embodiment, the audio recorder module (778) may be implemented by being included in the ASR module (776) as a configuration for storing the audio signal.
[0254] In operation 733, the audio source module (780) can transmit a third received signal (and / or a second received signal) to the audio recorder module (778) through the fifth channel (and / or the fourth channel).
[0255] In operation 735, the audio recorder module (778) can transmit a second transmission signal to the ASR module (776) through a third channel.
[0256] In operation 737, the audio recorder module (778) can transmit a third received signal (and / or a second received signal) to the ASR module (776) through the fifth channel (and / or the fourth channel).
[0257] In operation 739, the ASR module (776) can perform speech recognition on the second transmission signal to obtain the first content (e.g., the first text) corresponding to the second transmission signal. The ASR module (776) can transmit the first content to the translation module (774).
[0258] In operation 741, the ASR module (776) can perform speech recognition on the third received signal (or the second received signal) to obtain second content (e.g., second text) corresponding to the third received signal (or the second received signal). The ASR module (776) can transmit the second content to the translation module (774).
[0259] In operation 743, the translation module (774) can translate the first content into another language (e.g., the language used by the call partner). The translation module (774) can deliver the translated first content to the user interface (770).
[0260] In operation 745, the translation module (774) can translate the second content into another language (e.g., the user's language). The translation module (774) can deliver the translated second content to the user interface (770).
[0261] In operation 747, the user interface (770) may provide the first content and / or the translated first content. The user interface (770) may transmit the translated first content to the TTS module (772).
[0262] In operation 749, the user interface (770) may provide second content and / or translated second content. The user interface (770) may transmit the translated second content to the TTS module (772).
[0263] In operation 751, the TTS module (772) can generate a third transmission signal by converting the translated first content into TTS. The TTS module (772) can transmit the third transmission signal to the call forwarding module (788).
[0264] In operation 753, the TTS module (772) can generate a fourth received signal by converting the translated second content into TTS. The TTS module (772) can transmit the fourth received signal to the call forwarding module (788).
[0265] In operation 755, the call forwarding module (788) can forward the third transmission signal to the transmission signal processing module (790).
[0266] In operation 757, the transmission signal processing module (790) can block the second transmission signal used for voice recognition from being transmitted to the encoder (792) among the first transmission signal and the second transmission signal.
[0267] In operation 759, the transmission signal processing module (790) can determine whether the internal voice blocking setting is enabled. If the internal voice blocking setting is enabled, the transmission signal processing module (790) can block the first transmission signal from being transmitted to the encoder (792).
[0268] In operation 761, the transmission signal processing module (790) may mix the first transmission signal and the third transmission signal based on the settings of the translation call (e.g., whether the My Voice Blocking setting is enabled). For example, the transmission signal processing module (790) may not mix the first transmission signal and the third transmission signal if the My Voice Blocking setting is enabled. The transmission signal processing module (790) may mix the first transmission signal and the third transmission signal if the My Voice Blocking setting is disabled.
[0269] In operation 763, the transmission signal processing module (790) can transmit a third transmission signal to the encoder (792) when the internal voice blocking setting is enabled. When the original sound blocking setting is disabled, the transmission signal processing module (790) can transmit a signal mixed with the first transmission signal and the third transmission signal to the encoder (792).
[0270] In operation 765, the encoder (792) can encode the received signal (e.g., a third transmission signal or a mixed signal of the first transmission signal and the third transmission signal) and transmit it to the communication circuit. For example, the communication circuit can transmit the signal received from the encoder (792) (e.g., a third transmission signal or a mixed signal of the first transmission signal and the third transmission signal) to an external device (a call partner device).
[0271] In operation 767, the receiving signal processing module (794) can block the third receiving signal generated for voice recognition among the second receiving signal and the third receiving signal from being transmitted to the mixer module (796).
[0272] In operation 769, the receiving signal processing module (794) can transmit the second receiving signal to the mixer module (796).
[0273] In operation 771, the call forwarding module (788) (or audio playback module (not shown)) can forward the fourth received signal to the mixer module (796).
[0274] In operation 773, the mixer module (796) may mix the second received signal and the fourth received signal based on the settings of the translation call (e.g., whether the other party voice blocking setting is enabled). For example, the mixer module (796) may not mix the second received signal and the fourth received signal if the other party voice blocking setting is enabled. The mixer module (796) may mix the second received signal and the fourth received signal if the other party voice blocking setting is disabled.
[0275] In operation 775, the mixer module (796) can transmit a fourth received signal to the speaker when the other party voice blocking setting is enabled. When the other party voice blocking setting is disabled, the mixer module (796) can transmit a signal that mixes the second received signal and the fourth received signal to the speaker. The speaker can output a sound corresponding to the signal received from the mixer module (796).
[0276] According to one embodiment, the mixer module (796) may be included in the receiving signal processing module (794), and in this case, the operations performed by the mixer module (796) may be performed by the receiving signal processing module (794).
[0277] According to one embodiment, the operations of FIGS. 7a and 7b may be performed simultaneously or in a different order, at least one operation may be omitted, or at least one operation (e.g., at least one of the operations of FIGS. 5, 6a, or 6b) may be added. Each of the operations of FIGS. 7a and 7b may be performed in conjunction with at least some of the operations of FIGS. 5, 6a, or 6b.
[0278]
[0279] FIG. 8 is a drawing showing a user interface provided by an electronic device according to one embodiment.
[0280] According to one embodiment, an electronic device (e.g., electronic device (100; 200; 300b; 300c; 400b; 400c; 400d; 400e; 900; 1001)) may provide a user interface (800) for specifying settings related to a call. For example, FIG. 8 illustrates a user interface screen for specifying settings related to real-time interpretation (translation) during a call.
[0281] For example, the user interface (800) may include multiple items related to a call. The first item (810) may be an item for specifying the language used by the user during the call. The second item (820) may be an item for specifying the voice of the user during the call (e.g., a voice type for outputting translated user speech). The third item (830) may be an item for enabling or disabling the 'block my voice' setting. For example, if the 'block my voice' setting is enabled, the electronic device may transmit the user's original voice (e.g., a first transmission signal) to an external device (the call partner's device) during the call, and instead transmit the translated voice (e.g., a third transmission signal) to the external device. The fourth item (840) may be an item for specifying the language used by the other party. The fifth item (850) may be an item for specifying the voice of the other party during the call (e.g., a voice type for outputting translated partner speech). Item 6 (860) may be an item for enabling or disabling the 'blocking the other party's voice' setting. For example, when the 'blocking the other party's voice' setting is enabled, the electronic device may not output through the speaker a sound corresponding to the original voice (e.g., first received signal) and voice tuned from the original voice of the user of the external device (the other party's device) received during a call (e.g., second received signal) and may output through the speaker a sound corresponding to the translated voice (e.g., fourth received signal). Item 7 (870) may be an item for enabling or disabling the 'improving voice recognition quality' setting. For example, when the 'improving voice recognition quality' setting is enabled, the electronic device may perform voice recognition based on a transmission signal (e.g., second transmitted signal) tuned based on a specified setting value (e.g., first setting value) and / or a reception signal (e.g., second received signal or third received signal) tuned based on a specified setting value (e.g., second setting value or third setting value).For example, the first processor can improve the voice recognition performance performed by the second processor by transmitting a tuned signal to the second processor to improve voice recognition performance. For example, the electronic device may operate as described in FIGS. 2, 3b, 3c, 4b to 4e, 7a, and 7b. For example, when the 'voice recognition quality improvement' setting is disabled, the electronic device may perform voice recognition based on a transmission signal (e.g., a first transmission signal) corresponding to a user utterance acquired through a microphone and / or a reception signal (e.g., a first reception signal) received from an external device through a communication circuit. For example, the electronic device may operate as described in FIGS. 3a and 4a.
[0282]
[0283] An electronic device according to an embodiment disclosed in this document may include a microphone, a speaker, a communication circuit, a first processor, a second processor, and a memory. The first processor may be configured to acquire a first receiving signal received through the communication circuit and a first transmitting signal corresponding to a user utterance input through the microphone while in communication with an external device through the communication circuit. The first processor may be configured to generate a second transmitting signal by tuning the first transmitting signal based on a first setting value. The first processor may be configured to generate a second receiving signal and a third receiving signal by tuning the first receiving signal based on a second setting value and a third setting value, respectively. The first processor may be configured to transmit at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal to the second processor through independent channels. The second processor may be configured to perform voice recognition for at least one of the second receiving signal or the third receiving signal and each of the second transmitting signal.
[0284] According to one embodiment, the first processor may be configured to transmit the first received signal and the first transmitted signal, respectively, to the second processor through independent channels.
[0285] According to one embodiment, the second processor may be configured to generate and store call recording data based on the first received signal and the first transmitted signal.
[0286] According to one embodiment, the second processor may be configured to store the content obtained as a result of performing voice recognition on each of at least one of the second received signal or the third received signal and the second transmitted signal, by associating it with the call recording data.
[0287] According to one embodiment, when a translation function is activated during a call, the second processor may be configured to translate the first content obtained through voice recognition of the second transmission signal into a language related to the user of the external device. The second processor may be configured to transmit a third transmission signal corresponding to the translated first content to the first processor.
[0288] According to one embodiment, the first processor may be configured to transmit at least one of the first transmission signal or the third transmission signal to the external device through the communication circuit.
[0289] According to one embodiment, when a setting for blocking the transmission of user voice is activated, the first processor may be configured to prevent the first transmission signal from being transmitted to the communication circuit and to transmit the third transmission signal to the external device through the communication circuit.
[0290] According to one embodiment, when a translation function is activated during a call, the second processor may be configured to translate a second content obtained through voice recognition of at least one of the second received signal or the third received signal into a language related to the user of the electronic device. The second processor may be configured to transmit a fourth received signal corresponding to the translated second content to the first processor.
[0291] According to one embodiment, the first processor may be configured to output a sound corresponding to at least one of the second received signal or the fourth received signal through the speaker.
[0292] According to one embodiment, the first processor may be configured such that when a setting to block the output of the user's voice of the external device is activated, the second received signal is not transmitted to the speaker, and a sound corresponding to the fourth received signal is output through the speaker.
[0293] According to one embodiment, the second processor may be configured to transmit channel setting information to the first processor for mapping each of the plurality of transmission signals and reception signals to each of independent channels when it recognizes the occurrence of a call event.
[0294] According to one embodiment, the first processor may be configured to map each of the first received signal, the second received signal, the third received signal, the first transmitted signal, and the second transmitted signal to each of independent channels based on the channel setting information.
[0295] According to one embodiment, the tuning of the first transmission signal and the tuning of the first reception signal may include at least one of echo cancelling or noise suppression.
[0296] A voice recognition method of an electronic device comprising a first processor and a second processor according to one embodiment of the present disclosure may include, during a call with an external device, an operation of acquiring, by the first processor, a first received signal received through a communication circuit of the electronic device and a first transmitted signal corresponding to a user utterance input through a microphone of the electronic device.
[0297] The above method may include an operation in which the first processor tunes the first transmission signal based on a first setting value to generate a second transmission signal.
[0298] The above method may include an operation in which the first receiving signal is tuned based on the second setting value and the third setting value, respectively, by the first processor to generate the second receiving signal and the third receiving signal.
[0299] The above method may include the operation of transmitting at least one of the second received signal or the third received signal and each of the second transmitted signal to the second processor through independent channels by the first processor.
[0300] The above method may include an operation of performing speech recognition for each of at least one of the second received signal or the third received signal and the second transmitted signal by the second processor.
[0301] According to one embodiment, the method may include the operation of transmitting the first received signal and the first transmitted signal, respectively, to the second processor through independent channels by the first processor.
[0302] According to one embodiment, the method may include the operation of generating and storing call recording data based on the first received signal and the first transmitted signal by the second processor.
[0303] According to one embodiment, the method may include the operation of storing the content obtained as a result of performing voice recognition for each of at least one of the second received signal or the third received signal and the second transmitted signal by the second processor, by associating it with the call recording data.
[0304] According to one embodiment, the method may include the operation of translating a first content obtained through voice recognition of the second transmission signal by the second processor into a language related to the user of the external device.
[0305] According to one embodiment, the method may include the operation of transmitting a third transmission signal corresponding to the translated first content to the first processor by the second processor.
[0306] According to one embodiment, the method may include the operation of transmitting at least one of the first transmission signal or the third transmission signal to the external device through the communication circuit by the first processor.
[0307] According to one embodiment, the operation of transmitting to the external device may include, when a setting for blocking the transmission of user voice is activated, preventing the first transmission signal from being transmitted to the communication circuit by the first processor, and transmitting the third transmission signal to the external device through the communication circuit.
[0308] According to one embodiment, the method may include the operation of translating a second content obtained through voice recognition of at least one of the second received signal or the third received signal by the second processor into a language related to the user of the electronic device.
[0309] According to one embodiment, the method may include the operation of transmitting a fourth received signal corresponding to the translated second content to the first processor by the second processor.
[0310] According to one embodiment, the method may include an operation in which the first processor outputs a sound corresponding to at least one of the second received signal or the fourth received signal through the speaker.
[0311] According to one embodiment, the operation of outputting the sound may include, when a setting for blocking the output of the user's voice of the external device is activated, preventing the second received signal from being transmitted to the speaker by the first processor and outputting a sound corresponding to the fourth received signal through the speaker.
[0312] According to one embodiment, the method may include the operation of transmitting channel setting information to the first processor for mapping each of a plurality of transmission signals and reception signals to each of independent channels when the second processor recognizes the occurrence of a call event.
[0313] According to one embodiment, the method may include an operation in which the first processor maps each of the first received signal, the second received signal, the third received signal, the first transmitted signal, and the second transmitted signal to each of independent channels based on the channel setting information.
[0314] According to one embodiment, the tuning of the first transmission signal and the tuning of the first reception signal may include at least one of echo cancelling or noise suppression.
[0315] An electronic device according to one embodiment of the present disclosure may include a microphone, a speaker, a communication circuit, a first processor, a second processor, and a memory. The first processor may include at least one call recording module configured to transmit signals related to a call to the second processor while in a call with an external device through the communication circuit. The at least one call recording module may be configured to acquire a first receiving signal received through the communication circuit and a first transmitting signal corresponding to a user utterance input through the microphone during the call. The at least one call recording module may be configured to generate a second transmitting signal by tuning the first transmitting signal based on a first setting value. The at least one call recording module may be configured to generate a second receiving signal and a third receiving signal by tuning the first receiving signal based on a second setting value and a third setting value, respectively. The at least one call recording module may be configured to transmit each of the first transmission signal, the second transmission signal, the first reception signal, and at least one of the second reception signal or the third reception signal to the second processor through independent channels. The second processor may include a recording module configured to generate and store call recording data based on the first reception signal and the first transmission signal received from the first processor. The second processor may include at least one voice recognition module configured to perform voice recognition on each of the second reception signal or the third reception signal received from the first processor and the second transmission signal.
[0316] According to one embodiment, the second processor may include at least one translation module configured to translate at least one of the contents obtained as a result of performing speech recognition on each of the second received signal or the third received signal and the second transmitted signal, and to transmit it to the first processor.
[0317] According to various embodiments of the present disclosure, in performing the recording and / or translation functions of received signals and / or transmitted signals related to a call, by generating a tuning signal suitable for voice recognition through a plurality of channels to perform voice recognition, the voice recognition performance and efficiency can be increased and the user's call quality can be improved.
[0318]
[0319] FIG. 9 is a block diagram of an electronic device (901) in a network environment (900) according to various embodiments. Referring to FIG. 9, in the network environment (900), the electronic device (901) may communicate with an electronic device (902) through a first network (998) (e.g., a short-range wireless communication network) or may communicate with at least one of an electronic device (904) or a server (908) through a second network (999) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (901) may communicate with the electronic device (904) through a server (908). According to one embodiment, the electronic device (901) may include a processor (920), memory (930), input module (950), sound output module (955), display module (960), audio module (970), sensor module (976), interface (977), connection terminal (978), haptic module (979), camera module (980), power management module (988), battery (989), communication module (990), subscriber identification module (996), or antenna module (997). In some embodiments, at least one of these components (e.g., connection terminal (978)) may be omitted from the electronic device (901), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (976), camera module (980), or antenna module (997)) may be integrated into a single component (e.g., display module (960)).
[0320] The processor (920) can control at least one other component (e.g., a hardware or software component) of the electronic device (901) connected to the processor (920) by executing software (e.g., a program (940)), for example, and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (920) can store commands or data received from other components (e.g., a sensor module (976) or a communication module (990)) in volatile memory (932), process the commands or data stored in volatile memory (932), and store the resulting data in non-volatile memory (934). According to one embodiment, the processor (920) may include a main processor (921) (e.g., a central processing unit or an application processor) or an auxiliary processor (923) that can operate independently or together with it (e.g., a graphics processing unit, a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor). For example, if the electronic device (901) includes a main processor (921) and an auxiliary processor (923), the auxiliary processor (923) may be configured to use lower power than the main processor (921) or to be specialized for a designated function. The auxiliary processor (923) may be implemented separately from the main processor (921) or as part thereof.
[0321] The auxiliary processor (923) may control at least some of the functions or states associated with at least one component of the electronic device (901) (e.g., display module (960), sensor module (976), or communication module (990)) on behalf of the main processor (921) while the main processor (921) is in an inactive (e.g., sleep) state, or together with the main processor (921) while the main processor (921) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (923) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (980) or communication module (990)). According to one embodiment, the auxiliary processor (923) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (901) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (908)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, the artificial intelligence model may include a software structure, either additionally or substantially.
[0322] The memory (930) can store various data used by at least one component of the electronic device (901) (e.g., processor (920) or sensor module (976)). The data may include, for example, software (e.g., program (940)) and input or output data for related commands. The memory (930) may include volatile memory (932) or non-volatile memory (934).
[0323] The program (940) may be stored as software in memory (930) and may include, for example, an operating system (942), middleware (944), or an application (946).
[0324] The input module (950) can receive commands or data to be used for a component of the electronic device (901) (e.g., processor (920)) from outside the electronic device (901) (e.g., user). The input module (950) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).
[0325] The sound output module (955) can output a sound signal to the outside of the electronic device (901). The sound output module (955) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.
[0326] The display module (960) can visually provide information to an external (e.g., user) of the electronic device (901). The display module (960) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (960) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.
[0327] The audio module (970) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (970) can acquire sound through the input module (950) or output sound through the sound output module (955) or an external electronic device (e.g., electronic device (902)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (901).
[0328] The sensor module (976) can detect the operating state of the electronic device (901) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (976) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.
[0329] The interface (977) may support one or more specified protocols that can be used for the electronic device (901) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (902)). According to one embodiment, the interface (977) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.
[0330] The connection terminal (978) may include a connector through which the electronic device (901) can be physically connected to an external electronic device (e.g., electronic device (902)). According to one embodiment, the connection terminal (978) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).
[0331] The haptic module (979) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that can be perceived by the user through tactile or kinesthetic senses. According to one embodiment, the haptic module (979) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.
[0332] The camera module (980) can capture still images and video. According to one embodiment, the camera module (980) may include one or more lenses, image sensors, image signal processors, or flashes.
[0333] The power management module (988) can manage power supplied to the electronic device (901). According to one embodiment, the power management module (988) may be implemented, for example, as at least part of a power management integrated circuit (PMIC).
[0334] The battery (989) can supply power to at least one component of the electronic device (901). According to one embodiment, the battery (989) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.
[0335] The communication module (990) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (901) and an external electronic device (e.g., electronic device (902), electronic device (904), or server (908)), and the performance of communication through the established communication channel. The communication module (990) may include one or more communication processors that operate independently of the processor (920) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (990) may include a wireless communication module (992) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (994) (e.g., LAN (local area network) communication module, or power line communication module). The corresponding communication module among these communication modules can communicate with an external electronic device (904) through a first network (998) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (999) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (992) can identify or authenticate the electronic device (901) within a communication network such as the first network (998) or the second network (999) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (996).
[0336] The wireless communication module (992) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (992) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (992) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (992) can support various requirements specified in the electronic device (901), external electronic device (e.g., electronic device (904)), or network system (e.g., second network (999)). According to one embodiment, the wireless communication module (992) may support a Peak data rate (e.g., 20 Gbps or more) for eMBB realization, loss coverage (e.g., 164 dB or less) for mMTC realization, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for URLLC realization.
[0337] An antenna module (997) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (997) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (997) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (998) or a second network (999), may be selected from the plurality of antennas, for example, by a communication module (990). A signal or power may be transmitted or received between the communication module (990) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (997).
[0338] According to various embodiments, the antenna module (997) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to a second surface (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.
[0339] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.
[0340] According to one embodiment, commands or data may be transmitted or received between an electronic device (901) and an external electronic device (904) through a server (908) connected to a second network (999). Each of the external electronic devices (902, or 104) may be the same or a different type of device as the electronic device (901). According to one embodiment, all or part of the operations performed on the electronic device (901) may be performed on one or more of the external electronic devices (902, 104, or 108). For example, if the electronic device (901) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (901) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (901). The electronic device (901) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (901) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device (904) may include an Internet of Things (IoT) device. The server (908) may be an intelligent server using machine learning and / or neural networks. According to one embodiment, the external electronic device (904) or the server (908) may be included within a second network (999).The electronic device (901) can be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.
[0341]
[0342] FIG. 10 is a block diagram (1000) of an electronic device (1001) for supporting at least one network communication according to one embodiment. The electronic device (1001) may include, for example, a portable device (e.g., a smartphone, a tablet, a portable multimedia device, a portable medical device, a camera, or a wearable device), a computer device, or a home appliance. The electronic device (1001) according to the embodiments of this document is not limited to the aforementioned devices.
[0343] Referring to FIG. 10, the electronic device (1001) may include a processor (1020), a communication processor (1060), a first RFIC (radio frequency integrated circuit) (1022), a second RFIC (1024), a third RFIC (1026), an IFIC (intermediate frequency integrated circuit) (1028), a first RFFE (radio frequency front end) (1032), a second RFFE (1034), a third RFFE (1036), a phase converter (1038), a plurality of antennas (1048), a first antenna (1042), a second antenna (1044), and / or an antenna module (1046).
[0344] The antenna module (1046) may include a third RFIC (1026) and / or a plurality of antennas (1048). The third RFIC (1026) may include a third RFFE (1036) including a phase converter (1038), but is not limited thereto. The third RFIC (1026), the third RFFE (1036), the phase converter (1038), and / or the plurality of antennas (1048) may be implemented to be included in or mounted on a separate module or substrate.
[0345] The processor (1020) may be implemented, for example, as an application processor and may execute an application stored in the electronic device (1001). The processor (1020) and the communication processor (1060) may transmit and / or receive data through an interface. The processor (1020) may provide at least some of the data generated by the application to the communication processor (1060). The communication processor (1060) may provide at least some of the data received from a network (e.g., a first cellular network (1092) and / or a second cellular network (1094)) to the processor (1020), and the processor (1020) may use the received data for the execution of the application.
[0346] Depending on the implementation, the communication processor (1060) may be implemented as a single chip (or single package) with the processor (1020), but this is exemplary and may also be implemented as independent hardware. If the communication processor (1060) is implemented as a chip (or package), the chip (or package) may include a storage device (e.g., memory) in which protocol information for communication with legacy networks (e.g., LTE (long term evolution) protocol information), protocol information for communication with 5G networks (e.g., NR (new radio) protocol information), and / or protocol information for communication with communication networks beyond 5G (e.g., 6G networks) is stored. For example, the communication processor (1060) may utilize multiple protocol stacks to perform MR-DC (multiple radio access technology-dual connectivity) or dual SIM (subscriber identification module) services. The communication processor (1060) may support the establishment of a communication channel in a band to be used for wireless communication with the first cellular network (1092), and legacy network communication through the established communication channel. According to various embodiments, the first cellular network (1092) may be a legacy network including a second-generation (2G), 3G, 4G, and / or LTE network. The communication processor (1060) may support a designated band among the bands to be used for wireless communication with the second cellular network (1094) (e.g., FR (frequency range) 1 (e.g., 410 MHz (megahertz) to 7.125 GHz (gigahertz), but without limitation) (or, sub 6), and / or FR2 (e.g., 24.It may support the establishment of a communication channel corresponding to 25GHz to 71GHz (but not limited thereto) (or, above 6)) and 5G network communication through the established communication channel. The second cellular network (1094) may be, for example, a 5G network, but is not limited thereto. For example, the second cellular network (1094) may be a communication network beyond 5G (e.g., a 6G network). The communication processor (1060) may operate based on information (or instructions) associated with an LTE protocol stack (e.g., E-UTRA (evolved UMTS (universal mobile telecommunication system) terrestrial radio access network), EPC (evolved packet core), and / or EPS (evolved packet system), but not limited thereto) stored in built-in memory (or accessible memory). The communication processor (1060) stores 5G (e.g., NR, 5GC (5.)) in built-in memory (or accessible memory). th generation core), and / or 5GS (5 thIt may operate based on information (or instructions) associated with a protocol stack (a generation system, but without limitations). The communication processor (1060) may generate a baseband signal based on data from, for example, the processor (1020). The communication processor (1060) may provide data processed from a baseband signal received from the first RFIC (1022), the second RFIC (1024), and / or IFIC (1028) to the processor (1020). The electronic device (1001) may also perform communication based on dual connectivity. For example, the electronic device (1001) may perform communication using an LTE network and a 5G network based on heterogeneous radio access technology (RAT), for example, ENDC (E-UTRA NR dual connectivity). For example, an electronic device (1001) (e.g., a communication processor (1060)) may perform dual connectivity communication based on FR1 and FR2 of NR, which is a single RAT. Meanwhile, there is no limitation on the type of dual connectivity, and depending on the type of dual connectivity, at least some of the components described above may be implemented so as not to be included in the electronic device (1001).
[0347] The first RFIC (1022) can convert a baseband signal generated by a communication processor (1060) into a radio frequency (RF) signal of a first frequency band (e.g., about 700 MHz to about 3 GHz) during transmission. During reception, the first RFFE (1032) can receive an RF signal of the first frequency band through a first antenna (1042), preprocess the received RF signal, and provide it to the first RFIC (1022). The first RFIC (1022) can convert the preprocessed RF signal into a baseband signal so that it can be processed by the communication processor (1060).
[0348] The second RFIC (1024) can convert a baseband signal generated by the communication processor (1060) into an RF signal of the second frequency band (e.g., FR 1 (e.g., a frequency band of about 7.125 GHz or less)) during transmission. The second RFFE (1034) can receive the RF signal of the second frequency band through an antenna (e.g., the second antenna (1044)) during reception, preprocess the received RF signal, and provide it to the second RFIC (1024). The second RFIC (1024) can convert the preprocessed RF signal of the second frequency band into a baseband signal so that it can be processed by the communication processor (1060). The first RFIC (1022) and the second RFIC (1024) may be implemented as two or more independent chips (or multiple packages) or as a single chip (or a single package). The first RFIC (1022), the second RFIC (1024), the first RFFE (1032), and / or the second RFFE (1034) may include a plurality of power amplifiers (PAs), low noise amplifiers (LNAs), filters, multiplexers, and / or switches to support a plurality of RF paths (e.g., RF transmission paths and / or RF reception paths).
[0349] The third RFIC (1026) can convert a baseband signal generated by the communication processor (1060) into an RF signal of a third frequency band (e.g., FR 2 (e.g., a frequency band of about 24.25 GHz or higher)) upon transmission. The third RFFE (1036) can receive the RF signal of the third frequency band through an antenna (e.g., a plurality of antennas (1048)) upon reception and preprocess the received RF signal. The third RFIC (1026) and / or IFIC (1028) can convert the preprocessed RF signal of the third frequency band into a baseband signal so that it can be processed by the communication processor (1060). According to one embodiment, the third RFFE (1036) may be formed as part of the third RFIC (1026). In various embodiments, the third RFIC (1026) may transmit and / or receive signals with the communication processor (1060). For example, when communication based on dual connectivity of FR1 and FR2 is performed, the second RFIC (1024) and the second RFFE (1034) may be used for processing signals in the FR1 band, and the third RFIC (1026) and the IFIC (1028) may be used for processing signals in the FR2 band, but are not limited thereto.
[0350] According to one embodiment, the electronic device (1001) may include an IFIC (1028) separately from or at least as part of the third RFIC (1026). For example, the IFIC (1028) may convert a baseband signal generated by the communication processor (1060) into an RF signal (hereinafter referred to as an IF signal) in an intermediate frequency band (e.g., about 9 GHz to about 11 GHz) and then transmit the IF signal to the third RFIC (1026). The third RFIC (1026) may convert the IF signal into an RF signal in a third frequency band. Upon reception, the third RFIC (1026) may receive the RF signal in the third frequency band through an antenna (e.g., a plurality of antennas (1048)), convert the received RF signal into an IF signal, and provide it to the IFIC (1028). The IFIC (1028) can convert the IF signal into a baseband signal so that the communication processor (1060) can process it. In various embodiments, the IFIC (1028) may transmit and / or receive signals to and from the communication processor (1060). The RFICs (1022, 1024, 1026) and / or the IFIC (1026) may include at least one mixer (e.g., a mixer that performs frequency up conversion and / or a mixer that performs frequency down conversion).
[0351] A plurality of antennas (1048) may be formed into an antenna array comprising a plurality of antenna elements that can be used for beamforming. In this case, the third RFIC (1026) may include, for example, a plurality of phase shifters (1038) corresponding to the plurality of antenna elements as part of the third RFFE (1036). The plurality of phase shifters (1038) may, at transmission, convert the phase of each RF signal of the third frequency band to be transmitted to the outside of the electronic device (1001) (e.g., a base station of a 5G network) through an antenna element corresponding to each of the plurality of phase shifters (1038). At reception, the plurality of phase shifters (1038) may convert the phase of the RF signal of the third frequency band received from the outside to the same or substantially the same phase through an antenna element corresponding to each of the plurality of phase shifters (1038). According to an embodiment, the electronic device (1001) may include a plurality of antenna modules (1046). The electronic device (10101) may select and use some of the plurality of antenna modules (1046) or may use all of the plurality of antenna modules (1046).
[0352]
[0353] FIG. 11 is a block diagram illustrating an ASR module (1100) according to one embodiment.
[0354] According to one embodiment, the ASR module (1100) may include a front end (1110), an end-point detector (EPD) (1120), a wake-up module (1130), and / or an automatic speech recognition model (1140). The ASR module (1100) may generate text data using voice input. For example, the ASR module (1100) may acquire voice input through an I / O interface and / or an external device, and process the acquired voice input to output generated text data. For example, the ASR module (1100) may be a software module implemented by executing instructions by a processor. Hereinafter, operations performed by the ASR module (1100) and / or its components may be referred to as operations of the processor of the device implementing the ASR module (1100).
[0355] The front end (1110) can perform preprocessing operations on the voice input. For example, the front end (1110) can remove echoes from the voice input using an echo cancellation module (e.g., an acoustic echo canceller (AEC) and / or a residual echo suppressor (RES)). For example, the front end (1110) can remove noise from the voice input using a noise removal module (e.g., a noise suppressor, NS).
[0356] The EPD (1120) can detect the end point of the voice input. For example, the EPD (1120) can detect the end point of the voice input and, based on the detection result, identify the end point of the user utterance corresponding to the voice input.
[0357] The wake-up module (1130) can selectively transmit the voice input to an automatic speech recognition model (1140) based on whether the voice input contains a specified wake word (or, wake-up signal). For example, the wake-up module (1130) can identify whether a specified wake word exists in the voice input preprocessed from the front end (1110) through keyword spotting.
[0358] The automatic speech recognition model (1140) may include any model for obtaining text data from a speech input. For example, the automatic speech recognition model (1140) may extract text data from a speech input or / or convert the speech input into text data based on a STT algorithm. For example, the STT (speech to text) algorithm may include a hidden Markov model, a Gaussian mixture model, a deep neural network model, an n-gram language model, other statistical models, and / or a combination thereof. For example, the automatic speech recognition model (1140) may include one or more models. One or more models may be, for example, models for recognizing different languages.
[0359] The structure of the ASR module (1100) illustrated in FIG. 11 is exemplary, and the embodiments of the present disclosure are not limited thereto. As an example, the ASR module (1100) may be implemented as an E2E (end-to-end) model (e.g., CTC (connectionist temporal classification), RNN-T (recurrent neural network transducer), LAS (listen, attend, and spell), or Hybrid CTC / LAS).
[0360] For example, the ASR module (1100) may include other components. For example, the ASR module (1100) may further include a feature extraction module (not shown) and an encoder (not shown). The ASR module (1100) may extract features from a voice input (e.g., using the feature extraction module) to obtain a feature vector, and encode the feature vector using the encoder.
[0361] The electronic device according to the various embodiments disclosed in this document may be of various forms. The electronic device may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a consumer electronics device. The electronic device according to the embodiments of this document is not limited to the devices described above.
[0362] The various embodiments of this document and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of said items unless the relevant context clearly indicates otherwise. In this document, phrases such as "A or B," "at least one of A and B," "at least one of A or B," "A, B or C," "at least one of A, B and C," and "at least one of A, B, or C" may each include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Terms such as "first," "second," or "first" or "second" may be used simply to distinguish said components from other said components and do not limit said components in any other aspect (e.g., importance or order). Where any (e.g., 1st) component is referred to as "coupled" or "connected" to another (e.g., 2nd) component, with or without the terms "functionally" or "communicationly," it means that said any component may be connected to said other component directly (e.g., via a wire), wirelessly, or through a third component.
[0363] The term “module” as used in the various embodiments of this document may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).
[0364] Various embodiments of the present document may be implemented as software (e.g., program (940)) comprising one or more instructions stored in a storage medium (e.g., internal memory (936) or external memory (938)) readable by a machine (e.g., electronic device (901)). For example, a processor (e.g., processor (920)) of the machine (e.g., electronic device (901)) may call at least one of the one or more instructions stored in the storage medium and execute it. This enables the machine to be operated to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' simply means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.
[0365] According to one embodiment, the method according to the various embodiments disclosed herein may be provided as included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
[0366] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities, and some of the multiple entities may be separated and placed in other components. According to various embodiments, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.
Claims
In electronic devices, mike; speaker; Communication circuit; First processor; 2nd processor; and Includes memory, The above-mentioned first processor is, While communicating with an external device through the above communication circuit, a first receiving signal received through the above communication circuit and a first transmitting signal corresponding to a user's utterance input through the microphone are obtained, and A second transmission signal is generated by tuning the first transmission signal based on a first setting value, and The first received signal is tuned based on the second set value and the third set value, respectively, to generate the second received signal and the third received signal, and At least one of the second received signal or the third received signal and each of the second transmitted signal are configured to be transmitted to the second processor through independent channels, and The above second processor is, An electronic device configured to perform voice recognition for each of at least one of the second received signal or the third received signal and the second transmitted signal. In claim 1, The above-mentioned first processor is, The first receiving signal and the first transmitting signal are each configured to be transmitted to the second processor through independent channels, and The above second processor is, An electronic device configured to generate and store call recording data based on the first receiving signal and the first transmitting signal. In claim 2, The above second processor is, An electronic device configured to store content obtained as a result of performing voice recognition on each of the second received signal or the third received signal and the second transmitted signal, in association with the call recording data. In claim 1, If the translation function is enabled during a call, The above second processor is, Translate the first content obtained through voice recognition of the second transmission signal into a language related to the user of the external device, and It is configured to transmit a third transmission signal corresponding to the above-mentioned translated first content to the first processor, and The above-mentioned first processor is, An electronic device configured to transmit at least one of the first transmission signal or the third transmission signal to the external device through the communication circuit. In claim 4, The above-mentioned first processor is, An electronic device configured to prevent the first transmission signal from being transmitted to the communication circuit and to transmit the third transmission signal to the external device through the communication circuit when a setting to block the transmission of user voice is activated. In claim 1, If the translation function is enabled during a call, The above second processor is, Translating the second content obtained through voice recognition of at least one of the second received signal or the third received signal into a language related to the user of the electronic device, and It is configured to transmit a fourth reception signal corresponding to the above-mentioned translated second content to the first processor, and The above-mentioned first processor is, An electronic device configured to output a sound corresponding to at least one of the second received signal or the fourth received signal through the speaker. In claim 6, The above-mentioned first processor is, An electronic device configured to prevent the second received signal from being transmitted to the speaker and to output a sound corresponding to the fourth received signal through the speaker when a setting to block the output of the user's voice of the above external device is activated. In claim 1, The above second processor is, When the occurrence of a call event is recognized, channel setting information for mapping each of the multiple transmission signals and reception signals to each of independent channels is transmitted to the first processor, and The above-mentioned first processor is, An electronic device configured to map each of the first receiving signal, the second receiving signal, the third receiving signal, the first transmitting signal, and the second transmitting signal to each of independent channels based on the above channel setting information. In claim 1, An electronic device in which the tuning of the first transmission signal and the tuning of the first reception signal include at least one of echo cancelling or noise suppression. In a voice recognition method of an electronic device comprising a first processor and a second processor, An operation of acquiring, by the first processor, a first received signal received through the communication circuit of the electronic device and a first transmitted signal corresponding to a user utterance input through the microphone of the electronic device while in contact with an external device; An operation to generate a second transmission signal by tuning the first transmission signal based on a first setting value by the first processor; An operation to generate a second received signal and a third received signal by tuning the first received signal based on the second set value and the third set value, respectively, by the first processor; The operation of transmitting each of the second received signal or the third received signal and the second transmitted signal to the second processor through independent channels by the first processor; and A method comprising the operation of performing speech recognition for each of at least one of the second received signal or the third received signal and the second transmitted signal by the second processor. In claim 10, The operation of transmitting each of the first received signal and the first transmitted signal to the second processor through independent channels by the first processor; and A method comprising the operation of generating and storing call recording data based on the first received signal and the first transmitted signal by the second processor. In claim 10, The operation of translating the first content obtained through voice recognition of the second transmission signal by the second processor into a language related to the user of the external device; The operation of transmitting a third transmission signal corresponding to the translated first content to the first processor by the second processor; and A method comprising the operation of transmitting at least one of the first transmission signal or the third transmission signal to the external device through the communication circuit by the first processor. In claim 10, The operation of translating a second content obtained through voice recognition of at least one of the second received signal or the third received signal by the second processor into a language related to the user of the electronic device; The operation of transmitting a fourth received signal corresponding to the translated second content to the first processor by the second processor; and A method comprising the operation of outputting a sound corresponding to at least one of the second received signal or the fourth received signal through the speaker by the first processor. In claim 10, When the occurrence of a call event is recognized by the second processor, the operation of transmitting channel setting information to the first processor for mapping each of a plurality of transmission signals and reception signals to each of independent channels; and A method comprising the operation of mapping each of the first received signal, the second received signal, the third received signal, the first transmitted signal, and the second transmitted signal to each of independent channels based on the channel setting information by the first processor. In electronic devices, mike; speaker; Communication circuit; First processor; 2nd processor; and Includes memory, The above-mentioned first processor is, It includes at least one call recording module configured to transmit signals related to a call to a second processor while in a call with an external device through the above communication circuit, and The above-mentioned at least one call recording module is, During the above call, a first receiving signal received through the communication circuit and a first transmitting signal corresponding to user speech input through the microphone are obtained, A second transmission signal is generated by tuning the first transmission signal based on a first setting value, and The first received signal is tuned based on the second set value and the third set value, respectively, to generate the second received signal and the third received signal, and At least one of the first transmission signal, the second transmission signal, the first reception signal, and the second reception signal or the third reception signal is configured to be transmitted to the second processor through independent channels, and The above second processor is, A recording module configured to generate and store call recording data based on the first receiving signal and the first transmitting signal received from the first processor; and An electronic device comprising at least one voice recognition module configured to perform voice recognition for each of the second received signal or the third received signal received from the first processor and the second transmitted signal.