Echo processing methods, devices, storage media and chips
By calculating the correlation and leakage coefficient of speaker and microphone signals and using gain suppression error signals, the acoustic echo problem in smart devices is solved, improving the performance of audio and video calls and voice recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING XIAOMI MOBILE SOFTWARE CO LTD
- Filing Date
- 2023-01-31
- Publication Date
- 2026-06-30
AI Technical Summary
In smart devices, the way microphones and speakers work causes serious acoustic echo problems, affecting audio and video calls and voice recognition functions.
By determining the correlation between the speaker input signal and the microphone received signal, the leakage coefficient and gain are calculated, and the gain is used to suppress the error signal and reduce echo interference.
It effectively reduces the intensity of the echo signal collected by the microphone, avoids voice misrecognition and the phenomenon of remote devices playing the remote user's own voice, and improves the quality of audio and video calls and voice recognition.
Smart Images

Figure CN116110420B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of echo cancellation technology, and in particular to an echo processing method, apparatus, storage medium and chip. Background Technology
[0002] In people's daily lives, there are smart devices such as mobile phones, smart TVs, and smart speakers. These smart devices are equipped with microphones for collecting sound and speakers for playing audio.
[0003] However, the way microphones and speakers work can cause serious acoustic problems for these smart devices. In acoustic echo scenarios, the sound signal played by the smart device's speaker is picked up again by the microphone on the smart device after being affected by the room's impulse response, thus generating an echo signal. Summary of the Invention
[0004] To overcome the problems existing in related technologies, this disclosure provides an echo processing method, apparatus, storage medium, and chip.
[0005] According to a first aspect of the present disclosure, an echo processing method is provided, comprising:
[0006] If the first correlation is greater than the preset correlation, the leakage coefficient corresponding to the far-field signal is determined; wherein, the first correlation is the correlation between the speaker input signal and the microphone received signal, the speaker input signal is the signal input to the audio playback device, the microphone received signal is the signal collected by the audio acquisition device, and the far-field signal is the signal output by the audio playback device;
[0007] The gain is obtained based on the leakage coefficient;
[0008] The error signal is suppressed by the gain to obtain the actual signal acquired by the audio acquisition device. The error signal is the residual echo signal after echo cancellation of the microphone received signal.
[0009] Optionally, the first relevance is determined by the following steps:
[0010] Based on the first frequency domain signal corresponding to the loudspeaker input signal, the first power spectrum of the first frequency domain signal is obtained;
[0011] The second power spectrum of the second frequency domain signal is obtained based on the second frequency domain signal corresponding to the microphone received signal;
[0012] Based on the first frequency domain signal, the second frequency domain signal, and the first smoothing factor, the cross spectrum between the first frequency domain signal and the second frequency domain signal is obtained;
[0013] The first correlation degree is obtained based on the first power spectrum, the second power spectrum, and the cross spectrum.
[0014] Optionally, obtaining the first correlation degree based on the first power spectrum, the second power spectrum, and the cross spectrum includes:
[0015] The second correlation degree is obtained based on the first power spectrum, the second power spectrum, and the cross spectrum;
[0016] If the second relevance is greater than the first preset value, the first preset value shall be used as the first relevance.
[0017] If the second relevance is less than the first preset value, the second relevance is taken as the first relevance.
[0018] Optionally, determining the leakage coefficient corresponding to the far-field signal includes:
[0019] Based on the third frequency domain signal corresponding to the error signal, the third power spectrum corresponding to the third frequency domain signal is obtained;
[0020] The leakage coefficient is obtained based on the second power spectrum, the third power spectrum, and the second smoothing factor.
[0021] Optionally, the second smoothing factor is determined by the following steps:
[0022] Based on the second power spectrum and the third power spectrum, a third smoothing factor is obtained;
[0023] If the third smoothing factor is less than the second preset value, the second smoothing factor is obtained based on the third smoothing factor and the leakage estimation smoothing factor;
[0024] If the third smoothing factor is greater than the second preset value, the second smoothing factor is obtained based on the second preset value and the leakage estimation smoothing factor.
[0025] Optionally, obtaining the gain based on the leakage coefficient includes:
[0026] If the leakage coefficient is less than a third preset value, the gain is obtained based on a fourth preset value and the leakage coefficient.
[0027] If the leakage coefficient is greater than the third preset value, the gain is obtained based on the fourth preset value and the third preset value.
[0028] Optionally, the error signal is determined through the following steps:
[0029] Estimating the room impulse response using a filter;
[0030] The room impulse response is applied to the speaker input signal to generate an estimated signal;
[0031] The estimated signal is used to perform echo cancellation on the microphone received signal to obtain the error signal.
[0032] According to a second aspect of the present disclosure, an echo processing apparatus is provided, comprising:
[0033] The leakage coefficient determination module is configured to determine the leakage coefficient corresponding to the far-field signal if the first correlation is greater than the preset correlation; wherein, the first correlation is the correlation between the speaker input signal and the microphone received signal, the speaker input signal is the signal input to the audio playback device, the microphone received signal is the signal acquired by the audio acquisition device, and the far-field signal is the signal output by the audio playback device.
[0034] A gain determination module is configured to obtain the gain based on the leakage coefficient;
[0035] The suppression module is configured to suppress the error signal using the gain to obtain the actual output signal of the audio playback device, wherein the error signal is the residual echo signal after echo cancellation of the microphone received signal.
[0036] According to a third aspect of the present disclosure, a computer-readable storage medium is provided that stores computer program instructions thereon, which, when executed by a processor, implement the steps of the echo processing method provided in the first aspect of the present disclosure.
[0037] According to a fourth aspect of the present disclosure, a chip is provided, including a processor and an interface; the processor is configured to read instructions to execute the steps of the echo processing method provided in the first aspect of the present disclosure.
[0038] The technical solutions provided by the embodiments of this disclosure may include the following beneficial effects:
[0039] If the first correlation between the speaker input signal and the microphone received signal is greater than a preset correlation, it is confirmed that there is no near-field signal in the microphone received signal, meaning the user using the near-end device has not emitted any speech. To further reduce interference from the error signal obtained after echo cancellation, gain suppression can be applied to further reduce the echo intensity of the error signal, resulting in a smaller error signal acquired by the microphone of the near-end device, thus further reducing echo interference. In this way, the near-end device will not misrecognize speech based on the error signal with lower echo intensity, nor will the far-end device play the far-end user's own voice.
[0040] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description
[0041] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.
[0042] Figure 1 This is a flowchart illustrating an echo processing method according to an exemplary embodiment.
[0043] Figure 2 This is a signal flow diagram illustrating an echo processing method according to an exemplary embodiment.
[0044] Figure 3 This is a block diagram illustrating an echo processing apparatus according to an exemplary embodiment.
[0045] Figure 4 This is a block diagram illustrating an echo processing apparatus according to an exemplary embodiment.
[0046] Figure 5 This is a block diagram illustrating an echo processing apparatus according to an exemplary embodiment. Detailed Implementation
[0047] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.
[0048] It should be noted that all actions involving the acquisition of signals, information, or data in this application are carried out in compliance with the relevant data protection laws and policies of the country where the application is located, and with the authorization granted by the owner of the relevant device.
[0049] The way microphones and speakers work causes serious acoustic problems for smart devices. Depending on the scenario in which the echo signal is generated, it can be divided into line echo and acoustic echo. The main reason for the generation of acoustic echo is that the sound signal played by the speaker of the smart device is received by the microphone of the smart device again after passing through the room impulse response. This causes the near-end speech signal (near-talk signal) to be severely interfered with by the echo signal, affecting normal audio and video calls and the voice recognition function of the smart device.
[0050] For example, in the scenario of applying to smart TVs, if the smart TV is playing audio or video and the user wants to use voice control to change the program, the smart TV's microphone will simultaneously pick up the user's voice and the sound output by the smart TV, thus affecting the smart TV's voice recognition function.
[0051] In a smartphone scenario, if phone A and phone B are having an audio call, and phone B's microphone picks up the sound from phone B's speaker and transmits it to phone A, then the user of phone A will hear the sound of their previous conversation with the user of phone B, thus interfering with the normal audio call.
[0052] Based on this, related technologies propose using a filter to estimate the room impulse response, thereby generating an estimated signal. This estimated signal is used to cancel the interference signal after being affected by the room impulse response, thus reducing the interference signal collected by the microphone and reducing the impact of the interference signal on the near-field signal. However, after canceling the interference signal with the estimated signal, there is still some interference signal. This part of the interference signal will still interfere with the near-field signal. This part of the interference signal is referred to as the error signal below.
[0053] To further prevent the cancelled interference signal from interfering with the near-field signal, this disclosure proposes an echo processing method. This method mainly involves post-processing techniques after echo cancellation. This method can be applied to smart devices such as smartphones, tablets, computers, smart TVs, and smart speakers. Please refer to [link / reference]. Figure 1 As shown, the method includes the following steps:
[0054] In step S11, if the first correlation is greater than the preset correlation, the leakage coefficient corresponding to the far-field signal is determined; wherein, the first correlation is the correlation between the speaker input signal and the microphone received signal, the speaker input signal is the signal input to the audio playback device, the microphone received signal is the signal acquired by the audio acquisition device, and the far-field signal is the signal output by the audio playback device.
[0055] The audio playback device can be a speaker, and the audio acquisition device can be a microphone.
[0056] Please refer to Figure 2 As shown, Figure 2 In this context, x(n) is the speaker input signal; w(n) is the actual room impulse response; y(n) is the far-field signal output by the speaker after the room impulse response; v(n) includes the near-field signal and the noise signal; and d(n) is the microphone receiving signal, which is a combination of the far-field signal, the near-field signal, and the noise signal, input to the microphone. The filter provides an estimate of the room impulse response; e(n) is an estimated signal corresponding to y(n) generated by the filter based on the estimated room impulse response; e(n) is the signal generated by d(n) after passing through the filter. The error signal after echo cancellation still contains echo interference.
[0057] Among them, the speaker input signal x(n) is the electrical signal output by the chip of the smart device itself, such as the audio of the TV program output by the chip of the smart TV itself; the speaker input signal can also be the audio transmitted by the remote device to the user of the near device, such as the remote device transmitting the user's voice to the user using the near device during the communication between the remote device and the near device.
[0058] The actual room impulse response w(n) is a parameter that measures the delay and energy attenuation of the original audio due to sound attenuation and reflected noise when sound propagates in a closed or semi-open space.
[0059] The far-field signal y(n) is the sound output from the speaker of the near-field device to the far-field device, or the sound obtained by converting the electrical signal output from the chip of the near-field device itself. For example, the sound of the far-field user output from the speaker of the near-field mobile phone, or the audio obtained by converting the electrical signal output from the chip of the smart TV itself.
[0060] The near-field signal is the sound emitted by the user using a nearby device, such as speaking to a smart TV or a nearby mobile phone. The speaker of the near-field device outputs the far-field signal, and the microphone of the near-field device picks up the near-field signal and outputs it to the far-field device. After the speaker of the near-field device outputs the far-field signal, the far-field signal is affected by the room impulse response and picked up by the microphone of the near-field device. This results in the microphone picking up not only the near-field signal but also the far-field signal, which then interferes with the near-field signal. This interference is called echo interference.
[0061] The microphone received signal d(n) is the signal received by the microphone, which includes far-field signal, near-field signal, and noise around the near-end device.
[0062] The error signal e(n) is d(n) after passing through The echo-cancelled signal, for example, d(n) minus An error signal is obtained and is then captured by the microphone. This error signal is generated by applying the room impulse response, which is estimated by the filter, to the speaker input signal. The remaining error signal is obtained after echo cancellation is performed on the microphone's received signal using this estimated signal.
[0063] The leakage coefficient reflects the leakage level of the far-field signal, that is, the level at which the far-field signal leaks to the microphone. The larger the leakage coefficient, the greater the leakage level of the far-field signal, and the more interference echoes remain in the error signal collected by the microphone.
[0064] If the first correlation between the speaker input signal and the microphone received signal is greater than the preset correlation, it indicates that the value of the speaker input signal x(n) is close to the value of the microphone received signal d(n). The speaker input signal x(n) accounts for a large proportion of the microphone received signal d(n), while the proportion of noise and proximity signal v(n) is small. In this case, it can be determined that the user using the near-end device did not produce voice, but the chip of the near-end device or the far-end device output voice.
[0065] For example, if the proportion of speaker input signal in the microphone received signal is relatively large, it indicates that the user of the nearby mobile phone is not uttering voice, while the user of the distant mobile phone is uttering voice; or the user of the smart TV is not uttering voice commands, while the smart TV is outputting audio.
[0066] At this point, the leakage coefficient of the far-field signal can be determined, that is, the leakage coefficient of the far-field signal leaking to the microphone, thereby determining the degree of interference of the far-field signal to the near-field signal.
[0067] In step S12, the gain is obtained based on the leakage coefficient.
[0068] Once the leakage coefficient is determined, the gain can be obtained based on the leakage coefficient. This gain is used to suppress the error signal.
[0069] Wherein, if the leakage coefficient is less than a third preset value, the gain is obtained based on a fourth preset value and the leakage coefficient; if the leakage coefficient is greater than the third preset value, the gain is obtained based on the fourth preset value and the third preset value.
[0070] For example, the third preset value can be 1, and the fourth preset value can also be 1, as shown in the following formula:
[0071] gain = 1 - min(DE_ratio, 1) (1)
[0072] In formula (1), gain is the gain; DE_ratio is the leakage coefficient; min(DE_ratio, 1) means taking the minimum value between DE_ratio and 1.
[0073] As can be seen from formula (1), when the leakage coefficient is less than the third preset value 1, the fourth preset value 1 will be subtracted from the leakage coefficient to obtain the gain; when the leakage coefficient is greater than the third preset value 1, the fourth preset value 1 will be subtracted from the third preset value 1 to obtain the gain 0.
[0074] Equation (1) allows the gain to be limited to 1, thereby reducing echo interference in the error signal by a gain less than 1.
[0075] In step S13, the gain is used to suppress the error signal to obtain the signal actually acquired by the audio acquisition device. The error signal is the residual echo signal after echo cancellation of the microphone received signal.
[0076] The following formula can be used to suppress the error signal:
[0077] E p =E·gain(2)
[0078] In formula (2), E is the error signal, E p The signal actually acquired by the audio acquisition device after suppressing the error signal.
[0079] For example, if the error signal value is 2 and the gain value is 0.1, then the actual signal value acquired by the audio acquisition device is 0.2. After the error signal changes from 2 to 0.2, the echo intensity of the interference signal in the error signal will decrease, thereby further reducing echo interference.
[0080] Among them, the gain is inversely proportional to the echo intensity of the error signal. The larger the error signal, the smaller the gain, so that the error signal with a large echo intensity can be suppressed better.
[0081] Please refer to Figure 2 As shown, the error signal E after gain suppression is obtained. p The suppressed error signal can be input into the room impulse response estimated by the filter to update the room impulse response estimated by the filter, so that the updated room impulse response is closer to the real room impulse response.
[0082] Using the above technical solution, when the first correlation between the speaker input signal and the microphone received signal is greater than a preset correlation, it is confirmed that there is no near-field signal in the microphone received signal, meaning that the user using the near-end device has not emitted any speech. In this case, to further reduce the interference of the error signal obtained after eliminating echo interference, gain suppression can be used to further reduce the echo intensity of the error signal, resulting in a smaller error signal collected by the microphone of the near-end device. Thus, the near-end device, based on the error signal with a smaller echo intensity, will not experience speech misrecognition, nor will the far-end device play the far-end user's own voice.
[0083] For example, when the near-end device is a smart TV, if the solution in the related technology is used, the smart TV's microphone will pick up the audio of the program being played by the smart TV itself, and thus mistakenly trigger the voice recognition function based on the audio. After adopting the solution proposed in this disclosure, the volume of the audio of the program being played by the smart TV picked up by the smart TV's microphone is relatively low, making it impossible for the smart TV to recognize the voice, and naturally, the voice recognition function will not be mistakenly triggered.
[0084] For example, when the near-end device is a mobile phone, if the solution in the related technology is adopted, the microphone of the near-end phone will capture the voice of the remote user output by the remote phone and transmit the remote user's voice back to the remote phone. In this way, the remote user will hear their own voice on the remote phone. After adopting the solution proposed in this disclosure, the voice of the remote user captured by the microphone of the near-end phone is greatly reduced, and even if it is transmitted back to the remote phone, the user using the remote phone will not hear their own voice.
[0085] In one possible implementation, the first correlation between the speaker input signal and the microphone received signal is determined by the following steps:
[0086] In step S21, the first power spectrum of the first frequency domain signal is obtained based on the first frequency domain signal corresponding to the loudspeaker input signal.
[0087] The first frequency domain signal corresponding to the loudspeaker input signal is denoted as X(k), and for ease of description, it will be referred to as X below. The formula for determining the first power spectrum is:
[0088] phi_XX=α·phi_XX+(1-α)·|X| 2 (3)
[0089] In formula (3), phi_XX is the first power spectrum; α is the first smoothing factor, which is a constant between 0 and 1; X is the first frequency domain signal corresponding to the loudspeaker input signal.
[0090] As can be seen from formula (3), the first power spectrum can be determined based on the first smoothing factor and the first frequency domain signal.
[0091] In step S22, the second power spectrum of the second frequency domain signal is obtained based on the second frequency domain signal corresponding to the microphone received signal.
[0092] The second frequency domain signal corresponding to the microphone received signal is denoted as D(k), and for ease of description, it will be referred to as D below. The formula for determining the second power spectrum is:
[0093] phi_DD=α·phi_DD+(1-α)·|D| 2 (4)
[0094] In formula (4), phi_DD is the second power spectrum; α is the first smoothing factor, which is a constant between 0 and 1; and D is the second frequency domain signal corresponding to the microphone received signal.
[0095] In step S23, the cross spectrum between the first frequency domain signal and the second frequency domain signal is obtained based on the first frequency domain signal, the second frequency domain signal, and the first smoothing factor.
[0096] The cross-spectrum between the first frequency domain signal and the second frequency domain signal is determined by the following formula:
[0097] phi_XD=α·phi_XD+(1-α)·X * D (5)
[0098] In formula (5), phi_XD is the cross spectrum; α is the first smoothing factor; X is the first frequency domain signal; D is the second frequency domain signal; X * It is represented as the conjugate of the first frequency domain signal.
[0099] In step S24, the first correlation degree is obtained based on the first power spectrum, the second power spectrum, and the cross spectrum.
[0100] Specifically, a second correlation degree can be obtained based on the first power spectrum, the second power spectrum, and the cross spectrum; if the second correlation degree is greater than a first preset value, the first preset value is used as the first correlation degree; if the second correlation degree is less than the first preset value, the second correlation degree is used as the first correlation degree.
[0101] For example, the first preset value can be 0.99 or 1, and the first relevance is determined as follows:
[0102] msc_XD1=min (msc_XD2, 0.99) (6)
[0103] In formula (6), msc_XD1 on the left side of the equation is the first relevance, msc_XD2 on the right side of the equation is the second relevance, min(msc_XD2, 0.99) is the minimum value between the second relevance and 0.99; 0.99 is the first preset value.
[0104] As can be seen from formula (6), when the second relevance msc_XD2 is greater than the first preset value of 0.99, 0.99 is used as the first relevance; when the second relevance msc_XD2 is less than the first preset value of 0.99, the second relevance msc_XD2 is used as the first relevance.
[0105] The formula for calculating the second relevance msc_XD2 is as follows:
[0106]
[0107] In formula (7), msc_XD2 is the second correlation degree; phi_XX is the first power spectrum; phi_DD is the second power spectrum; and phi_XD is the cross spectrum.
[0108] It can be seen that the second correlation degree can be obtained by dividing the square of the absolute value of the cross spectrum by the product of the first power spectrum and the second power spectrum. If the second correlation degree is less than the first preset value, the second correlation degree is used as the first correlation degree. If the second correlation degree is greater than the first preset value, the first preset value is used as the first correlation degree.
[0109] In the above scheme, after obtaining the first power spectrum and the second power spectrum through formula (3) and formula (4) respectively, the first power spectrum and the second power spectrum are substituted into formula (5) to obtain the cross spectrum; then the cross spectrum, the first power spectrum and the second power spectrum are substituted into formula (7) to obtain the second correlation degree, and the second correlation degree is substituted into formula (6) to obtain the first correlation degree.
[0110] In one possible implementation, the leakage coefficient of the long-distance signal is determined by the following steps:
[0111] In step S31, the third power spectrum corresponding to the third frequency domain signal is obtained based on the third frequency domain signal corresponding to the error signal.
[0112] The formula for determining the third power spectrum is as follows:
[0113] phi_EE=α·phi_EE+(1-α)·|E| 2 (8)
[0114] In formula (8), phi_EE is the third power spectrum; α is the first smoothing factor, which is a constant between 0 and 1; and E is the third frequency domain signal corresponding to the error signal.
[0115] In step S32, the leakage coefficient is obtained based on the second power spectrum, the third power spectrum, and the second smoothing factor.
[0116] The formula for determining the leakage coefficient is as follows:
[0117]
[0118] In formula (9), DE_ratio is the leakage coefficient; beta2 is the second smoothing factor; phi_EE is the third power spectrum; and phi_DD is the second power spectrum.
[0119] As can be seen from formula (9), based on the known second smoothing factor, second power spectrum and third power spectrum, the leakage coefficient of the far-field signal can be obtained by substituting these three parameters into formula (9).
[0120] The second smoothing factor is determined through the following sub-steps:
[0121] In sub-step A1, a third smoothing factor is obtained based on the second power spectrum and the third power spectrum.
[0122] The formula for determining the third smoothing factor is as follows:
[0123]
[0124] In formula (10), beta3 is the third smoothing factor; phi_DD is the second power spectrum; and phi_EE is the third power spectrum.
[0125] As can be seen from formula (10), the third smoothing factor can be obtained by dividing the third power spectrum by the second power spectrum and then multiplying by the constant 0.005.
[0126] In sub-step A2, if the third smoothing factor is less than the second preset value, the second smoothing factor is obtained based on the third smoothing factor and the leakage estimated smoothing factor; if the third smoothing factor is greater than the second preset value, the second smoothing factor is obtained based on the second preset value and the leakage estimated smoothing factor.
[0127] The formula for determining the second smoothing factor is as follows:
[0128] beta2=min(beta3,0.99)·alpha_XD(11)
[0129] In formula (11), beta3 is the third smoothing factor; beta2 is the second smoothing factor; alpha_XD is the leakage estimation smoothing factor, alpha_XD=|msc_XD1| 2msc_XD1 is the first relevance calculated by the above formula (6).
[0130] As can be seen from formula (11), when the third smoothing factor is less than the second preset value, the leakage estimation smoothing factor can be multiplied by the third smoothing factor to obtain the second smoothing factor; when the third smoothing factor is greater than the second preset value, the leakage estimation smoothing factor can be multiplied by the second preset value to obtain the second smoothing factor.
[0131] As can be seen, after obtaining the third smoothing factor through formula (10), the third smoothing factor is substituted into formula (11) to obtain the second smoothing factor; then the second smoothing factor is substituted into formula (9) to obtain the leakage coefficient of the far-field level.
[0132] Figure 3 This is a block diagram illustrating an echo processing apparatus according to an exemplary embodiment. (Refer to...) Figure 3 The echo processing device 300 includes a leakage coefficient determination module 310, a gain determination module 320, and a suppression module 330.
[0133] The leakage coefficient determination module 310 is configured to determine the leakage coefficient corresponding to the far-field signal if the first correlation is greater than the preset correlation; wherein, the first correlation is the correlation between the speaker input signal and the microphone received signal, the speaker input signal is the signal input to the audio playback device, the microphone received signal is the signal acquired by the audio acquisition device, and the far-field signal is the signal output by the audio playback device.
[0134] Gain determination module 320 is configured to obtain gain based on the leakage coefficient;
[0135] The suppression module 330 is configured to suppress the error signal using the gain to obtain the actual output signal of the audio playback device, wherein the error signal is the residual echo signal after echo cancellation of the microphone received signal.
[0136] Optionally, the echo processing device 300 includes:
[0137] The first power spectrum determination module is configured to obtain the first power spectrum of the first frequency domain signal based on the first frequency domain signal corresponding to the loudspeaker input signal.
[0138] The second power spectrum determination module is configured to obtain the second power spectrum of the second frequency domain signal based on the second frequency domain signal corresponding to the microphone received signal.
[0139] The cross-spectrum determination module is configured to obtain the cross-spectrum between the first frequency domain signal and the second frequency domain signal based on the first frequency domain signal, the second frequency domain signal, and the first smoothing factor;
[0140] The first correlation determination module is configured to obtain the first correlation based on the first power spectrum, the second power spectrum, and the cross spectrum.
[0141] Optionally, the first relevance determination module includes:
[0142] The second correlation determination submodule is configured to obtain the second correlation based on the first power spectrum, the second power spectrum, and the cross spectrum;
[0143] The first determining submodule is configured to use the first preset value as the first relevance when the second relevance is greater than the first preset value.
[0144] The second determining submodule is configured to use the second relevance as the first relevance when the second relevance is less than the first preset value.
[0145] Optionally, the leakage coefficient determination module 310 includes:
[0146] The third power spectrum determination submodule is configured to obtain the third power spectrum corresponding to the third frequency domain signal based on the third frequency domain signal corresponding to the error signal;
[0147] The leakage coefficient determination submodule is configured to obtain the leakage coefficient based on the second power spectrum, the third power spectrum, and the second smoothing factor.
[0148] Optionally, the echo processing device 300 includes:
[0149] The third smoothing factor determination module is configured to obtain a third smoothing factor based on the second power spectrum and the third power spectrum;
[0150] The second smoothing factor determination module is configured to obtain the second smoothing factor based on the third smoothing factor and the leakage estimation smoothing factor when the third smoothing factor is less than the second preset value.
[0151] The second smoothing factor determination module is configured to obtain the second smoothing factor based on the second preset value and the leakage estimated smoothing factor when the third smoothing factor is greater than the second preset value.
[0152] Optionally, the gain determination module 320 includes:
[0153] The first gain determination submodule is configured to obtain the gain based on a fourth preset value and the leakage coefficient when the leakage coefficient is less than a third preset value;
[0154] The second gain determination submodule is configured to obtain the gain based on the fourth preset value and the third preset value when the leakage coefficient is greater than the third preset value.
[0155] Optionally, the echo processing device 300 includes:
[0156] The impulse response estimation module is configured to estimate the room impulse response through a filter;
[0157] An estimation signal determination module is configured to apply the room impulse response to the speaker input signal to generate an estimation signal;
[0158] An error signal determination module is configured to use the estimated signal to perform echo cancellation on the microphone received signal to obtain the error signal.
[0159] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0160] This disclosure also provides a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the steps of the echo processing method provided in this disclosure.
[0161] Figure 4 This is a block diagram illustrating an apparatus 800 for echo processing according to an exemplary embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc.
[0162] Reference Figure 4 The device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input / output interface 812, a sensor component 814, and a communication component 816.
[0163] Processing component 802 typically controls the overall operation of device 800, such as operations associated with display, telephone calls, data communication, camera operation, and recording. Processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Furthermore, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
[0164] Memory 804 is configured to store various types of data to support the operation of device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, etc. Memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0165] Power supply component 806 provides power to various components of device 800. Power supply component 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power to device 800.
[0166] Multimedia component 808 includes a screen that provides an output interface between the device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of the touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 808 includes a front-facing camera and / or a rear-facing camera. When the device 800 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
[0167] Audio component 810 is configured to output and / or input audio signals. For example, audio component 810 includes a microphone (MIC) configured to receive external audio signals when device 800 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 804 or transmitted via communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
[0168] Input / output interface 812 provides an interface between processing component 802 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, start buttons, and lock buttons.
[0169] Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of device 800. For example, sensor assembly 814 may detect the on / off state of device 800, the relative positioning of components such as the display and keypad of device 800, changes in the position of device 800 or a component of device 800, the presence or absence of user contact with device 800, the orientation or acceleration / deceleration of device 800, and temperature changes of device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 814 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.
[0170] Communication component 816 is configured to facilitate wired or wireless communication between device 800 and other devices. Device 800 can access wireless networks based on communication standards, such as WiFi, 2G, or 3G, or combinations thereof. In one exemplary embodiment, communication component 816 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 816 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
[0171] In an exemplary embodiment, the apparatus 800 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the echo processing method described above.
[0172] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 804 including instructions, which can be executed by a processor 820 of the device 800 to complete the echo processing method described above. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
[0173] The aforementioned device can be a standalone electronic device or a part of a standalone electronic device. For example, in one embodiment, the device can be an integrated circuit (IC) or a chip, wherein the integrated circuit can be a single IC or a collection of multiple ICs. The chip can include, but is not limited to, the following types: GPU (Graphics Processing Unit), CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), and SoC (System on Chip). The aforementioned integrated circuit or chip can be used to execute executable instructions (or code) to implement the aforementioned echo processing method. The executable instructions can be stored in the integrated circuit or chip or obtained from other devices or equipment. For example, the integrated circuit or chip includes a processor, memory, and an interface for communicating with other devices. The executable instruction can be stored in the memory, and when the executable instruction is executed by the processor, it implements the echo processing method described above; or, the integrated circuit or chip can receive the executable instruction through the interface and transmit it to the processor for execution to implement the echo processing method described above.
[0174] In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable device, the computer program having a code portion for performing the echo processing method described above when executed by the programmable device.
[0175] Figure 5 This is a block diagram illustrating an apparatus 1900 for echo processing according to an exemplary embodiment. For example, apparatus 1900 may be provided as a server. (Refer to...) Figure 5 The apparatus 1900 includes a processing component 1922, which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by the processing component 1922. The application programs stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component 1922 is configured to execute instructions to perform the echo processing method described above.
[0176] Device 1900 may also include a power supply component 1926 configured to perform power management of device 1900, a wired or wireless network interface 1950 configured to connect device 1900 to a network, and an input / output interface 1958. Device 1900 can operate on an operating system stored in memory 1932.
[0177] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of this disclosure. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.
[0178] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.
Claims
1. An echo processing method, characterized in that, include: If the first correlation is greater than the preset correlation, the leakage coefficient corresponding to the far-field signal is determined; wherein, the first correlation is the correlation between the speaker input signal and the microphone received signal, the speaker input signal is the signal input to the audio playback device, the microphone received signal is the signal collected by the audio acquisition device, and the far-field signal is the signal output by the audio playback device; The gain is obtained based on the leakage coefficient; The error signal is suppressed by the gain to obtain the actual signal acquired by the audio acquisition device. The error signal is the residual echo signal after echo cancellation of the microphone received signal.
2. The method according to claim 1, characterized in that, The first relevance is determined through the following steps: The first power spectrum of the first frequency domain signal is obtained based on the first frequency domain signal corresponding to the loudspeaker input signal; The second power spectrum of the second frequency domain signal is obtained based on the second frequency domain signal corresponding to the microphone received signal; Based on the first frequency domain signal, the second frequency domain signal, and the first smoothing factor, the cross spectrum between the first frequency domain signal and the second frequency domain signal is obtained; The first correlation degree is obtained based on the first power spectrum, the second power spectrum, and the cross spectrum.
3. The method according to claim 2, characterized in that, The step of obtaining the first correlation degree based on the first power spectrum, the second power spectrum, and the cross spectrum includes: The second correlation degree is obtained based on the first power spectrum, the second power spectrum, and the cross spectrum; If the second relevance is greater than the first preset value, the first preset value shall be used as the first relevance. If the second relevance is less than the first preset value, the second relevance is taken as the first relevance.
4. The method according to claim 2, characterized in that, The determination of the leakage coefficient corresponding to the far-field signal includes: Based on the third frequency domain signal corresponding to the error signal, the third power spectrum corresponding to the third frequency domain signal is obtained; The leakage coefficient is obtained based on the second power spectrum, the third power spectrum, and the second smoothing factor.
5. The method according to claim 4, characterized in that, The second smoothing factor is determined through the following steps: Based on the second power spectrum and the third power spectrum, a third smoothing factor is obtained; If the third smoothing factor is less than the second preset value, the second smoothing factor is obtained based on the third smoothing factor and the leakage estimation smoothing factor; If the third smoothing factor is greater than the second preset value, the second smoothing factor is obtained based on the second preset value and the leakage estimation smoothing factor.
6. The method according to claim 1, characterized in that, The step of obtaining the gain based on the leakage coefficient includes: If the leakage coefficient is less than a third preset value, the gain is obtained based on a fourth preset value and the leakage coefficient. If the leakage coefficient is greater than the third preset value, the gain is obtained based on the fourth preset value and the third preset value.
7. The method according to claim 1, characterized in that, The error signal is determined through the following steps: Estimating the room impulse response using a filter; The room impulse response is applied to the speaker input signal to generate an estimated signal; The estimated signal is used to perform echo cancellation on the microphone received signal to obtain the error signal.
8. An echo processing device, characterized in that, include: The leakage coefficient determination module is configured to determine the leakage coefficient corresponding to the far-field signal if the first correlation is greater than the preset correlation; wherein, the first correlation is the correlation between the speaker input signal and the microphone received signal, the speaker input signal is the signal input to the audio playback device, the microphone received signal is the signal acquired by the audio acquisition device, and the far-field signal is the signal output by the audio playback device. A gain determination module is configured to obtain the gain based on the leakage coefficient; The suppression module is configured to suppress the error signal using the gain to obtain the actual output signal of the audio playback device, wherein the error signal is the residual echo signal after echo cancellation of the microphone received signal.
9. A computer-readable storage medium having computer program instructions stored thereon, characterized in that, When executed by a processor, the program instructions implement the steps of the method described in any one of claims 1 to 8.
10. A chip, characterized in that, It includes a processor and an interface; the processor is used to read instructions to execute the method of any one of claims 1 to 8.