An echo cancellation method, electronic device and chip system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By adapting different echo cancellation algorithms to the parameters of voice interaction services in electronic devices, the problem of microphones picking up echoes is solved, the echo cancellation capability and human-computer interaction efficiency are improved, power consumption is reduced, and the user experience is enhanced.

CN119229888BActive Publication Date: 2026-06-30HONOR DEVICE CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HONOR DEVICE CO LTD
Filing Date: 2023-06-30
Publication Date: 2026-06-30

Application Information

Patent Timeline

30 Jun 2023

Application

30 Jun 2026

Publication

CN119229888B

IPC: G10L21/0216; G10L21/0208

AI Tagging

Technology Topics

Testing Methods Speech sound

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Label printing module (MY-IP562)
CN310046407SProcess engineering Testing Methods
Toy blocks (Xingcheng city postcard)
CN310034416STesting Methods Mechanical engineering
Linear transport system with object transfer
CN116568616BTransport systemTransit system
Multi-remote controller
JP1829778SControl engineering Control theory
Toy (Wish Meow - Wish Cat Space Pack)
CN310034330STesting Methods Mechanical engineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

During voice interaction, the microphone picks up the echoes caused by the audio played by the electronic device, which affects the clarity of the user's voice commands and the efficiency of human-computer interaction.

Method used

Based on the parameters of different voice interaction services, corresponding echo cancellation algorithms are used to eliminate the echoes collected by the microphone. This includes adapting different echo cancellation algorithms to improve echo cancellation capabilities when electronic devices start different voice interaction services.

Benefits of technology

It improves echo cancellation capabilities and human-computer interaction efficiency, reduces the power consumption of electronic devices, and prompts users with the current status by displaying a microphone recording icon.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN119229888B_ABST

Patent Text Reader

Abstract

This application provides an echo cancellation method, electronic device, and chip system, relating to the field of signal processing technology, for improving echo cancellation capability and human-computer interaction efficiency; the method includes: in response to the electronic device initiating a first voice interaction service, the electronic device acquires first service parameters and acquires a first echo cancellation algorithm corresponding to the first service parameters; then, the electronic device uses the first echo cancellation algorithm to cancel the echo collected by the electronic device through the microphone.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of signal processing technology, and in particular to an echo cancellation method, electronic device, and chip system. Background Technology

[0002] With the rapid development of voice recognition technology, more and more electronic devices support voice interaction. By using the microphone of the electronic device to collect the voice commands issued by the user, human-computer interaction can be carried out with the electronic device, thereby freeing up hands and improving the efficiency of human-computer interaction.

[0003] In some scenarios, microphones capture not only the user's voice commands but also echoes generated by audio played from electronic devices. Therefore, echo cancellation is necessary. Echo cancellation removes the echoes created by the audio played from the electronic device within the environment, thus preserving only the user's voice commands. Summary of the Invention

[0004] This application provides an echo cancellation method, electronic device, and chip system to improve echo cancellation capabilities and human-computer interaction efficiency.

[0005] The embodiments of this application adopt the following technical solutions:

[0006] In a first aspect, an echo cancellation method is provided, which is applied to an electronic device that supports multiple voice interaction services, including a first voice interaction service; the electronic device includes a microphone and a sound-emitting device; the method includes: in response to the electronic device starting the first voice interaction service, the electronic device obtains parameter information when performing audio functions in the scenario where the electronic device starts the first voice interaction service, that is, the electronic device obtains the first service parameters.

[0007] Then, the electronic device obtains the first echo cancellation algorithm corresponding to the first service parameter based on the first service parameter, and uses the first echo cancellation algorithm to eliminate the echo collected by the electronic device through the microphone.

[0008] It is understandable that echoes are caused by audio signals captured by a microphone and emitted by a sound-producing device.

[0009] In summary, by adopting the solution of this application, in the scenario where an electronic device initiates a first voice interaction service, the electronic device uses a first echo cancellation algorithm corresponding to the first service parameters to eliminate the echo collected by the microphone. That is, the electronic device adapts the corresponding first echo cancellation algorithm to the first service parameters, thereby improving the echo cancellation capability and improving the efficiency of human-computer interaction.

[0010] In one possible implementation of the first aspect, the multiple voice interaction services also include a second voice interaction service; the method further includes: in the scenario where the electronic device switches from the first voice interaction service to the second voice interaction service, in response to the electronic device launching the second voice interaction service, the electronic device obtains parameter information when performing audio functions in the scenario where the electronic device launches the second voice interaction service, that is, the electronic device obtains the second service parameters.

[0011] Then, the electronic device obtains the second echo cancellation algorithm corresponding to the second service parameters based on the second service parameters, and uses the second echo cancellation algorithm to cancel the echo collected by the electronic device through the microphone.

[0012] It should be noted that the first and second echo cancellation algorithms have different capabilities.

[0013] In this implementation, when an electronic device initiates a second voice interaction service, the electronic device uses a second echo cancellation algorithm corresponding to the second service parameters to eliminate the echo collected by the microphone. Furthermore, since the first and second echo cancellation algorithms have different capabilities, this application aims to obtain different service parameters for different voice interaction services and adapt echo cancellation algorithms with different capabilities based on these parameters, thereby further improving echo cancellation capabilities and human-computer interaction efficiency.

[0014] Optionally, the electronic device can store multiple echo cancellation algorithms, each corresponding to a different voice interaction service. Based on this, the electronic device can select a first echo cancellation algorithm corresponding to the first service parameter from the multiple echo cancellation algorithms. Correspondingly, the electronic device can select a second echo cancellation algorithm corresponding to the second service parameter from the multiple echo cancellation algorithms.

[0015] In one possible implementation of the first aspect, the first service parameter includes a service identifier corresponding to the first voice interaction service and a volume type corresponding to the current volume of the electronic device; the electronic device obtains the first service parameter by: in response to the electronic device starting the first voice interaction service, the electronic device determines the service identifier corresponding to the first voice interaction service; and then, the electronic device obtains the volume and distinguishes the volume type corresponding to the volume based on the service identifier.

[0016] In this way, electronic devices can obtain the corresponding first echo cancellation algorithm based on the service identifier, volume and volume type corresponding to the first voice interaction service, and eliminate the echo collected by the microphone, thereby improving the echo cancellation capability and improving the efficiency of human-computer interaction.

[0017] In one possible implementation of the first aspect, the first service parameter further includes the current volume of the electronic device; the electronic device uses a first echo cancellation algorithm to eliminate the echo collected by the microphone of the electronic device, including: if the current volume of the electronic device is greater than or equal to a preset volume, the electronic device uses the first echo cancellation algorithm to eliminate the echo collected by the microphone of the electronic device.

[0018] Thus, when the current volume of the electronic device is greater than or equal to the preset volume, the electronic device can perform echo cancellation, thereby improving the reliability of echo cancellation.

[0019] Optionally, if the volume of the electronic color mark is lower than a preset volume, the electronic device will not perform echo cancellation. This reduces the power consumption of the electronic device.

[0020] In one possible implementation of the first aspect, the method further includes: if the volume changes, the electronic device updates the first service parameter, and based on the updated first service parameter, determines whether to eliminate the echo collected by the electronic device through the microphone.

[0021] In this way, when the volume of the electronic device changes, the electronic device can determine whether to cancel the echo collected by the microphone based on the updated first service parameters, which can further improve the reliability of echo cancellation.

[0022] Optionally, the electronic device can monitor in real time whether the current volume of the electronic device changes, and update the first service parameter in a timely manner if the volume changes.

[0023] In one possible implementation of the first aspect, the electronic device includes a voice application package (APK), in which multiple voice interaction services are integrated; wherein, the voice APK is a voice interaction service that allows users to control the electronic device to perform operations corresponding to the voice command by directly speaking the voice command; the voice command is a command different from the wake-up command, which is used to wake up the application in the electronic device for voice interaction with the user.

[0024] In one possible implementation of the first aspect, the electronic device further includes an audio processor (ADSP); the electronic device acquires the first service parameters by sending the first service parameters to the ADSP via a voice APK.

[0025] The electronic device obtains a first echo cancellation algorithm corresponding to the first service parameters based on the first service parameters, including: the electronic device matches the first echo cancellation algorithm corresponding to the first service parameters through ADSP based on the first service parameters.

[0026] In one possible implementation of the first aspect, the method further includes: the electronic device employing a first echo cancellation algorithm to eliminate ambient noise collected by the electronic device through a microphone.

[0027] In this way, the environmental noise collected by the microphone can be further eliminated through the first echo cancellation algorithm, thereby further improving the echo cancellation capability and improving the efficiency of human-computer interaction.

[0028] In one possible implementation of the first aspect, the method further includes: in response to the electronic device initiating any one of multiple voice interaction services, the electronic device displays a microphone recording icon; wherein the microphone recording icon is used to indicate that the electronic device is calling the microphone to collect the user's voice commands.

[0029] By displaying a microphone icon, the user is notified that the microphone on their electronic device is currently recording, and can directly speak voice commands, thus improving the user experience.

[0030] In one possible implementation of the first aspect, the first voice interaction service is used to implement various operation functions in a third-party application via voice, and the second voice interaction service is used to implement various operation functions in a system application via voice.

[0031] Optionally, third-party applications can be short video applications, long video applications, etc. System applications can be alarm clock applications, phone applications, etc.

[0032] In one possible implementation of the first aspect, in the scenario where the electronic device initiates the first voice interaction service, the first echo cancellation algorithm instruction includes: Normalized Least Mean Square Adaptive Filtering (NLMS) algorithm and Neural Network (NN) algorithm; in the scenario where the electronic device initiates the second voice interaction service, the second echo cancellation algorithm includes at least: NLMS algorithm and Nonlinear Processing (NLP) algorithm.

[0033] In one possible implementation of the first aspect, the electronic device uses a first echo cancellation algorithm to eliminate the echo collected by the electronic device through the microphone, including: in response to the electronic device initiating a first voice interaction service, the electronic device collects first sound information (including a first voice command and a first echo) through the microphone; the electronic device uses the first echo cancellation algorithm to eliminate the first echo included in the first sound information, and retains the first voice command included in the first sound information; based on this, the method further includes: the electronic device performing an operation corresponding to the first voice command.

[0034] In a second aspect, an electronic device is provided, which has the function of implementing the echo cancellation algorithm described in any one of the first aspects above. This function can be implemented by hardware or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described function.

[0035] Thirdly, an electronic device is provided, comprising: a microphone, a sound-generating device, a memory, and one or more processors; the memory is used to store computer execution instructions, and when the terminal is running, the processor executes the computer execution instructions stored in the memory to cause the electronic device to perform the echo cancellation method described in any one of the first aspects.

[0036] Fourthly, an electronic device is provided, comprising: a processor; the processor being coupled to a memory, and after reading instructions from the memory, executing an echo cancellation method as described in any one of the first aspects above according to the instructions.

[0037] Fifthly, a chip system is provided for use in a terminal, the chip system including a processor and an interface for receiving instructions and transmitting them to at least one processor; at least one processor executes the instructions to cause an electronic device to perform the echo cancellation method as described in any one of the first aspects above.

[0038] Optionally, the processor can be an audio processor (ADSP) and / or an application processor (AP).

[0039] In a sixth aspect, a computer-readable storage medium is provided, which stores instructions that, when executed on a computer, enable the computer to perform the echo cancellation method described in any one of the first aspects.

[0040] In a seventh aspect, a computer program product containing instructions is provided, which, when run on a computer, enables the computer to perform the echo cancellation method described in any one of the first aspects.

[0041] The technical effects of any of the design methods in aspects two through seven can be found in the technical effects of different design methods in aspect one, and will not be repeated here. Attached Figure Description

[0042] Figure 1A A schematic diagram illustrating a scenario example of a method for controlling an electronic device via voice, provided in an embodiment of this application;

[0043] Figure 1B A schematic diagram illustrating a scenario example of a method for controlling an electronic device via voice, provided in an embodiment of this application;

[0044] Figure 2 A schematic diagram of an interface for a voice interaction service scenario provided in an embodiment of this application;

[0045] Figure 3 A schematic diagram of a voice interaction service scenario provided in this application embodiment. Figure 2 ;

[0046] Figure 4 A schematic diagram of a voice interaction service scenario provided in this application embodiment. Figure 3 ;

[0047] Figure 5 A schematic diagram of a voice interaction service scenario provided in this application embodiment. Figure 4 ;

[0048] Figure 6 A schematic diagram of a voice interaction service scenario provided in this application embodiment. Figure 5 ;

[0049] Figure 7 A schematic flowchart of an echo cancellation method provided in this application embodiment is shown below;

[0050] Figure 8 A flowchart illustrating an echo cancellation method provided in this application embodiment. Figure 2 ;

[0051] Figure 9 A flowchart illustrating an echo cancellation method provided in this application embodiment. Figure 3 ;

[0052] Figure 10 A schematic diagram of the hardware structure of an electronic device provided in an embodiment of this application;

[0053] Figure 11 A schematic diagram of the software framework of an electronic device provided in an embodiment of this application;

[0054] Figure 12 This is a schematic diagram of a chip system provided in an embodiment of this application. Detailed Implementation

[0055] In the description of the embodiments of this application, unless otherwise stated, " / " means "or". For example, A / B can mean A or B. The "and / or" in this document is merely a description of the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can represent three situations: A exists alone, A and B exist simultaneously, and B exists alone.

[0056] The terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of embodiments of this application, unless otherwise stated, "multiple" means two or more.

[0057] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0058] Voice control function:

[0059] Many terminal devices support voice control. These devices use microphones to capture user voice recordings, analyze and recognize the voice, and execute the corresponding commands, allowing users to control the device via voice.

[0060] Generally, to conserve power and prevent accidental triggering, voice control functionality needs to be enabled before use. For example, a user inputs a preset word (called a wake-up word) into the terminal device to wake it up. Once awakened, the terminal device can execute the user's voice command, thus activating voice control. For instance, users can enable or disable voice control by toggling preset switches on or off the terminal device's user interface.

[0061] Voice control functions may have different names on different terminal devices, such as "voice control," "intelligent voice," "voice assistant," "see and speak," "voice command," "command at will," and "intelligent AI." The specific implementation of voice control functions with different names may also differ.

[0062] The following examples illustrate several different implementations of voice control functionality.

[0063] Voice assistant:

[0064] Before using a voice assistant to control an electronic device, the user needs to input a wake word to wake the device. Typically, before the device is woken up, its microphone operates in a power-saving mode (e.g., searching for signals at low power) to pick up ambient sounds. The microphone's voice pickup is only used for wake word detection at the kernel level; the corresponding recording channel for the voice assistant is not activated in the device's system or drivers.

[0065] Users input a wake-up word into the electronic device via voice. Upon receiving the wake-up word, the electronic device is awakened, and the corresponding recording channel for the voice assistant is activated in the system and drivers. After the electronic device is awakened, the voice (audio stream) captured by the microphone is sent to the voice assistant application for processing through the corresponding recording channel. In this way, the electronic device can execute the user's voice commands, enabling the user to control the electronic device via voice; it can also enable functions such as dialogue with the user.

[0066] For example, Figure 1A This illustration shows a scenario where a user controls their phone using a voice assistant. Figure 1A As shown, the phone displays its home screen, and the user inputs the voice command "Hello YOYO". In response to receiving the wake-up word "Hello YOYO", the phone is woken up. For example, after being woken up, the phone plays the voice command "I'm here" to notify the user that the phone has been activated.

[0067] Once the phone is activated, the user can control it via voice. For example, such as... Figure 1A As shown, the user inputs the voice command "Open video" into the phone. The phone parses and recognizes the user's voice and executes the command corresponding to "Open video." For example, in response to receiving the user's voice command "Open video," the phone launches the video application.

[0068] In some implementations, after the electronic device is woken up, the user can issue a command to the device by inputting voice, and the device will execute the command. Once the electronic device has executed a command, or if it does not receive a command from the user via voice within a certain period after being woken up (e.g., within 8 seconds), the device will no longer respond to user voice commands. For example, the electronic device may close the recording channel corresponding to the voice assistant. The user needs to input a wake-up word again to wake up the device before they can issue commands via voice again. In other words, after being woken up, the electronic device enters a "short-reception" state, and can respond to user voice commands for a short period (e.g., within 8 seconds).

[0069] In some implementations, when the electronic device is connected to the network, it supports entering a continuous dialogue scenario after being woken up, allowing the user to engage in a sustained conversation with the device. The electronic device will continue to pick up audio after each playback, without requiring repeated wake-ups, until the user exits the continuous dialogue using a command such as "exit."

[0070] What is visible can be said:

[0071] Visible and Talkable (or Visible and Talkable) is controlled by a preset switch. Users can turn Visible and Talkable on by turning the preset switch on, and turn it off by turning it off. In one implementation, the electronic device's microphone operates in a power-saving mode (e.g., searching for signals at low power) to pick up ambient sound. When the preset switch for Visible and Talkable is turned on, the corresponding recording channel for Visible and Talkable is activated in the electronic device's system and driver. Thus, the voice (audio stream) picked up by the microphone can be sent to the Visible and Talkable application for processing through the corresponding recording channel, enabling users to control the electronic device via voice.

[0072] As you can see, once activated, the electronic device enters a "long-received" state, allowing users to issue commands to the device via voice at any time without needing to input a wake-up word.

[0073] For example, refer to Figure 1B The phone displays its home screen, and the user inputs the voice command "Open video." The phone parses and recognizes the user's voice and executes the command corresponding to "Open video." For example, in response to receiving the user's voice command "Open video," the phone launches the video application.

[0074] It should be noted that this can be said to be implemented locally by the electronic device, without the need for a network connection.

[0075] In scenarios where users' hands are occupied, voice is the primary and fastest human-computer interaction method. Electronic devices respond to spoken wake-up commands (such as the wake word "Hello, YOYO") to enable interaction between the user and the device. Typically, electronic devices have pre-installed applications that allow voice interaction (such as voice assistants). Users can use voice assistants to interact with electronic devices to achieve functions that previously required multiple manual operations.

[0076] The scenarios where users' hands are occupied mainly include: users cooking in the kitchen and watching videos on electronic devices; users washing up and watching videos or listening to music on electronic devices; users eating on the sofa in the living room and watching videos on electronic devices; or users resting in the bedroom and having electronic devices in the living room ringing alarm clocks.

[0077] In the scenarios described above, users can use the voice assistant to watch videos (such as pause, play, next / previous, increase / decrease volume, etc.) and turn off alarms / reminders later.

[0078] Currently, when using a voice assistant, users typically need to say a wake word to activate it. After activating the assistant, users can continue to speak voice commands to control the electronic device to perform the corresponding operation. For example, in response to the user saying the wake word "Hello, YOYO," the electronic device activates the voice assistant. Then, in response to the user saying "Play next," the electronic device can play the next video.

[0079] In some embodiments, the electronic device automatically shuts down the voice assistant after waking it up once and executing the corresponding voice command. When a user needs to use the voice assistant for voice interaction for an extended period, the electronic device needs to frequently respond to the user's spoken wake word to activate the voice assistant. However, frequent activation of the voice assistant inevitably leads to higher power consumption, and it also cannot meet the diverse needs of users.

[0080] In some embodiments of this application, the electronic device includes a voice application package (APK), which is a voice interaction service that allows users to wake up the device without speaking and control the electronic device to perform the operation corresponding to the voice command after speaking the voice command.

[0081] For example, this voice APK integrates multiple voice interaction services, such as the "Visible and Speakable" service, the "Command-Based" service, and the "Artificial Intelligence (AI)" subtitle service. The "Visible and Speakable" service is used to implement various operational functions in third-party applications (such as long video applications and short video applications) via voice; the "Command-Based" service is used to implement various operational functions in system applications (such as alarm clock applications and phone applications) via voice; and the AI subtitle service is used to recognize the user's voice and automatically generate synchronized subtitles.

[0082] The user interface of multiple voice interaction services integrated into the voice APK in the embodiments of this application is described below with reference to the accompanying drawings. The following embodiments use multiple voice interaction services, including visible and speakable services, on-the-fly command services, and AI subtitle services, as examples for illustration.

[0083] Several voice interaction services have corresponding functional entry points on electronic devices, such as the control center and settings application. Taking a mobile phone as an example, with the settings application as the functional entry point, for instance, in response to a user opening the settings application, the phone displays something like... Figure 2 The interface shown in Figure (1) includes a "Voice Control" setting. In response to the user's operation on the "Voice Control" setting, the phone displays the following... Figure 2The interface shown in (2) includes services such as visible and speakable functions, on-demand command functions, and AI subtitle functions. Of course, this interface may also include other scenario services, such as on-demand translation services and on-demand reading services, which will not be listed here.

[0084] For example, taking the activation of "visible and talkable services" as an example, such as Figure 2 As shown in (2), in response to the user's click on the "Visible and Speakable" option, the phone displays as follows: Figure 2 The prompt page shown in (3) includes prompt information to inform the user how to use the visible and talkable functions. Figure 2 The interface shown in (3) also includes a "voice control" switch. When the "voice control" switch is on, the phone enables the "visible and speakable" functionality; when the "voice control" switch is off, the phone exits the "visible and speakable" functionality. In one example, the phone enables the "visible and speakable" function in response to the user's click on the "voice control" switch.

[0085] In some embodiments, after the mobile phone enables the "Visible and Talkable" feature, the phone displays a notification icon to indicate that the feature is enabled. For example, such as... Figure 2 As shown in (4), the status bar of the mobile phone displays a microphone icon.

[0086] It should be noted that the activation process for other services (such as customizable commands, AI subtitles, etc.) can be referenced from the activation process for "visible and speakable" above. They are similar to those described above and will not be listed here again.

[0087] In addition, when any of the aforementioned voice interaction services is enabled on the phone, the status bar on the phone's display screen will show something like this. Figure 2 The microphone icon shown in (4) indicates that the phone is in the recording mode, which can prevent the user's privacy from being leaked.

[0088] As you can see, once the "Speak Now" feature is enabled, the phone enters continuous recording mode, and its microphone continuously captures ambient sound. Optionally, in one implementation, after any function on the phone activates the recording channel, the phone will send a notification to the user to indicate that the phone is in recording mode. This helps prevent the leakage of user privacy. For example, after the "Speak Now" feature is enabled, the phone activates its recording channel and enters continuous recording mode, as exemplified by... Figure 2 As shown in (4), a prompt icon is displayed on the mobile phone screen. This prompt icon is used to indicate that the microphone is recording sound.

[0089] Taking the activation of the "Visible and Talkable" service on a mobile phone as an example, after the service is activated, the phone enters continuous audio reception mode. Users can input voice commands into the phone, which parses and recognizes the input commands. If the recognized voice command matches any preset command, the phone executes that command.

[0090] For example, the preset instructions corresponding to the visible and audible services include system instructions, short video application instructions, and long video application instructions, and the specific instructions can be shown in Table 1 below.

[0091] Table 1

[0092]

[0093] It should be noted that Table 1 is merely an example of some preset instructions corresponding to the visible and speakable service in the embodiments of this application. Of course, the visible and speakable service can also support other preset instructions, which will not be elaborated here.

[0094] For example, when a mobile phone activates the "Visible and Talkable" service, in a scenario where the user is browsing short videos, such as... Figure 3 As shown in (1), the mobile phone plays short video A and displays a prompt message. The mobile phone can display the prompt message in a bubble format (e.g., "You can control the video with your voice"). Then, in response to the user's voice command (e.g., "Next"), as shown... Figure 3 As shown in (2), the mobile phone switches from playing short video A to playing short video B.

[0095] For example, when a mobile phone activates the visible and speakable service, such as Figure 4 As shown in (1), the phone displays desktop A and a prompt message. The phone can display the prompt message in a bubble format (e.g., "Please try speaking"). Then, in response to the user's voice command (e.g., "Swipe left"), as shown... Figure 4 As shown in (2), the phone displays desktop B.

[0096] Taking the mobile phone's "Command of Choice" service as an example, the "Command of Choice" service supports system application commands, such as alarm clock and phone call commands.

[0097] For example, when the mobile phone activates the "Fulfilled Commands" service, in a scenario where the phone alarm rings, such as... Figure 5 As shown in Figure (1), the mobile phone displays an alarm clock notification box as a floating window. This notification box includes: the current time (e.g., 8:00 AM), a close control, and a "Remind Later" control. In some embodiments, the mobile phone also displays a notification message (e.g., "Try saying 'Remind Later'") via a bubble. Then, in response to the user's voice command (e.g., "Remind Later"), as shown... Figure 5As shown in (2), the phone exits the alarm clock notification box and displays the phone's desktop.

[0098] Taking the AI captioning service launched on a mobile phone as an example, the AI captioning service supports recognizing the user's speech and automatically generating synchronized captions. For example, when the mobile phone launches the AI captioning service, in a scenario where the user is making a video call, such as... Figure 6 As shown, the phone automatically generates synchronized subtitles based on the user's voice during a video call and displays them on the video call interface.

[0099] For example, still Figure 6 As shown, the phone automatically generates synchronized subtitles based on the user's (Bob's) voice (such as "Jack, let's go out this weekend") and displays them on the video call interface; correspondingly, the phone also automatically generates synchronized subtitles based on the user's (Jack's) voice (such as "Sure") and displays them on the video call interface.

[0100] It should be noted that the visible and speakable services, the command-based services, and the AI subtitle services include, but are not limited to, the scenarios listed in the above embodiments, and may also include other scenarios. This application embodiment does not limit these scenarios.

[0101] In summary, in this embodiment of the application, by integrating multiple voice interaction services into the same voice APK, users can speak voice commands directly without needing to say a wake word. The voice APK then controls the electronic device to perform the operation corresponding to the voice command. This reduces power consumption and improves the human-computer interaction experience in scenarios where the user's hands are occupied and voice interaction needs to be performed for a long time.

[0102] Understandably, in the above scenario, when an electronic device initiates any voice interaction service, it uses the device's microphone to capture the user's voice commands. Simultaneously, the microphone also captures the echoes generated by the audio output from the electronic device's sound-producing components (such as speakers).

[0103] For example, in the "visible and speakable" business scenario, when a user is browsing short videos, the electronic device captures the user's voice commands through its microphone, and also captures the echo caused by the video playing through its speaker. In the "command-based" business scenario, when an electronic device's alarm clock rings, it captures the user's voice commands through its microphone, and also captures the echo caused by the alarm clock playing through its speaker. In the "AI captioning" business scenario, when an electronic device is playing audio, it captures the user's voice commands through its microphone, and also captures the echo caused by the audio playing through its speaker.

[0104] To improve the human-computer interaction experience, it is necessary to perform echo cancellation on the sound captured by the microphone of electronic devices (including the user's voice and the sound played by the speaker, such as audio (long video sound or short video sound), alarm clock sound, incoming call sound, etc.).

[0105] This application provides an echo cancellation method that can employ different echo cancellation algorithms for different voice interaction services, thereby improving echo cancellation capabilities and human-computer interaction efficiency. Specifically, in the scenario where an electronic device initiates a first voice interaction service, a first echo cancellation algorithm is used to eliminate the echo caused by the audio played by the speaker and captured by the microphone; in the scenario where an electronic device initiates a second voice interaction service, a second echo cancellation algorithm is used to eliminate the echo caused by the audio played by the speaker and captured by the microphone.

[0106] The first voice interaction service and the second voice interaction service can be any one of the visible and speakable service, the command service, and the AI subtitle service in the above embodiments, and the first voice interaction service and the second voice interaction service are different; the first echo algorithm and the second echo algorithm have different echo cancellation capabilities.

[0107] The specific process of the echo cancellation method provided in the embodiments of this application will be described below with reference to the accompanying drawings.

[0108] For example, such as Figure 7 As shown, the electronic device includes a voice APK, an audio digital signal processor (ADSP), and a microphone. The voice APK includes a first voice interaction service and a second voice interaction service; the ADSP includes a first echo cancellation algorithm and a second echo cancellation algorithm; the microphone is used to collect user voice commands and echoes caused by audio played from the speaker (hereinafter referred to as echoes).

[0109] Still Figure 7 As shown, in response to a user's operation on the first voice interaction service, the voice APK initiates the first voice interaction service and informs the ADSP that the first voice interaction service has been initiated. Accordingly, after the voice APK initiates the first voice interaction service, the microphone captures the first voice command and the first echo, and transmits the captured first voice command and the first echo to the ADSP. Then, the ADSP uses the first echo cancellation algorithm corresponding to the first voice interaction service to eliminate the first echo, retain the first voice command, and report the first voice command to the voice APK. Upon receiving the first voice command, the voice APK distributes the first voice command to the corresponding application (APPS), so that the application performs the corresponding operation based on the first voice command.

[0110] In some embodiments, after the voice APK initiates the first voice interaction service, it sends a first instruction to the ADSP to indicate that the electronic device has currently initiated the first voice interaction service. In other embodiments, after the voice APK initiates the first voice interaction service, it sends a first service parameter corresponding to the first voice interaction service to the ADSP to indicate that the electronic device has currently initiated the first voice interaction service. The first service parameter indicates the parameter information used by the electronic device when performing audio functions in the scenario where the electronic device initiates the first voice interaction service.

[0111] Of course, the voice APK can also inform ADSP that the first voice interaction service has been launched in other ways, which will not be listed here.

[0112] It should be noted that the voice APK can send the first voice command to the corresponding application through the application framework layer of the electronic device. These applications can include third-party applications, system applications, and desktop applications. The specific implementation process can be found in the description of the following embodiments, and will not be detailed here.

[0113] Accordingly, in response to the user's operation on the second voice interaction service, the voice APK initiates the second voice interaction service and informs the ADSP that the second voice interaction service has been activated. After the voice APK initiates the second voice interaction service, the microphone captures the second voice command and the second echo, and transmits the captured second voice command and second echo to the ADSP. Then, the ADSP uses the second echo cancellation algorithm corresponding to the second voice interaction service to eliminate the second echo, retain the second voice command, and report the second voice command to the voice APK. Upon receiving the second voice command, the voice APK distributes the second voice command to the corresponding application, enabling the application to perform the corresponding operation based on the second voice command.

[0114] For example, after the voice APK initiates the second voice interaction service, it sends second service parameters corresponding to the second voice interaction service to the ADSP to indicate that the electronic device has currently initiated the second voice interaction service. These second service parameters indicate the parameter information used by the electronic device when performing audio functions in the scenario where the second voice interaction service is initiated.

[0115] It should be noted that, for an example of an electronic device initiating a second voice interaction service for echo cancellation, please refer to the above embodiments, which will not be repeated here.

[0116] In summary, in the embodiments of this application, after the electronic device starts the first voice interaction service, the electronic device can use the first echo cancellation algorithm corresponding to the first voice interaction service to cancel the echo; after the electronic device starts the second voice interaction service, it can use the second echo cancellation algorithm corresponding to the second voice interaction service to cancel the echo. Since the first echo cancellation algorithm and the second echo cancellation algorithm have different capabilities, using different echo cancellation algorithms for different voice interaction services can improve the echo cancellation capability and the efficiency of human-computer interaction.

[0117] After launching the voice interaction service with the voice APK, the system sends service parameters (such as the first or second service parameters mentioned above) corresponding to the voice interaction service to the ADSP. This example exemplifies the specific process of the echo cancellation method provided in this application embodiment. The service parameters include one or more of the following: service identifier corresponding to the voice interaction service, volume, volume type, and device type.

[0118] For example, in combination Figure 7 ,like Figure 8 As shown, the voice APK includes a service processing module, a service presentation module, a data acquisition module, and an echo cancellation management module. The ADSP includes an audio conversion module and an echo cancellation algorithm module. The electronic device also includes a system APK, which includes a volume module.

[0119] like Figure 8 As shown, the business processing module is connected to the business presentation module and the data acquisition module, respectively; the data acquisition module is connected to the echo cancellation management module and the volume module, respectively; the echo cancellation algorithm module is connected to the echo cancellation management module and the audio conversion module, respectively; and the audio conversion module is connected to the microphone.

[0120] The service processing module is used to obtain the service identifier corresponding to the voice interaction service. For example, in response to a user's initiation of a voice interaction service, the service processing module distinguishes the service identifier corresponding to the voice interaction service using the acoustic echo cancellation (AEC) mode. For instance, when the electronic device initiates the visible and speakable service, the service processing module distinguishes the visible and speakable service as having an AEC_mode of 1; when the electronic device initiates the command-based service, the service processing module distinguishes the command-based service as having an AEC_mode of 2; and when the electronic device initiates the AI caption service, the service processing module distinguishes the AI caption service as having an AEC_mode of 3.

[0121] It should be noted that the service identifiers shown in the above embodiments are merely illustrative examples of this application and do not constitute a limitation of this application. Of course, service identifiers can also be represented in other ways (such as A, B, C), which will not be listed here.

[0122] The service presentation module is used to display a microphone icon. For example, in response to a user initiating a voice interaction service, the service processing module notifies the service presentation module to display a microphone icon in the status bar of the electronic device.

[0123] The data acquisition module obtains the service identifier from the service processing module and the current volume of the electronic device from the volume module. The data acquisition module distinguishes the volume type based on the service identifier. For example, when the service identifier is 1, the volume type is media volume; when the service identifier is 2, the volume type is alarm volume; and when the service identifier is 3, the volume type is ring volume. The data acquisition module is also used to obtain the device type of the electronic device (e.g., iPad, phone).

[0124] Then, the data acquisition module sends the service parameters (such as service identifier, volume, volume type, and device type) to the echo cancellation management module. The echo cancellation management module manages the service parameters corresponding to the voice interaction service and sends the service parameters to the echo cancellation algorithm module.

[0125] The audio conversion module is used to acquire the user's voice commands and echoes from the microphone, and convert the acquired voice commands and echoes into electrical signals (such as converting them into voice signals and echo signals). Then, the audio conversion module sends the voice signals and echo signals to the echo cancellation algorithm module.

[0126] The echo cancellation algorithm module matches the echo cancellation algorithm corresponding to the service parameters, eliminates the echo signal, and preserves the voice signal.

[0127] In summary, by adopting the solution of this application embodiment, after the voice interaction service is started, the electronic device can obtain the service parameters corresponding to the voice interaction service, and then match the echo cancellation algorithm corresponding to the service parameters to eliminate the echo signal and retain the voice signal, which can improve the echo cancellation capability and the human-computer interaction efficiency.

[0128] In some embodiments, after obtaining the service parameters, the echo cancellation algorithm module determines whether echo cancellation is required based on the volume included in the service parameters. For example, if the volume is greater than or equal to a preset volume threshold, the echo cancellation algorithm module determines that echo cancellation is required; if the volume is less than the preset volume threshold, the echo cancellation algorithm module determines that echo cancellation is not required.

[0129] Thus, when the volume is greater than or equal to the preset volume, the echo cancellation algorithm module performs echo cancellation. When the volume is less than the preset volume, it means that the audio played by the electronic device's speaker is too low, so the echo cancellation algorithm module does not perform echo cancellation, thereby reducing power consumption.

[0130] In some embodiments, the data acquisition module registers a callback with the volume module to retrieve the volume status of the electronic device. For example, after the volume module detects a change in the volume of the electronic device, it sends the changed volume back to the data acquisition module. Based on the changed volume, the data acquisition module determines whether echo cancellation is needed. If so, the data acquisition module re-acquires the service parameters and sends the re-acquired service parameters to the echo cancellation management module.

[0131] In this way, the echo cancellation management module can match the echo cancellation algorithm corresponding to the business parameters reissued by the data acquisition module to eliminate the echo, which helps to improve the reliability of echo cancellation.

[0132] For example, in combination Figure 8 ,like Figure 9 As shown, the voice APK also includes a data processing module; the ADSP also includes a playback module and shared memory. The electronic device also includes a speaker. Please refer to [link / reference]. Figure 9 The data processing module is connected to the shared memory; the playback module is connected to the audio conversion module and the speaker respectively.

[0133] The playback module outputs audio electrical signals to the speaker, which then converts these signals into audio and plays them back. It's understandable that the audio played by the speaker, once captured by the microphone, will cause an echo. Therefore, while the microphone is capturing the user's voice commands, it will also capture the echoes generated by the audio played by the speaker.

[0134] Shared memory is used to store the echo cancellation algorithm module's echo-removed speech signal and then sends the speech signal to the data processing module. The data processing module converts the speech signal into corresponding voice commands and sends the voice commands to the corresponding applications.

[0135] In some embodiments, such as Figure 9As shown, the echo cancellation algorithm module includes multiple echo cancellation algorithms, each corresponding to a specific voice interaction service. For example, echo cancellation algorithm 1 corresponds to the "visible and speakable" service, and echo cancellation algorithm 2 corresponds to the "instant command" service.

[0136] For example, the echo cancellation algorithm module has a pre-defined correspondence between the echo cancellation algorithm and service parameters. This correspondence can be stored in the echo cancellation algorithm module in the form of a table or an array, as shown in Table 2. Table 2 illustrates this using service parameters including service identifier, volume type, and device type as an example.

[0137] Table 2

[0138] Service parameters (service identifier, volume type, device type) Echo cancellation algorithm (1. media, phone) Echo cancellation algorithm 1 (2. alarm, phone) Echo cancellation algorithm 2 …… ……

[0139] It should be noted that Table 2 is merely an example of the correspondence between echo cancellation algorithms and service parameters in this application embodiment. Of course, the echo cancellation algorithm module may also include the correspondence between other service parameters and echo cancellation algorithms, which will not be listed here.

[0140] For example, as Figure 9 As shown, in response to a user initiating a voice interaction service, the electronic device uses its microphone to capture the user's voice commands. When the electronic device needs to play audio, the playback module outputs an audio electrical signal to the speaker, which then converts the signal into audio and plays it. Thus, the microphone captures both the user's commands and the echo generated by the audio played by the speaker; in other words, the microphone can capture both voice commands and echoes.

[0141] The microphone then transmits the voice command and echo to the audio conversion module, which converts the voice command into a voice signal and the echo into an echo signal. Correspondingly, the playback module also transmits the audio electrical signal as a reference signal to the audio conversion module. Further, the audio conversion module sends the voice signal, echo signal, and reference signal to the echo cancellation algorithm module.

[0142] Correspondingly, in response to a user's initiation of a voice interaction service, the service processing module obtains the service identifier corresponding to the voice interaction service and sends the service identifier to the data acquisition module. Then, the data acquisition module obtains the current volume of the electronic device and distinguishes the corresponding volume type based on the service identifier; the data acquisition module is also used to obtain the device type of the electronic device. Further, the data acquisition module sends the service identifier, volume, volume type, and device type as service parameters to the echo cancellation management module. The echo cancellation management module then sends the service parameters to the echo cancellation algorithm module.

[0143] In some embodiments, as shown in Table 2 above, if the service parameters include 1, media, and phone, the echo cancellation algorithm module selects echo cancellation algorithm 1 to cancel the echo signal.

[0144] For example, echo cancellation algorithm 1 includes: front-end gain control, normalized least square adaptive filtering (NLMS), neural network (NN), dereverberation, and automatic gain control (AGC).

[0145] Among them, front-end gain control is used to increase the signal gain, so as to reduce the difficulty of subsequent signal processing.

[0146] NLMS is used for echo cancellation. For example, NLMS uses an adaptive filtering algorithm to adjust the filter weight vector, estimating an approximate echo path to approximate the actual echo path, thus obtaining the estimated echo signal. Echo cancellation is achieved by subtracting the estimated echo signal from the mixed signal (speech signal + echo signal) acquired by a microphone. For instance, the reference signal data is divided into multiple blocks of 10ms each, and multiple NLMS filters are used to learn the echo of each block and then superimposed to obtain the filter weight vector.

[0147] Even after echo cancellation using NLMS, residual echo signals (i.e., residual signals) will still exist. For example, the residual signal satisfies the following expression: e(n) = d(n) - w(n) T*x(n) e(n) represents the residual signal, d(n) represents the speech signal and echo signal output by the audio conversion module, w(n) represents the reference signal, and T represents the weight vector.

[0148] The neural network (NN) is used to suppress the residual signal from the NLMS output to enhance the echo cancellation effect. For example, the residual signal from the NLMS output, along with the speech signal and echo signal from the audio conversion module, are used as inputs to the NN. The neural network model is trained to learn the nonlinear characteristics of the echo signal, thereby suppressing the nonlinear echo signal.

[0149] For example, the aforementioned neural network model can be a convolutional neural network (CNN), a deep neural network (DNN), a gated recurrent unit (GRU), or a combination of CNN and GRU. Of course, other neural network models are also possible, which will not be elaborated here. Any model that can achieve the effect of suppressing echo signals should fall within the protection scope of this application's embodiments.

[0150] De-reverbling is used to suppress reverberation from directions other than the direction of incidence of the voice command onto the microphone. Since reverberation is caused by multiple reflections of the sound source, it can be assumed that the reverberation of the current frame is generated by convolution of the speech from several previous frames through the room impulse response. Therefore, an adaptive filtering algorithm can be used for mono-channel de-reverbling.

[0151] Automatic gain control (AGC) is used to control the volume gain of a speech signal. For example, an AGC algorithm can be used to adjust the level of the speech signal to a pre-set target level.

[0152] In other embodiments, as shown in Table 2 above, if the service parameters include 2, alarm, and phone, the echo cancellation algorithm module selects echo cancellation algorithm 2 to cancel the echo signal. For example, echo cancellation algorithm 2 includes: front-end gain control, NLMS, non-linear processing (NLP), dereverberation, and AGC.

[0153] Understandably, unlike Echo Cancellation Algorithm 1, Echo Cancellation Algorithm 2 uses NLP to suppress the residual signal output by NLMS after performing echo cancellation, in order to enhance the effect of echo cancellation.

[0154] For example, NLP is used to construct a gain expression to estimate the residual signal based on the similarity (cohxd) between the speech signal and the echo signal output by the audio conversion module, and the similarity (cohxe) between the residual signal and the reference signal.

[0155] For example, the residual signal satisfies the following expression: hnl=alpha*cohxd+(1-alpha)*cohxe; hnl represents the residual signal, alpha represents the smoothing weighting coefficient, cohxd represents the similarity between the speech signal and the echo signal, and cohxe represents the similarity between the residual signal and the reference signal.

[0156] The gain satisfies the following expression: gain = max(min_gain, 1.0 - hnl); gain represents the gain, and hnl represents the residual signal.

[0157] It should be noted that the examples of front-end gain control, NLMS, dereverberation, and AGC included in Echo Cancellation Algorithm 2 can be found in the above embodiments, and will not be repeated here.

[0158] Still Figure 9 As shown, after the echo cancellation algorithm module eliminates the echo signal, it retains the speech signal and stores it in shared memory. Then, the data processing module reads the speech signal from the shared memory, converts it into voice commands, and sends them to the corresponding application so that the application can perform the appropriate operation based on the voice commands.

[0159] For example, the data processing module includes a process for converting speech signals into speech commands, comprising: speech recognition, intent understanding, and dialogue management. Speech recognition is used to convert speech signals into text information; intent understanding is used to convert the text information into semantics that the electronic device can understand, and to understand the intent of the text information based on the semantics; dialogue management is used to generate the final voice command to be executed by the electronic device based on the understood intent of the text information.

[0160] For example, speech recognition can be achieved through the automatic speech recognition (ASR) function in electronic devices; intent understanding can be achieved through the natural language understanding (NLU) function in electronic devices; and dialogue management can be achieved through the dialogue management (DM) function in electronic devices.

[0161] In summary, when different echo cancellation algorithms are used in the echo cancellation module, the final echo cancellation capabilities vary because different algorithms use different methods to suppress echoes. Therefore, using different echo cancellation algorithms for different voice interaction services can improve echo cancellation capabilities and human-computer interaction efficiency.

[0162] In some embodiments of this application, the echo cancellation algorithm can also eliminate ambient noise collected by the microphone. The specific implementation process for eliminating ambient noise can be found in related technologies, and will not be elaborated upon here.

[0163] It should be noted that the contents described in the various embodiments of this application can explain and illustrate the technical solutions in other embodiments of this application. The technical features described in the various embodiments can also be applied in other embodiments and combined with the technical features in other embodiments to form new solutions. This application only provides an exemplary list of several embodiments for illustration and does not mean that this application is limited thereto.

[0164] This application provides an electronic device that may include a memory and one or more processors. The memory stores computer program code, which includes computer instructions. When the computer instructions are executed by the processor, the electronic device performs various functions or steps as described above. The structure of this electronic device can be referenced to the structure of the electronic device 100 described below.

[0165] like Figure 10 The diagram shown is a structural schematic of an electronic device 100 provided in an embodiment of this application. The electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, antenna 1, antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone jack 170D, a sensor module 180, a positioning module 181, buttons 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc.

[0166] It is understood that the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device 100. In other embodiments, the electronic device 100 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

[0167] Processor 110 may include one or more processing units, such as: application processor (AP), modem, graphics processing unit (GPU), image signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), ADSP, baseband processor, and / or neural network processing unit (NPU), etc. Different processing units may be independent devices or integrated into one or more processors.

[0168] In this embodiment, the AP determines the service parameters corresponding to the voice interaction service and sends the corresponding service parameters to the ADSP. After receiving the service parameters sent by the AP, the ADSP performs echo cancellation based on the matching echo cancellation algorithm.

[0169] The controller can be the nerve center and command center of the electronic device 100. The controller can generate operation control signals according to the instruction opcode and timing signals to complete the control of fetching and executing instructions.

[0170] The processor 110 may also include a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. This memory can store instructions or data that the processor 110 has just used or that are used repeatedly. If the processor 110 needs to use the instruction or data again, it can retrieve it directly from the memory. This avoids repeated accesses, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.

[0171] In some embodiments, the processor 110 may include one or more interfaces. It is understood that the interface connection relationships between the modules illustrated in this embodiment are merely illustrative and do not constitute a structural limitation of the electronic device 100. In other embodiments, the electronic device 100 may also employ different interface connection methods or combinations of multiple interface connection methods as described in the above embodiments.

[0172] The charging management module 140 receives charging input from a charger, which can be a wireless charger or a wired charger. While charging the battery 142, the charging management module 140 can also supply power to the electronic device via the power management module 141.

[0173] The power management module 141 is used to connect the battery 142, the charging management module 140, and the processor 110. The power management module 141 receives input from the battery 142 and / or the charging management module 140 to power the processor 110, internal memory 121, external memory, display 194, camera 193, and wireless communication module 160, etc.

[0174] The wireless communication function of electronic device 100 can be realized through antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, modem processor and baseband processor, etc.

[0175] Electronic device 100 implements display functions through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and for graphics rendering. Processor 110 may include one or more GPUs, which execute program instructions to generate or modify display information.

[0176] Display screen 194 is used to display images, videos, etc. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini-LED, a Micro-OLED, a quantum dot light-emitting diode (QLED), etc.

[0177] Electronic device 100 can implement audio functions such as music playback and recording through audio module 170 (such as the audio conversion module mentioned above), speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, and application processor.

[0178] Audio module 170 is used to convert digital audio information into analog audio signal output, and also to convert analog audio input into digital audio signal. Audio module 170 can also be used for audio signal encoding and decoding. Speaker 170A, also called a "loudspeaker," is used to convert audio electrical signals into sound signals. Receiver 170B, also called a "handset," is used to convert audio electrical signals into sound signals. Microphone 170C, also called a "microphone" or "microphone unit," is used to convert sound signals into electrical signals. Headphone jack 170D is used to connect wired headphones.

[0179] In this embodiment of the application, when the electronic device starts the voice interaction service, the microphone 170C is used to collect the user's voice commands; at the same time as the microphone 170C collects the user's voice commands, it also collects the echo caused by the speaker 170A playing audio.

[0180] The external storage interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device. The external memory card communicates with the processor 110 through the external storage interface 120 to perform data storage functions. For example, audio and video files can be stored on the external memory card.

[0181] Internal memory 121 can be used to store computer executable program code, which includes instructions. Processor 110 executes various functional applications and data processing of the electronic device by running the instructions stored in internal memory 121. For example, in this embodiment, processor 110 can execute instructions stored in internal memory 121, which may include a program storage area and a data storage area.

[0182] The program storage area can store the operating system, at least one application program required for a function (such as sound playback, image playback, etc.). The data storage area can store data created during the use of the electronic device (such as audio data, phonebook, etc.). Furthermore, the internal memory 121 can include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.

[0183] For example, processor 110 can execute computer-executable program code stored in internal memory 121, the executable program code including instructions. By executing these instructions, processor 110 performs various functional applications and data processing of the electronic device. For example, in this embodiment, the AP in processor 110 executes corresponding steps by running instructions stored in internal memory and sends messages to the network side via modem.

[0184] Buttons 190 include a power button, volume buttons, etc. Motor 191 can be used for call vibration alerts or for touch vibration feedback. Indicator 192 can be an indicator light, used to indicate charging status, battery level changes, or to indicate messages, missed calls, notifications, etc.

[0185] The SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the electronic device. The electronic device can support one or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple cards can be inserted into the same SIM card interface 195 simultaneously. The multiple cards can be of the same or different types. The SIM card interface 195 is also compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as calls and data communication. In some embodiments, the electronic device 100 uses an eSIM card, i.e., an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

[0186] In addition to the aforementioned components, the electronic device 100 also runs an operating system, such as iOS or Android. Applications can be installed and run on this operating system.

[0187] The software system of electronic device 100 can adopt a layered architecture, event-driven architecture, microkernel architecture, microservice architecture, or cloud architecture, etc. This application embodiment uses the layered architecture Android system as an example to exemplify the software structure of electronic device 100.

[0188] Figure 11 This is a software structure block diagram of an electronic device according to an embodiment of this application.

[0189] A layered architecture divides software into several layers, each with a clear role and function. Layers communicate with each other through software interfaces. In some embodiments, the Android system is divided into five layers, from top to bottom: the application layer, the application framework layer, the Android runtime and system libraries, the hardware abstraction layer (HAL), and the kernel layer.

[0190] The application layer can include a series of application packages (APKs). For example, the application layer may include a voice APK. The voice APK includes a UI layer and a logic layer; the UI layer includes a business presentation module. The business presentation module is used to display a microphone icon when the voice APK on the electronic device initiates a voice interaction service.

[0191] The logic layer includes a service processing module, a data acquisition module, an echo cancellation management module, and a data processing module. The service processing module determines the service identifier corresponding to the voice interaction service and sends this identifier to the data acquisition module in the logic layer. The data acquisition module acquires the current volume of the electronic device and distinguishes the corresponding volume type based on the service identifier. The data acquisition module also acquires the device type of the electronic device.

[0192] The echo cancellation management module determines service parameters based on the service identifier, volume, volume type, and device type, and then sends these parameters to the ADSP via the capability connection interface. The capability connection interface is the connection interface between the software layer and the hardware of the electronic device.

[0193] The data processing module converts the echo-free speech signal into voice commands and calls a software development kit (SDK) to send the voice commands to the application framework layer (such as a multimodal interaction framework). For example, ... Figure 11 As shown, the process by which the data processing module converts voice signals into voice commands includes: voice recognition, intent understanding, and dialogue management.

[0194] The application framework layer provides application programming interfaces (APIs) and a programming framework for applications in the application layer. The application framework layer includes a set of predefined functions.

[0195] like Figure 11 As shown, the application framework layer can include a multimodal interaction framework and a page-aware framework. The page-aware framework determines the currently running application on the electronic device based on the currently displayed page. The page-aware framework sends the application identifier (such as application package name, application name, etc.) of the currently running application to the multimodal interaction framework, which then calls the SDK to send voice commands to the corresponding application.

[0196] The Android runtime consists of core libraries and a virtual machine. The Android runtime is responsible for scheduling and managing the Android system.

[0197] The core library consists of two parts: one part is the functionalities that Java needs to call, and the other part is the Android core library.

[0198] The application layer and application framework layer run in a virtual machine. The virtual machine executes the Java files of the application layer and application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, security and exception management, and garbage collection.

[0199] System libraries can include multiple functional modules. For example: surface manager, media libraries, 3D graphics processing libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), etc.

[0200] HAL can include multiple library modules, each of which implements a set of interfaces for a specific type of hardware component. For example, HAL includes display HAL, camera HAL, audio HAL, and sensor HAL. The kernel layer is the layer between hardware and software, and includes display drivers, camera drivers, audio drivers, sensor drivers, etc.

[0201] In some embodiments of this application, such as Figure 11 As shown, the electronic device also includes an ADSP, which comprises an audio conversion module, an echo cancellation algorithm module, and shared memory. The audio conversion module acquires voice commands and echoes captured by the microphone, converting the voice commands into voice signals and the echoes into echo signals. The echo cancellation algorithm module matches the corresponding echo cancellation algorithm based on service parameters, eliminating the echo signals and preserving the voice signals. Shared memory is used to store the voice signals.

[0202] For example, the ADSP stores the echo-free speech signal in shared memory, and the data processing module in the speech APK reads the speech signal from the shared memory and converts the speech signal into speech commands.

[0203] Then, the data processing module calls the SDK to send the voice command to the multimodal interaction framework. The multimodal interaction framework obtains the application name (such as the package name) of the application currently running on the electronic device and sends the voice command to the corresponding application so that the application can perform the corresponding operation based on the voice command.

[0204] This application also provides a chip system, such as... Figure 12 As shown, the chip system 1100 includes at least one processor 1101 and at least one interface circuit 1102. The processor 1101 can be one of the types described in the above embodiments. Figure 10The processor 110 is shown. The interface circuit 1102 can be, for example, an interface circuit between the processor 110 and external memory; or an interface circuit between the processor and internal memory 121.

[0205] The processor 1101 and interface circuit 1102 described above can be interconnected via lines. For example, interface circuit 1102 can be used to receive signals from other devices (e.g., the memory of electronic device 100). As another example, interface circuit 1102 can be used to send signals to other devices (e.g., processor 1101). Exemplarily, interface circuit 1102 can read instructions stored in memory and send those instructions to processor 1101. When the instructions are executed by processor 1101, the electronic device can perform the various steps performed by the electronic device in the above embodiments. Of course, the chip system may also include other discrete devices, and this application embodiment does not specifically limit this.

[0206] This application also provides a computer-readable storage medium including computer instructions that, when executed on an electronic device, cause the electronic device to perform various functions or steps performed by the electronic device in the above method embodiments.

[0207] This application also provides a computer program product that, when run on a computer, causes the computer to perform various functions or steps performed by the electronic device in the above method embodiments.

[0208] Through the above description of the embodiments, those skilled in the art can clearly understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.

[0209] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0210] The units described as separate components may or may not be physically separate. A component shown as a unit can be one or more physical units; that is, it can be located in one place or distributed in multiple different locations. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0211] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0212] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of this application, essentially or in other words, the parts that contribute to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product. This software product is stored in a storage medium and includes several instructions to cause a device (which may be a microcontroller, chip, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0213] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. An echo cancellation method, characterized in that, The method is applied in an electronic device, wherein the electronic device is equipped with a voice APK that supports wake-up-free operation, and the voice APK integrates multiple voice interaction services; the multiple voice interaction services include a first voice interaction service; the electronic device includes a microphone and a sound-emitting device; the method includes: In response to the electronic device initiating the first voice interaction service, the electronic device obtains first service parameters; wherein, the first service parameters include the service identifier corresponding to the first voice interaction service and the volume type corresponding to the current volume of the electronic device; The electronic device obtains a first echo cancellation algorithm corresponding to the first service parameters based on the first service parameters; The electronic device uses the first echo cancellation algorithm to eliminate the echo collected by the microphone; the echo is caused by the audio collected by the microphone and emitted by the sound-emitting device.

2. The method according to claim 1, characterized in that, The plurality of voice interaction services also includes a second voice interaction service; the method further includes: In the scenario where the electronic device switches from the first voice interaction service to the second voice interaction service, in response to the electronic device activating the second voice interaction service, the electronic device acquires second service parameters; the second service parameters are used to indicate parameter information when the electronic device performs audio functions in the scenario where the electronic device activates the second voice interaction service. The electronic device obtains a second echo cancellation algorithm corresponding to the second service parameters based on the second service parameters; The electronic device uses the second echo cancellation algorithm to eliminate the echo collected by the microphone. The first echo cancellation algorithm and the second echo cancellation algorithm have different abilities to eliminate the echo.

3. The method according to claim 1 or 2, characterized in that, The electronic device acquires the first service parameters, including: In response to the electronic device initiating the first voice interaction service, the electronic device determines the service identifier corresponding to the first voice interaction service; The electronic device obtains the current volume of the electronic device and distinguishes the volume type corresponding to the volume based on the service identifier.

4. The method according to claim 3, characterized in that, The first service parameter also includes the current volume of the electronic device; the electronic device uses a first echo cancellation algorithm to cancel the echo collected by the microphone, including: If the current volume of the electronic device is greater than or equal to a preset volume, the electronic device uses a first echo cancellation algorithm to cancel the echo collected by the microphone.

5. The method according to claim 4, characterized in that, The method further includes: If the volume changes, the electronic device updates the first service parameter and, based on the updated first service parameter, determines whether to eliminate the echo collected by the electronic device through the microphone.

6. The method according to claim 1, characterized in that, The electronic device further includes an audio processor (ADSP); the electronic device acquires first service parameters, including: The electronic device sends the first service parameters to the ADSP via the voice APK; The electronic device obtains a first echo cancellation algorithm corresponding to the first service parameters based on the first service parameters, including: The electronic device uses the ADSP to match the first echo cancellation algorithm corresponding to the first service parameters.

7. The method according to any one of claims 1, 2, 4-6, characterized in that, The method further includes: The electronic device uses the first echo cancellation algorithm to eliminate the environmental noise collected by the microphone.

8. The method according to any one of claims 1, 2, 4-6, characterized in that, The method further includes: In response to the electronic device activating any one of the plurality of voice interaction services, the electronic device displays a microphone recording icon; The microphone icon is used to indicate that the electronic device is using the microphone to collect the user's voice commands.

9. The method according to any one of claims 2, 4-6, characterized in that, The first voice interaction service is used to implement various operation functions in third-party applications via voice, and the second voice interaction service is used to implement various operation functions in system applications via voice.

10. The method according to claim 9, characterized in that, In the scenario where the electronic device initiates the first voice interaction service, the first echo cancellation algorithm includes at least: Normalized Least Mean Square Adaptive Filtering (NLMS) algorithm and Neural Network (NN) algorithm; In the scenario where the electronic device initiates the second voice interaction service, the second echo cancellation algorithm includes at least the NLMS algorithm and the nonlinear processing NLP algorithm.

11. The method according to any one of claims 1, 2, 4-6, and 10, characterized in that, The electronic device employs a first echo cancellation algorithm to eliminate the echo collected by the microphone, including: In response to the electronic device initiating the first voice interaction service, the electronic device acquires first sound information through the microphone; wherein, the first sound information includes a first voice command and a first echo; The electronic device uses the first echo cancellation algorithm to eliminate the first echo included in the first sound information, while retaining the first voice command included in the first sound information; The method further includes: The electronic device performs the operation corresponding to the first voice command.

12. An electronic device, characterized in that, The electronic device includes: a microphone, a sound-emitting device, a memory, and one or more processors; The memory stores computer program code, which includes computer instructions; when the computer instructions are executed by the processor, the electronic device performs the method as described in any one of claims 1-11.

13. A chip system, characterized in that, The chip system, applied to an electronic device, includes: at least one processor and an interface for receiving instructions and transmitting them to the at least one processor; the at least one processor executes the instructions to cause the electronic device to perform the method as described in any one of claims 1-11.

14. A computer-readable storage medium, characterized in that, Includes computer instructions that, when executed on an electronic device, cause the electronic device to perform the method as described in any one of claims 1-11.