A multi-feedback AGC identification method and system

By combining a multi-feedback AGC recognition system with a CNN convolutional neural network, adaptive gain adjustment of the audio AGC circuit is achieved, solving the problem of residual echo affecting calls and improving the quality of audio processing.

CN122245331APending Publication Date: 2026-06-19HANGZHOU XUJIAN SCI & TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU XUJIAN SCI & TECH CO LTD
Filing Date
2024-12-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing audio AGC circuits suffer from residual echo affecting call quality when processing microphone sound, leading to inappropriate gain adjustment and echo amplification that can worsen call performance.

Method used

A multi-feedback AGC recognition system is adopted, which uses a microphone module to collect near-end sound and a speaker module to play far-end sound. An echo cancellation module is used to eliminate echoes, and the multi-feedback AGC module performs adaptive gain adjustment. Combined with a CNN convolutional neural network to detect the presence of human voices, the gain is dynamically adjusted to reduce residual echoes.

Benefits of technology

It effectively reduces residual echo caused by distant background noise, solves the problem of inappropriate gain adjustment in AGC, and improves call quality.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245331A_ABST
    Figure CN122245331A_ABST
Patent Text Reader

Abstract

This invention relates to a multi-feedback AGC (Automatic Guided Collection) identification method and system. The multi-feedback AGC identification system includes: a microphone module, an echo cancellation module, a multi-feedback AGC module, a network audio transmission module, a network receiving module, a remote processing module, and a speaker module. The microphone module collects near-end call sound and far-end sound (i.e., echo) played by the speaker module. The echo cancellation module subtracts the near-end sound frequency slices according to the slice difference to eliminate echo. Then, the multi-feedback AGC module adjusts the gain of the near-end sound. The remote processing module sends the detected far-end PCM data of the human voice to the speaker module and the echo cancellation module respectively, thereby reducing far-end background noise that causes echo. This solves the problem in existing background noise technologies where residual echo causes inappropriate gain adjustment for AGC, and the residual echo is also amplified by AGC, affecting call quality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of audio processing technology, specifically a multi-feedback AGC recognition method and system. Background Technology

[0002] An audio AGC circuit is a circuit that automatically adjusts the amplifier gain. It detects the input signal to obtain a control voltage reflecting the signal strength, and then compares this control voltage with the amplifier gain, thereby maintaining the output signal at a stable level. It is widely used in audio amplifiers, speech recognition, audio processing, and other fields to process transmitted analog signals, such as audio signals.

[0003] Existing audio AGC (Automatic Gain Control) is based solely on the energy of the audio signal, which can meet the needs of most scenarios. However, in some special devices, the microphone sound still retains residual echoes after echo cancellation. These residual echoes can cause inappropriate gain adjustment by AGC, and the residual echoes can also be amplified by AGC, affecting the call quality. Summary of the Invention

[0004] To address the shortcomings of existing technologies, the present invention aims to provide a multi-feedback AGC identification method and system to solve the problem in the prior art where residual echo causes inappropriate gain adjustment in AGC, and the residual echo is also amplified by AGC, affecting call quality.

[0005] To achieve the above objectives, this invention proposes a multi-feedback AGC recognition system, comprising:

[0006] Microphone module: The microphone module is used to collect near-end call sound and far-end sound (i.e., echo) played by the speaker module, and sends the collected near-end PCM data to the echo cancellation module;

[0007] Echo Cancellation Module: The echo cancellation module receives PCM data from the near-end sound from the microphone module and PCM data from the far-end sound from the far-end processing module. It divides the received PCM data into segments and performs FFT transformation to obtain FFT frequency segments. The least mean square algorithm is used to match the segment difference between the two frequency segments. The time difference between the near-end sound PCM data and the far-end sound PCM data is obtained by multiplying the segment difference by the segment value, and the time difference is sent to the multi-feedback AGC module. The echo cancellation module subtracts the near-end sound frequency segment from the near-end sound frequency segment according to the segment difference to eliminate echo, and performs inverse FFT transformation on the echo-cancelled near-end sound frequency segment to convert it into near-end sound PCM data, which is then sent to the multi-feedback AGC module.

[0008] Multi-feedback AGC module: The multi-feedback AGC module is used to receive the near-end PCM sound from the echo cancellation module, the time difference between the PCM data of the near-end sound and the PCM data of the far-end sound, and the duration of the human voice in the far-end sound from the far-end processing module. It also performs near-end sound gain adjustment and then sends the processed near-end sound PCM data to the network audio transmission module.

[0009] Network audio transmission module: The network audio transmission module is used to receive PCM data of near-end sound from the multi-feedback AGC module, encode and compress it, and send it to the remote end via the network;

[0010] Network receiving module: The network receiving module receives compressed data of remote sound through the network, decodes it and converts it into PCM data of remote sound, and then sends it to the remote processing module.

[0011] Remote processing module: The remote processing module is used to receive remote PCM data from the network receiving module and send the detected remote PCM data of human voice to the speaker module and echo cancellation module respectively to reduce the echo caused by remote background noise.

[0012] The remote processing module sends the duration of the human voice in the remote sound to the multi-feedback AGC module;

[0013] Speaker module: The speaker module receives PCM data of the remote sound from the remote processing module and plays the remote sound.

[0014] Preferably, the multi-feedback AGC module performs FFT transformation on the PCM data of the near-end sound to obtain FFT frequency slices, and stacks the continuous frequency slices to form a two-dimensional matrix, representing the frequency and time dimension of the sound, which is used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is a continuous probability value of whether it is a human voice. When the probability value is greater than a set threshold, the multi-feedback AGC module determines that the near-end sound contains a human voice.

[0015] When the multi-feedback AGC module detects the presence of human voices in near-end sounds, it simultaneously determines the time when far-end sounds produce an echo in the near-end sounds and adjusts the dynamic gain accordingly. When the volume of the detected human voices in near-end sounds is low, the gain system is increased; when the volume of the detected human voices in near-end sounds is high, the gain system is decreased.

[0016] The multi-feedback AGC module multiplies each sample of the PCM data of the near-end sound by the gain system to achieve adaptive gain adjustment of the near-end sound. During the time when the far-end sound produces an echo in the near-end sound, the multi-feedback AGC module multiplies the gain-adjusted near-end sound by an attenuation coefficient to reduce the impact of the residual echo of the far-end sound on the listening experience.

[0017] Preferably, the remote processing module performs FFT on the remote PCM data to obtain frequency slices of the remote data. The remote processing module stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is a continuous probability value of whether it is a human voice. When the probability value is greater than a set threshold, the remote processing module determines that a human voice has been detected.

[0018] To achieve the above objectives, this invention proposes a multi-feedback AGC recognition method, comprising the following steps:

[0019] 1) The microphone module collects near-end call sound and far-end sound played by the speaker module, and sends the collected near-end PCM data to the echo cancellation module;

[0020] 2) The echo cancellation module removes echoes from the PCM data;

[0021] 2.1) The echo cancellation module receives PCM data of the near-end sound from the microphone module and PCM data of the far-end sound from the far-end processing module.

[0022] 2.2) The echo cancellation module performs FFT transformation on the two PCM segments to obtain FFT frequency segments;

[0023] 2.3) Use the least mean square algorithm to match the frequency slice difference between the two frequency slices;

[0024] 2.4) The time difference between the PCM data of the near-end sound and the PCM data of the far-end sound is obtained by multiplying the segment difference by the segment stone material;

[0025] 2.5) The echo cancellation module sends the time difference to the multi-feedback AGC module;

[0026] 2.6) The echo cancellation module subtracts the near-end sound frequency segment from the near-end sound frequency segment according to the segment difference to achieve echo cancellation;

[0027] 2.7) The echo cancellation module performs an inverse FFT transformation on the near-end sound frequency slices to convert them into near-end sound PCM data, which is then sent to the multi-feedback AGC module.

[0028] 3) The multi-feedback AGC module adjusts the near-end sound gain;

[0029] 3.1) The multi-feedback AGC module receives the near-end PCM of the echo cancellation module, the time difference between the near-end PCM data and the far-end PCM data, and the duration of human voice presence in the far-end sound from the far-end processing module.

[0030] 3.2) Multi-feedback AGC module detects human voice;

[0031] 3.3) When the multi-feedback AGC module detects the presence of human voices in the near-end sound, it simultaneously determines the time when the far-end sound produces an echo in the near-end sound and adjusts the dynamic gain accordingly.

[0032] 3.4) The multi-feedback AGC module sends the processed near-end sound PCM data to the network audio transmission module;

[0033] 4) The network audio transmission module receives the PCM data of the near-end sound from the multi-feedback AGC module, encodes and compresses it, and then sends it to the remote end via the network;

[0034] 5) The network receiving module receives compressed data of the remote sound through the network, then decodes it and converts it into PCM data of the remote sound before sending it to the remote processing module.

[0035] 6) The remote processing module processes remote PCM data and detects human voices;

[0036] 6.1) The remote processing module receives remote PCM data from the network receiving module;

[0037] 6.2) Remote processing module detects human voice;

[0038] 6.3) The remote processing module sends the detected remote PCM data of human voice to the speaker module and the echo cancellation module respectively to reduce the echo caused by remote background noise;

[0039] 6.4) The remote processing module sends the duration of the human voice in the remote sound to the multi-feedback AGC module;

[0040] 7) The speaker module receives the PCM data of the remote sound from the remote processing module and plays the remote sound.

[0041] Preferably, in step 3.2):

[0042] The multi-feedback AGC module performs FFT transformation on the PCM data of the near-end sound to obtain FFT frequency slices, and stacks the continuous frequency slices to form a two-dimensional matrix, representing the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is a continuous probability value of whether it is a human voice. When the probability value is greater than a set threshold, the multi-feedback AGC module determines that a human voice is detected in the near-end sound.

[0043] Preferably, in step 3.3):

[0044] The time it takes for the far-end sound to echo the near-end sound is obtained by adding the time of the far-end sound's voice to the time of the near-end sound's PCM data to the time of the near-end sound's PCM data to the time of the far-end sound's voice to exist in the multi-feedback AGC module.

[0045] Preferably, in step 3.3):

[0046] The gain system is increased when the volume of the detected near-end human voice is low, and decreased when the volume of the detected near-end human voice is high.

[0047] Preferably, each sample of the PCM data of the near-end sound from the multi-feedback AGC module is multiplied by the gain system to achieve adaptive gain adjustment of the near-end sound.

[0048] Preferably, the multi-feedback AGC module multiplies the near-end sound by an attenuation coefficient during the time it takes for the far-end sound to echo the near-end sound, thereby reducing the impact of the residual echo of the far-end sound on the listening experience.

[0049] Preferably, in step 6.2):

[0050] The remote processing module performs FFT on the remote PCM data to obtain frequency slices of the remote data. The remote processing module stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is a continuous probability value of whether it is a human voice. When the probability value is greater than a set threshold, the remote processing module determines that a human voice has been detected.

[0051] Compared with the prior art, the present invention has the following advantages:

[0052] This invention utilizes a microphone module to collect near-end call sound and a speaker module to play far-end sound (i.e., echo). The collected near-end PCM data is sent to an echo cancellation module. The echo cancellation module subtracts the near-end sound frequency segments according to the segment difference to eliminate echo. The echo-cancelled near-end sound frequency segments are then converted into near-end sound PCM data using an inverse FFT and sent to a multi-feedback AGC module. The multi-feedback AGC module then adjusts the near-end sound gain. A far-end processing module sends the detected far-end PCM data of the human voice to both the speaker module and the echo cancellation module to reduce echoes caused by far-end background noise. The far-end processing module sends the duration of the far-end human voice to the multi-feedback AGC module. The speaker module receives the far-end sound PCM data from the far-end processing module and plays the far-end sound, reducing residual echoes caused by far-end background noise. This solves the problem in existing technologies where residual echoes cause inappropriate gain adjustment by AGC, and the residual echoes are also amplified by AGC, affecting call quality. Attached Figure Description

[0053] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0054] Figure 1 This is a schematic diagram of the overall logic of the present invention;

[0055] In the diagram: Microphone module 1, Echo cancellation module 2, Multi-feedback AGC module 3, Network audio transmission module 4, Network reception module 5, Remote processing module 6, Speaker module 7. Detailed Implementation

[0056] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention.

[0057] Example 1: As Figure 1 As shown, this invention proposes a multi-feedback AGC recognition system, comprising:

[0058] Microphone Module 1: Microphone Module 1 is used to collect near-end call sound and far-end sound played by speaker module 7, and sends the collected near-end PCM data to echo cancellation module 2;

[0059] Echo Cancellation Module 2: Echo Cancellation Module 2 receives PCM data of near-end sound from Microphone Module 1 and PCM data of far-end sound from Far-end Processing Module 6. It divides the received PCM data into segments and performs FFT transformation to obtain FFT frequency segments. It uses the least mean square algorithm to match the segment difference between the two frequency segments. It obtains the time difference between the near-end sound PCM data and the far-end sound PCM data by multiplying the segment difference by the segment value, and sends the time difference to Multi-Feedback AGC Module 3. Echo Cancellation Module 2 subtracts the near-end sound frequency segment from the near-end sound frequency segment according to the segment difference to eliminate echo, and performs inverse FFT transformation on the echo-eliminated near-end sound frequency segment to convert it into near-end sound PCM data and sends it to Multi-Feedback AGC Module 3.

[0060] Multi-feedback AGC module 3: Multi-feedback AGC module 3 is used to receive the near-end PCM near-end sound from echo cancellation module 2, the time difference between the PCM data of the near-end sound and the PCM data of the far-end sound, and the duration of the human voice in the far-end sound from the far-end processing module 6, and to perform near-end sound gain adjustment processing, and then send the processed near-end sound PCM data to network audio transmission module 4.

[0061] Network audio transmission module 4: Network audio transmission module 4 is used to receive PCM data of near-end sound from multi-feedback AGC module 3, encode and compress it, and send it to the remote end via the network;

[0062] Network receiving module 5: Network receiving module 5 receives compressed data of remote sound through the network, decodes it and converts it into PCM data of remote sound, and sends it to remote processing module 6;

[0063] Remote processing module 6: The remote processing module 6 is used to receive remote PCM data from the network receiving module 5, and send the detected remote PCM data of human voice to the speaker module 7 and the echo cancellation module 2 respectively to reduce the echo generated by remote background noise.

[0064] The remote processing module 6 sends the duration of the human voice in the remote sound to the multi-feedback AGC module 3;

[0065] Speaker module 7: Speaker module 7 receives PCM data of the remote sound from remote processing module 4 and plays the remote sound.

[0066] The multi-feedback AGC module 3 performs FFT transformation on the PCM data of the near-end sound to obtain FFT frequency slices, and stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is a continuous probability value of whether it is a human voice. When the probability value is greater than the set threshold, the multi-feedback AGC module 3 determines that the near-end sound contains a human voice.

[0067] When the multi-feedback AGC module 3 detects the presence of human voices in near-end sounds, it simultaneously determines the time when far-end sounds produce echoes in the near-end sounds and adjusts the dynamic gain accordingly. When the volume of the detected human voices in near-end sounds is low, the gain system is increased; when the volume of the detected human voices in near-end sounds is high, the gain system is decreased.

[0068] The multi-feedback AGC module 3 multiplies each sample of the PCM data of the near-end sound by the gain system to achieve adaptive gain adjustment of the near-end sound; during the time when the far-end sound produces an echo in the near-end sound, the multi-feedback AGC module 3 multiplies the near-end sound after gain adjustment by an attenuation coefficient to reduce the impact of the residual echo of the far-end sound on the listening experience.

[0069] The remote processing module 6 performs FFT on the remote PCM data to obtain frequency slices of the remote data. The remote processing module 6 stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is the probability value of whether it is a human voice. When the probability value is greater than the set threshold, the remote processing module 6 determines that a human voice has been detected.

[0070] This embodiment uses a microphone module to collect near-end call sound and a speaker module to play far-end sound (i.e., echo). The collected near-end PCM data is sent to an echo cancellation module. The echo cancellation module subtracts the near-end sound frequency slices according to the slice difference to eliminate echo. The echo-cancelled near-end sound frequency slices are then converted into near-end sound PCM data using an inverse FFT and sent to a multi-feedback AGC module. The multi-feedback AGC module then adjusts the near-end sound gain. A far-end processing module sends the detected far-end PCM data of the human voice to both the speaker module and the echo cancellation module to reduce echoes caused by far-end background noise. The far-end processing module sends the duration of the far-end human voice to the multi-feedback AGC module. The speaker module receives the far-end sound PCM data from the far-end processing module and plays the far-end sound, reducing residual echoes caused by far-end background noise. This solves the problem in existing technologies where residual echoes cause inappropriate gain adjustment by AGC, and the residual echoes are also amplified by AGC, affecting call quality.

[0071] Example 2: Figure 1 As shown, this invention proposes a multi-feedback AGC recognition method, which includes the following steps:

[0072] 1) Microphone module 1 collects near-end call sound and far-end sound played by speaker module 7, and sends the collected near-end PCM data to echo cancellation module 2;

[0073] 2) Echo cancellation module 2 eliminates echoes from the PCM data.

[0074] 2.1) The echo cancellation module 2 receives the PCM data of the near-end sound from the microphone module 1 and the PCM data of the far-end sound from the far-end processing module 6.

[0075] 2.2) Echo cancellation module 2 performs FFT transformation on the two PCM segments to obtain FFT frequency segments.

[0076] 2.3) Apply the least mean square algorithm to the two frequency slices to match the slice difference.

[0077] 2.4) The time difference between the PCM data of the near-end sound and the PCM data of the far-end sound is obtained by multiplying the segment difference by the segment stone material;

[0078] 2.5) The echo cancellation module 2 sends the time difference to the multi-feedback AGC module 3;

[0079] 2.6) Echo cancellation module 2 subtracts the near-end sound frequency segment from the near-end sound frequency segment according to the segment difference to achieve echo cancellation;

[0080] 2.7) The echo cancellation module 2 performs an inverse FFT transformation on the near-end sound frequency segments to convert them into near-end sound PCM data and sends them to the multi-feedback AGC module 3;

[0081] 3) Multi-feedback AGC module 3 adjusts and processes near-end sound gain.

[0082] 3.1) The multi-feedback AGC module 3 receives the near-end PCM near-end sound from the echo cancellation module 2, the time difference between the PCM data of the near-end sound and the PCM data of the far-end sound, and the duration of the human voice in the far-end sound from the far-end processing module 6.

[0083] 3.2) Multi-feedback AGC module 3 detects human voice;

[0084] 3.3) When the multi-feedback AGC module 3 detects the presence of human voices in the near-end sound, it simultaneously determines the time when the far-end sound produces an echo in the near-end sound and adjusts the dynamic gain accordingly.

[0085] 3.4) Multi-feedback AGC module 3 sends the processed near-end sound PCM data to the network audio transmission module 4;

[0086] 4) The network audio transmission module 4 receives the PCM data of the near-end sound from the multi-feedback AGC module 3, encodes and compresses it, and then sends it to the remote end via the network;

[0087] 5) The network receiving module 5 receives the compressed data of the remote sound through the network, then decodes it and converts it into PCM data of the remote sound, which is then sent to the remote processing module 6.

[0088] 6) Remote processing module 6 processes remote PCM data and detects human voice.

[0089] 6.1) The remote processing module 6 receives remote PCM data from the network receiving module 5;

[0090] 6.2) Remote processing module 6 detects human voice;

[0091] 6.3) The remote processing module 6 sends the detected remote PCM data of human voice to the speaker module 7 and the echo cancellation module 2 respectively to reduce the echo generated by remote background noise;

[0092] 6.4) The remote processing module 6 sends the duration of the human voice in the remote sound to the multi-feedback AGC module 3;

[0093] 7) The speaker module 7 receives the PCM data of the remote sound from the remote processing module 4 and plays the remote sound.

[0094] This embodiment uses a microphone module to collect near-end call sound and a speaker module to play far-end sound (i.e., echo). The collected near-end PCM data is sent to an echo cancellation module. The echo cancellation module subtracts the near-end sound frequency slices according to the slice difference to eliminate echo. The echo-cancelled near-end sound frequency slices are then converted into near-end sound PCM data using an inverse FFT and sent to a multi-feedback AGC module. The multi-feedback AGC module then adjusts the near-end sound gain. A far-end processing module sends the detected far-end PCM data of the human voice to both the speaker module and the echo cancellation module to reduce echoes caused by far-end background noise. The far-end processing module sends the duration of the far-end human voice to the multi-feedback AGC module. The speaker module receives the far-end sound PCM data from the far-end processing module and plays the far-end sound, reducing residual echoes caused by far-end background noise. This solves the problem in existing technologies where residual echoes cause inappropriate gain adjustment by AGC, and the residual echoes are also amplified by AGC, affecting call quality.

[0095] Example 3, based on the technical solution and working principle of Example 2, differs from this example in that:

[0096] In step 3.2), the specific method for the multi-feedback AGC module to detect human voice is as follows:

[0097] The multi-feedback AGC module 3 performs FFT transformation on the PCM data of the near-end sound to obtain FFT frequency slices, and stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is a continuous probability value of whether it is a human voice. When the probability value is greater than a set threshold, the multi-feedback AGC module 3 determines that the near-end sound contains a human voice.

[0098] Example 4, based on the technical solution and working principle of Example 2 or Example 3, differs from this example in that:

[0099] In step 3.3):

[0100] The time it takes for the far-end sound to echo the near-end sound is obtained by adding the time of the far-end sound's voice to the time of the near-end sound's PCM data to the time of the near-end sound's PCM data to the time of the far-end sound's voice to exist.

[0101] Example 5, based on the technical solutions and working principles of Example 2, Example 3, or Example 4, differs from this example in that:

[0102] In step 3.3):

[0103] The gain system is increased when the volume of the detected near-end human voice is low, and decreased when the volume of the detected near-end human voice is high.

[0104] Each sample of the PCM data of the near-end sound from the multi-feedback AGC module 3 is multiplied by the gain system to achieve adaptive gain adjustment of the near-end sound.

[0105] The multi-feedback AGC module 3 multiplies the near-end sound by an attenuation coefficient during the time it takes for the far-end sound to echo the near-end sound, thereby reducing the impact of the residual echo of the far-end sound on the listening experience.

[0106] In step 6.2):

[0107] The remote processing module 6 performs FFT on the remote PCM data to obtain frequency slices of the remote data. The remote processing module 6 stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is the probability value of whether it is a human voice. When the probability value is greater than the set threshold, the remote processing module 6 determines that a human voice has been detected.

[0108] The above description is merely a specific embodiment of the present invention and is not intended to limit the scope of the present invention. All equivalent changes or modifications made to the structure, features and principles described in the claims of the present invention should be included within the scope of the claims of the present invention.

[0109] The above-described embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and are not intended to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A multi-feedback AGC based identification system, characterized in that, include: Microphone module (1): The microphone module (1) is used to collect near-end call sound and far-end sound played by the speaker module (7), and send the collected near-end PCM data to the echo cancellation module (2); Echo cancellation module (2): The echo cancellation module (2) is used to receive the PCM data of the near-end sound from the microphone module (1) and the PCM data of the far-end sound from the far-end processing module (6), and to perform FFT transformation on the received two PCM data segments to obtain FFT frequency segments. The least mean square algorithm is used on the two frequency segments to match the segment difference of the frequency segments. The time difference between the near-end sound PCM data and the far-end sound PCM data is obtained by multiplying the segment difference by the segment stone, and the time difference is sent to the multi-feedback AGC module (3). The echo cancellation module (2) subtracts the near-end sound frequency segment from the near-end sound frequency segment according to the segment difference to achieve echo cancellation, and performs FFT inverse transformation on the echo-cancelled near-end sound frequency segment to convert it into near-end sound PCM data and sends it to the multi-feedback AGC module (3). Multi-feedback AGC module (3): The multi-feedback AGC module (3) is used to receive the near-end PCM near-end sound from the echo cancellation module (2), the time difference between the PCM data of the near-end sound and the PCM data of the far-end sound, and the time of the presence of the human voice in the far-end sound from the far-end processing module (6), and to process the near-end sound gain adjustment, and then send the processed near-end sound PCM data to the network audio transmission module (4). Network audio transmission module (4): The network audio transmission module (4) is used to receive the PCM data of the near-end sound from the multi-feedback AGC module (3), encode and compress it, and send it to the remote end through the network; Network receiving module (5): The network receiving module (5) receives compressed data of remote sound through the network, decodes it and converts it into PCM data of remote sound, and sends it to the remote processing module (6). Remote processing module (6): The remote processing module (6) is used to receive remote PCM data from the network receiving module (5) and send the detected remote PCM data of human voice to the speaker module (7) and echo cancellation module (2) respectively to reduce the echo generated by remote background noise. The remote processing module (6) sends the duration of the human voice in the remote sound to the multi-feedback AGC module (3); Speaker module (7): The speaker module (7) receives the PCM data of the remote sound from the remote processing module (4) and plays the remote sound.

2. The multi-feedback AGC identification system of claim 1, wherein, The multi-feedback AGC module (3) performs FFT transformation on the PCM data of the near-end sound to obtain FFT frequency slices, and stacks the continuous frequency slices to form a two-dimensional matrix, representing the frequency and time dimension of the sound, which is used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is the continuous probability value of whether it is a human voice. When the probability value is greater than the set threshold, the multi-feedback AGC module (3) determines that the human voice of the near-end sound is detected. The multi-feedback AGC module (3) simultaneously determines the time when the far-end sound produces an echo in the near-end sound when detecting the presence of human voice in the near-end sound, and adjusts the dynamic gain accordingly. When the volume of the detected human voice in the near-end sound is low, the gain system is increased; when the volume of the detected human voice in the near-end sound is high, the gain system is decreased. Each sample of the PCM data of the near-end sound from the multi-feedback AGC module (3) is multiplied by the gain system to achieve adaptive gain adjustment of the near-end sound. The multi-feedback AGC module (3) multiplies the near-end sound by an attenuation coefficient during the time when the far-end sound produces an echo in the near-end sound, thereby reducing the impact of the residual echo of the far-end sound on the listening experience.

3. The multi-feedback AGC identification system according to claim 1 or 2, characterized in that, The remote processing module (6) performs FFT on the remote PCM data to obtain frequency slices of the remote data. The remote processing module (6) stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is the probability value of whether it is a human voice. When the probability value is greater than the set threshold, the remote processing module (6) determines that a human voice has been detected.

4. A multi-feedback AGC-based identification method, characterized in that, Includes the following steps: 1) The microphone module (1) collects the near-end call sound and the far-end sound played by the speaker module (7), and sends the collected near-end PCM data to the echo cancellation module (2); 2) Echo cancellation module (2) eliminates echoes from PCM data; 2.1) The echo cancellation module (2) receives the PCM data of the near-end sound from the microphone module (1) and the PCM data of the far-end sound from the far-end processing module (6); 2.2) The echo cancellation module (2) performs FFT transformation on the two PCM segments to obtain FFT frequency segments; 2.3) Use the least mean square algorithm to match the frequency slice difference between the two frequency slices; 2.4) The time difference between the PCM data of the near-end sound and the PCM data of the far-end sound is obtained by multiplying the segment difference by the segment stone material; 2.5) The echo cancellation module (2) sends the time difference to the multi-feedback AGC module (3); 2.6) Echo cancellation module (2) Subtracts the near-end sound frequency segment from the near-end sound frequency segment according to the segment difference to achieve echo cancellation; 2.7) The echo cancellation module (2) performs FFT inverse transformation on the near-end sound frequency slices to convert them into near-end sound PCM data and sends them to the multi-feedback AGC module (3); 3) Multi-feedback AGC module (3) performs near-end sound gain adjustment processing; 3.1) The multi-feedback AGC module (3) receives the near-end PCM near-end sound, the time difference between the near-end sound PCM data and the far-end sound PCM data from the echo cancellation module (2), and the time of human voice presence in the far-end sound from the far-end processing module (6). 3.2) Multi-feedback AGC module (3) detects human voice; 3.3) The multi-feedback AGC module (3) when detecting the presence of human voice in the near-end sound, simultaneously determines the time when the far-end sound produces an echo in the near-end sound and adjusts the dynamic gain; 3.4) The multi-feedback AGC module (3) sends the processed near-end sound PCM data to the network audio transmission module (4); 4) The network audio transmission module (4) receives the PCM data of the near-end sound from the multi-feedback AGC module (3), encodes and compresses it, and then sends it to the remote end through the network; 5) The network receiving module (5) receives the compressed data of the remote sound through the network, and then decodes it into PCM data of the remote sound and sends it to the remote processing module (6). 6) The remote processing module (6) processes the remote PCM data and detects human voice; 6.1) The remote processing module (6) receives remote PCM data from the network receiving module (5); 6.2) Remote processing module (6) detects human voice; 6.3) The remote processing module (6) sends the detected remote PCM data of human voice to the speaker module (7) and the echo cancellation module (2) respectively to reduce the echo generated by the remote background noise; 6.4) The remote processing module (6) sends the duration of the human voice in the remote sound to the multi-feedback AGC module (3); 7) The speaker module (7) receives the PCM data of the remote sound from the remote processing module (4) and plays the remote sound.

5. The method according to claim 4, wherein, In step 3.2): The multi-feedback AGC module (3) performs FFT transformation on the PCM data of the near-end sound to obtain FFT frequency slices, and stacks the continuous frequency slices to form a two-dimensional matrix, representing the frequency and time dimension of the sound, which is used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is the continuous probability value of whether it is a human voice. When the probability value is greater than the set threshold, the multi-feedback AGC module (3) determines that the human voice of the near-end sound is detected.

6. The method according to claim 4 or 5, characterized in that, In step 3.3): The multi-feedback AGC module (3) adds the time difference between the PCM data of the near sound and the PCM data of the far sound to the time when the far sound produces an echo in the near sound.

7. The method according to claim 4 or 5, characterized in that, In step 3.3): The gain system is increased when the volume of the detected near-end human voice is low, and decreased when the volume of the detected near-end human voice is high.

8. The method according to claim 7, wherein, Each sample of the PCM data of the near-end sound from the multi-feedback AGC module (3) is multiplied by the gain system to achieve adaptive gain adjustment of the near-end sound.

9. The multi-feedback AGC recognition method according to claim 8, characterized in that, The multi-feedback AGC module (3) multiplies the near-end sound by an attenuation coefficient during the time when the far-end sound produces an echo in the near-end sound, thereby reducing the impact of the residual echo of the far-end sound on the listening experience.

10. The multi-feedback AGC recognition method according to claim 4, characterized in that, In step 6.2): The remote processing module (6) performs FFT on the remote PCM data to obtain frequency slices of the remote data. The remote processing module (6) stacks the continuous frequency slices to form a two-dimensional matrix, which represents the frequency and time dimension of the sound. This matrix is ​​used as the input of the CNN convolutional neural network. The output of the CNN convolutional neural network is the probability value of whether it is a human voice. When the probability value is greater than the set threshold, the remote processing module (6) determines that a human voice has been detected.