Audio enhancement method and system

An audio enhancement, multi-channel audio technology, applied in speech analysis, instruments, etc., can solve the problems of high computational complexity of the algorithm, fuzzy sorting, etc., to reduce the amount of calculation, overcome the fuzzy sorting, and improve the effect of speech enhancement.

Active Publication Date: 2021-10-12
AISPEECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the defects of the above method mainly lie in the following two aspects. The first aspect is that after the CGMM model parameters are randomly initialized, in order to make the CGMM model achieve better results, it is usually necessary to use the EM algorithm to iteratively update the parameters more than 20 times, so the calculation of the algorithm very complex
The second defect is that since the algorithm is performed in the frequency domain, the calculations between the frequency bands are independent of each other
[0006] In the implementation process of the method in the prior art, in order to ensure the later application of audio, such as recognition and other operations, it is necessary to iterate the original collected audio multiple times, so the computational complexity of the algorithm is very large
The category corresponding to each masking value is uncertain, resulting in ambiguous sorting problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio enhancement method and system
  • Audio enhancement method and system
  • Audio enhancement method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is some embodiments of the present invention, but not all of them. Based on the implementation manners in the present invention, all other implementation manners obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045]In order to solve the two defects of the existing method, the present invention uses a direction of arrival estimation method to process the original multi-channel audio to obtain the spatial spectrum information of the original audio. The DOA (direction of arrival, direction of arrival) corresponding to the peak va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an audio enhancement method. The spatial spectrum of the original multi-channel audio is obtained by the direction of arrival estimation algorithm. Acquiring multiple peaks greater than a set threshold from the spatial spectrum; acquiring multiple estimated direction values ​​of the multiple peaks according to a DOA wave-of-arrival estimation method. A spatial covariance matrix of multiple estimated direction values ​​is obtained according to the multiple estimated direction values ​​and the steering vector of the microphone array. The CGMM complex Gaussian mixture model is initialized and established according to the spatial covariance matrix; the parameters of the CGMM complex Gaussian mixture model are iteratively updated by the clustering method. Enhanced audio is obtained by enhancing the original multi-channel audio through the MVDR minimum variance distortion-free response beamforming algorithm. This method reduces the number of iterations of the EM algorithm to update the parameters of the CGMM model, and greatly reduces the amount of calculation. At the same time, the category of time-frequency point masking values ​​obtained in each frequency band is definite, so that the masking values ​​of the same category in each frequency band can be merged together, which overcomes the problem of fuzzy sorting.

Description

technical field [0001] The invention belongs to the technical field of speech recognition, in particular to an audio enhancement method and system. Background technique [0002] At present, the masking value of the time-frequency point is mostly obtained through CGMM (complex Gaussian mixture model, complex Gaussian mixture model), and then MVDR (minimum variance distortionless response, minimum variance distortionless response) is used for speech enhancement. [0003] However, the defects of the above method mainly lie in the following two aspects. The first aspect is that after the parameters of the CGMM model are randomly initialized, in order to make the CGMM model achieve better results, it is usually necessary to use the EM algorithm to iteratively update the parameters more than 20 times, so the calculation of the algorithm The complexity is very large. The second defect is that since the algorithm is performed in the frequency domain, the calculations between the fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L21/0216
CPCG10L21/0216G10L2021/02166
Inventor 任维怡周强
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products