Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Audio enhancement method and system

An audio enhancement and multi-channel audio technology, which is applied in voice analysis, instruments, etc., can solve the problems of sorting fuzzy, algorithm calculation complexity, etc., and achieve the effect of overcoming sorting fuzzy

Active Publication Date: 2019-11-01
AISPEECH CO LTD
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the defects of the above method mainly lie in the following two aspects. The first aspect is that after the CGMM model parameters are randomly initialized, in order to make the CGMM model achieve better results, it is usually necessary to use the EM algorithm to iteratively update the parameters more than 20 times, so the calculation of the algorithm very complex
The second defect is that since the algorithm is performed in the frequency domain, the calculations between the frequency bands are independent of each other
[0006] In the implementation process of the method in the prior art, in order to ensure the later application of audio, such as recognition and other operations, it is necessary to iterate the original collected audio multiple times, so the computational complexity of the algorithm is very large
The category corresponding to each masking value is uncertain, resulting in ambiguous sorting problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio enhancement method and system
  • Audio enhancement method and system
  • Audio enhancement method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is some embodiments of the present invention, but not all of them. Based on the implementation manners in the present invention, all other implementation manners obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045]In order to solve the two defects of the existing method, the present invention uses a direction of arrival estimation method to process the original multi-channel audio to obtain the spatial spectrum information of the original audio. The DOA (direction of arrival, direction of arrival) corresponding to the peak va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an audio enhancement method. Spatial spectrum of the original multi-channel audio frequency is obtained through a direction of arrival (DOA) estimation algorithm. A plurality of peak values greater than a set threshold value are obtained from the spatial spectrum; and a plurality of estimation direction values of the plurality of peak values are obtained according to the DOA estimation method. A spatial covariance matrix of the plurality of estimation direction values is obtained according to the plurality of estimation direction values and a guiding vector of a microphone array. A CGMM (complex Gaussian mixture model) is initialized and established according to the spatial covariance matrix; and a parameter of the CGMM is updated iteratively by a clustering method.The original multi-channel audio frequency is enhanced by an MVDR (minimum variance distortionless response) beamforming algorithm to obtain an enhanced audio. The method reduces the number of timesthat an EM algorithm iteratively updates the parameter of the CGMM, thereby greatly reducing the amount of calculations. Meanwhile, a category of time-frequency point masking values obtained for eachfrequency band is determined, so that masking values of the same category for each frequency band can be combined together, to solve the problem of fuzzy sequencing.

Description

technical field [0001] The invention belongs to the technical field of speech recognition, in particular to an audio enhancement method and system. Background technique [0002] At present, the masking value of the time-frequency point is mostly obtained through CGMM (complex Gaussian mixture model, complex Gaussian mixture model), and then MVDR (minimum variance distortionless response, minimum variance distortionless response) is used for speech enhancement. [0003] However, the defects of the above method mainly lie in the following two aspects. The first aspect is that after the parameters of the CGMM model are randomly initialized, in order to make the CGMM model achieve better results, it is usually necessary to use the EM algorithm to iteratively update the parameters more than 20 times, so the calculation of the algorithm The complexity is very large. The second defect is that since the algorithm is performed in the frequency domain, the calculations between the fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L21/0216
CPCG10L21/0216G10L2021/02166
Inventor 任维怡周强
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products