Extracting method of MFCC coefficients of voice signal, device and Mel filtering method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech signal and coefficient technology, applied in speech analysis, speech recognition, instruments, etc.

Active Publication Date: 2009-11-11

VIMICRO ELECTRONICS CORP

View PDF0 Cites 13 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The technical problem to be solved by this invention is to provide a kind of MFCC coefficient extraction method and device of speech signal, to solve the problem that the MFCC coefficient extraction method of HTK exists

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0056] refer to figure 1 , is a flowchart of a method for extracting MFCC coefficients of a speech signal described in Embodiment 1.

[0057] S101, when performing Mel filtering, increase the number of subbands of the Mel filter bank, perform Mel filtering in the frequency range, and obtain a Mel filtering output corresponding to each subband;

[0058] That is, the original dimension of the Mel filter (that is, the number of subbands) is extended, and then the signal in the full frequency band is filtered. In this way, according to the mapping relationship between the Mel frequency and the linear frequency, the number of sub-bands in the low-frequency range on the signal frequency band (ie, the linear frequency band) is correspondingly increased, thereby ensuring sufficient frequency resolution accuracy for low-frequency signals. But at the same time, the number of sub-bands in the high-frequency range also increases accordingly. Since high-frequency signals are susceptible t...

Embodiment 2

[0070] The present invention is mainly applied to broadband signal processing with a frequency range of 0-16kHz, because the 16kHz broadband signal can basically meet the feature information required for speech recognition. The following will take a 16kHz broadband signal as an example to describe in detail. Among them, 0-8k is the low frequency range, and 8k-16k is the high frequency range. Of course, the present invention is not limited to the frequency range of 0-16 kHz.

[0071] refer to figure 2 , is a flowchart of a method for extracting MFCC coefficients of a speech signal described in Embodiment 2.

[0072] S201, voice enhancement processing;

[0073] In this embodiment, speech enhancement processing is performed on signals in the range of 16 kHz at the same time. The purpose of speech enhancement is to extract the original speech as pure as possible from the noisy speech signal. Currently, there are many enhancement algorithms commonly used, such as spectral subt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an extracting method of MFCC coefficients of a voice signal and a device, which aim at solving the problem existing in the extracting method of MFCC coefficients of HTK. The method comprises the following steps of: preemphasis, windowing, fast Fourier transformation, power spectrum estimation, Mel filtering, non-linear transformation and discrete cosine transform, wherein when carrying out Mel filtering, increasing the subband quantity of a Mel filter group, carrying out Mel filtering in a frequency range and obtaining Mel filtering output corresponding to each subband; then carrying out polymerization to the subband quantity in a high frequency range and obtaining Mel filtering output corresponding to each subband after polymerization; continuing to carry out non-linear transformation and discrete cosine transform to the Mel filtering output in a low frequency range and the high frequency range after polymerization; and finally extracting the MFCC coefficients. The invention guarantees that low frequency signal has sufficient frequency resolving accuracy, simultaneously carries out polymerization to the subband quantity in the high frequency range, and improves the interference rejection of high frequency, thus optimizing the extracted MFCC coefficients and improving accuracy rate of voice recognition.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a method and device for extracting MFCC coefficients of a speech signal and a Mel filtering method. Background technique [0002] In the process of speech recognition processing, Mel-scale Frequency Cepstral Coefficients (MFCC for short) is one of the commonly used characteristic parameters. MFCC simulates the auditory characteristics of the human ear, can reflect the perceptual characteristics of human speech, and extracts the speaker's personality characteristics from the speaker's speech signal, and has achieved a high recognition rate in the practical application of speech recognition. The standard MFCC coefficient extraction process includes pre-emphasis, windowing, FFT transform (Fast Fourier Transform, fast Fourier transform), power spectrum estimation, Mel filter, nonlinear transform (calculate logarithm Log) and DCT transform (Discrete Cosine Transform , disc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/02

Inventor 张晨冯宇红

Owner VIMICRO ELECTRONICS CORP

Extracting method of MFCC coefficients of voice signal, device and Mel filtering method

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A speech signal and coefficient technology, applied in speech analysis, speech recognition, instruments, etc.

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech signal and coefficient technology, applied in speech analysis, speech recognition, instruments, etc.

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology