Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for improving speech recognition accuracy on basis of voice leading end noise elimination

A speech recognition and front-end noise technology, applied in speech analysis, instruments, etc., can solve problems such as low recognition accuracy and speech endpoint detection errors

Inactive Publication Date: 2014-09-24
HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problems existing in the prior art, the present invention proposes a method for improving the accuracy of large-scale isolated word speech recognition by eliminating noise based on the front end of the speech, which solves the problem of speech endpoint detection errors in the MFCC extraction process due to noise. Problems with low recognition accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for improving speech recognition accuracy on basis of voice leading end noise elimination
  • Method for improving speech recognition accuracy on basis of voice leading end noise elimination
  • Method for improving speech recognition accuracy on basis of voice leading end noise elimination

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0031] The working principle of the present invention is as follows: the input noisy speech signal can be regarded as the model of two communication channels inputting pure speech and pure noise respectively, so CASA simulates the effect of human ear, and according to the signal time difference (ITD) of two channel arrivals The sum intensity difference (ILD) is used to determine the sound source, that is, the focus is placed on the pure speech signal. CASA uses ITD and ILD to estimate the mask information of the time-frequency unit (T-Funit) in the time-frequency domain. The information of the T-F mask can indicate where the T-F area is noise and where is speech. Finally, the T-F area containing speech information is used for speech Synthesis, restore "pure" speech.

[0032] Such as figure 1 As shown, the method for improving speech recogniti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for improving speech recognition accuracy on the basis of voice leading end noise elimination for large-scale isolate word speech recognition. The method solves the problem that the recognition accuracy rate is low because of speech endpoint detection errors in the MFCC extraction process with noise existing. CASA is used for the leading end of speech recognition, and compared with traditional denoising methods such as noise reduction and speech enhancement, noise can be effectively separated from the speech with the noise by simulating an auditory nerve system of a human ear. The method for improving the speech recognition accuracy on the basis of the voice leading end noise elimination is adopted for recognizing 10240 pieces of speech with noise, the accuracy rate of recognition is improved to 95.5 percent from 83 percent compared with the method without leading end noise processing.

Description

technical field [0001] The invention relates to the field of speech recognition of isolated words, in particular to a method for improving the accuracy of speech recognition of large-scale isolated words. Background technique [0002] The most widely studied and applied feature parameter in speech recognition technology is the Mel cepstrum coefficient (MFCC). The low-frequency MFCC parameters have high spectral resolution and are suitable for speech recognition. Judging from the current situation, the Mel-scale cepstrum parameters have basically replaced the cepstrum parameters derived from the commonly used linear predictive coding, because it takes into account the characteristics of human vocalization and receiving sound, and shows better performance in speech recognition. Good robustness. [0003] However, the recognition rate of MFCC parameters is not very good in the presence of large background noise. Since noise exists everywhere in nature, the speech of any human ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/0308G10L25/84
Inventor 刘明王明江
Owner HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products