Speech recognition method based on model pre-training and bidirectional LSTM

A speech recognition and pre-training technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of poor anti-noise ability of neural network

Active Publication Date: 2018-10-19
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF7 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] The purpose of the present invention is to solve the problem of poor anti-noise ability of neural network under high noise conditions, and propose a speech recognition method of model pre-training and bidirectional LSTM

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech recognition method based on model pre-training and bidirectional LSTM
  • Speech recognition method based on model pre-training and bidirectional LSTM
  • Speech recognition method based on model pre-training and bidirectional LSTM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] This embodiment describes the speech recognition method based on pre-training and bidirectional LSTM according to the present invention.

[0079] Step A: Input the speech signal to be processed;

[0080] Be specific to the present embodiment, adopt matlab to superimpose noise signal for pure speech according to SNR 9:1, 7:3, the format of the file of each input speech signal to be processed is '.wav';

[0081] Step B: speech signal preprocessing;

[0082] In this embodiment, the voice signal input in step A is passed through a high-pass filter, wherein the coefficient of the filter is 0.96;

[0083] Select 25ms, divide the speech signal processed by the high-pass filter into frames, and set a frame shift of 12.5ms, and convert the speech signal to be processed input in step A into a short-term speech signal T(n) in units of frames ;

[0084] Each frame of short-term speech signal is multiplied by the Hamming window function with a value of 0.46 to obtain the frame si...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speech recognition method based on model pre-training and bidirectional LSTM and belongs to the field of deep learning and speech recognition. The method comprises steps of 1) inputting a to-be-processed speech signal; 2) preprocessing the to-be-processed speech signal; 3) extracting a Mel-frequency cepstrum coefficient and a dynamic difference to obtain a speech feature;4) constructing a bidirectional LSTM structure; 5) optimizing the bidirectional LSTM by using an maxout function to obtain maxout-biLSTM; 6) performing model pre-training; 7) training the noise-containing speech signal by using the pre-trained maxout-biLSTM to obtain a result. The method improves the original activation function of the bidirectional LSTM by using the maxout activation function, and uses the model pre-training method to improve the robustness of an acoustic model in a noisy environment, and can be used for building and training a speech recognition model in a high-noise environment.

Description

technical field [0001] The present invention relates to a speech recognition method of model pre-training and bidirectional LSTM, in particular to a speech recognition method based on pre-training, maxout activation function and bidirectional LSTM model, which can significantly improve the anti-noise ability of neural network in high noise environment , which belongs to the field of deep learning and speech recognition. Background technique [0002] With the continuous development and wide application of computer software and hardware technology, speech recognition technology has been developed rapidly, and the research of speech recognition has attracted more and more people's attention. In recent years, the successful application of deep learning in the field of speech recognition has also made good results in the field of speech recognition. However, the performance of the speech recognition system tends to drop sharply in the high-noise environment of real life. The ess...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/20G10L15/16G10L15/06G10L25/24G10L25/18G10L25/45G10L25/30
CPCG10L15/063G10L15/16G10L15/20G10L25/18G10L25/24G10L25/30G10L25/45
Inventor 金福生王茹楠张俊逸韩翔宇
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products