Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech recognition method based on neural network stacking autoencoder multi-feature fusion

A stacked autoencoder and multi-feature fusion technology, applied in neural learning methods, biological neural network models, speech recognition, etc. Recognition rate and calculation efficiency, speed up, and the effect of reducing feature dimension

Active Publication Date: 2018-01-19
HANGZHOU DIANZI UNIV
View PDF4 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method cannot guarantee the irrelevance between each feature, that is, there is redundancy in the feature vector obtained by direct splicing, which makes the classification effect of the trained model poor.
The low efficiency of model training is a difficulty restricting the field of voice recognition technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech recognition method based on neural network stacking autoencoder multi-feature fusion
  • Speech recognition method based on neural network stacking autoencoder multi-feature fusion
  • Speech recognition method based on neural network stacking autoencoder multi-feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Taking four types of excavation equipment (including hand-held electric picks, excavators, cutters, and hydraulic impact hammers) as examples, the Linear Prediction Cepstrum Coefficients (LPCC) and Mel frequency cepstral coefficients (Mel Frequency Cepstrum Coefficients, MFCC) these two feature extraction methods, the present invention is described further. The following description is only for demonstration and explanation, and does not limit the present invention in any form.

[0039] Model training:

[0040] Step 1. Carry out frame-based windowing on the collected sound data of the four types of excavation equipment during operation. The frame length is N, and the frame shift is Add the Hamming window to get the sound database;

[0041] Step 2, use the LPCC feature extraction algorithm to perform feature extraction on the sound source data of each frame, where the order of LPCC (i.e. the number of LPCC features) we record as R LPCC .

[0042] Step 3. Use the MFC...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a speech recognition method based on neural network stacking autoencoder multi-feature fusion. Firstly, the original sound data is framed and windowed, and the typical time-domain linear predictive cepstrum coefficient feature and the frequency-domain Mel frequency cepstrum coefficient feature are respectively extracted from the framed and windowed data; then the extractedfeatures are spliced, the initial feature representation vector of acoustic signals is constructed and the training feature library is created; then the multi-layer neural network stacking autoencoder is used for feature fusion and learning; the multi-layer autoencoder adopts the over-limit learning machine algorithm to learn training; and finally the extracted features are trained using the over-limit learning machine algorithm to get the classifier model; and the constructed model is finally used to test sample classification and identification. The method adopts multi-feature fusion basedon the over-limit learning machine multi-layer neural network stacking autoencoder, which has higher recognition accuracy compared with the traditional single feature extraction method.

Description

technical field [0001] The invention relates to the technical field of voice recognition, in particular to a voice recognition method based on neural network stacked autoencoder multi-feature fusion. Background technique [0002] Sound recognition is one of the goals of artificial intelligence, and the ability to accurately identify and classify sound signals plays a key role in the development of artificial intelligence. In the existing technology, the traditional method of feature extraction plus classifier has been widely used. However, the feature extraction should be selected according to different types of sounds. Since the source of sound signals is different, the method of feature extraction is also different. It is necessary to select a feature extraction method with corresponding professional knowledge for different sounds. In addition, the sound situation in some environments is more complicated, and the traditional feature extraction methods cannot achieve the d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/06G10L15/14G10L15/20G10L25/24G10L25/30G06N3/08
Inventor 曹九稳程飞王建中
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products