Semi-supervised speech feature variable factor decomposition method

A voice feature, semi-supervised technology, applied in the direction of instruments, character and pattern recognition, computer parts, etc., can solve the problems of recognition rate influence, indistinguishability, confusion, etc., to avoid mutual interference and improve recognition accuracy.

Active Publication Date: 2014-09-03
JIANGSU UNIV
View PDF4 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the existing tasks related to emotion, gender and age recognition based on speech signals, the features extracted by traditional feature extraction methods are often mixed with factors such as emotion, gender, age, speech content, language, etc., which are difficult to compare with each other. distinction, resulting in poor recognition
[0003] In the paper named Feature Learning in Deep Neural Networks—Studies on Speech Recognition Tasks by Dong Yu et al., a deep neural network is used to learn a deep feature, but this feature may be mixed with many factors, such as emotion, gender, age and other factors , if this feature is used for speech emotion recognition, the recognition rate may be affected by other factors in the feature
At present, there is no feature extraction method that can extract different types of features in speech signals.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised speech feature variable factor decomposition method
  • Semi-supervised speech feature variable factor decomposition method
  • Semi-supervised speech feature variable factor decomposition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] figure 1 The general idea of ​​the method of the present invention is given. First, the speech is preprocessed to obtain the spectrogram, and the spectrogram blocks of different sizes are input into the unsupervised feature learning network SAE, and the convolution kernels of different sizes are obtained through pre-training, and then after convolution , pooling operation to form a local invariant feature y. y is used as the input of the semi-supervised convolutional neural network, and y is decomposed into four types of features by minimizing four different loss function terms.

[0016] The preprocessed speech signal is divided into l i× h i Spectral blocks of different sizes, i represents the number of spectral blocks, different sizes of spectral blocks are input into the unsupervised feature learning network SAE, pre-trained to obtain convolution kernels of different sizes, and then use convolution kernels of different sizes to compare the entire language Convolve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semi-supervised speech feature variable factor decomposition method. Speech features are divided into four types: emotion-related features, gender-related features, age-related features and noise, language and other factor-related features. Firstly, a speech is pretreated to obtain a spectrogram, speech spectrum blocks of different sizes are inputted to an unsupervised feature learning network SAE, convolution kernels of different sizes are obtained through pre-training, convolution kernels of different sizes are then respectively used for carrying out convolution on the whole spectrogram, a plurality of feature mapping pictures are obtained, maximal pooling is then carried out on the feature mapping pictures, and the features are finally stacked together to form a local invariant feature y. Y serves as input of semi-supervised convolution neural network, y is decomposed into four types of features through minimizing four different loss function items. The problem that the recognition accuracy rate is not high as emotion, gender, age and speech features are mixed is solved, and the method can be used for different recognition demands based on speech signals and can also be used for decomposing more factors.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to a method for decomposing speech features. Background technique [0002] As computers penetrate into every corner of life, various types of computing platforms need easier input media, and voice is one of the best choices for users. Generally speaking, speech includes various information such as the speaker, the content of the speech, the emotion of the speaker, gender, age, etc. In recent years, with the continuous improvement of some applications, the development of speech signal-based recognition technology for human emotions, gender, age, speech content, etc. has been promoted. For example, traditional call centers usually randomly connect waiters to provide customers with telephone consultation, but cannot provide personalized services based on the user's emotion, gender and age, which prompts whether it is possible to judge the customer's emotion through the vo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62
CPCG06F18/00
Inventor 毛启容黄正伟薛文韬于永斌詹永照苟建平邢玉萍
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products