A Semi-Supervised Method for Decomposing Variable Factors of Speech Features

A speech feature and semi-supervised technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of recognition rate impact, difficult to distinguish, poor recognition effect, etc., to avoid mutual interference and improve recognition accuracy Effect

Active Publication Date: 2017-02-15
JIANGSU UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the existing tasks related to emotion, gender and age recognition based on speech signals, the features extracted by traditional feature extraction methods are often mixed with factors such as emotion, gender, age, speech content, language, etc., which are difficult to compare with each other. distinction, resulting in poor recognition
[0003] In the paper titled Feature Learning in Deep Neural Networks—Studieson Speech Recognition Tasks by Dong Yu et al., a deep neural network is used to learn a deep feature, but this feature may be mixed with many factors, such as emotion, gender, age and other factors. If this feature is used for speech emotion recognition, the recognition rate may be affected by other factors in the feature
At present, there is no feature extraction method that can extract different types of features in speech signals.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Semi-Supervised Method for Decomposing Variable Factors of Speech Features
  • A Semi-Supervised Method for Decomposing Variable Factors of Speech Features
  • A Semi-Supervised Method for Decomposing Variable Factors of Speech Features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] figure 1 The general idea of ​​the method of the present invention is given. First, the speech is preprocessed to obtain the spectrogram, and the spectrogram blocks of different sizes are input into the unsupervised feature learning network SAE, and the convolution kernels of different sizes are obtained through pre-training, and then after convolution , pooling operation to form a local invariant feature y. y is used as the input of the semi-supervised convolutional neural network, and y is decomposed into four types of features by minimizing four different loss function terms.

[0016] The preprocessed speech signal is divided into l i× h i Spectral blocks of different sizes, i represents the number of spectral blocks, different sizes of spectral blocks are input into the unsupervised feature learning network SAE, pre-trained to obtain convolution kernels of different sizes, and then use convolution kernels of different sizes to compare the entire language Convolve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for decomposing variable factors of semi-supervised speech features, which divides speech features into four categories: emotion-related features, gender-related features, age-related features, and other factor-related features including noise and language. First, the speech is preprocessed to obtain the spectrogram, and the spectrogram blocks of different sizes are input into the unsupervised feature learning network SAE, and the convolution kernels of different sizes are obtained through pre-training, and then the entire spectrogram is processed with convolution kernels of different sizes. Convolution to obtain several feature maps, and then perform maximum pooling on the feature maps, and finally stack the features to form a local invariant feature y. y is used as the input of the semi-supervised convolutional neural network, and y is decomposed into four types of features by minimizing four different loss function terms. The present invention solves the problem of low recognition accuracy caused by mixed voice features of emotion, gender and age, can be used for different recognition requirements based on voice signals, and can also be used to decompose more factors.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to a method for decomposing speech features. Background technique [0002] As computers penetrate into every corner of life, various types of computing platforms need easier input media, and voice is one of the best choices for users. Generally speaking, speech includes various information such as the speaker, the content of the speech, the emotion of the speaker, gender, age, etc. In recent years, with the continuous improvement of some applications, the development of speech signal-based recognition technology for human emotions, gender, age, speech content, etc. has been promoted. For example, traditional call centers usually randomly connect waiters to provide customers with telephone consultation, but cannot provide personalized services based on the user's emotion, gender and age, which prompts whether it is possible to judge the customer's emotion through the vo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/62
CPCG06F18/00
Inventor 毛启容黄正伟薛文韬于永斌詹永照苟建平邢玉萍
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products