Semi-supervised speech feature variable factor decomposition method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voice feature, semi-supervised technology, applied in the direction of instruments, character and pattern recognition, computer parts, etc., can solve the problems of recognition rate influence, indistinguishability, confusion, etc., to avoid mutual interference and improve recognition accuracy.

Active Publication Date: 2014-09-03

JIANGSU UNIV

View PDF4 Cites 30 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, in the existing tasks related to emotion, gender and age recognition based on speech signals, the features extracted by traditional feature extraction methods are often mixed with factors such as emotion, gender, age, speech content, language, etc., which are difficult to compare with each other. distinction, resulting in poor recognition

[0003] In the paper named Feature Learning in Deep Neural Networks—Studies on Speech Recognition Tasks by Dong Yu et al., a deep neural network is used to learn a deep feature, but this feature may be mixed with many factors, such as emotion, gender, age and other factors , if this feature is used for speech emotion recognition, the recognition rate may be affected by other factors in the feature

At present, there is no feature extraction method that can extract different types of features in speech signals.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0015] figure 1 The general idea of the method of the present invention is given. First, the speech is preprocessed to obtain the spectrogram, and the spectrogram blocks of different sizes are input into the unsupervised feature learning network SAE, and the convolution kernels of different sizes are obtained through pre-training, and then after convolution , pooling operation to form a local invariant feature y. y is used as the input of the semi-supervised convolutional neural network, and y is decomposed into four types of features by minimizing four different loss function terms.

[0016] The preprocessed speech signal is divided into l i× h i Spectral blocks of different sizes, i represents the number of spectral blocks, different sizes of spectral blocks are input into the unsupervised feature learning network SAE, pre-trained to obtain convolution kernels of different sizes, and then use convolution kernels of different sizes to compare the entire language Convolve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a semi-supervised speech feature variable factor decomposition method. Speech features are divided into four types: emotion-related features, gender-related features, age-related features and noise, language and other factor-related features. Firstly, a speech is pretreated to obtain a spectrogram, speech spectrum blocks of different sizes are inputted to an unsupervised feature learning network SAE, convolution kernels of different sizes are obtained through pre-training, convolution kernels of different sizes are then respectively used for carrying out convolution on the whole spectrogram, a plurality of feature mapping pictures are obtained, maximal pooling is then carried out on the feature mapping pictures, and the features are finally stacked together to form a local invariant feature y. Y serves as input of semi-supervised convolution neural network, y is decomposed into four types of features through minimizing four different loss function items. The problem that the recognition accuracy rate is not high as emotion, gender, age and speech features are mixed is solved, and the method can be used for different recognition demands based on speech signals and can also be used for decomposing more factors.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to a method for decomposing speech features. Background technique [0002] As computers penetrate into every corner of life, various types of computing platforms need easier input media, and voice is one of the best choices for users. Generally speaking, speech includes various information such as the speaker, the content of the speech, the emotion of the speaker, gender, age, etc. In recent years, with the continuous improvement of some applications, the development of speech signal-based recognition technology for human emotions, gender, age, speech content, etc. has been promoted. For example, traditional call centers usually randomly connect waiters to provide customers with telephone consultation, but cannot provide personalized services based on the user's emotion, gender and age, which prompts whether it is possible to judge the customer's emotion through the vo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62

CPCG10L15/16

Inventor 毛启容黄正伟薛文韬于永斌詹永照苟建平邢玉萍

Owner JIANGSU UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Semi-supervised speech feature variable factor decomposition method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology