A multidimensional voice message identification system based on a progressive neural network and a method thereof

A neural network and speech information technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of not making full use of the correlation of single-task speech information, and achieve the effect of preventing forgetting effect, improving accuracy, and improving recognition efficiency.

Active Publication Date: 2018-12-07
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this work does not take full advantage of t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multidimensional voice message identification system based on a progressive neural network and a method thereof
  • A multidimensional voice message identification system based on a progressive neural network and a method thereof
  • A multidimensional voice message identification system based on a progressive neural network and a method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Below in conjunction with accompanying drawing and embodiment the multidimensional speech information recognition method based on ProgNets that the present invention proposes is described in detail:

[0030] The corpus used in the present embodiment is KSU-Emotions, and the corpus has two stages, and the present embodiment selects the second stage to study, and the corpus uses 14 (7 males and 7 females) speakers to simulate five kinds of emotions ( neutral, sad, happy, surprised and angry), each emotion has 336 sentences, a total of 1680 sentences, the corpus size of the second stage corpus is about 2 hours and 21 minutes.

[0031] In order to better estimate the recognition effect of multi-dimensional speaker information, the present embodiment adopts an adaptive method based on i-vector to extract features, on the basis of Mel-frequency cepstral coefficient (MFCC) features, combined with Gaussian mixture model (GMM ), and the Universal Background Model (UBM) is traine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multidimensional voice message identification system based on a progressive neural network and a method thereof. Based on a baseline system, a progressive neural network is introduced. The baseline system takes i-vector, which is a feature vector, as an input, wherein the vector includes three SNN identification models for gender identification, emotional information identification and identity information identification. On the basis of gender identification, the progressive neural network combines the SNN identification model for gender-related emotional informationidentification with the SNN identification model for gender-related identity information identification, and the information migrates to each other to construct an identification system.

Description

technical field [0001] The invention belongs to the technical field of multi-dimensional speech information recognition, and in particular relates to a progressive neural network-based multi-dimensional speech information recognition system and a method for recognizing various speech information, specifically gender, emotion and speaker identity information. Background technique [0002] Speech signal is the main tool for information transmission and communication between human beings. In daily situations, a speaker's voice often not only conveys semantic information, but also carries information such as the speaker's emotional state, identity, geographical location, and gender. This means that the speech signal we collect is actually a mixed signal of many kinds of information. However, the current speech recognition research is mainly focused on identifying a single message, which is not conducive to understanding the true meaning of speech. Simultaneous recognition of mu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/22G10L15/16G10L15/06G10L25/63
CPCG10L15/063G10L15/16G10L15/22G10L25/63
Inventor 陈海霞杨震
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products