Emotional speech processing

An emotion and speech technology, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as poor performance, ambiguity, and indistinguishability of statistical sound recognition models

Active Publication Date: 2016-05-11
SONY COMPUTER ENTERTAINMENT INC
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, processing emotional speech is very challenging
For example, emotional speech characteristics are significantly different from spoken / conversational speech, and thus statistical sound recognition models trained with spoken speech do not perform well when encountered with emotional speech
In addition, emotion recognition is difficult because different speakers have different ways of expressing their emotions, and thus the classes are ambiguous and indistinguishable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Emotional speech processing
  • Emotional speech processing
  • Emotional speech processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0017] According to aspects of the present disclosure, the sentiment clustering method may be based on Probabilistic Linear Discriminant Analysis (PLDA). For example, each sentimental utterance can be modeled as a Gaussian Mixture Model (GMM) mean supervector. figure 1 An example of generating a GMM supervector (GMMSV) is shown. Initially, one or more speech signals 101 are received. Each speech signal 101 may be any segment of human speech. By way of example and not limitation, the signal 101 may comprise single syllables, words, sentences, or any combination of these. By way of example and not limitation, the voice signal 101 may be captured with a local microphone or received over a network, recorded, digitized and / or stored in computer memory or other non-transitory storage medium. Afterwards, the speech signal 101 can be used for PLDA model training and / or for emotion clustering or emotion classification. In some embodiments, the speech signal used for PLDA model trai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to emotional speech processing. A method for emotion or speaking style recognition and / or clustering comprises receiving one or more speech samples, generating a set of training data by extracting one or more acoustic features from every frame of the one or more speech samples, and generating a model from the set of training data, wherein the model identifies emotion or speaking style dependent information in the set of training data. The method may further comprise receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from every frame of the one or more test speeches, and transforming the set of test data using the model to better represent emotion / speaking style dependent information, and use the transformed data for clustering and / or classification to discover speech with similar emotion or speaking style. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

Description

[0001] Related applications [0002] This application claims priority to commonly assigned US Provisional Patent Application No. 62 / 030,013, filed July 28, 2014, the entire disclosure of which is incorporated herein by reference. This application also claims priority to commonly assigned US Patent Application Serial No. 14 / 743,673, filed June 18, 2015, the entire disclosure of which is incorporated herein by reference. technical field [0003] The present disclosure relates to speech processing, and more particularly to emotional speech processing. Background technique [0004] Emotional speech processing is important for many applications including user interfaces, games and more. However, processing emotional speech is very challenging. For example, emotional speech characteristics are significantly different from spoken / conversational speech, and thus statistical sound recognition models trained with spoken speech do not perform well when encountered with emotional spee...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/18G10L25/63G10L25/27
CPCG10L15/07G10L17/26G10L25/63G10L15/063
Inventor O.卡林利-阿卡巴卡克陈如新
Owner SONY COMPUTER ENTERTAINMENT INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products