Unlock instant, AI-driven research and patent intelligence for your innovation.

emotional speech processing

A speech processing and emotion technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of poor performance, unclearness, and indistinguishability of statistical sound recognition models.

Active Publication Date: 2021-06-01
SONY COMPUTER ENTERTAINMENT INC
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, processing emotional speech is very challenging
For example, emotional speech characteristics are significantly different from spoken / conversational speech, and thus statistical sound recognition models trained with spoken speech do not perform well when encountered with emotional speech
In addition, emotion recognition is difficult because different speakers have different ways of expressing their emotions, and thus the classes are ambiguous and indistinguishable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • emotional speech processing
  • emotional speech processing
  • emotional speech processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0017] According to aspects of the present disclosure, the sentiment clustering method may be based on Probabilistic Linear Discriminant Analysis (PLDA). For example, each sentimental utterance can be modeled as a Gaussian Mixture Model (GMM) mean supervector. figure 1 An example of generating a GMM supervector (GMM SV) is shown. Initially, one or more speech signals 101 are received. Each speech signal 101 may be any segment of human speech. By way of example and not limitation, the signal 101 may comprise single syllables, words, sentences, or any combination of these. By way of example and not limitation, the voice signal 101 may be captured with a local microphone or received over a network, recorded, digitized and / or stored in computer memory or other non-transitory storage medium. Afterwards, the speech signal 101 can be used for PLDA model training and / or for emotion clustering or emotion classification. In some embodiments, the speech signal used for PLDA model tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Described herein are methods for emotion or utterance recognition and / or clustering, comprising receiving one or more speech samples, generating a set of training data, and generating a model from the set of training data, wherein the model identifies emotion or speech pattern related information in the set of training data. The method may further include receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from each frame of the one or more test speech samples, and using the model to transform the A set of test data to better represent emotion and / or speaking style related information, and using said transformed data for clustering and / or classification to find speech with similar emotion or speaking style.

Description

[0001] Related applications [0002] This application claims priority to commonly assigned US Provisional Patent Application No. 62 / 030,013, filed July 28, 2014, the entire disclosure of which is incorporated herein by reference. This application also claims priority to commonly assigned US Patent Application Serial No. 14 / 743,673, filed June 18, 2015, the entire disclosure of which is incorporated herein by reference. technical field [0003] The present disclosure relates to speech processing, and more particularly to emotional speech processing. Background technique [0004] Emotional speech processing is important for many applications including user interfaces, games and more. However, processing emotional speech is very challenging. For example, emotional speech characteristics are significantly different from spoken / conversational speech, and thus statistical sound recognition models trained with spoken speech do not perform well when encountered with emotional spee...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/06G10L15/18G10L15/08G10L25/63G10L25/24G10L25/27
CPCG10L15/07G10L17/26G10L25/63G10L15/063
Inventor O.卡林利-阿卡巴卡克陈如新
Owner SONY COMPUTER ENTERTAINMENT INC