emotional speech processing

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech processing and emotion technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of poor performance, unclearness, and indistinguishability of statistical sound recognition models.

Active Publication Date: 2021-06-01

SONY COMPUTER ENTERTAINMENT INC

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, processing emotional speech is very challenging

For example, emotional speech characteristics are significantly different from spoken / conversational speech, and thus statistical sound recognition models trained with spoken speech do not perform well when encountered with emotional speech

In addition, emotion recognition is difficult because different speakers have different ways of expressing their emotions, and thus the classes are ambiguous and indistinguishable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0017] According to aspects of the present disclosure, the sentiment clustering method may be based on Probabilistic Linear Discriminant Analysis (PLDA). For example, each sentimental utterance can be modeled as a Gaussian Mixture Model (GMM) mean supervector. figure 1 An example of generating a GMM supervector (GMM SV) is shown. Initially, one or more speech signals 101 are received. Each speech signal 101 may be any segment of human speech. By way of example and not limitation, the signal 101 may comprise single syllables, words, sentences, or any combination of these. By way of example and not limitation, the voice signal 101 may be captured with a local microphone or received over a network, recorded, digitized and / or stored in computer memory or other non-transitory storage medium. Afterwards, the speech signal 101 can be used for PLDA model training and / or for emotion clustering or emotion classification. In some embodiments, the speech signal used for PLDA model tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Described herein are methods for emotion or utterance recognition and / or clustering, comprising receiving one or more speech samples, generating a set of training data, and generating a model from the set of training data, wherein the model identifies emotion or speech pattern related information in the set of training data. The method may further include receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from each frame of the one or more test speech samples, and using the model to transform the A set of test data to better represent emotion and / or speaking style related information, and using said transformed data for clustering and / or classification to find speech with similar emotion or speaking style.

Description

[0001] Related applications [0002] This application claims priority to commonly assigned US Provisional Patent Application No. 62 / 030,013, filed July 28, 2014, the entire disclosure of which is incorporated herein by reference. This application also claims priority to commonly assigned US Patent Application Serial No. 14 / 743,673, filed June 18, 2015, the entire disclosure of which is incorporated herein by reference. technical field [0003] The present disclosure relates to speech processing, and more particularly to emotional speech processing. Background technique [0004] Emotional speech processing is important for many applications including user interfaces, games and more. However, processing emotional speech is very challenging. For example, emotional speech characteristics are significantly different from spoken / conversational speech, and thus statistical sound recognition models trained with spoken speech do not perform well when encountered with emotional spee...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/06G10L15/18G10L15/08G10L25/63G10L25/24G10L25/27

CPCG10L15/07G10L17/26G10L25/63G10L15/063

Inventor O.卡林利-阿卡巴卡克陈如新

Owner SONY COMPUTER ENTERTAINMENT INC

emotional speech processing

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology