Speaker identification method based on joint optimization of total variation space and classifier

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of joint optimization and identification, applied in the field of speaker identification, can solve the problems of affecting system identification performance, high error rate of speaker identification, unfavorable identification tasks, etc.

Active Publication Date: 2021-03-23

HARBIN INST OF TECH

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the current total change space estimation methods do not consider the needs of the task, which is not conducive to the identification task, and then affects the recognition performance of the system, resulting in a high error rate in speaker identification.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0026] Specific implementation mode one: as figure 1 As shown, the speaker identification method based on the joint optimization of the total variation space and the classifier described in this embodiment comprises the following steps:

[0027] Step 1. Input the Mel cepstral coefficients of each segment of speech in the training set into the general background model, and use the maximum a posteriori probability method on the general background model to adapt to obtain the Gaussian mixture model corresponding to each segment of speech. Using the Gaussian mixture model Obtain the mean supervector corresponding to each segment of speech in the training set;

[0028] Then the mean supervector corresponding to each segment of speech in the training set forms the mean supervector set;

[0029] Step 2, calculate the covariance matrix Φ of the mean value m of the mean value supervector corresponding to all segments of speech in the training set and the corresponding mean value super...

specific Embodiment approach 2

[0044] Specific embodiment two: the difference between this embodiment and specific embodiment one is: the Gaussian mixture model is used to obtain the mean supervector corresponding to each section of speech in the training set, and its specific process is:

[0045] Suppose the training set contains a total of S 0 speaker's speech, and the total number of speech segments containing the sth speaker is H s , s=1,2,...,S 0 ;

[0046] According to the mean value μ of all Gaussian components corresponding to the h-th segment of speech of the s-th speaker c , c=1,2,...,C, obtain the mean supervector M corresponding to the h segment of the s speaker's speech s,h ,M s,h The expression is:

[0047]

[0048] Among them: C represents the number of mean values of the Gaussian components corresponding to the h-th segment of the s-th speaker’s speech, μ 1 Represents the mean value of the first Gaussian component corresponding to the h-th segment of speech of the s-th speaker;

...

specific Embodiment approach 3

[0050] Specific implementation mode three: the difference between this implementation mode and specific implementation mode two is: the specific process of said step two is:

[0051]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speaker identity recognition method based on combined optimization of a total change space and a classifier, and belongs to the technical field of speaker identity recognition. The speaker identity recognition method solves the problem that a current total change space estimation method is high in equal error rate for speaker identity recognition. The speaker identity recognition method comprises the steps that firstly, the representation of a training set mean value hyper vector in an initial total change space is obtained; then the length of the representation is normalized and input to the classifier PLDA; then, under the supervision of the classifier PLDA, the parameters of the classifier and parameters of the total change space are updated, the steps are repeated until the set maximum number of iterations is reached, and the final classifier parameters and the total change space parameters are obtained; and during the test, the mean hyper vector of a testspeech and the mean hyper vector of a target speaker are used for calculating the representation of the mean hyper vectors in the total change space, then the length of the representation is normalized, and the joint probability density of the mean hyper vectors on the classifier is calculated as the basis for final classification. The speaker identity recognition method can be applied to the technical field of speaker identification.

Description

technical field [0001] The invention belongs to the technical field of speaker identification, and in particular relates to a speaker identification method based on joint optimization of a total variation space and a classifier. Background technique [0002] Speech is an important information carrier for human beings to communicate emotion and cognition, and it is the most basic and natural way of communication in life and work. With the development of information technology, it is possible to identify the identity of the speaker by analyzing the personal characteristics in the speech signal. Speaker identification technology also has a broad space for development because of its good accuracy, economy and scalability. Among many speaker identification technologies, the speaker identification method based on the identity-vector (i-vector) framework is the most widely used due to its excellent performance and high efficiency. [0003] The core technology of the I-vector fram...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L17/12G10L17/04G10L25/24

CPCG10L17/04G10L17/12G10L25/24

Inventor 韩纪庆陈晨郑贵滨郑铁然

Owner HARBIN INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speaker identification method based on joint optimization of total variation space and classifier

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology