Text-dependent speaker recognition method based on joint deep learning

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A speaker recognition and text-related technology, which is applied in the field of text-related speaker recognition, can solve problems such as poor robustness and inability to accurately represent the speaker's personality characteristics, and achieve the effects of improving accuracy, widening gaps, and narrowing differences

Active Publication Date: 2015-06-24

AISPEECH CO LTD

View PDF6 Cites 58 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The present invention aims at the shortcomings of existing traditional speaker recognition methods, such as feature extraction that cannot accurately represent the speaker’s personality characteristics, dynamic features of lost voice signals, poor robustness, and poor recognition effect, and proposes a method. The text-related speaker recognition method based on joint deep learning, in the feature extraction stage, uses joint deep learning to extract j-vector (joint vector, joint feature vector), and uses linear difference analysis as a classifier in the recognition verification stage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0038]In this embodiment, the text information and the speaker are taken into consideration during the training of the deep neural network. For simplicity of implementation, the loss function of the speaker and the text information is directly added to obtain a new loss function. Due to the linear nature of the gradient, the gradient of each coefficient can be calculated independently, and then the coefficients of each non-output layer can be updated by the gradient of the new loss function (sum of two loss functions). When the performance of the two networks cannot be improved, the learning rate starts to decrease.

[0039] The federated learning of this embodiment avoids overfitting of any one task and makes the network more effective. Once the network training (development phase) is completed, j-vector features can be extracted on the last layer of the network, such as figure 1 shown. This feature can be used in various registration and evaluation models.

[0040] Cosine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a text-dependent speaker recognition method based on joint deep learning, and belongs to the field of intelligent voice. The text-dependent speaker recognition method comprises the steps that firstly, an FBANK coefficient is extracted and acquired from voice frequency to be tested, the FBANK coefficient is input into a neural network to be calculated after frame extension, and the j-vector of the voice frequency to be tested is acquired; an LDA model is retrained, a predictive threshold value is acquired, finally, the j-vector of the registered voice frequency to be tested of a speaker is normalized with the j-vector of the tested voice frequency to be tested of the speaker, then the LDA model with the predictive threshold value is input, and a predicting result is acquired. The text-dependent speaker recognition method based on joint deep learning has the advantage that the accuracy of the text-dependent speaker recognition can be improved greatly.

Description

technical field [0001] The present invention relates to a technology in the field of intelligent speech, in particular to a text-related speaker recognition method based on joint deep learning. Background technique [0002] Speaker recognition refers to accepting or rejecting the identity authentication of a speaker given the sound information. Speaker recognition technology has been widely used in many fields, such as: identity verification, Internet security, human-computer interaction, banking and securities systems, military criminal investigation, etc. Speaker recognition technology is divided into text-dependent speaker recognition and text-independent speaker recognition. The former requires the corpus of the training model to be consistent with the test corpus, while the latter does not. Text-related speaker recognition is mainly divided into three modules: feature extraction, model training and classification recognition. Studies have shown that the main problem ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L17/02G10L17/18

Inventor陈楠昕葛凌廷顾昊常烜恺钱彦旻俞凯

OwnerAISPEECH CO LTD

Text-dependent speaker recognition method based on joint deep learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology