Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text-dependent speaker recognition method based on joint deep learning

A speaker recognition and text-related technology, which is applied in the field of text-related speaker recognition, can solve problems such as poor robustness and inability to accurately represent the speaker's personality characteristics, and achieve the effects of improving accuracy, widening gaps, and narrowing differences

Active Publication Date: 2015-06-24
AISPEECH CO LTD
View PDF6 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The present invention aims at the shortcomings of existing traditional speaker recognition methods, such as feature extraction that cannot accurately represent the speaker’s personality characteristics, dynamic features of lost voice signals, poor robustness, and poor recognition effect, and proposes a method. The text-related speaker recognition method based on joint deep learning, in the feature extraction stage, uses joint deep learning to extract j-vector (joint vector, joint feature vector), and uses linear difference analysis as a classifier in the recognition verification stage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text-dependent speaker recognition method based on joint deep learning
  • Text-dependent speaker recognition method based on joint deep learning
  • Text-dependent speaker recognition method based on joint deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038]In this embodiment, the text information and the speaker are taken into consideration during the training of the deep neural network. For simplicity of implementation, the loss function of the speaker and the text information is directly added to obtain a new loss function. Due to the linear nature of the gradient, the gradient of each coefficient can be calculated independently, and then the coefficients of each non-output layer can be updated by the gradient of the new loss function (sum of two loss functions). When the performance of the two networks cannot be improved, the learning rate starts to decrease.

[0039] The federated learning of this embodiment avoids overfitting of any one task and makes the network more effective. Once the network training (development phase) is completed, j-vector features can be extracted on the last layer of the network, such as figure 1 shown. This feature can be used in various registration and evaluation models.

[0040] Cosine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text-dependent speaker recognition method based on joint deep learning, and belongs to the field of intelligent voice. The text-dependent speaker recognition method comprises the steps that firstly, an FBANK coefficient is extracted and acquired from voice frequency to be tested, the FBANK coefficient is input into a neural network to be calculated after frame extension, and the j-vector of the voice frequency to be tested is acquired; an LDA model is retrained, a predictive threshold value is acquired, finally, the j-vector of the registered voice frequency to be tested of a speaker is normalized with the j-vector of the tested voice frequency to be tested of the speaker, then the LDA model with the predictive threshold value is input, and a predicting result is acquired. The text-dependent speaker recognition method based on joint deep learning has the advantage that the accuracy of the text-dependent speaker recognition can be improved greatly.

Description

technical field [0001] The present invention relates to a technology in the field of intelligent speech, in particular to a text-related speaker recognition method based on joint deep learning. Background technique [0002] Speaker recognition refers to accepting or rejecting the identity authentication of a speaker given the sound information. Speaker recognition technology has been widely used in many fields, such as: identity verification, Internet security, human-computer interaction, banking and securities systems, military criminal investigation, etc. Speaker recognition technology is divided into text-dependent speaker recognition and text-independent speaker recognition. The former requires the corpus of the training model to be consistent with the test corpus, while the latter does not. Text-related speaker recognition is mainly divided into three modules: feature extraction, model training and classification recognition. Studies have shown that the main problem ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L17/02G10L17/18
Inventor 陈楠昕葛凌廷顾昊常烜恺钱彦旻俞凯
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products