Unlock instant, AI-driven research and patent intelligence for your innovation.

A Speaker Recognition Method Based on Mutual Information Estimation

A speaker recognition and mutual information technology, applied in the field of speaker recognition, can solve the problem of inability to judge the uniqueness of the speaker's feature representation, and achieve the effect of reducing EER and optimizing network training.

Active Publication Date: 2022-07-05
HARBIN UNIV OF SCI & TECH
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when directly using the deep neural network for unsupervised learning and extracting speaker features, it is impossible to judge whether the speaker feature representation extracted by the network judgment is unique and has high expressive ability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Speaker Recognition Method Based on Mutual Information Estimation
  • A Speaker Recognition Method Based on Mutual Information Estimation
  • A Speaker Recognition Method Based on Mutual Information Estimation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] The technical solution adopted by the present invention is a method for speaker identification based on mutual information estimation, which comprises the following steps:

[0035] Step 1. Preprocess all the voices in the dataset and extract spectrogram features;

[0036] Step 2. In the training phase, the spectrogram is first extracted from the speech and used as the input of the VGG-M network; then random triplet sampling is performed on the training data to obtain positive and negative sample pairs; finally, positive and negative sample pairs are obtained. Perform mutual information estimation, and use the objective function based on mutual information estimation to perform network training and update network parameters;

[0037] Step 3, using the trained VGG-M network to extract the embedded feature vector representing the speaker identity feature corresponding to the test voice and the target speaker voice;

[0038] Step 4. Calculate the cosine distance between th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speaker identification method based on mutual information estimation, which solves the problems of poor distinguishability of speaker identity features and high error rate of the identification system. During training, the spectrogram is first extracted from the speech and used as the input of the VGG-M network; then random triplet sampling is performed on the training data, positive and negative samples are obtained for mutual information estimation, and the The objective function trains the network. During recognition, the trained VGG‑M network is used to extract the embedded features corresponding to the test voice and the target speaker voice; then the cosine distance between the two embedded features is calculated and used as the speaker matching score; the score is compared with The set threshold is compared to determine whether the test speech comes from the target speaker. This method can effectively utilize the mutual information between the speaker features corresponding to the positive and negative samples, so as to optimize the network training and reduce the error rate of the system. The present invention can be applied to the field of speaker recognition.

Description

technical field [0001] The invention belongs to the technical field of speaker identification, in particular to a speaker identification method based on mutual information estimation. Background technique [0002] In recent years, biometric identification technology has gradually become a convenient and quick way to verify identity information. Voice is the most commonly used and direct way of communication, and the unique physiological characteristics of each person obtained from voice are called "voiceprint". Due to the individual differences in each person's vocal organs and pronunciation habits, each person's voiceprint is different and unique. Therefore, the unique biometric features of the speaker can be extracted from the speaker's speech signal as the uniquely authenticated identity information. [0003] With the rapid development of deep learning in image processing, speech recognition and other fields, methods based on deep learning are gradually being applied in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L17/02G10L17/04G10L25/30G10L25/45G10L25/51G06N3/08G06N3/04
CPCG10L17/02G10L17/04G10L25/45G10L25/30G10L25/51G06N3/08G06N3/04
Inventor 陈晨肜娅峰陈德运
Owner HARBIN UNIV OF SCI & TECH