Unlock instant, AI-driven research and patent intelligence for your innovation.

Speaker recognition method based on mutual information estimation

A technology of speaker recognition and mutual information, which is applied in the field of speaker recognition, can solve problems such as the inability to judge the uniqueness of speaker characteristics, and achieve the effect of reducing EER and optimizing network training

Active Publication Date: 2021-05-28
HARBIN UNIV OF SCI & TECH
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when directly using the deep neural network for unsupervised learning and extracting speaker features, it is impossible to judge whether the speaker feature representation extracted by the network judgment is unique and has high expressive ability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker recognition method based on mutual information estimation
  • Speaker recognition method based on mutual information estimation
  • Speaker recognition method based on mutual information estimation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] The technical scheme that the present invention takes is a kind of speaker recognition method based on mutual information estimation, and this method comprises the following steps:

[0035] Step 1. Preprocess all voices in the data set and extract spectrogram features;

[0036] Step 2. In the training phase, first extract the spectrogram of the speech and use it as the input of the VGG-M network; then randomly sample the triplets of the training data to obtain positive and negative sample pairs; finally, the positive and negative sample pairs Perform mutual information estimation, and use the objective function based on mutual information estimation to perform network training and update network parameters;

[0037] Step 3, utilizing the trained VGG-M network to extract the embedded feature vector representing the identity of the speaker corresponding to the test voice and the target speaker's voice;

[0038] Step 4, calculate the cosine distance between the embedded f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speaker recognition method based on mutual information estimation, and solves the problems of low discrimination of speaker identity features and high error rate of a recognition system. During training, a speech spectrogram is extracted from speech, and the speech spectrogram is used as the input of a VGG-M network; and then, random triple sampling is carried out on training data, positive and negative samples are obtained for mutual information estimation, and a network is trained by using a target function based on mutual information estimation; in the recognition process, the trained VGG-M network is used for extracting embedded features, corresponding to the voice of the target speaker, of the test voice; a cosine distance between the two embedded features is calculated, and the cosine distance is taken as a matching score of the speaker; and the score is compared with a preset threshold value, and whether the test voice comes from the target speaker or not is judged. According to the method, mutual information between speaker features corresponding to positive and negative samples can be effectively utilized, so that network training is optimized, and the error rate of a system is reduced. The method can be applied to the field of speaker recognition.

Description

technical field [0001] The invention belongs to the technical field of speaker recognition, and in particular relates to a speaker recognition method based on mutual information estimation. Background technique [0002] In recent years, biometric information technology has gradually become a convenient and quick way to verify identity information. Speech is the most commonly used and most direct way of communication for people. The unique physiological characteristics of each person obtained from speech are called "voiceprints". Due to the individual differences in the vocal organs and pronunciation habits of each person, each person's voiceprint is different and unique. Therefore, the speaker's unique biological characteristics can be extracted from the speaker's voice signal as uniquely identifiable identity information. [0003] With the rapid development of deep learning in image processing, speech recognition and other fields, methods based on deep learning are gradua...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L17/02G10L17/04G10L25/30G10L25/45G10L25/51G06N3/08G06N3/04
CPCG10L17/02G10L17/04G10L25/45G10L25/30G10L25/51G06N3/08G06N3/04
Inventor 陈晨肜娅峰陈德运
Owner HARBIN UNIV OF SCI & TECH