Method for reducing error identification rate of text irrelevant speaker identification system

A speaker recognition, text-independent technology, applied in speech analysis, instruments, etc., can solve the problem of increased misrecognition rate and achieve the effect of reducing the misrecognition rate and reducing the high misrecognition rate

Active Publication Date: 2011-11-09
哈尔滨工业大学高新技术开发总公司
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention aims to solve the problem that the error rate of the existing text-independent speaker recognition system increases with the increase of users outside the set in the open-set test, and provides a method for reducing the error rate of the text-independent speaker recognition system method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for reducing error identification rate of text irrelevant speaker identification system
  • Method for reducing error identification rate of text irrelevant speaker identification system
  • Method for reducing error identification rate of text irrelevant speaker identification system

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0014] Specific embodiment one: a kind of method of reducing the misrecognition rate of the text-independent speaker recognition system of the present embodiment is carried out according to the following steps:

[0015] Step 1, using the training data of the closed set of the benchmark speaker recognition system to obtain the Gaussian mixture model of the feature vector of each known speaker and the threshold for correct identification thereof;

[0016] Step 2. Divide the speakers in the closed set into two groups according to male and female, arrange the thresholds for correct recognition of each group according to their size, and divide the thresholds into segments, each segment as a group;

[0017] Step 3. The speakers contained in each group obtained in step 2 are replaced by a model conforming to the Gaussian distribution, and the central distribution of each group of the male group and the central distribution of each group of the female group are obtained;

[0018] Step...

specific Embodiment approach 2

[0023] Specific embodiment two: the difference between this embodiment and specific embodiment one is that the calculation of the Gaussian mixture model in step 3 is carried out according to the following steps:

[0024] a. There are R speakers in the group, and the Gaussian distribution of the i-th speaker in the group is N(μ i , ∑ i ), where μ i represents the mean vector of the Gaussian distribution of the i-th speaker, ∑ i Represents the diagonal covariance matrix of the Gaussian distribution of the i-th speaker, where i=1,2,...,R, with μ i (k) means μ i The k-th dimension component of , with σ 2 i (k) means ∑ i The kth diagonal element of , w i is the weight of the Gaussian distribution,

[0025] b. Press Computes the sum of weights w over all Gaussian distributions in the group c ;

[0026] c. Press Computes the mean vector μ of the central distribution of the mixture Gaussian model of the subgroup c The k-th dimension component of :

[0027] d. Press C...

specific Embodiment approach 3

[0030] Specific embodiment three: the difference between this embodiment and specific embodiment one or two is that the calculation method of the threshold value of the group in step four is as follows:

[0031] There are L Gaussian models in the group, and the threshold for each Gaussian model to be correctly identified is λ 1 ,λ 2 ,...,λ L , then the threshold λ of the group mixture Gaussian model is:

[0032] λ = λ 1 + λ 2 + . . . + λ L L

[0033] or λ for:

[0034] λ = 1 1 λ 1 + 1 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for reducing error identification rate of a text irrelevant speaker identification system, and relates to a method for reducing error identification rate of a speaker identification system. The method solves the problem that the error identification rate of the conventional text irrelevant speaker identification system is increased in open set test. The method comprises the following steps of: dividing speakers in a closed set into a male group and a female group by using an identification threshold value of known speakers in the closed set acquired by a reference speaker identification system, dividing the male group and the female group into a plurality of small groups in a form of threshold value subsection, and finding out the central distribution of each small group; adding a coarse screening module at the front end of the reference speaker identification system, judging the gender of tested voice, and then comparing the voice to be tested with the central distribution of the small group of the same gender to obtain a probability threshold value of the voice to be tested; and performing identification by using the voice frame of the probability threshold value. According to the method, the identification accuracy rate is improved by 2 to 3 percent compared with the conventional system, and the method can be used for the text irrelevant speaker identification system.

Description

technical field [0001] The invention relates to a method for reducing the false recognition rate of a speaker recognition system. Background technique [0002] Speaker recognition is the process of automatically identifying a speaker using the unique individual information contained in the speaker's speech waveform. Speaker recognition can be divided into three types: text-independent, text-dependent, and text-prompted according to different requirements for speech content. Text-independent means that the user does not require a specific language and content when registering in the system, and does not require the voice of the same content during verification and registration; text-dependent means that the verification corpus is consistent with the content of the corpus provided at the time of registration; in the text prompt, the user Proceed as specified by the system. Because of its security and flexibility, text-independent speaker recognition has attracted more attent...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/00
Inventor 韩纪庆王秋雯
Owner 哈尔滨工业大学高新技术开发总公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products