Self-adapting method of DNN acoustic model based on personal identity characteristics

An acoustic model and identity feature technology, applied in the field of communication, can solve the problems of mismatch between the speaker's voice of the training data and the target speaker's voice, reducing the accuracy of DNN frame classification, and unable to make full use of speaker information, etc., to improve the system. Recognition performance, overcoming the drop in accuracy, and good adaptive performance

Inactive Publication Date: 2019-04-16
XIDIAN UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the problem that still exists in the above two systems is that the speaker's voice of the training data does not match the target speaker's voice, that is, it is assumed that the training data and the test data obey the same distribution
This method overcomes the problem of frame classification accuracy decline, and the recognition performance has been improved to a certain extent, but the recognition is not good, because the improved i-vector is simply spliced ​​with the original input features at the input layer, and a small amount of speech cannot be fully utili

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-adapting method of DNN acoustic model based on personal identity characteristics
  • Self-adapting method of DNN acoustic model based on personal identity characteristics
  • Self-adapting method of DNN acoustic model based on personal identity characteristics

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0033] Example 1

[0034] In recent years, speaker adaptation technology has received more and more attention. Adaptive methods are divided into model domain adaptation and feature domain adaptation. The application of adaptive technology in hidden Markov-Gaussian hybrid HMM-GMM systems has been very important. Mature, but it is difficult to directly apply to the Hidden Markov-Deep Neural Network HMM-DNN system. Many research institutions have done a lot of research on deep neural network adaptation. Among these methods, the speaker adaptation method based on i-vector is very popular. However, the current adaptive research methods have not fully utilized a small amount of adaptive data, and the network structure is complex, the calculation complexity is high, and the stability is not good enough. The present invention researches and discusses the i-vector adaptive method based on deep neural network DNN, and proposes an adaptive method of DNN acoustic model based on personal id...

Example Embodiment

[0047] Example 2

[0048] The adaptive method of the DNN acoustic model based on the personal identity (i-vector) feature is the same as in embodiment 1. The extraction of the personal identity i-vector feature described in step 1 of the present invention includes the following steps

[0049] 1a) Using 39-dimensional low-dimensional feature MFCC extracted from the speech data of the test set in the open source corpus, including its first-order and second-order features, train a DNN model for non-specific speaker feature extraction;

[0050] 1b) Apply the singular value matrix decomposition technique SVD to decompose the last hidden weight matrix of the DNN model extracted from the non-specific speaker features trained in step 1a), and replace the original weight matrix with it.

[0051] 1c) Apply back propagation algorithm (BP) and gradient descent method for DNN model training, and then use the trained DNN model to extract low-dimensional features of non-specific speakers.

[0052] 1d)...

Example Embodiment

[0054] Example 3

[0055] The adaptive method of the DNN acoustic model based on the personal identity (i-vector) feature is the same as that of the embodiment 1-2. The expression of the speaker identity vector i-vector described in steps (1c) and (1d) is extracted. for:

[0056] M=m+Tx+e

[0057] Among them, M represents the GMM average super vector of a specific speaker, m represents the UBM average super vector, T represents a total feature space, x represents the extracted i-vector feature representing personal identity, and e represents the residual noise item.

[0058] This example is based on the training data in the corpus, it is easy to obtain the universal background model UBM, and the total change matrix T is obtained through the expectation maximization (EM) algorithm. The personal identity i-vcetor feature extracted by this method has good speaker distinction Sex, represents the difference information between speakers, and has the advantages of low dimensionality, fewer ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a self-adapting method of a DNN acoustic model based on personal identity characteristics. The method solves the problems of easy over-fitting, poor personal identity representation capability and low robustness in the self-adapting training. The method comprises the following steps of: extracting personal identity characteristics, and inputting the MFCC characteristics asthe DNN model of a non-specific speaker; establishing a GMM-HMM speech recognition system; building a DNN-HMM baseline system of the DNN acoustic model with a plurality of hidden layers; and carryingout self-adapting training on the individual identity characteristics of the DNN acoustic model layer by layer to obtain the DNN acoustic model which has self-adapting capability to a specific speaker. In personal identity characteristic extraction, a weight matrix decomposition of the last hidden layer of DNN model is replaced by VAD technology. According to the method, a small amount of speakerdata is fully utilized to adjust the model parameters so that the recognition accuracy rate of the specific speaker is improved. And the complexity is low, and the recognition performance is obviouslyimproved. The method is used for intelligent systems related to speech recognition or communications, medical treatment, vehicle mounting and the like.

Description

technical field [0001] The invention belongs to the field of communication technology, and mainly relates to the speaker feature extraction technology of personal identity i-vector, in particular to an adaptive method of DNN acoustic model based on personal identity (i-vector) feature, which is used for non-specific speaker voice identify. Background technique [0002] In recent years, the deep neural network DNN has achieved great success in speech recognition. In speech acoustic modeling, the HMM-DNN system based on the invisible Markov-deep neural network is compared with the traditional invisible Markov-Gaussian The hybrid model HMM-GMM system has better acoustic discrimination and greatly improves the performance of speech recognition. DNN has become the mainstream acoustic model. [0003] However, there is still a problem in the above two systems that the speaker's voice of the training data does not match the target speaker's voice, that is, it is assumed that the tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/16G10L15/14G10L15/02G10L25/24
CPCG10L15/02G10L15/144G10L15/16G10L25/24
Inventor 李颖闫贝贝郭旭东
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products