Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Self-adapting method of DNN acoustic model based on personal identity characteristics

An acoustic model and identity feature technology, applied in the field of communication, can solve the problems of mismatch between the speaker's voice of the training data and the target speaker's voice, reducing the accuracy of DNN frame classification, and unable to make full use of speaker information, etc., to improve the system. Recognition performance, overcoming the drop in accuracy, and good adaptive performance

Inactive Publication Date: 2019-04-16
XIDIAN UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the problem that still exists in the above two systems is that the speaker's voice of the training data does not match the target speaker's voice, that is, it is assumed that the training data and the test data obey the same distribution
This method overcomes the problem of frame classification accuracy decline, and the recognition performance has been improved to a certain extent, but the recognition is not good, because the improved i-vector is simply spliced ​​with the original input features at the input layer, and a small amount of speech cannot be fully utilized. Personal information, affecting system recognition performance
[0008] Some research institutions choose the bottleneck feature with stronger representation ability instead of MFCC feature to obtain personal identity i-vector. The bottleneck layer is introduced into the structure, which reduces the frame classification accuracy of DNN
[0009] To sum up, the current adaptive research methods have achieved great success, but due to the lack of full use of a small amount of adaptive data, and the complex network structure, high computational complexity, and insufficient stability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-adapting method of DNN acoustic model based on personal identity characteristics
  • Self-adapting method of DNN acoustic model based on personal identity characteristics
  • Self-adapting method of DNN acoustic model based on personal identity characteristics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034]In recent years, more and more attention has been paid to speaker adaptive technology. Adaptive methods are divided into model domain adaptive and feature domain adaptive. The application of adaptive technology in hidden Markov-Gaussian mixture HMM-GMM systems has been widely used. It is mature, but it is difficult to directly apply it to the hidden Markov-deep neural network HMM-DNN system. Many research institutions have done a lot of research on the adaptation of deep neural networks. Among these methods, speaker adaptation based on i-vector is very popular. However, the current adaptive research methods have not made full use of a small amount of adaptive data, and the network structure is complex, the calculation complexity is high, and the stability is not good enough. The present invention just researches and discusses the i-vector adaptive method based on the deep neural network DNN, and proposes an adaptive method based on the DNN acoustic model of the personal...

Embodiment 2

[0048] The adaptive method of the DNN acoustic model based on the personal identity (i-vector) feature is the same as embodiment 1, and the extraction of the personal identity i-vector feature described in step 1 of the present invention includes the following steps

[0049] 1a) Utilize the 39-dimensional low-dimensional feature MFCC extracted from the speech data of the test set in the open source corpus, including its first-order and second-order features, and train a DNN model for non-specific speaker feature extraction;

[0050] 1b) Apply the singular value matrix decomposition technique SVD to decompose the last hidden layer weight matrix of the DNN model trained in step 1a) for non-specific speaker feature extraction, and use it to replace the original weight matrix.

[0051] 1c) Apply the backpropagation algorithm (BP) and the gradient descent method to train the DNN model, and then use the trained DNN model to extract low-dimensional features of non-specific speakers. ...

Embodiment 3

[0055] The adaptive method of the DNN acoustic model based on personal identity (i-vector) feature is the same as embodiment 1-2, the extraction of the characteristic speaker identity vector i-vector described in (1c) and (1d) in the step, its expression for:

[0056] M=m+Tx+e

[0057] where M denotes the speaker-specific GMM mean supervector, m denotes the UBM mean supervector, T denotes a total feature space, x denotes the extracted i-vector features representing personal identity, and e denotes the residual noise term.

[0058] In this example, based on the training data in the corpus, it is easy to obtain the general background model UBM, and the total change matrix T is obtained through the expectation maximization (EM) algorithm. The personal identity i-vcetor feature extracted by this method has a good speaker distinction , which represents the difference information between speakers, and has the advantages of low dimensionality, few parameters during adaptive training...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a self-adapting method of a DNN acoustic model based on personal identity characteristics. The method solves the problems of easy over-fitting, poor personal identity representation capability and low robustness in the self-adapting training. The method comprises the following steps of: extracting personal identity characteristics, and inputting the MFCC characteristics asthe DNN model of a non-specific speaker; establishing a GMM-HMM speech recognition system; building a DNN-HMM baseline system of the DNN acoustic model with a plurality of hidden layers; and carryingout self-adapting training on the individual identity characteristics of the DNN acoustic model layer by layer to obtain the DNN acoustic model which has self-adapting capability to a specific speaker. In personal identity characteristic extraction, a weight matrix decomposition of the last hidden layer of DNN model is replaced by VAD technology. According to the method, a small amount of speakerdata is fully utilized to adjust the model parameters so that the recognition accuracy rate of the specific speaker is improved. And the complexity is low, and the recognition performance is obviouslyimproved. The method is used for intelligent systems related to speech recognition or communications, medical treatment, vehicle mounting and the like.

Description

technical field [0001] The invention belongs to the field of communication technology, and mainly relates to the speaker feature extraction technology of personal identity i-vector, in particular to an adaptive method of DNN acoustic model based on personal identity (i-vector) feature, which is used for non-specific speaker voice identify. Background technique [0002] In recent years, the deep neural network DNN has achieved great success in speech recognition. In speech acoustic modeling, the HMM-DNN system based on the invisible Markov-deep neural network is compared with the traditional invisible Markov-Gaussian The hybrid model HMM-GMM system has better acoustic discrimination and greatly improves the performance of speech recognition. DNN has become the mainstream acoustic model. [0003] However, there is still a problem in the above two systems that the speaker's voice of the training data does not match the target speaker's voice, that is, it is assumed that the tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/16G10L15/14G10L15/02G10L25/24
CPCG10L15/02G10L15/144G10L15/16G10L25/24
Inventor 李颖闫贝贝郭旭东
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products