Self-adapting method of DNN acoustic model based on personal identity characteristics

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An acoustic model and identity feature technology, applied in the field of communication, can solve the problems of mismatch between the speaker's voice of the training data and the target speaker's voice, reducing the accuracy of DNN frame classification, and unable to make full use of speaker information, etc., to improve the system. Recognition performance, overcoming the drop in accuracy, and good adaptive performance

Inactive Publication Date: 2019-04-16

XIDIAN UNIV

View PDF4 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] However, the problem that still exists in the above two systems is that the speaker's voice of the training data does not match the target speaker's voice, that is, it is assumed that the training data and the test data obey the same distribution

This method overcomes the problem of frame classification accuracy decline, and the recognition performance has been improved to a certain extent, but the recognition is not good, because the improved i-vector is simply spliced with the original input features at the input layer, and a small amount of speech cannot be fully utilized. Personal information, affecting system recognition performance

[0008] Some research institutions choose the bottleneck feature with stronger representation ability instead of MFCC feature to obtain personal identity i-vector. The bottleneck layer is introduced into the structure, which reduces the frame classification accuracy of DNN

[0009] To sum up, the current adaptive research methods have achieved great success, but due to the lack of full use of a small amount of adaptive data, and the complex network structure, high computational complexity, and insufficient stability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0034]In recent years, more and more attention has been paid to speaker adaptive technology. Adaptive methods are divided into model domain adaptive and feature domain adaptive. The application of adaptive technology in hidden Markov-Gaussian mixture HMM-GMM systems has been widely used. It is mature, but it is difficult to directly apply it to the hidden Markov-deep neural network HMM-DNN system. Many research institutions have done a lot of research on the adaptation of deep neural networks. Among these methods, speaker adaptation based on i-vector is very popular. However, the current adaptive research methods have not made full use of a small amount of adaptive data, and the network structure is complex, the calculation complexity is high, and the stability is not good enough. The present invention just researches and discusses the i-vector adaptive method based on the deep neural network DNN, and proposes an adaptive method based on the DNN acoustic model of the personal...

Embodiment 2

[0048] The adaptive method of the DNN acoustic model based on the personal identity (i-vector) feature is the same as embodiment 1, and the extraction of the personal identity i-vector feature described in step 1 of the present invention includes the following steps

[0049] 1a) Utilize the 39-dimensional low-dimensional feature MFCC extracted from the speech data of the test set in the open source corpus, including its first-order and second-order features, and train a DNN model for non-specific speaker feature extraction;

[0050] 1b) Apply the singular value matrix decomposition technique SVD to decompose the last hidden layer weight matrix of the DNN model trained in step 1a) for non-specific speaker feature extraction, and use it to replace the original weight matrix.

[0051] 1c) Apply the backpropagation algorithm (BP) and the gradient descent method to train the DNN model, and then use the trained DNN model to extract low-dimensional features of non-specific speakers. ...

Embodiment 3

[0055] The adaptive method of the DNN acoustic model based on personal identity (i-vector) feature is the same as embodiment 1-2, the extraction of the characteristic speaker identity vector i-vector described in (1c) and (1d) in the step, its expression for:

[0056] M=m+Tx+e

[0057] where M denotes the speaker-specific GMM mean supervector, m denotes the UBM mean supervector, T denotes a total feature space, x denotes the extracted i-vector features representing personal identity, and e denotes the residual noise term.

[0058] In this example, based on the training data in the corpus, it is easy to obtain the general background model UBM, and the total change matrix T is obtained through the expectation maximization (EM) algorithm. The personal identity i-vcetor feature extracted by this method has a good speaker distinction , which represents the difference information between speakers, and has the advantages of low dimensionality, few parameters during adaptive training...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a self-adapting method of a DNN acoustic model based on personal identity characteristics. The method solves the problems of easy over-fitting, poor personal identity representation capability and low robustness in the self-adapting training. The method comprises the following steps of: extracting personal identity characteristics, and inputting the MFCC characteristics asthe DNN model of a non-specific speaker; establishing a GMM-HMM speech recognition system; building a DNN-HMM baseline system of the DNN acoustic model with a plurality of hidden layers; and carryingout self-adapting training on the individual identity characteristics of the DNN acoustic model layer by layer to obtain the DNN acoustic model which has self-adapting capability to a specific speaker. In personal identity characteristic extraction, a weight matrix decomposition of the last hidden layer of DNN model is replaced by VAD technology. According to the method, a small amount of speakerdata is fully utilized to adjust the model parameters so that the recognition accuracy rate of the specific speaker is improved. And the complexity is low, and the recognition performance is obviouslyimproved. The method is used for intelligent systems related to speech recognition or communications, medical treatment, vehicle mounting and the like.

Description

technical field [0001] The invention belongs to the field of communication technology, and mainly relates to the speaker feature extraction technology of personal identity i-vector, in particular to an adaptive method of DNN acoustic model based on personal identity (i-vector) feature, which is used for non-specific speaker voice identify. Background technique [0002] In recent years, the deep neural network DNN has achieved great success in speech recognition. In speech acoustic modeling, the HMM-DNN system based on the invisible Markov-deep neural network is compared with the traditional invisible Markov-Gaussian The hybrid model HMM-GMM system has better acoustic discrimination and greatly improves the performance of speech recognition. DNN has become the mainstream acoustic model. [0003] However, there is still a problem in the above two systems that the speaker's voice of the training data does not match the target speaker's voice, that is, it is assumed that the tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/16G10L15/14G10L15/02G10L25/24

CPCG10L15/02G10L15/144G10L15/16G10L25/24

Inventor 李颖闫贝贝郭旭东

Owner XIDIAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Self-adapting method of DNN acoustic model based on personal identity characteristics

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology