Method for training acoustic model based on CTC (Connectionist Temporal Classification)

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An acoustic model and model technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of CTC acoustic model performance is not as good as CE model, model training is unstable, performance degradation of small and medium datasets, etc., to improve independence and performance. The effect of identification, reducing the number of search paths, and easy parallel computing

Active Publication Date: 2018-07-10

INST OF ACOUSTICS CHINESE ACAD OF SCI +1

View PDF4 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0013] During the training process of the CTC model, all paths that can be mapped to the correct text sequence will be included in the forward and backward search process, including some extremely asymmetric paths, that is, the positions where those phonemes appear are compared with the actual situation There are severely delayed or advanced paths, and these paths will lead to unstable model training

In addition, the traditional CTC model architecture uses RNN for training. RNN has long-term modeling capabilities, which can greatly improve the performance of CTC models. However, due to some characteristics of RNN, it is not easy to train in parallel, the training speed is very slow, and the training efficiency is low.

[0014] Although the training steps of the CTC model are simplified, its recognition accuracy is not competitive with the CE model; the recognition accuracy is slightly lower than the traditional cross-entropy (Cross-entropy, CE) method, and the recognition accuracy is lower than that of the CE model. Low; especially on small and medium-sized data sets, the performance degradation is more serious, and the performance of the CTC acoustic model is usually not as good as the CE model

In addition, the training of the CTC model is extremely unstable and prone to divergence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0029] The present invention will be further described in detail below in conjunction with the accompanying drawings.

[0030] Such as figure 2 As shown, the present invention provides a kind of method based on the acoustic model training of CTC, this method adopts at first a plurality of independent " blank " symbols to replace all phonemes in the original CTC model to share a " blank " symbol, then to the training data The phoneme labeling sequence is aligned with the time points through an initial model GMM to obtain the approximate location of each phoneme, and then construct a search path graph for the forward and backward calculation of CTC for the phoneme labeling sequence after adding the "blank" symbol; then through a configurable The parameter "Time Tolerance" controls phonemes to appear slightly earlier or later in the search path within the "Time Tolerance" range, which is the range of time each element occurs, usually set to 50- 300 milliseconds. In this embodi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for training an acoustic model based on CTC (Connectionist Temporal Classification). The method comprises the steps of 1, training an initial GMM (Gaussian Mixture Model), wherein time point forced alignment on text annotation of training data by using the GMM to obtain a time region corresponding to each phoneme; 2, inserting a blank symbol associated with the phoneme behind each phoneme, wherein each phoneme has a unique blank symbol; 3, constructing a CTC forward and backward calculated search path diagram for a phoneme annotation sequence with the blank symbols being added by adopting a finite state machine; 4, restricting the appearance time range of each phoneme according to a time alignment result, pruning the search path diagram, and cutting off thepath with the phoneme position exceeding the time restrictions so as to obtain a final search path diagram required by calculating a network error in CTC; and 5, performing acoustic model training byadopting the combination of a time-delay neural network (TDNN) structure and the CTC method to obtain a final TDNN-CTC acoustic model.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a method for training an acoustic model based on CTC. Background technique [0002] In recent years, the introduction of Deep Neural Network (DNN) for acoustic model modeling in speech recognition systems has achieved great success. Due to the excellent classification ability of DNN, it can replace the Gaussian Mixture Model (GMM) in the traditional Hidden Markov Model architecture to generate posterior probability. However, this new HMM / DNN model architecture is very complicated to train. Therefore, researchers began to explore an end-to-end learning method, that is, input a sequence of speech features and directly obtain its text sequence. In this case, the method of combining Connectionist Temporal Classification (CTC) with Recurrent Neural Network (RNN) has attracted more and more attention from researchers. [0003] There are two main differences between CTC an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/14G10L15/16G10L15/02

CPCG10L15/02G10L15/144G10L15/16G10L2015/025

Inventor 张鹏远王智超潘接林颜永红

Owner INST OF ACOUSTICS CHINESE ACAD OF SCI

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for training acoustic model based on CTC (Connectionist Temporal Classification)

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology