Speech recognition method based on CLDNN+CTC acoustic model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An acoustic model and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of overall performance degradation, inability to extract more discriminative features, limited model fitting ability, etc., and achieve high recognition rate, With anti-noise ability, easy to train the effect

Pending Publication Date: 2020-04-14

武汉水象电子科技有限公司

View PDF11 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This strategy of adding noise is not universal, the noise is different in different scenarios, adding noise to augment data is not a universal solution;

[0019] The speech recognition method disclosed in Invention Patent 3 uses a multi-channel convolutional neural network as the acoustic model. The same voice data enters the same three-channel convolutional network respectively, which cannot extract more discriminative features, and at the same time makes the network structure more Complex, requires a large amount of training data, and is prone to overfitting;

[0020] The speech recognition technology disclosed in Invention Patent 4 is based on a simple DCNN network model and outputs speech sequences end-to-end. Since it uses a CNN-based structure, it has limited processing capacity for data with strong temporal characteristics such as speech; at the same time , the entire model has only 9 layers. For speech recognition with a large Chinese vocabulary, the model fitting ability is limited;

This modeling method involves three models, and the three models are interdependent. The shortcomings of any model will restrain other models, resulting in a sharp drop in overall performance.

The model combines syllables and acoustic features to determine whether the speech is the text, which cannot substantially improve the recognition accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0042] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0043] Such as figure 1 Shown, a kind of speech recognition method based on CLDNN+CTC acoustic model provided by the present invention, described method comprises

[0044] Step 1, obtaining a real-time speech signal, performing feature extraction on the speech signal, and obtaining a frame-by-frame acoustic feature sequence;

[0045] Step 2, using the acoustic feature sequence as the input of the CLDNN+CTC acoustic model, and outputting the phoneme sequence;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice recognition method based on a CLDNN+CTC acoustic model, and the method comprises the steps: 1, obtaining a real-time voice signal, carrying out the feature extraction of the voice signal, and obtaining a frame-by-frame acoustic feature sequence; 2, taking the acoustic characteristic sequence as the input of a CLDNN+CTC acoustic model, and outputting a phoneme sequence; and 3, establishing a decoding model for converting the phoneme sequence into the character sequence, taking the phoneme sequence as the input of the decoding model, and outputting the character sequence through the decoding model. The method is a voice recognition method based on two-stage end-to-end (seq2seq), and comprises an end-to-end model of a 'voice-phoneme sequence' and an end-to-endmodel of a 'phoneme sequence-character sequence', the two models are different from the end-to-end model of an existing 'voice-character sequence', the two models do not need super-large-scale corpustraining, the advantages of the two parts can be complementary, and a language model can make up for the deficiency of an acoustic model in a noise environment.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a speech recognition method based on a CLDNN+CTC acoustic model. Background technique [0002] Speech is the most common and effective way of human interaction, and it has always been an important part of the research field of human-computer communication and human-computer interaction. Human-computer voice interaction technology, which is composed of speech synthesis, speech recognition and natural language understanding, is recognized as a difficult and challenging technical field in the world. At the same time, speech recognition technology can enter various industries such as industrial production, electronic communication, automotive electronics, medical care, service education, etc., and will lead the information technology revolution to a new level. [0003] Speech recognition, also known as automatic speech recognition (Automatic Speech Recognition, ASR). Automatic spe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/02G10L25/78

CPCG10L15/02G10L25/78G10L2015/025

Inventor 柳慧芬袁熹

Owner 武汉水象电子科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speech recognition method based on CLDNN+CTC acoustic model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology