Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Non-autoregressive speech recognition training decoding method based on parameter sharing and system thereof

A speech recognition and training method technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of non-autoregressive model training difficulties, etc., and achieve the effects of improving decoding accuracy, improving training speed, and low delay

Pending Publication Date: 2021-10-08
中科极限元(杭州)智能科技股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the non-autoregressive model faces problems such as training difficulties, and

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Non-autoregressive speech recognition training decoding method based on parameter sharing and system thereof
  • Non-autoregressive speech recognition training decoding method based on parameter sharing and system thereof
  • Non-autoregressive speech recognition training decoding method based on parameter sharing and system thereof

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0051] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS It should be understood that the specific embodiments described herein are intended to illustrate and explain the present invention and is not intended to limit the invention.

[0052] Based on two-step decoding parameters shared non-self-regression model and training method, models based on self-focus transform networks include an acoustic encoder based on self-focus mechanism, decoder based on self-focus mechanism, such as figure 1 As shown, including the following steps:

[0053] Step 1, obtain voice training data and corresponding text annotation training data, and extract a series of speech training data, constitute a speech feature sequence;

[0054] The goal of speech recognition is to convert a continuous speech signal into a text sequence. During the identification process, the coefficient composition of the specific frequency component is extracted by the discrete Fourier transformation by adding the waveform signa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a non-autoregressive speech recognition training decoding method based on parameter sharing and a system thereof, and the training method comprises the steps: extracting the features of speech training data, and forming an acoustic feature sequence; performing acoustic coding on the acoustic feature sequence, and outputting an acoustic coding state sequence; performing non-autoregressive decoding on the acoustic coding state sequence and the blank filling sequence, and calculating non-autoregressive cross entropy loss in combination with text labeling training data; performing autoregression decoding on the acoustic coding state sequence and the text labeling training data, and calculating autoregression cross entropy loss in combination with the text labeling training data; carrying out weighting according to the non-autoregression cross entropy loss and the autoregression cross entropy loss to obtain joint loss, calculating a gradient, and carrying out back propagation; and circularly executing until the training is completed; and the decoding method comprises the step of performing speech recognition through the trained model; wherein the system comprises an acoustic feature sequence extraction module, an acoustic encoder, a non-autoregression decoder, an autoregression decoder and a joint loss calculation module.

Description

technical field [0001] The invention relates to the technical field of electronic signal processing, in particular to a non-autoregressive speech recognition training and decoding method and system based on parameter sharing. Background technique [0002] As the entrance of human-computer interaction, speech recognition is an important research direction in the field of artificial intelligence. End-to-end speech recognition discards the pronunciation dictionary, language model and decoding network that the hybrid speech recognition model relies on, and realizes the direct conversion of audio feature sequences to text sequences. The classic encoding and decoding model uses word autoregressive decoding, and its encoder encodes the input speech into a high-level feature representation; the decoder starts from the initial symbol, and gradually predicts the corresponding text based on the output of the editor sequence until the end marker is predicted. The timing dependence of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/06G10L15/02G10L19/18G10L19/26G10L25/24G10L25/27G10L25/30
CPCG10L15/063G10L15/02G10L19/18G10L19/26G10L25/24G10L25/27G10L25/30G10L2015/0633
Inventor 温正棋田正坤
Owner 中科极限元(杭州)智能科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products