Streaming speech recognition system and method based on non-autoregression model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A speech recognition and regression model technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of low speech recognition decoding efficiency and poor real-time speech recognition, and achieve the effect of avoiding losses and improving the speed of streaming reasoning.

Pending Publication Date: 2022-03-18

董立波

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The purpose of the present invention is to provide a non-autoregressive model-based streaming speech recognition system and method to solve the existing technical problems of low efficiency of speech recognition decoding and poor real-time speech recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0060] A non-autoregressive model-based streaming speech recognition system training method, which includes an acoustic feature sequence extraction module, a streaming acoustic encoder, a CTC linear mapping layer, and a non-autoregressive decoder, such as figure 1 As shown, the training process includes the following steps:

[0061] Step 1. Obtain speech training data and corresponding text annotation training data, and extract a series of features of the speech training data to form a speech feature sequence;

[0062] The goal of speech recognition is to convert continuous speech signals into text sequences. During the recognition process, the waveform signals in the time domain are windowed and framed and then discrete Fourier transform is performed to extract coefficients of specific frequency components to form feature vectors. A series of feature vectors constitute a speech feature sequence, and the speech features are Mel frequency cepstral coefficients (MFCC) or Mel fil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a streaming speech recognition system and method based on a non-autoregression model. The method comprises the following steps: S11, extracting an acoustic feature sequence; s12, generating an acoustic coding state sequence; s13, generating an acoustic coding state sequence; s14, CTC output probability distribution and connection time sequence loss are calculated; s15, performing alignment by using a viterbi algorithm; s16, inputting section by section and calculating joint cross entropy loss; s17, calculating a gradient according to the joint loss of the joint time sequence loss and the joint cross entropy loss, and carrying out back propagation; s18, circularly executing the steps S12 to S17 until the training is completed; the system comprises an acoustic feature sequence extraction module, a streaming acoustic encoder, a CTC linear transformation layer and a non-autoregressive decoder which are sequentially connected with one another. According to the invention, non-autoregressive decoding is carried out on the input audio segments segment by segment, so that the streaming reasoning speed is improved. And the loss of the language modeling capability is avoided.

Description

technical field [0001] The invention belongs to the technical field of electronic signal processing, and in particular relates to a non-autoregressive model-based streaming speech recognition system and method. Background technique [0002] As the entrance of human-computer interaction, speech recognition has important application value in assisting machines to obtain external information and improving the experience of human-computer interaction. Streaming speech recognition methods are usually implemented using models based on autoregressive models. Common models include the RNN-Transducer model and the encoding and decoding model based on the attention mechanism. The decoder starts from the starting symbol, and based on the output of the editor , predict the corresponding text sequence step by step or frame by frame until the end tag is predicted. The decoding method of this kind of autoregressive decoding relies on the marks generated in the past time. This timing depen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L15/16G10L15/02G10L15/26G10L19/16G10L25/24

CPCG10L15/16G10L15/02G10L15/26G10L19/16G10L25/24

Inventor董立波

Owner董立波

Streaming speech recognition system and method based on non-autoregression model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology