Streaming speech recognition system and method based on non-autoregression model

A speech recognition and regression model technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of low speech recognition decoding efficiency and poor real-time speech recognition, and achieve the effect of avoiding losses and improving the speed of streaming reasoning.

Pending Publication Date: 2022-03-18
董立波
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a non-autoregressive model-based streaming speech recognition system and method to solve the existing technical problems of low efficiency of speech recognition decoding and poor real-time speech recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Streaming speech recognition system and method based on non-autoregression model
  • Streaming speech recognition system and method based on non-autoregression model
  • Streaming speech recognition system and method based on non-autoregression model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] A non-autoregressive model-based streaming speech recognition system training method, which includes an acoustic feature sequence extraction module, a streaming acoustic encoder, a CTC linear mapping layer, and a non-autoregressive decoder, such as figure 1 As shown, the training process includes the following steps:

[0061] Step 1. Obtain speech training data and corresponding text annotation training data, and extract a series of features of the speech training data to form a speech feature sequence;

[0062] The goal of speech recognition is to convert continuous speech signals into text sequences. During the recognition process, the waveform signals in the time domain are windowed and framed and then discrete Fourier transform is performed to extract coefficients of specific frequency components to form feature vectors. A series of feature vectors constitute a speech feature sequence, and the speech features are Mel frequency cepstral coefficients (MFCC) or Mel fil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a streaming speech recognition system and method based on a non-autoregression model. The method comprises the following steps: S11, extracting an acoustic feature sequence; s12, generating an acoustic coding state sequence; s13, generating an acoustic coding state sequence; s14, CTC output probability distribution and connection time sequence loss are calculated; s15, performing alignment by using a viterbi algorithm; s16, inputting section by section and calculating joint cross entropy loss; s17, calculating a gradient according to the joint loss of the joint time sequence loss and the joint cross entropy loss, and carrying out back propagation; s18, circularly executing the steps S12 to S17 until the training is completed; the system comprises an acoustic feature sequence extraction module, a streaming acoustic encoder, a CTC linear transformation layer and a non-autoregressive decoder which are sequentially connected with one another. According to the invention, non-autoregressive decoding is carried out on the input audio segments segment by segment, so that the streaming reasoning speed is improved. And the loss of the language modeling capability is avoided.

Description

technical field [0001] The invention belongs to the technical field of electronic signal processing, and in particular relates to a non-autoregressive model-based streaming speech recognition system and method. Background technique [0002] As the entrance of human-computer interaction, speech recognition has important application value in assisting machines to obtain external information and improving the experience of human-computer interaction. Streaming speech recognition methods are usually implemented using models based on autoregressive models. Common models include the RNN-Transducer model and the encoding and decoding model based on the attention mechanism. The decoder starts from the starting symbol, and based on the output of the editor , predict the corresponding text sequence step by step or frame by frame until the end tag is predicted. The decoding method of this kind of autoregressive decoding relies on the marks generated in the past time. This timing depen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/16G10L15/02G10L15/26G10L19/16G10L25/24
CPCG10L15/16G10L15/02G10L15/26G10L19/16G10L25/24
Inventor 董立波
Owner 董立波
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products