Encoder and local generative attention mechanism-based end-to-end speech recognition system adopting same
An attention and encoder technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as increased training time, wasted memory usage, and increased attention weight error rate, achieving good recognition rate, reduction of multiplication times, The effect of reducing computational complexity
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0070] (1) Model structure:
[0071] The SA-Transformer baseline model is an improved Transformer speech recognition model, including an encoder and a decoder. The encoder consists of a convolutional front end and 12 identical encoder sub-blocks. Each sub-block contains SA layers, convolutional layer and a feed-forward fully connected layer. For the convolutional front end, we stack two 3×3 convolutional layers. We set the stride of both its time dimension and frequency dimension to 2 to downsample the input features. The decoder consists of a word embedding layer and 6 identical decoder sub-blocks. In addition to the feed-forward fully connected layer, the decoder sub-block also contains two multi-head attention on the embedded representation of the label sequence and the output of the encoder respectively. SA layer.
[0072] LDSA-Transformer has the same decoder as the baseline model. Just replace the self-attention mechanism in the SA-Transformer encoder with LDSA. The...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


