Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Block-based self-attention real-time end-to-end speech translation method

A technology of speech translation and attention, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of monotonous alignment of input and output, poor effect, etc., and achieve the effect of strong portability and convenient deployment

Pending Publication Date: 2022-03-04
沈阳雅译网络技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since there is a relatively strong constraint in the speech recognition task, that is, the input and output are monotonically aligned, there is no such constraint in the speech translation task
Therefore, direct transfer of block-based real-time speech recognition methods to real-time speech translation tasks is less effective.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Block-based self-attention real-time end-to-end speech translation method
  • Block-based self-attention real-time end-to-end speech translation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be further elaborated below in conjunction with the accompanying drawings of the description.

[0038] In real-time speech translation tasks, due to the limitations of traditional attention calculation methods, it is difficult to use them in speech translation tasks. By using the block attention method, the present invention can obtain certain context information when decoding the real-time speech translation model, and realize real-time translation, thereby improving the translation speed and reducing the delay.

[0039] The present invention provides a block-based self-attention real-time end-to-end speech translation method, comprising the following steps:

[0040] 1) Preprocess the recorded audio file training data, map the ID of each voice and its stored path with the corresponding target language text, and construct two mapping files;

[0041] 2) Extracting the acoustic features of the audio file, extracting the two acoustic features of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a block-based self-attention real-time end-to-end speech translation method, which comprises the following steps of: preprocessing recorded audio file training data, mapping an ID (Identity) of each speech, a storage path of each speech and a corresponding target language text, and constructing two mapping files; respectively extracting two acoustic features of a Mel filter bank and a Mel frequency cepstrum coefficient of the audio; constructing a target language dictionary by using the training data, wherein the target language dictionary is used for generating a target language text sequence during decoding; cleaning the training data, and converting the training data into a format file required by an end-to-end speech translation model; initializing an end-to-end speech translation model, and training by using a data file in a specific format; in the inference stage, the size of the block is set, and the trained end-to-end speech translation model is used to dynamically encode the source speech, so that a target statement is generated in real time. According to the method, the model has the capability of real-time speech translation, and the decoding speed of the model is improved under the condition that the performance of the model is not reduced.

Description

technical field [0001] The invention relates to an end-to-end real-time speech translation method, in particular to a block-based self-attention real-time end-to-end speech translation method. Background technique [0002] Speech translation (Speech Translation) broadly refers to the process of translating the speech of a language into the speech or text corresponding to the target language. Speech translation usually refers to the process of translating speech into corresponding target language text, while Speech-to-speech Translation specifically refers to the process of translating speech into corresponding target language speech. Speech translation has a very wide range of application scenarios, such as subtitle generation, conference simultaneous interpretation, etc., and plays an important role in cross-language communication. [0003] In the past, speech translation was usually carried out in a cascading manner, that is, a speech recognition system was used to recogn...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/06G10L15/26G10L25/24
CPCG10L15/02G10L15/063G10L15/26G10L25/24
Inventor 徐萍宁义明
Owner 沈阳雅译网络技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products