Unlock instant, AI-driven research and patent intelligence for your innovation.

Training method and system for obtaining better speech translation model in generative adversarial

A technology of speech translation and training method, applied in the field of speech translation, can solve the problems of lack of internal supervision signal in the amount of data and the inability to train the coding layer of ST effectively.

Active Publication Date: 2021-10-15
PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU +1
View PDF18 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the problem that the coding layer of ST cannot be effectively trained due to the lack of data volume and the lack of internal supervision signals, the present invention provides a better training method and system for speech translation model in generative confrontation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training method and system for obtaining better speech translation model in generative adversarial
  • Training method and system for obtaining better speech translation model in generative adversarial
  • Training method and system for obtaining better speech translation model in generative adversarial

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] Such as figure 1 As shown, the embodiment of the present invention provides a training method for obtaining a better speech translation model in generative confrontation, including the following steps:

[0051] Step 1: Collect training data and use the transcription-translation data pairs in the training data to train the MT model;

[0052] Step 2: Use the contraction mechanism to compress the input length of the ST model, so that the output length of the coding layer of speech and text is approximately the same, including: first use the CTC loss to help the ST model predict the transcription of the speech, and capture the acoustic information of the speech; then use the CTC The existing peak phenomenon removes redundant information in the ST model coding layer state;

[0053] Step 3: Use the adversarial device to make the output distribution of the coding layer of the ST model close to the output distribution of the coding layer of the MT model through the "maximum an...

Embodiment 2

[0059] On the basis of the above examples, combined with figure 2 The shown training framework structure is a more specific description of a training method for obtaining a better speech translation model in generative confrontation provided by an embodiment of the present invention.

[0060] The training framework of the present invention is a general network-independent structure, that is, a convolutional network, a recurrent neural network, and a transformer structure are all acceptable. In the embodiment of the present invention, the Transformer structure is used as the main structure, such as figure 2 shown.

[0061] The training framework of the present invention mainly includes five parts: (1) Acoustic encoding layer, which encodes the acoustic features into the encoding layer state corresponding to the source text. (2) The CTC module, which is used to predict the transcription of speech, helps the acoustic coding layer to capture acoustic information. (3) The cont...

Embodiment 3

[0099] Correspondingly, the embodiment of the present invention also provides a training system for obtaining a better speech translation model in generative confrontation, including:

[0100] A data collection module for collecting training data;

[0101] The model training module is used to use the transcription-translation data in the training data to train the MT model; use the shrinkage mechanism to compress the input length of the ST model, so that the output length of the coding layer of speech and text is approximately the same, including: first adopt CTC loss Help the ST model to predict the transcription of speech and capture the acoustic information of the speech; then use the peak phenomenon of CTC to remove the redundant information in the ST model coding layer state; use the antagonist to make the ST model coding layer The output distribution is close to the output distribution of the encoding layer of the MT model, which helps the ST model capture more semantic ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a training method and system for obtaining a better speech translation model in generative adversarial. The method comprises the following steps: collecting training data, and training an MT model by utilizing transcription-translation data in the training data; compressing the input length of the ST model by utilizing a contraction mechanism, so that the output lengths of coding layers of the voice and the text are approximately the same, comprising the following steps: firstly, assisting the ST model to predict transcription of the voice by adopting CTC loss, and capturing acoustic information of the voice; then, removing redundant information in an ST model coding layer state by using a peak phenomenon existing in CTC; enabling the output distribution of a coding layer of the ST model to be fitted with the output distribution of a coding layer of the MT model by adopting a countermeasure through a maximum and minimum method, and helping the ST model to capture more semantic information; using CTC loss as additional loss, and performing combined training on the whole speech translation model in combination with loss of the end-to-end ST model. According to the invention, the recognition performance of the speech translation model can be improved, and the speech translation efficiency and quality are improved.

Description

technical field [0001] The invention relates to the technical field of speech translation, in particular to a training method and system for obtaining a better speech translation model in generative confrontation, and a speech translation method and device. Background technique [0002] Speech translation refers to inputting voice in one language and outputting text in another language. The traditional speech translation system adopts a cascade method, that is, the speech is first transcribed by an ASR (Automatic Speech Recognition, speech recognition) system, and then the transcribed result is input into an MT (Machine Translation, machine translation) system for translation. This system can use more data to train the ASR and MT systems separately to obtain a ST (Speech Translation, speech translation) system with higher translation quality, so the cascaded ST system has been widely used for many years. [0003] The end-to-end system skips the intermediate transcription st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/30G10L15/26G06K9/62G06N3/04G06N3/08
CPCG06F40/58G06F40/30G10L15/26G06N3/08G06N3/047G06N3/045G06F18/2415G06F18/214
Inventor 屈丹张昊杨绪魁闫红刚张文林郝朝龙魏雪娟李真
Owner PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU