Method and system for training better speech translation models in generative confrontation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech translation and training method, applied in the field of speech translation, can solve the problems of lack of internal supervision signal in the amount of data and the inability to train the coding layer of ST effectively.

Active Publication Date: 2022-04-15

PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU +1

View PDF18 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Aiming at the problem that the coding layer of ST cannot be effectively trained due to the lack of data volume and the lack of internal supervision signals, the present invention provides a better training method and system for speech translation model in generative confrontation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0050] like figure 1 As shown, an embodiment of the present invention provides a training method for obtaining a better speech translation model in generative confrontation, including the following steps:

[0051] Step 1: Collect training data, and use the transcription-translation data pair in the training data to train the MT model;

[0052] Step 2: Use the shrinking mechanism to compress the input length of the ST model, so that the output lengths of the coding layers of speech and text are approximately the same, including: firstly using the CTC loss to help the ST model predict the transcription of the speech, and capture the acoustic information of the speech; then use the CTC loss. The existing peak phenomenon removes redundant information in the coding layer state of the ST model;

[0053] Step 3: Use the adversary to make the output distribution of the coding layer of the ST model close to the output distribution of the coding layer of the MT model through the "maxim...

Embodiment 2

[0059] On the basis of the above-mentioned embodiment, combined with figure 2 The shown training framework structure further elaborates on a training method for obtaining a better speech translation model in generative confrontation provided by the embodiment of the present invention.

[0060] The training framework of the present invention is a general network-independent structure, that is, a convolutional network, a cyclic neural network, or a transformer structure can be used. In the embodiment of the present invention, the Transformer structure is used as the main structure, such as figure 2 shown.

[0061] The training framework of the present invention mainly includes five parts: (1) an acoustic encoding layer, which encodes the acoustic features into an encoding layer state corresponding to the source text. (2) The CTC module is used to predict the transcription of speech and help the acoustic coding layer to capture acoustic information. (3) Shrinkage mechanism, ...

Embodiment 3

[0099] Correspondingly, the embodiment of the present invention also provides a training system for obtaining a better speech translation model in generative confrontation, including:

[0100] A data collection module for collecting training data;

[0101] The model training module is used to use the transcription-translation data in the training data to train the MT model; use the shrinkage mechanism to compress the input length of the ST model, so that the output length of the coding layer of speech and text is approximately the same, including: first adopt CTC loss Help the ST model to predict the transcription of speech and capture the acoustic information of the speech; then use the peak phenomenon of CTC to remove the redundant information in the ST model coding layer state; use the antagonist to make the ST model coding layer The output distribution is close to the output distribution of the encoding layer of the MT model, which helps the ST model capture more semantic ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a training method and system for obtaining a better speech translation model in generative confrontation. The method includes collecting training data, using the transcription-translation data in the training data to train the MT model; using the contraction mechanism to compress the input length of the ST model, so that the output length of the coding layer of speech and text is approximately the same, including: first adopting CTC The loss helps the ST model to predict the transcription of the speech and capture the acoustic information of the speech; then use the peak phenomenon of the CTC to remove the redundant information in the state of the ST model coding layer; use the adversarial device to make the coding of the ST model The layer output distribution fits the encoding layer output distribution of the MT model, helping the ST model capture more semantic information; the CTC loss is used as an additional loss, and the entire speech translation model is jointly trained in combination with the end-to-end ST model loss. The invention can improve the recognition performance of the speech translation model, and further improve the speech translation efficiency and quality.

Description

technical field [0001] The present invention relates to the technical field of speech translation, in particular to a training method, system, and speech translation method and device for obtaining a better speech translation model in generative confrontation. Background technique [0002] Voice translation refers to inputting speech in one language and outputting text in another language. The traditional speech translation system adopts a cascade method, that is, the speech is first transcribed by an ASR (Automatic Speech Recognition, speech recognition) system, and then the transcribed result is input into the MT (Machine Translation, machine translation) system for translation. This kind of system can use more data to train the ASR and MT systems separately, and obtain an ST (Speech Translation, speech translation) system with higher translation quality, so the cascaded ST system has been widely used for many years. [0003] The end-to-end system skips the intermediate t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F40/58G06F40/30G10L15/26G06K9/62G06N3/04G06N3/08

CPCG06F40/58G06F40/30G10L15/26G06N3/08G06N3/047G06N3/045G06F18/2415G06F18/214

Inventor 屈丹张昊杨绪魁闫红刚张文林郝朝龙魏雪娟李真

Owner PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU

Method and system for training better speech translation models in generative confrontation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology