Method and system for training better speech translation models in generative confrontation
A technology of speech translation and training method, applied in the field of speech translation, can solve the problems of lack of internal supervision signal in the amount of data and the inability to train the coding layer of ST effectively.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0050] like figure 1 As shown, an embodiment of the present invention provides a training method for obtaining a better speech translation model in generative confrontation, including the following steps:
[0051] Step 1: Collect training data, and use the transcription-translation data pair in the training data to train the MT model;
[0052] Step 2: Use the shrinking mechanism to compress the input length of the ST model, so that the output lengths of the coding layers of speech and text are approximately the same, including: firstly using the CTC loss to help the ST model predict the transcription of the speech, and capture the acoustic information of the speech; then use the CTC loss. The existing peak phenomenon removes redundant information in the coding layer state of the ST model;
[0053] Step 3: Use the adversary to make the output distribution of the coding layer of the ST model close to the output distribution of the coding layer of the MT model through the "maxim...
Embodiment 2
[0059] On the basis of the above-mentioned embodiment, combined with figure 2 The shown training framework structure further elaborates on a training method for obtaining a better speech translation model in generative confrontation provided by the embodiment of the present invention.
[0060] The training framework of the present invention is a general network-independent structure, that is, a convolutional network, a cyclic neural network, or a transformer structure can be used. In the embodiment of the present invention, the Transformer structure is used as the main structure, such as figure 2 shown.
[0061] The training framework of the present invention mainly includes five parts: (1) an acoustic encoding layer, which encodes the acoustic features into an encoding layer state corresponding to the source text. (2) The CTC module is used to predict the transcription of speech and help the acoustic coding layer to capture acoustic information. (3) Shrinkage mechanism, ...
Embodiment 3
[0099] Correspondingly, the embodiment of the present invention also provides a training system for obtaining a better speech translation model in generative confrontation, including:
[0100] A data collection module for collecting training data;
[0101] The model training module is used to use the transcription-translation data in the training data to train the MT model; use the shrinkage mechanism to compress the input length of the ST model, so that the output length of the coding layer of speech and text is approximately the same, including: first adopt CTC loss Help the ST model to predict the transcription of speech and capture the acoustic information of the speech; then use the peak phenomenon of CTC to remove the redundant information in the ST model coding layer state; use the antagonist to make the ST model coding layer The output distribution is close to the output distribution of the encoding layer of the MT model, which helps the ST model capture more semantic ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


