Training method and system for obtaining better speech translation model in generative adversarial
A technology of speech translation and training method, applied in the field of speech translation, can solve the problems of lack of internal supervision signal in the amount of data and the inability to train the coding layer of ST effectively.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0050] Such as figure 1 As shown, the embodiment of the present invention provides a training method for obtaining a better speech translation model in generative confrontation, including the following steps:
[0051] Step 1: Collect training data and use the transcription-translation data pairs in the training data to train the MT model;
[0052] Step 2: Use the contraction mechanism to compress the input length of the ST model, so that the output length of the coding layer of speech and text is approximately the same, including: first use the CTC loss to help the ST model predict the transcription of the speech, and capture the acoustic information of the speech; then use the CTC The existing peak phenomenon removes redundant information in the ST model coding layer state;
[0053] Step 3: Use the adversarial device to make the output distribution of the coding layer of the ST model close to the output distribution of the coding layer of the MT model through the "maximum an...
Embodiment 2
[0059] On the basis of the above examples, combined with figure 2 The shown training framework structure is a more specific description of a training method for obtaining a better speech translation model in generative confrontation provided by an embodiment of the present invention.
[0060] The training framework of the present invention is a general network-independent structure, that is, a convolutional network, a recurrent neural network, and a transformer structure are all acceptable. In the embodiment of the present invention, the Transformer structure is used as the main structure, such as figure 2 shown.
[0061] The training framework of the present invention mainly includes five parts: (1) Acoustic encoding layer, which encodes the acoustic features into the encoding layer state corresponding to the source text. (2) The CTC module, which is used to predict the transcription of speech, helps the acoustic coding layer to capture acoustic information. (3) The cont...
Embodiment 3
[0099] Correspondingly, the embodiment of the present invention also provides a training system for obtaining a better speech translation model in generative confrontation, including:
[0100] A data collection module for collecting training data;
[0101] The model training module is used to use the transcription-translation data in the training data to train the MT model; use the shrinkage mechanism to compress the input length of the ST model, so that the output length of the coding layer of speech and text is approximately the same, including: first adopt CTC loss Help the ST model to predict the transcription of speech and capture the acoustic information of the speech; then use the peak phenomenon of CTC to remove the redundant information in the ST model coding layer state; use the antagonist to make the ST model coding layer The output distribution is close to the output distribution of the encoding layer of the MT model, which helps the ST model capture more semantic ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


