Voice recognition and voice synthesis model training method based on dual learning

A speech recognition model and speech synthesis technology, applied in the fields of speech synthesis, speech recognition, speech recognition and speech synthesis, can solve the problems of high cost, time-consuming and laborious, and it is difficult to ensure data quality, so as to save cost and solve data problems. small number of effects
CN108133705AInactive Publication Date: 2018-06-08RUN TECH CO LTD

Patent Information

Authority / Receiving Office
CN Β· China
Current Assignee / Owner
RUN TECH CO LTD
Publication Date
2018-06-08
Estimated Expiration
Not applicable Β· inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a voice recognition and voice synthesis model training method based on dual learning. The method comprises the following steps that firstly voice recognition acts as the "main task" and voice synthesis acts as the "dual task"; the voice data A are converted into a text B' by using a voice recognition model to be trained; the confidence coefficient that the text obtained by conversion of the voice data A is written by humans rather than by machines is calculated by using a text language model obtained by pre-training; the text B' obtained by conversion of the voice data Ais converted back to the voice data A' by using a voice synthesis model to be trained; the "reconstruction similarity" between the voice data A' and the original voice data A is calculated by using avoice language model obtained by pre-training; and the final "reward" is calculated, and the parameters of the voice recognition model to be trained and the voice synthesis model to be trained are updated by using the REINFORCE algorithm of the reinforcement learning technology. A lot of cost overhead caused by data collection can be saved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the technical field of speech recognition and speech synthesis, in particular, it is a kind of speech recognition and speech synthesis established by using deep learning technology in an unsupervised manner by utilizing the nature of dual learning, using a large amount of unlabeled data and reinforcement learning technology. The method for training the speech synthesis model can be applied to the fields of speech recognition and speech synthesis. Background technique

[0002] Speech is the most basic and most effective way for people to communicate in daily life. With the maturity of artificial intelligence technology, people also hope to communicate and transmit information with computers through direct dialogue, so speech recognition and speech synthesis have also become a major topic in the field of natural language processing. The demand for various forms such as speech-to-text and text-to-speech synthesis is expan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More