Voice recognition and voice synthesis model training method based on dual learning

A speech recognition model and speech synthesis technology, applied in the fields of speech synthesis, speech recognition, speech recognition and speech synthesis, can solve the problems of high cost, time-consuming and laborious, and it is difficult to ensure data quality, so as to save cost and solve data problems. small number of effects

Inactive Publication Date: 2018-06-08
RUN TECH CO LTD
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Traditional speech recognition and speech synthesis model training methods require a large amount of one-to-one correspondence between speech data and text data, but collecting a sufficient amount of such one-to-one correspondence data is not only a time-consuming and laborious task, but also d

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice recognition and voice synthesis model training method based on dual learning
  • Voice recognition and voice synthesis model training method based on dual learning
  • Voice recognition and voice synthesis model training method based on dual learning

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0019] The present invention will be further described below in conjunction with specific drawings and embodiments.

[0020] The general idea of ​​the present invention is: first, use less labeled data to pre-train the speech recognition model and the speech synthesis model; then, through the dual learning method, use a large amount of unlabeled data and reinforcement learning technology to avoid In a supervised way, the speech recognition model and speech synthesis model are further trained.

[0021] First, define the input of the algorithm, including: speech data set D used to train speech recognition and speech synthesis models A , Text data set D B ; The voice recognition model to be trained Θ AB ; The speech synthesis model to be trained Θ BA ; Pre-trained speech language model LM used to calculate the confidence that speech data is generated by humans instead of machines A ; Pre-trained text language model LM used to calculate the confidence that the text data is written by...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a voice recognition and voice synthesis model training method based on dual learning. The method comprises the following steps that firstly voice recognition acts as the "main task" and voice synthesis acts as the "dual task"; the voice data A are converted into a text B' by using a voice recognition model to be trained; the confidence coefficient that the text obtained by conversion of the voice data A is written by humans rather than by machines is calculated by using a text language model obtained by pre-training; the text B' obtained by conversion of the voice data Ais converted back to the voice data A' by using a voice synthesis model to be trained; the "reconstruction similarity" between the voice data A' and the original voice data A is calculated by using avoice language model obtained by pre-training; and the final "reward" is calculated, and the parameters of the voice recognition model to be trained and the voice synthesis model to be trained are updated by using the REINFORCE algorithm of the reinforcement learning technology. A lot of cost overhead caused by data collection can be saved.

Description

technical field [0001] The present invention relates to the technical field of speech recognition and speech synthesis, in particular, it is a kind of speech recognition and speech synthesis established by using deep learning technology in an unsupervised manner by utilizing the nature of dual learning, using a large amount of unlabeled data and reinforcement learning technology. The method for training the speech synthesis model can be applied to the fields of speech recognition and speech synthesis. Background technique [0002] Speech is the most basic and most effective way for people to communicate in daily life. With the maturity of artificial intelligence technology, people also hope to communicate and transmit information with computers through direct dialogue, so speech recognition and speech synthesis have also become a major topic in the field of natural language processing. The demand for various forms such as speech-to-text and text-to-speech synthesis is expan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/06G10L13/08G10L25/27
CPCG10L15/063G10L13/08G10L25/27
Inventor 杨华兴刘云浩
Owner RUN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products