Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice recognition and voice synthesis model training method based on dual learning

A speech recognition model and speech synthesis technology, applied in the fields of speech synthesis, speech recognition, speech recognition and speech synthesis, can solve the problems of high cost, time-consuming and laborious, and it is difficult to ensure data quality, so as to save cost and solve data problems. small number of effects

Inactive Publication Date: 2018-06-08
RUN TECH CO LTD
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Traditional speech recognition and speech synthesis model training methods require a large amount of one-to-one correspondence between speech data and text data, but collecting a sufficient amount of such one-to-one correspondence data is not only a time-consuming and laborious task, but also difficult to guarantee The quality of the collected data, in addition, the collection of data will also bring a lot of cost overhead
Insufficient amount of high-quality data has become a major obstacle to improving the accuracy and conversion efficiency of speech recognition and speech synthesis models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice recognition and voice synthesis model training method based on dual learning
  • Voice recognition and voice synthesis model training method based on dual learning
  • Voice recognition and voice synthesis model training method based on dual learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with specific drawings and embodiments.

[0020] The general idea of ​​the present invention is: firstly, use less labeled data to pre-train the speech recognition model and the speech synthesis model; The speech recognition model and the speech synthesis model are further trained in a supervised way.

[0021] First, define the input of the algorithm, including: speech data set D for training speech recognition and speech synthesis models A , a text dataset D B ; Speech recognition model Θ to be trained AB ; The speech synthesis model Θ to be trained BA ; Pre-trained speech language model LM used to calculate the confidence that the speech data is generated by humans rather than machine-generated A ; Pre-trained text language model LM used to calculate the confidence that the text data is written by humans rather than generated by machines B ; When updating parameters, the hyperparameter α used to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a voice recognition and voice synthesis model training method based on dual learning. The method comprises the following steps that firstly voice recognition acts as the "main task" and voice synthesis acts as the "dual task"; the voice data A are converted into a text B' by using a voice recognition model to be trained; the confidence coefficient that the text obtained by conversion of the voice data A is written by humans rather than by machines is calculated by using a text language model obtained by pre-training; the text B' obtained by conversion of the voice data Ais converted back to the voice data A' by using a voice synthesis model to be trained; the "reconstruction similarity" between the voice data A' and the original voice data A is calculated by using avoice language model obtained by pre-training; and the final "reward" is calculated, and the parameters of the voice recognition model to be trained and the voice synthesis model to be trained are updated by using the REINFORCE algorithm of the reinforcement learning technology. A lot of cost overhead caused by data collection can be saved.

Description

technical field [0001] The present invention relates to the technical field of speech recognition and speech synthesis, in particular, it is a kind of speech recognition and speech synthesis established by using deep learning technology in an unsupervised manner by utilizing the nature of dual learning, using a large amount of unlabeled data and reinforcement learning technology. The method for training the speech synthesis model can be applied to the fields of speech recognition and speech synthesis. Background technique [0002] Speech is the most basic and most effective way for people to communicate in daily life. With the maturity of artificial intelligence technology, people also hope to communicate and transmit information with computers through direct dialogue, so speech recognition and speech synthesis have also become a major topic in the field of natural language processing. The demand for various forms such as speech-to-text and text-to-speech synthesis is expan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/06G10L13/08G10L25/27
CPCG10L15/063G10L13/08G10L25/27
Inventor 杨华兴刘云浩
Owner RUN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products