Unified Chinese-English mixed text generation and speech recognition end-to-end framework

A mixed text and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as data mismatch

Active Publication Date: 2021-08-20
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF11 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In this way, although the speech recognition model training data can be obtained, the synthetic data does not match the real data. How to use the synthetic data to improve the performance of the recognition system is a challenging problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unified Chinese-English mixed text generation and speech recognition end-to-end framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0086] Such as figure 1 The end-to-end framework for unified Chinese-English mixed text generation and speech recognition provided by the embodiment of the present application includes:

[0087] Chinese-English mixed phoneme sequence generation module, speech feature extraction module, acoustic feature sequence convolution downsampling module, acoustic encoder, phoneme embedding module, phoneme encoder, discriminator and decoder; the phoneme encoder and the discriminator Constitute a generation confrontation network, the phoneme coder is used as the generator of the generation confrontation network, the discriminator is the discriminator of the generation confrontation network, and the acoustic encoder is used as the true data input of the generation confrontation network, Using this confrontational generative network to promote the distribution of the phoneme coded representation output by the phoneme encoder close to the acoustic coded representation output by the acoustic c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a universal unified Chinese-English mixed text generation and speech recognition end-to-end framework. The universal unified Chinese-English mixed text generation and speech recognition end-to-end framework comprises an acoustic encoder, a phoneme encoder, a discriminator and a decoder, the phoneme encoder and the discriminator form a generative adversarial network, the phoneme encoder serves as a generator of the generative adversarial network, the discriminator serves as a discriminator of the generative adversarial network, and the acoustic encoder serves as real data input of the generative adversarial network, the generative adversarial network is used for promoting the distribution of phoneme coding representations output by a phoneme encoder to be close to acoustic coding representations output by an acoustic encoder, and the decoder fuses the acoustic coding representations and the phoneme coding representations to obtain decoding representations, and inputs the decoding representation into a softmax function to obtain an output target with the maximum probability.

Description

technical field [0001] This application relates to the field of speech recognition, in particular to an end-to-end framework for unifying Chinese-English mixed text generation and speech recognition. Background technique [0002] The Chinese-English mixed phenomenon refers to the inclusion of both Chinese and English expressions in the speaking process, mainly including two types of inter-sentence conversion and intra-sentence conversion. Among them, the phenomenon of intra-sentence conversion has brought great challenges to speech recognition technology. The main problems are accent problems caused by non-standard pronunciation of speakers; more and more complex modeling units; collaborative pronunciation of different languages; difficulties in data collection; difficulties in data labeling, etc. With the development of deep learning technology, monolingual speech recognition technology has been greatly improved. Especially for the end-to-end speech recognition model, its...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/02G10L15/183G10L15/26
CPCG10L15/02G10L15/063G10L15/183G10L15/26G10L2015/025
Inventor 陶建华张帅易江燕
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products