Unified Chinese-English mixed text generation and speech recognition end-to-end framework

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A mixed text and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as data mismatch

Active Publication Date: 2021-08-20

INST OF AUTOMATION CHINESE ACAD OF SCI

View PDF11 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In this way, although the speech recognition model training data can be obtained, the synthetic data does not match the real data. How to use the synthetic data to improve the performance of the recognition system is a challenging problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0086] Such as figure 1 The end-to-end framework for unified Chinese-English mixed text generation and speech recognition provided by the embodiment of the present application includes:

[0087] Chinese-English mixed phoneme sequence generation module, speech feature extraction module, acoustic feature sequence convolution downsampling module, acoustic encoder, phoneme embedding module, phoneme encoder, discriminator and decoder; the phoneme encoder and the discriminator Constitute a generation confrontation network, the phoneme coder is used as the generator of the generation confrontation network, the discriminator is the discriminator of the generation confrontation network, and the acoustic encoder is used as the true data input of the generation confrontation network, Using this confrontational generative network to promote the distribution of the phoneme coded representation output by the phoneme encoder close to the acoustic coded representation output by the acoustic c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a universal unified Chinese-English mixed text generation and speech recognition end-to-end framework. The universal unified Chinese-English mixed text generation and speech recognition end-to-end framework comprises an acoustic encoder, a phoneme encoder, a discriminator and a decoder, the phoneme encoder and the discriminator form a generative adversarial network, the phoneme encoder serves as a generator of the generative adversarial network, the discriminator serves as a discriminator of the generative adversarial network, and the acoustic encoder serves as real data input of the generative adversarial network, the generative adversarial network is used for promoting the distribution of phoneme coding representations output by a phoneme encoder to be close to acoustic coding representations output by an acoustic encoder, and the decoder fuses the acoustic coding representations and the phoneme coding representations to obtain decoding representations, and inputs the decoding representation into a softmax function to obtain an output target with the maximum probability.

Description

technical field [0001] This application relates to the field of speech recognition, in particular to an end-to-end framework for unifying Chinese-English mixed text generation and speech recognition. Background technique [0002] The Chinese-English mixed phenomenon refers to the inclusion of both Chinese and English expressions in the speaking process, mainly including two types of inter-sentence conversion and intra-sentence conversion. Among them, the phenomenon of intra-sentence conversion has brought great challenges to speech recognition technology. The main problems are accent problems caused by non-standard pronunciation of speakers; more and more complex modeling units; collaborative pronunciation of different languages; difficulties in data collection; difficulties in data labeling, etc. With the development of deep learning technology, monolingual speech recognition technology has been greatly improved. Especially for the end-to-end speech recognition model, its...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L15/06G10L15/02G10L15/183G10L15/26

CPCG10L15/02G10L15/063G10L15/183G10L15/26G10L2015/025

Inventor陶建华张帅易江燕

OwnerINST OF AUTOMATION CHINESE ACAD OF SCI

Unified Chinese-English mixed text generation and speech recognition end-to-end framework

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology