Speech recognition model training method, speech recognition method and related devices

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A speech recognition model and training method technology, applied in the computer field, can solve the problems of small amount of data, poor training of the model, poor generalization, etc.

Active Publication Date: 2020-12-25

BEIJING CENTURY TAL EDUCATION TECH CO LTD

View PDF5 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] However, in some scenarios, at least two types of languages will exist at the same time. In order to recognize the speech and audio of multiple types of languages at the same time, a speech recognition algorithm can be used to construct a pronunciation dictionary. For example, for Chinese English mixed audio, first establish the English pronunciation to Chinese pronunciation, and construct the pronunciation vocabulary according to the Chinese pronunciation phoneme, this method can solve some Chinese and English mixed recognition situations, the performance mainly depends on the English word mapping to the Chinese pronunciation vocabulary size, but this process not only requires manual labeling, but also many English words have similar pronunciations to Chinese words or cannot be mapped to Chinese pronunciations, so this method has poor generalization and it is difficult to obtain good speech recognition results; you can also use The method is based on the deep neural network model, but the training of the model requires a large amount of labeled data, and the amount of data mixed with multiple types of languages is small, the model cannot be well trained, and the effect of speech recognition is also poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0036] In the prior art, when speech recognition is performed on text, the accuracy is relatively low.

[0037] In order to improve the accuracy of text speech recognition, an embodiment of the present invention provides a speech recognition model training method, including:

[0038] Determine the training current mixed language audio and the training previous mixed language audio of the training mixed language audio set, obtain the training initial acoustic features of the current mixed language audio, and use the first language module of the speech recognition model to be trained to obtain the training initial acoustic Acoustic features of the first time-series position for feature training, using the second language module of the speech recognition model to be trained to obtain the second time-series position acoustic feature of the training initial acoustic feature, wherein the training current mixed language audio and The mixed language audio before the training includes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a speech recognition model training method, a speech recognition method and related devices. The training method comprises the steps of: determining a trainingcurrent mixed language audio, obtaining the training initial acoustic features, utilizing a first language module to obtain a training first time sequence position acoustic feature, utilizing a second language module to obtain a training second time sequence position acoustic feature, performing fusion and text coding on the training first time sequence position acoustic feature and the trainingsecond time sequence position acoustic feature to obtain a training current fusion text feature, obtaining a first training current prediction text feature according to the training current fusion text feature and a previous reference text feature, obtaining first loss according to the first training current prediction text feature and a current reference text feature, then obtaining model loss, and adjusting the parameters of a speech recognition model according to the model loss until the trained speech recognition model is obtained. According to the voice recognition model training method,the voice recognition method and the related devices provided by the embodiment of the invention, the voice recognition accuracy can be improved.

Description

technical field [0001] The embodiments of the present invention relate to the field of computers, and in particular, to a speech recognition model training method, a speech recognition method and related devices. Background technique [0002] With the development of computer technology and deep learning technology, speech recognition technology has become an important research direction and has been widely used. [0003] However, in some scenarios, at least two types of languages will exist at the same time. In order to recognize the speech and audio of multiple types of languages at the same time, a speech recognition algorithm can be used to construct a pronunciation dictionary. For example, for Chinese English mixed audio, first establish the English pronunciation to Chinese pronunciation, and construct the pronunciation vocabulary according to the Chinese pronunciation phoneme, this method can solve some Chinese and English mixed recognition situations, the performan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/00G10L15/02G10L15/06G10L19/04

CPCG10L15/005G10L15/02G10L15/063G10L19/04

Inventor李成飞王桑杨嵩

OwnerBEIJING CENTURY TAL EDUCATION TECH CO LTD

Speech recognition model training method, speech recognition method and related devices

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology