Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech recognition model training method, speech recognition method and related devices

A speech recognition model and training method technology, applied in the computer field, can solve the problems of small amount of data, poor training of the model, poor generalization, etc.

Active Publication Date: 2020-12-25
BEIJING CENTURY TAL EDUCATION TECH CO LTD
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, in some scenarios, at least two types of languages ​​will exist at the same time. In order to recognize the speech and audio of multiple types of languages ​​at the same time, a speech recognition algorithm can be used to construct a pronunciation dictionary. For example, for Chinese English mixed audio, first establish the English pronunciation to Chinese pronunciation, and construct the pronunciation vocabulary according to the Chinese pronunciation phoneme, this method can solve some Chinese and English mixed recognition situations, the performance mainly depends on the English word mapping to the Chinese pronunciation vocabulary size, but this process not only requires manual labeling, but also many English words have similar pronunciations to Chinese words or cannot be mapped to Chinese pronunciations, so this method has poor generalization and it is difficult to obtain good speech recognition results; you can also use The method is based on the deep neural network model, but the training of the model requires a large amount of labeled data, and the amount of data mixed with multiple types of languages ​​is small, the model cannot be well trained, and the effect of speech recognition is also poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech recognition model training method, speech recognition method and related devices
  • Speech recognition model training method, speech recognition method and related devices
  • Speech recognition model training method, speech recognition method and related devices

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In the prior art, when speech recognition is performed on text, the accuracy is relatively low.

[0037] In order to improve the accuracy of text speech recognition, an embodiment of the present invention provides a speech recognition model training method, including:

[0038] Determine the training current mixed language audio and the training previous mixed language audio of the training mixed language audio set, obtain the training initial acoustic features of the current mixed language audio, and use the first language module of the speech recognition model to be trained to obtain the training initial acoustic Acoustic features of the first time-series position for feature training, using the second language module of the speech recognition model to be trained to obtain the second time-series position acoustic feature of the training initial acoustic feature, wherein the training current mixed language audio and The mixed language audio before the training includes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a speech recognition model training method, a speech recognition method and related devices. The training method comprises the steps of: determining a trainingcurrent mixed language audio, obtaining the training initial acoustic features, utilizing a first language module to obtain a training first time sequence position acoustic feature, utilizing a second language module to obtain a training second time sequence position acoustic feature, performing fusion and text coding on the training first time sequence position acoustic feature and the trainingsecond time sequence position acoustic feature to obtain a training current fusion text feature, obtaining a first training current prediction text feature according to the training current fusion text feature and a previous reference text feature, obtaining first loss according to the first training current prediction text feature and a current reference text feature, then obtaining model loss, and adjusting the parameters of a speech recognition model according to the model loss until the trained speech recognition model is obtained. According to the voice recognition model training method,the voice recognition method and the related devices provided by the embodiment of the invention, the voice recognition accuracy can be improved.

Description

technical field [0001] The embodiments of the present invention relate to the field of computers, and in particular, to a speech recognition model training method, a speech recognition method and related devices. Background technique [0002] With the development of computer technology and deep learning technology, speech recognition technology has become an important research direction and has been widely used. [0003] However, in some scenarios, at least two types of languages ​​will exist at the same time. In order to recognize the speech and audio of multiple types of languages ​​at the same time, a speech recognition algorithm can be used to construct a pronunciation dictionary. For example, for Chinese English mixed audio, first establish the English pronunciation to Chinese pronunciation, and construct the pronunciation vocabulary according to the Chinese pronunciation phoneme, this method can solve some Chinese and English mixed recognition situations, the performan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/00G10L15/02G10L15/06G10L19/04
CPCG10L15/005G10L15/02G10L15/063G10L19/04
Inventor 李成飞王桑杨嵩
Owner BEIJING CENTURY TAL EDUCATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products