Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

End-to-end model training method and device, computer equipment and storage medium

A model training and model technology, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as dependence on training data, robustness, and low flexibility

Pending Publication Date: 2022-08-09
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the embodiment of the present application is to propose an end-to-end model training method, device, computer equipment and storage medium applied to speech recognition, so as to solve the problem that the traditional CTC-Attention-based end-to-end model relies heavily on training data, resulting in robust Problems with low sex and flexibility

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end model training method and device, computer equipment and storage medium
  • End-to-end model training method and device, computer equipment and storage medium
  • End-to-end model training method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] continue to refer to figure 2 , which shows an implementation flow chart of the end-to-end model training method applied to speech recognition provided by Embodiment 1 of the present application. For convenience of description, only the parts related to the present application are shown.

[0045] The above-mentioned end-to-end model training method applied to speech recognition includes: step S201 , step S202 , step S203 , step S204 and step S205 .

[0046] In step S201, model training data is acquired, wherein the model training data includes an audio training set, and the audio training set includes training audio data and audio annotation text.

[0047] In this embodiment of the present application, the end-to-end model applied to speech recognition may use the standard CTC-Attention model, such as image 3 shown.

[0048] In the embodiment of the present application, the training audio data refers to the audio used for training the acoustic model in the end-to-en...

Embodiment 2

[0112] further reference Figure 9 , as a response to the above figure 2 The implementation of the shown method, the present application provides an embodiment of an end-to-end model training device applied to speech recognition, the device embodiment is the same as figure 2 Corresponding to the method embodiments shown, the apparatus can be specifically applied to various electronic devices.

[0113] like Figure 9 As shown, the end-to-end model training apparatus 200 applied to speech recognition in this embodiment includes: a training data acquisition module 210 , an audio recognition module 220 , a text fusion module 230 , a language translation module 240 and a joint training module 250 . in:

[0114] A training data acquisition module 210, configured to acquire model training data, wherein the model training data includes an audio training set, and the audio training set includes training audio data and audio annotation text;

[0115] The audio recognition module 2...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention belongs to the technical field of voice recognition in artificial intelligence, and relates to an end-to-end model training method and device applied to voice recognition, computer equipment and a storage medium. According to the invention, the output of the acoustic model is used as the extended text of the audio training data, and the extended text and the audio annotation text are used as the language model input together to train the speech recognition model, so that the defect that the annotation text content in a traditional speech training set is too limited is effectively overcome; the language model of the speech recognition model learns richer and more comprehensive information, so that the recognition accuracy of the speech recognition model is effectively improved, meanwhile, the coupling degree of acoustic information and language information in the end-to-end model is reduced to a certain extent, and the recognition accuracy of the whole model in different scenes is improved. Particularly, the robustness of recognizing voices in different fields is improved, the problem that the accuracy is greatly reduced when an application scene is replaced is avoided, and the flexibility of the model in actual use and deployment is also improved.

Description

technical field [0001] The present application relates to the technical field of speech processing in artificial intelligence, and in particular, to an end-to-end model training method, apparatus, computer equipment and storage medium applied to speech recognition. Background technique [0002] With the rapid development of the field of artificial intelligence, speech recognition has become an increasingly important application technology in the field of artificial intelligence. Speech recognition technology has developed from the earlier hmm-gmm model and acoustic model to the currently commonly used end-to-end model. [0003] At present, the latest end-to-end model is an end-to-end model based on CTC-Attention. The end-to-end model based on CTC-Attention adopts the method of multi-task learning to jointly optimize speech and text, which combines the advantages of both. , compared with the traditional model, a great improvement has been achieved. [0004] However, the app...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/06G10L15/26
CPCG10L15/063G10L15/26
Inventor 赵梦原王健宗张之勇
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products