Supercharge Your Innovation With Domain-Expert AI Agents!

Optimization method and system for single-channel speech recognition model

A speech recognition model, a single-channel technology, applied in speech recognition, speech analysis, neural learning methods, etc., can solve the problems of complex models, poor performance, cumbersome training process, etc., and achieve model simplification, good performance, and improved model performance Effect

Active Publication Date: 2021-06-22
AISPEECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to at least solve the traditional model in the prior art is more complex, the training process is cumbersome, the training effect is not good, and the performance is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimization method and system for single-channel speech recognition model
  • Optimization method and system for single-channel speech recognition model
  • Optimization method and system for single-channel speech recognition model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0062] As an implementation manner, the joint error determined according to the knowledge distillation loss and the direct loss includes:

[0063] The knowledge distillation loss and the direct loss are weighted and summed according to a preset training mode to determine a joint error.

[0064] In order to meet different recognition requirements, different training modes can be set according to different usage environments during the training process. Then, through different weighting ratios, speech recognition models that meet different needs are trained.

[0065] It can be seen from this embodiment that by setting different training modes, during the training process, the joint error of the knowledge distillation loss and the direct loss is determined according to different weight ratios, so as to meet different requirements of the recognition environment. Thus, the recognition effect of the speech recognition model is improved.

[0066] For further specific implementation...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the present invention provides an optimization method for a single-channel speech recognition model. The method includes: receiving each single-person voice with a real label vector, mixing the voices of multiple people, inputting the voice features extracted from each single-person voice to a target teacher model, and obtaining a target soft label vector corresponding to each single-person voice; The multi-person mixed speech is input to the end-to-end student model, and the output arrangement is determined; according to the output label vector of each person in the multi-person mixed speech with the determined output arrangement, the knowledge distillation loss and direct loss are determined; when determined according to the knowledge distillation loss and direct loss The end-to-end student model is optimized according to the joint error when the joint error of . The embodiment of the present invention also provides an optimization system for a single-channel speech recognition model. In the embodiment of the present invention, it is easier to learn good parameters, and at the same time, the model is relatively simplified, and the better parameters enable the trained student model to have better performance.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to an optimization method and system for a single-channel speech recognition model. Background technique [0002] With the development of intelligent voice, more and more devices have the function of speech recognition. However, due to the consideration of the usage scenarios of different devices, some devices are only equipped with a single microphone, and some devices are equipped with multiple microphones. Microphones are so-called single-channel and multi-channel. Since there is only a single microphone, this type of device has poor recognition performance when dealing with speech conversations such as banquets where multiple people speak at the same time and are mixed together. For this purpose, the knowledge distillation method of single-channel multi-speaker speech recognition based on bidirectional long-short-term memory network-cyclic neural network, or an end-to-end sing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/06G10L15/16G06N3/08G06N3/02
CPCG06N3/02G06N3/08G10L15/063G10L15/16
Inventor 钱彦旻张王优常煊恺
Owner AISPEECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More