Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Decoding method and device supporting domain customized language model

A language model and decoding method technology, applied in the computer field, can solve the problems of weakened speech recognition decoding effect, poor robustness, low support effect, etc., to achieve the effect of improving the effect and scalability, improving the experience of use, and improving the scope of application

Pending Publication Date: 2021-06-22
苏州协同创新智能制造科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But its main problem is that it is difficult to carry out very convenient domain customization and domain expansion. The decoding is only limited to the voice domain already included in the process of neural network training language model.
For speech fields not included in training, the decoding effect of speech recognition will be relatively weakened, that is, the robustness of speech recognition in new or customized fields is poor
For example, when a language model trained with life scene corpus is used in a professional engineering scene, the language recognition rate will be poor
If the existing decoding method wants to improve the performance of the language model in a new field, it is necessary to collect speech data in the new field and restart the training of the end-to-end neural network language model, which makes it difficult for some language recognition content that requires rapid customization The support effect is very low, and the current decoding method cannot meet the actual application requirements in speech recognition
From this, it can be understood that there are limitations in the existing technology, which is not conducive to the further optimization and promotion of speech recognition technology.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Decoding method and device supporting domain customized language model
  • Decoding method and device supporting domain customized language model
  • Decoding method and device supporting domain customized language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] In Embodiment 1 of the present invention, an example of the end-to-end speech recognition encoder encoder and decoder decoder is as follows figure 1 As shown, the RNNT architecture based on convolution and Transformer is used. In this embodiment, the decoding method that supports domain-customized language models is based on beam search decoding, such as figure 2 As shown, it includes: Step S1. Generate the first decoding network as the first-pass decoding search network to obtain the first score set; generate the second decoding network through the language model of the customized domain, and obtain the second score as the language model score query network Set; Step S2. Update the bundle and re-score the language model in the customized domain, including iteratively performing the following operations: take the sub-matrix of the encoder expansion matrix corresponding to the current speech frame and the decoder output matrix, and input the spliced ​​matrix into The o...

Embodiment 2

[0049] The difference between the second embodiment and the first embodiment is that before the step S2, the speech signal is initialized to obtain the encoder expansion matrix and the decoder output matrix. Such as Figure 4 As shown, in the present embodiment, at first, to the speech signal of input device input, extract the feature of this speech sequence by aforementioned extraction module, the frame number of speech sequence is num_frames, the dimension of each speech frame is dim_feat, with It gets the phonetic feature matrix for the elements. Input this speech matrix into the encoder of the end-to-end speech recognition system, obtain a transformed matrix [num_frames, enc_dim], and expand it into an encoder expansion matrix of [B, num_frames, enc_dim]. Initialize the decoded set of B words, where B represents the size of the entire beam. each character is , representing the sentence initial symbol, and get each The size of is represented by a vector, input to the de...

Embodiment 3

[0064] Embodiment 3 is based on Embodiment 2, in the process of cluster expansion, such as Figure 5 As shown, first judge whether the corresponding idx_i is a null character, and expand the bundle according to the judgment result. If idx_i corresponds to a null character, then add t_idx_i = t_idx_i+1, that is, the corresponding speech frame to the next position, and directly use the Cartesian product pair set corresponding to the current cluster as the updated Cartesian product pair set, to The second score of the current bundle is directly used as the updated second score; and then the second process is directly carried out. If the corresponding idx_i is not a null character, the first process is performed first.

[0065] The following describes the first process in detail, including the first step: find all Cartesian product pairs {LState_1, LState_2, ..., LState_m} from the Cartesian_{b_i} corresponding state of the current beam b_i and A collection of corresponding...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a decoding method supporting a domain customized language model, which is based on cluster search decoding and comprises the following steps: generating a first decoding network as a first decoding search network; generating a second decoding network through a language model of the customization domain; expanding the cluster while re-scoring the language model in the customization field; and when enough decoding hypotheses exist in the final decoding hypothesis set, outputting all the decoding hypotheses. According to the method, the decoding score of the language model in the customized field is combined with the cluster expansion process, so that the end-to-end language model can be quickly applied to a new field, the performance of a language recognition system is improved, and the robustness and the recognition efficiency are optimized. The decoding device provided by the invention can realize the decoding method provided by the invention, so that the decoding device has corresponding advantages, only needs limited resource configuration, and is beneficial to further popularization and application of the speech recognition system.

Description

technical field [0001] The invention belongs to the field of computer technology, and in particular relates to a decoding method supporting domain-customized language models and a corresponding decoding device in end-to-end speech recognition technology. Background technique [0002] The above information disclosed in this Background section below is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not form the prior art that is already known in the art to a person of ordinary skill in the art. The content in the background section is only the technology known to the public, and does not necessarily represent the prior art in the field. [0003] Existing end-to-end speech recognition systems mainly include three types of systems based on Connectionist TemporalClassification (CTC), Sequence-to-sequence (Seq2Seq) or RNN Transducer (RNNT, cyclic neural network transformer). The RNNT-based end-to-end sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/16G10L15/30
CPCG10L15/16G10L15/30Y02D10/00
Inventor 谢东平
Owner 苏州协同创新智能制造科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products