Recurrent neural network language model training method and device, equipment and medium

A language model and neural network technology, applied in the field of artificial intelligence, can solve problems such as hindering applications

Active Publication Date: 2018-12-07
MOBVOI INFORMATION TECH CO LTD
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in order to pursue better language expression ability, large RNNLM models are often required, and it is precisely becau...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Recurrent neural network language model training method and device, equipment and medium
  • Recurrent neural network language model training method and device, equipment and medium
  • Recurrent neural network language model training method and device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] figure 1 It is a flow chart of a recursive neural network language model training method provided in Embodiment 1 of the present invention. This embodiment is applicable to the training situation of a recurrent neural network language model used for language text recognition. The method can be composed of recursive Neural network language model training device to perform, specifically includes the following steps:

[0027] S110. Input the language text in the corpus into the trained high-rank recurrent neural network language model RNNLM and the lightweight RNNLM to be trained respectively.

[0028] In this embodiment, the corpus includes Penn Treebank (PTB) corpus and / or Wall Street Journal (WSJ) corpus. Among them, the PTB corpus contains a total of 24 parts, the vocabulary size is limited to 10000, and the label Indicates out-of-set words. Select part or all of the predictions in the PTB corpus as the training set, and input the language text in the training set i...

Embodiment 2

[0042] In the process of model training, it is found that the training process of the student model still has the following two defects: First, in the language model, each training data label vector represents a degenerated data distribution, which gives the corresponding language text Likelihood on a category. Compared to the possibility distribution obtained by the teacher model in all training data, that is, the probability that the corresponding language text falls on all labels, the degenerate data distribution has more noise and localization. Second, different from the previous experimental results of knowledge distillation in acoustic modeling and image recognition, in this embodiment, it is found in the experiment of language text recognition that when the cross-entropy loss and KL divergence have fixed weights, by minimizing The weighted sum of the cross-entropy loss and the KL divergence yields a student model that is inferior to that obtained by minimizing the KL ...

Embodiment 3

[0098] image 3 It is a schematic structural diagram of a recurrent neural network language model training device provided in Embodiment 3 of the present invention. Such as image 3 As shown, an input module 31 and a minimization module 32 are included.

[0099] The input module 31 is used to input the language text in the corpus into the high-rank recursive neural network language model RNNLM and the lightweight RNNLM to be trained respectively;

[0100] Minimize module 32, be used for iterating the parameter in lightweight RNNLM, minimize the weighted sum of cross entropy loss and Kullback-Leibler divergence, to complete the training to lightweight RNNLM;

[0101] Among them, the cross-entropy loss is the cross-entropy loss of the output vector of the lightweight RNNLM relative to the training data label vector of the language text, and the Kullback-Leibler divergence is the Kullback of the output vector of the lightweight RNNLM relative to the output vector of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a recurrent neural network language model (RNNLM) training method and device, equipment and a medium. The method includes: inputting language texts in the corpus into a trained high-rank NNLM and a to-be -rained lightweight RNNLM respectively; and carrying out iteration of parameters of the lightweight RNNLM and minimizing a weighted sum of a cross-entropyloss and Kullback-Leibler divergence to complete training of the lightweight RNNLM. The cross-entropy loss is one of the output vector of the lightweight RNNLM relative to a training data label vector; and the Kullback-Leibler divergence is one of the output vector of the lightweight RNNLM relative to the output vector of the high-rank RNNLM. Therefore, the RNNLM scale is reduced effectively.

Description

technical field [0001] The embodiments of the present invention relate to the field of artificial intelligence, and in particular, to a recursive neural network language model training method, device, equipment and medium. Background technique [0002] Recurrent Neural Network (RNN) has large storage capacity and strong computing power, which makes it have great advantages over traditional language modeling methods, and is now widely used in language modeling. [0003] The Recurrent Neural Network Model (RNNLM) is a model proposed by Mikolov in 2010. By using the Recurrent Neural Network (RNN) to train the language model, a better expression effect can be obtained. RNNLM expresses each word in a continuous, low-dimensional space, and has the ability to represent historical information of various lengths through a recursive vector. [0004] However, in order to pursue better language expressiveness, large RNNLM models are often required, and it is precisely because of the la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04
CPCG06N3/045
Inventor 施阳阳黄美玉雷欣
Owner MOBVOI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products