Word vector generation method and related equipment

A technology of word vectors and vectors, applied in the field of natural language processing, can solve problems such as not considering the importance of different Chinese characters, and achieve the effect of improving expression ability
CN111199153AActive Publication Date: 2020-05-26BEIJING GRIDSUM TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING GRIDSUM TECH CO LTD
Publication Date
2020-05-26

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The embodiment of the invention provides a word vector generation method and related equipment. According to the method, the importance degree of the word vectors in the words can be added in the process of converting the word vectors into the word vectors through an attention mechanism, the weighted average of the word vectors is combined with the original word vectors, the final word vectors areobtained, and the expression capacity of the word vectors can be effectively improved. The method comprises the steps of obtaining a target word, wherein the target word is a word of a word vector tobe generated; determining an initial word vector of the target word, a word vector of each word in the target word and a global variable through a training model; determining the weight of each character in the target word through the global variable and the character vector of each character; and determining a target word vector of the target word according to the word vector of each word, the weight of each word in the target word and the initial word vector of the target word.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of natural language processing, in particular to a method for generating word vectors and related equipment. Background technique

[0002] Text is a carrier of information and plays an important role in the development of our society. In order for computers to be able to deal with natural language problems, these discrete texts must first be mathematicalized. The easiest way is to use One-hotRepresentation to convert each word into a vector of |V| dimension, where |V| represents the size of the vocabulary. The position corresponding to the word sequence number is 1, and the other positions are 0. In 2003, Yoshua Bengio et al. first applied neural networks to language models, and proposed to use Distributed Representation instead of traditional One-hot Representation to represent word vectors, making word vectors Not only computable, but meaningful. In 2013, Mikolov et al. proposed the Continuous Bag of Wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More