A word vector training method and system

A training method and word vector technology, which is applied in the field of word vector training methods and systems, can solve problems such as unavailable and low accuracy of word vector training, and achieve the effect of improving training accuracy
CN105930318BInactive

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention is suitable for the technical field of computers, and provides a word vector training method and system. The method includes: performing word vector training on each training target word in a training sample document, and meanwhile acquiring a window word of each training target word in a context window in the training sample document; predicting the occurrence probability of each window word by using a Skip-gram model; updating a word vector corresponding to the each window word in a word vector library and an intermediate vector corresponding to each non-leaf node in a code path corresponding to each training target word in a Huffman tree; updating whole document vectors of the training sample document through a preset formula; calculating increasing local input vectors of a CBOW model, and then calculating mixed stitching vectors of the CBOW model; setting the mixed stitching vectors as input of a projection layer of the CBOW model; predicting the occurrence probability of each training target word; and finally updating the word vector of each training target word and the intermediate vector corresponding to each non-leaf node in the Huffman tree. The method and the system improve the accuracy of the word vector of each training target word.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of computers, and in particular relates to a word vector training method and system. Background technique

[0002] In recent years, word vectors have become a very popular tool in the field of natural language processing. Traditional text processing methods generally use words as basic features, and represent words as binary-coded word vectors. Word vectors using this representation not only The problem of feature sparsity is easy to occur, and any two words are independent of each other, and the semantic and lexical associations implied between words cannot be correctly captured. In order to solve this problem, distributed word vectors came into being. Distributed word vectors represent words as a dense, low-dimensional real-valued vector, and each dimension represents a characteristic attribute of words. Simple cosine calculations between word vectors can be used to mine out various differences between wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More