Word vector training method and system

A training method and word vector technology, which is applied in the field of word vector training method and system, can solve the problems of unavailable and low accuracy of word vector training, and achieve the effect of improving training accuracy

Inactive Publication Date: 2016-09-07
SHENZHEN UNIV
View PDF4 Cites 55 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a word vector training method and system, aiming to solve the problem that th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector training method and system
  • Word vector training method and system
  • Word vector training method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] figure 1 The implementation process of the word vector training method provided by the first embodiment of the present invention is shown. For the convenience of explanation, only the parts related to the embodiment of the present invention are shown, and the details are as follows:

[0030] In step S101, a dictionary including the training target word is constructed in advance, the word vector of the training target word and the intermediate vectors corresponding to all non-leaf nodes in the Huffman tree are initialized, and the word vector of the training target word forms a word vector library.

[0031] In the embodiment of the present invention, a dictionary including training target words can be constructed in advance. Specifically, a text related to a certain type or subject can be segmented, stop words removed, and high and low frequency words removed, so as to construct a corresponding dictionary. Preferably, the ICTCLAS2015 word segmentation system of the Chine...

Embodiment 2

[0048] image 3 The structure of the word vector training system provided by Embodiment 2 of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown, including:

[0049] Vector initialization unit 31, for pre-constructing the dictionary that comprises training target word, the word vector of training target word and the corresponding intermediate vector of all non-leaf nodes in the Huffman tree are initialized, the word vector of training target word forms a word vector storehouse ;as well as

[0050] The word vector training unit 32 is configured to scan a preset training sample document, and perform a preset word vector training step on each scanned training target word to obtain a word vector of each training target word.

[0051] Preferably, as Figure 4 As shown, the word vector training unit 32 of the word vector training system provided by the embodiment of the present invention in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is suitable for the technical field of computers, and provides a word vector training method and system. The method includes: performing word vector training on each training target word in a training sample document, and meanwhile acquiring a window word of each training target word in a context window in the training sample document; predicting the occurrence probability of each window word by using a Skip-gram model; updating a word vector corresponding to the each window word in a word vector library and an intermediate vector corresponding to each non-leaf node in a code path corresponding to each training target word in a Huffman tree; updating whole document vectors of the training sample document through a preset formula; calculating increasing local input vectors of a CBOW model, and then calculating mixed stitching vectors of the CBOW model; setting the mixed stitching vectors as input of a projection layer of the CBOW model; predicting the occurrence probability of each training target word; and finally updating the word vector of each training target word and the intermediate vector corresponding to each non-leaf node in the Huffman tree. The method and the system improve the accuracy of the word vector of each training target word.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a word vector training method and system. Background technique [0002] In recent years, word vectors have become a very popular tool in the field of natural language processing. Traditional text processing methods generally use words as basic features, and represent words as binary-coded word vectors. Word vectors using this representation not only The problem of feature sparsity is easy to occur, and any two words are independent of each other, and the semantic and lexical associations implied between words cannot be correctly captured. In order to solve this problem, distributed word vectors came into being. Distributed word vectors represent words as a dense, low-dimensional real-valued vector, and each dimension represents a characteristic attribute of words. Simple cosine calculations between word vectors can be used to mine out various differences between wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/216G06F40/30
Inventor 傅向华李晶
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products