An improved word2vec method for training with correlative factors of part-of-speech and word order

A word order factor and joint word technology, which is applied in the field of word2vec improvement for joint part-of-speech and word-order correlation factor training, which can solve problems such as the lack of word order information retention, the insensitivity of word order information, and the inability to use part-of-speech correlation information.

Active Publication Date: 2020-10-23
BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The shortcomings of the Word2vec model and the GloVe model are: first, it is not sensitive to word order information; second, it cannot use part-of-speech association information
This modeling preserves part-of-speech information very well, but does not preserve word order information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An improved word2vec method for training with correlative factors of part-of-speech and word order
  • An improved word2vec method for training with correlative factors of part-of-speech and word order
  • An improved word2vec method for training with correlative factors of part-of-speech and word order

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0085] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0086] The present invention provides an improved word2vec method for training related factors of part-of-speech and word order, and proposes a Structured word2vec on POS model; the method uses part-of-speech tagging information and word order as influencing factors to jointly optimize the model, and utilizes part-of-speech association information to analyze words in the context window. Modeling the inherent syntactic relationship between them; the weighted calculation of the context word sequence is carried out through the part-of-speech association weight, and then the vector inner product is calculated according to the order of the word position, and the stochastic gradient descent (SGD) algorithm is used to jointly learn the relevant weight and word embedding. figure 1 Shown is the process flow of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses an improved word2vec method for training correlative factors of part of speech and word order, and proposes a Structured word2vec on POS model, including a CWindow‑POS (CWP) model and a Structured Skip gram‑POS (SSGP) model, both of which will Part-of-speech tagging information and word order are jointly optimized as influencing factors, and the inherent syntactic relationship between words in the context window is modeled by using part-of-speech association information; the context word sequence is weighted by the part-of-speech association weight, and then the vector is calculated according to the word position order Inner product calculation, using stochastic gradient descent (SGD) algorithm to jointly learn relevant weights and word embedding. The invention orients and embeds words according to their position order, and realizes joint optimization of word vectors and part-of-speech correlation weighted matrices; it has high efficiency in word analogy tasks, word similarity tasks and qualitative analysis.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and relates to a word2vec method, in particular to an improved word2vec method that combines part-of-speech and word-order correlation factor training. Sequence oriented embedding, and use part-of-speech association information to establish the inherent syntactic relationship between words in the context window; realize joint optimization of word vector and part-of-speech correlation weighting matrix. Background technique [0002] Part of speech is the basic element of natural language processing. Word order contains the conveyed semantic and grammatical information, which are the key information in natural language. How to effectively combine the two in the word embedding model is the focus of current research. Semantic vector space models of language represent each word with a real-valued vector, and word vectors can be used as features in many applications, such as document classific...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & AuthorityPatents(China)
IPC IPC(8): G06F16/35G06F40/30G06K9/62
CPCG06F16/353G06F40/30G06F18/22
Inventor于重重曹帅潘博张青川
OwnerBEIJING TECHNOLOGY AND BUSINESS UNIVERSITY