Word2vec improvement method of related factor training combining parts of speech and word orders

A word order factor and joint word technology, which is applied in the field of word2vec improvement for joint part-of-speech and word-order correlation factor training, which can solve problems such as inability to use part-of-speech correlation information, word order information is not preserved, and word order information is insensitive.

Active Publication Date:
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The shortcomings of the Word2vec model and the GloVe model are: first, it is not sensitive to word order information; second, it cannot use part-of-speech association information
This modeling preserves part-of-speech information very well, but does not preserve word order information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word2vec improvement method of related factor training combining parts of speech and word orders
  • Word2vec improvement method of related factor training combining parts of speech and word orders
  • Word2vec improvement method of related factor training combining parts of speech and word orders

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0086] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0087] The present invention provides an improved word2vec method for training related factors of part-of-speech and word order, and proposes a Structured word2vec on POS model; the method uses part-of-speech tagging information and word order as influencing factors to jointly optimize the model, and utilizes part-of-speech association information to analyze words in the context window. Modeling the inherent syntactic relationship between them; the context word sequence is weighted by the part-of-speech association weight, and then the vector inner product is calculated according to the word position order, and the stochastic gradient descent (SGD) algorithm is used to jointly learn the correlation weight and word embedding. figure 1 Shown is the process flow of the method of the present invention.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word2vec improvement method of related factor training combining parts of speech and word orders. The method provides a model of Structured word2vec on POS. The model includes a CWindow-POS (CWP) model and a Structured Skip gram-POS (SSGP) model. According to each of the two models, part-of-speech label information and the word orders are used as influencing factors for combined optimization, and part-of-speech correlation information is utilized for modeling inherent syntactic relationships between words in context windows; and weighted calculation is carried out on context word sequences through part-of-speech correlation weights, then vector inner-product calculation is carried out according to the word position orders, and a stochastic gradient descent (SGD) algorithm is used for learning the related weights and word embedding in a combined manner. The method directionally embeds words according to the position orders thereof, realizes combined optimization for word vectors and weighted matrices related to the parts of speech, and has high efficiency in all of word analogy tasks, word similarity tasks and qualitative analysis.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and relates to a word2vec method, in particular to an improved word2vec method that combines part-of-speech and word-order correlation factor training. Sequence oriented embedding, and use part-of-speech association information to establish the inherent syntactic relationship between words in the context window; realize joint optimization of word vector and part-of-speech correlation weighting matrix. Background technique [0002] Part of speech is the basic element of natural language processing. Word order contains the conveyed semantic and grammatical information, which are the key information in natural language. How to effectively combine the two in the word embedding model is the focus of current research. Semantic vector space models of language represent each word with a real-valued vector, and word vectors can be used as features in many applications, such as document classific...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06K9/62
CPCG06F16/353G06F40/30G06F18/22
Inventor 于重重曹帅潘博张青川
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products