Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Two-stage semantic word vector generation method

A word vector, two-stage technology, applied in the field of two-stage semantic word vector generation, can solve problems such as large space, quality degradation, and data sparseness, and achieve high quality and excellent results

Active Publication Date: 2020-04-17
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF8 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional 0-1 representation has two problems: on the one hand, the 0-1 representation causes data sparsity, which makes the word vectors generated in this way take up a lot of space; on the other hand, the 0-1 representation can only distinguish different words, but no contribution to the representation of word sense
Li Guojia used K-Means clustering to build a two-stage model in the word meaning recognition stage. The disadvantage of this method is similar to Neelakantan's method. It is necessary to set the number of center clusters for the K-Means algorithm in advance, which is equivalent to determining the number of word meanings generated in advance. number, the scalability is not good enough
[0007] Summarizing the existing methods, we can know: the disadvantage of 0-1 representation is that it causes dimensionality disaster and lack of semantic information; the disadvantage of word-level embedding is that 1) the word vectors generated by word training with multiple meanings are more biased in the corpus More semantics appear, and less semantics in the corpus are weakened; 2) In the result of calculating a high similarity with a polysemous word, the semantics are not related to each other; 3) The original triangular inequality of the word vector space is destroyed , which reduces its quality; in word sense level embedding, the fusion model can compress the generation process of word vectors, but its effect is based on the effect of the clustering algorithm it uses, and most of the current clustering algorithms are not as good as supervised models classification algorithm
The two-stage model ignores the similarity between the process of word meaning recognition and word vector generation. The two processes are completed in series, which is inefficient, but it has a high guarantee for the quality of the generated word vector.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Two-stage semantic word vector generation method
  • Two-stage semantic word vector generation method
  • Two-stage semantic word vector generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention.

[0027] The two-stage semantic word vector generation method proposed by the present invention is divided into three stages and consists of five steps. The first stage is text matrixing; the second stage includes two steps of feature extractor construction and semantic recognition; the third stage includes two steps of neural language model construction and semantic word vector generation.

[0028] Step 1: Text Matrixization

[0029] Select the clause s containing polysemy w from the obtained texti , forming a set D w ={s 1 ,s 2 ,s 3 ...} (that is, the set of clauses containing ambiguous words), this clause s i and polysemy w in the sense item category c of this ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a two-stage semantic word vector generation method. The two-stage semantic word vector generation method comprises the following five steps of: performing text matrixizing; constructing a feature extractor; performing semantic recognition; constructing a neural language model; and generating a semantic term vector. According to the method, the corresponding word vectors aregenerated for different semantics of the polysemy by using the plurality of neural networks, the defect that the polysemy only corresponds to one word vector in a traditional word-level embedded modeis overcome, and the size of the used corpus is within an acceptable range; meanwhile, a mode of combining a convolutional neural network (CNN) and a support vector machine (SVM) is adopted, on one hand, the feature extraction capability of the convolutional neural network is utilized, and on the other hand, the generalization and robustness of the SVM are utilized, so that the word meaning recognition effect is better, and the generated semantic word vector quality is higher.

Description

technical field [0001] The invention belongs to the field of neural networks, and in particular relates to a two-stage semantic word vector generation method. Background technique [0002] Word representation is one of the key issues in natural language processing. Whether the word representation method is appropriate directly affects the modeling methods of tasks such as syntactic analysis, semantic representation and text understanding, and also affects the accuracy and robustness of application systems such as information retrieval and question answering systems. [0003] At present, the representation strategies of Chinese words can be summarized into three types: traditional 0-1 representation, distributed representation based on latent semantic information and distributed representation based on neural network language model. The traditional 0-1 representation has two problems: on the one hand, the 0-1 representation causes data sparsity, which makes the word vectors ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F40/30
CPCG06N3/08G06N3/048G06N3/045G06F18/2411G06F18/214
Inventor 桂盛霖刘一飞
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products