Method for establishing word vector improved model based on semantic embedding

A technology for improving models and building methods, applied in the field of word vector model building, can solve problems such as difficult to distinguish the meaning of polysemous words

Active Publication Date: 2019-12-03
NANJING UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When people read, their judgment on the semantics of polysemous words depends on their own knowledge reserve and reasoning ability, but for machines, it is difficult to distinguish the meanings represented by polysemous words in specific contexts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for establishing word vector improved model based on semantic embedding
  • Method for establishing word vector improved model based on semantic embedding
  • Method for establishing word vector improved model based on semantic embedding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

[0071] A method for building an improved word vector model based on semantic embedding, such as Figure 1-4 As shown, the method mainly includes three stages, namely: context vector training stage, semantic induction stage and semantic representation stage. Concretely include the following steps:

[0072] 1) Context vector training phase, ( figure 1 Middle 1-3 steps): such as figure 2 , 3 as shown,

[0073] 1)-a Process a large-scale corpus, extract text conten...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for establishing word vector improved model based on semantic embedding. The method comprises the following steps: 1) a bidirectional long-short-term memory network training stage; 2) a context vector calculation stage: inputting a sentence and a target word t into the bidirectional long-short-term memory network trained in the step 1) to obtain a context vector; 3) a context vector semantic clustering stage: a, calculating the current context vector and the similarity of each semantic cluster center of the word t by using cosine similarity; b, calculating theprobability P of the class cluster to which the current context vector belongs by using a Bayesian non-parametric statistical model; c, maximizing a P value and solving a class cluster corresponding to the value; c, offsetting the class cluster center to which the current context vector belongs. According to the method, the neural network and the Bayesian non-parametric statistical method are utilized to solve the problem that a current word vector model cannot solve the problem that one word has multiple meanings.

Description

technical field [0001] The invention relates to an improved method for establishing a word vector model. The model established by the method solves multiple semantic problems that the currently used word vector model cannot represent polysemous words. Background technique [0002] At present, the Internet has become an important way for people to obtain and distribute information. The information on the network platform often contains huge value. How to extract valuable content from the massive text information is one of the key issues of computer science. The development of deep learning technology has improved the ability of computers to process data and also promoted the development of natural language learning. When using deep learning for natural language processing, an indispensable process is the vectorized representation of words. In terms of word representation, the most widely used word2vec word vector model. The model has two structures, Skip-Gram and CBOW. The...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/35G06F17/27G06K9/62G06N3/04G06N3/08
CPCG06F16/374G06F16/355G06N3/049G06N3/084G06F18/24155
Inventor 张雷唐思雨潘元元路千惠谢俊元
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products