Unlock instant, AI-driven research and patent intelligence for your innovation.

A word vector embedding method and device

A technology of word vectors and vectors, which is applied in the field of word vector embedding methods and devices, can solve the problems of low frequency of occurrence of target words, small corpus, unreasonable word vector embedding, etc., and achieve the effect of improving accuracy and efficiency

Active Publication Date: 2019-06-28
POTEVIO INFORMATION TECH
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] However, for small-scale companies or remote fields, the corpus is small, and the target word appears less frequently in the corpus, or even does not appear in the corpus. Using Skip-Gram to randomly initialize word vectors for training leads to unreasonable word vector embedding, which affects performance of the Skip-Gram model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A word vector embedding method and device
  • A word vector embedding method and device
  • A word vector embedding method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0026] figure 1 A schematic flow chart of the word vector embedding method provided by the embodiment of the present invention, such as figure 1 As shown, the method includes:

[0027] Step S11, obtaining the reference word vector of each word that matches the pre-trained word vector database in the sentence where the target word is located;

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a word vector embedding method and device. The method comprises the steps of obtaining a reference word vector of each word matched with a pre-trained word vector library in a sentence where a target word is located; Determining an initial word vector of the target word according to the reference word vector; And training a target corpus set corresponding to the target word according to the initial word vector and a vector embedding model W2V, and determining an embedded word vector of the target word. According to the word vector embedding method, in the initialization stage, the target words are endowed with priori knowledge; Even if the corpus set is small or the target word does not appear in the pre-training corpus set, a reasonable target wordvector can be trained through the vector embedding model, so that the embedding vector of the target word is closer to the real semantics of the target word, the problem that the word vector embedding of the small corpus set is unreasonable is solved, the word vector embedding accuracy is improved, and the efficiency of the vector embedding model is further improved.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of natural language processing, and in particular to a word vector embedding method and device. Background technique [0002] Vector embedding (Word2Vec, W2V) is a Natural Language Processing (NLP) method. W2V vectorizes all the words in the text, so that it can quantitatively measure the relationship between words and mine the relationship between words. contact. The current general-purpose vector embedding tools mainly include the Continuous Bag-of-Words Model (CBOW) and the Skip-Gram model. The training input of the CBOW model is the word vector corresponding to the context-related word of the target word, and the word vector of the target word is output. The Skip-Gram model is the opposite of CBOW. It assumes that similar words have similar contexts. According to the context of the current word prediction, the input is a word vector of the target word, and the output is the contex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCY02D10/00
Inventor 张鹏
Owner POTEVIO INFORMATION TECH