Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for training word vector embedding model

A technology of word vectors and vectors, applied in the field of machine learning technology to text processing, can solve problems such as single and difficult to meet multiple needs, and achieve the effect of improving accuracy

Active Publication Date: 2020-08-14
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the current word embedding algorithm is relatively simple, and it is difficult to meet various needs. For example, while quickly generating word embeddings for a large number of words, it is necessary to ensure that the determined embeddings have high accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for training word vector embedding model
  • Method and device for training word vector embedding model
  • Method and device for training word vector embedding model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Multiple embodiments disclosed in this specification will be described below in conjunction with the accompanying drawings.

[0027] The embodiment of this specification discloses a method for training a word vector embedding model. Below, at first the inventor proposes the inventive concept of described method to introduce, specifically as follows:

[0028] The word vector algorithm is used to map a word to a fixed-dimensional vector, so that the value of the vector can represent the semantic information of the word. At present, there are two common frameworks for training word vectors, namely Skigram and CBOW (Continuous Bag-of-Words Model, continuous bag-of-words model). The word vectors determined based on the Skigram framework are more accurate, but the training speed will be many times slower. In some scenarios with a very large amount of data, the CBOW framework is more needed, but the accuracy of the word vector determined based on it is limited.

[0029] Bas...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of this specification provides a method for training a word vector embedding model, which includes multiple iterative updates, any one of which includes: first determining the central word and the corresponding multiple context words from the word sequence corresponding to the training sentence; then , according to the first word vector matrix, determine the center word vector corresponding to the center word, and according to the second word vector matrix, determine the multiple context word vectors corresponding to the multiple context words; then, based on the multiple context word vectors The similarity between them determines the corresponding multiple attention weights; and then uses the multiple attention weights to carry out weighted summation on the multiple context word vectors to obtain the context representation vector of the central word; then, calculate the The first similarity between the central word vector and the context representation vector; finally, at least aiming at increasing the first similarity, update the first word vector matrix and the second word vector matrix.

Description

technical field [0001] The embodiments of this specification relate to the application of machine learning technology to the field of text processing, and in particular, relate to a method and device for training a word vector embedding model. Background technique [0002] Word vector technology solves the problem that computers are difficult to understand the semantics of human language by mapping words into real number vectors. For example, humans can easily judge that "cat" and "cat" are two words with very close semantics, but it is difficult for a computer to describe the semantic similarity between these two words. In this regard, the word vector algorithm can be used to generate a word vector for "cat" and "cat", and then by calculating the similarity between word vectors, the semantic similarity between "cat" and "cat" can be determined. Therefore, the accuracy of the word vector algorithm determines the semantic understanding ability of the computer. [0003] Howe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/289G06F40/30
CPCG06F16/3344G06F16/35
Inventor 曹绍升陈超超吴郑伟
Owner ALIPAY (HANGZHOU) INFORMATION TECH CO LTD