Word vector generating method and device based on incremental learning

A technology of incremental learning and word vectors, applied in the computer field, can solve the problems that the GloVe algorithm does not consider incremental learning, and the process of generating word vectors takes a long time.

Active Publication Date: 2017-06-13
BEIHANG UNIV
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the GloVe algorithm does not consider the case of incremental learning. When the corpus changes incrementally, the global co-occurrence matrix changes.
The GloVe algorithm can only merge the original corpus and the incremental part of the corpus to obtain the merged corpus, and then retrain the entire corpus from the initial state, which will cause the process of generating word vectors to take a long time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector generating method and device based on incremental learning
  • Word vector generating method and device based on incremental learning
  • Word vector generating method and device based on incremental learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0022] figure 1 It is a flow chart of Embodiment 1 of the word vector generation method based on incremental learning in the present invention, such as figure 1 As shown, the method of this embodiment may include:

[0023] Step 101. Obtain the word co-occurrence matrix of the original corpus, the word co-occurrence matrix of the newly added corp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a word vector generating method and device based on incremental learning. The word vector generating method based on incremental learning includes the steps that a word co-occurrence matrix of an original corpus, a word co-occurrence matrix of a newly added corpus and a training result parameter of the original corpus are acquired, wherein the training result parameter comprises a gradient value and a first matrix decomposition result; the training result parameter of the original corpus serves as an initial training parameter of the newly added corpus; a total objective function is iteratively optimized through a gradient descent algorithm through the initial training parameter of the newly added corpus, the word co-occurrence matrix of the original corpus and the word co-occurrence matrix of the newly added corpus, and a second matrix decomposition result is obtained, wherein the second matrix decomposition result is a solution enabling the total objective function to achieve minimization; multiple word vectors are obtained according to the second matrix decomposition result. The duration consumed in the word vector generating process can be effectively shortened.

Description

technical field [0001] Embodiments of the present invention relate to computer technology, and in particular to a method and device for generating word vectors based on incremental learning. Background technique [0002] The word vector is to use a vector to represent a word, so as to mathematicize the natural language symbols so that the computer can process the natural language. [0003] The GloVe algorithm is a new word vector generation method, which comprehensively uses the global statistical information and local statistical information of words to generate a language model and a vectorized representation of words. The GloVe algorithm combines the advantages of the traditional statistical-based word vector model and prediction-based word vector model. The training process is simpler and more efficient, and the generated word vectors can better reflect the linear relationship between words. [0004] However, the GloVe algorithm does not consider the case of incremental...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/216G06F40/284
Inventor 张日崇包梦蛟刘垚鹏彭浩李建欣
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products