Word vector model incremental study method

A learning method and incremental technology, applied in the field of incremental learning of the word vector model, can solve problems such as consuming a lot of time and space, and achieve the effects of meeting efficiency requirements, avoiding repetitive learning, and reducing computational complexity

Active Publication Date: 2017-05-31
BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY +1
View PDF6 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If new samples arrive and all data needs to be relearned, it will consume a lot of time and space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector model incremental study method
  • Word vector model incremental study method
  • Word vector model incremental study method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0051] The present invention provides an online incremental learning algorithm (Incremental LearningWord2vec, ILW) based on word vectors, which realizes incremental learning of texts in an online system without re-learning all data.

[0052] figure 1 It is a flow chart of the ILW method provided by the present invention used in an online system. The incremental learning method ILW algorithm of the word vector model can be divided into three parts, the specific process is as follows:

[0053] 1. Initialization and update of new words (steps 1-2)

[0054] To train a new text data text, the vector model needs to be updated. The word vector model contains a large number of vector information of historical words. In the NS algorithm of the CBOW model, when new data is added to the training samples, it...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word vector model incremental study method. The super parameter used in the method comprises: a vector dimension, a number scope negative data, and a text window length; the word vector mode is updated dynamically to complete the vector model optimization and therefore complete the incremental study through initializing new words that appear in the additional text and sampling counterexample based on the historical glossary word-list; the technical scheme study can avoid to study repeatedly, historical data the calculating complexion degree is greatly reduced; the scheme can maintain the higher study efficiency so as to satisfy the efficiency requirement of the online system.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and relates to a word vector online training (learning) method, in particular to an incremental learning method of a word vector model. Background technique [0002] The text representation model is an important basis for text mining, natural language processing, and information retrieval. A good text representation model not only needs to contain the semantic information of the corresponding language unit (word or document), but also can directly measure the relationship between texts through this representation. semantic similarity. Word vectors have good semantic properties and are a common way to express word features. The value of each dimension of word vectors represents a certain semantic or grammatical feature, which can be used by many researchers for semantic or grammatical analysis. [0003] Literature (Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of W...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N99/00
CPCG06F40/253G06F40/30G06N20/00
Inventor 潘博于重重赵霞秦勇
Owner BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products