Topic embedding based on network link and document content, document representation method

A network link and topic technology, applied in character and pattern recognition, text database clustering/classification, unstructured text data retrieval, etc., can solve problems such as poor document representation, achieve poor improvement effect, and update parameters Efficient, scalable effects

Active Publication Date: 2019-02-01
TIANJIN UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the currently proposed method of obtaining document representation through topic embedding only utilizes the content information of the document (that is, the words in the document) to obtain the desired results, and there are a large number of fuzzy semantics (such as synonyms) in the content of the document, which is affected by fuzzy The impact of semantics can easily lead to poor topic embedding and document representation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic embedding based on network link and document content, document representation method
  • Topic embedding based on network link and document content, document representation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below by specific examples. The embodiments of the present invention are intended to enable those skilled in the art to better understand the present invention, and do not limit the present invention in any way.

[0031] In order to reduce the influence of Chinese fuzzy semantics in the traditional topic embedding method and obtain higher-quality topic embedding and document representation, the present invention uses a probabilistic graph model to establish a model with strong explanatory power and effective fusion of content and links in the document network. In order to make the The method is fast in operation and has strong scalability, and the variational expectation maximization algorithm adopted in the present invention is optimized. Through the training model of the present invention, users can directly obtain high-quality topic embedding and more accurate document representation. The invention has broad applicatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a subject embedding and document representation method based on network link and document content. By introducing topological link information in document network, the problemof fuzzy semantic terms (such as synonyms) affecting topic in topic embedding model can be solved effectively. Using the link information, combining the document content and links in the data with probability graph model, optimizing the parameters of the model by variational inference method, the negative influence caused by fuzzy semantics is basically solved, and more accurate topic embedding and document representation are obtained. The method of the invention combines the link relation and the content information in the document network, The problem of inefficient topic embedding caused byfuzzy semantics (such as polysemy) in the existing topic embedding model is effectively improved. The probability graph model is established to make the method more interpretable, and the variationalexpectation maximization algorithm is used to make parameter updating efficient, convergence time short, and can be applied to large-scale networks.

Description

technical field [0001] The invention belongs to machine learning, complex network, and natural language processing, and proposes a new topic model to improve topic embedding (that is, to represent topics in low-dimensional space) to improve the low-dimensional representation of documents in document networks. Topic embedding of web links and document content, document representation methods. Background technique [0002] In the field of natural language processing, the topic model has many applications, such as: mapping documents to low-dimensional topic spaces, that is, using the topic distribution of documents to represent documents. The traditional topic model ignores the correlation between topics. In order to represent the correlation between topics, in recent years, in the topic model, the method of topic embedding has been proposed, which is mainly used to represent the relationship between topics. At present, most methods of obtaining document representation through...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/35G06K9/62
CPCG06F40/247G06F40/30G06F18/24155Y02D10/00
Inventor 金弟黄剑涛
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products