Self-encoding document representation method using random walk

A random walk and self-encoding technology, applied in special data processing applications, natural language data processing, instruments, etc., can solve serious high-dimensional sparse problems

Inactive Publication Date: 2018-08-21
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The graphical model text representation method avoids the problem of independence of each dimension in VSM, but the increase in the number of nodes in the graphical model will cause serious high-dimensional sparse problems, so controlling the balance between the number of nodes and the representation effect is very important for the graphical model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-encoding document representation method using random walk
  • Self-encoding document representation method using random walk
  • Self-encoding document representation method using random walk

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with examples.

[0031] The specific process is:

[0032] Step 1, perform sparse topic encoding on the text set.

[0033] Step 1.1, given a Boolean vector X of a text (i) , then the posterior probability p(t i |X) can be generated by an encoding network composed of a nonlinear sigmoid function, in the form of formula (1).

[0034] p(t i |X)←f θ (X)=σ(WX+b) (1)

[0035] Step 1.2, given text topic code Y (i) , word distribution Z (i) Chinese word w j The posterior probability of occurrence p(w j |Y) can be generated by a decoding network composed of a nonlinear sigmoid function, in the form of formula (2).

[0036] p(w j |Y)←g θ′ (Y)=σ(W T Y+c) (2)

[0037] Step 1.3, use the Bernoulli cross entropy shown in formula (3) to measure the difference between the real wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a self-encoding document representation method using random walk, belonging to the field of natural language processing and machine learning. The goal is to solve the text topic modeling problem. A self-encoding network is adopted; for a given text set, we first use a sparse self-encoding network to construct sparse topic coding of text; then we construct a text neighbor graph based on text similarity measure, generate a random walk structure by applying low rank constraint to the text neighbor graph, and calculate weighted coefficients of the local neighbor text by aconditional access probability of the random walk structure; finally, the sparse topic encoding of the local neighbor text is utilized to perform weighting and embed an intrinsic geometric structure for characterizing the text manifold, to serve as a regular constraint term to fuse into the training of the self-encoding network, and a parameterized topic coding network is established to perform topic modeling on external text of a sample. The self-encoding document representation method has the advantages of being high in accuracy and operation efficiency and capable of modeling the external topics of the text. The method is suitable for the field of text topic modeling which requires high precision, has a great impetus for the development of text representation, and has a good applicationvalue and promotion value.

Description

technical field [0001] The invention relates to a local weighted embedding regularization self-encoding text topic modeling method, which belongs to the field of natural language processing and machine learning. Background technique [0002] Text topic modeling discovers the explanatory factors hidden behind the text set by constructing the probability generation relationship between topics (hidden variables) and words (observed variables), and constructs low-dimensional topic coding of texts based on the probability dependencies between variables , so as to efficiently store and represent the semantic information of the text. Text topic modeling has good explanatory properties and a solid theoretical foundation. It has been widely concerned and applied to many important natural language tasks, such as sentiment analysis, clustering, document retrieval, statistical machine translation, etc. [0003] An excellent text representation needs to meet three characteristics: 1. Lo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/22
CPCG06F16/3344G06F40/126
Inventor 罗森林赵一飞潘丽敏魏超
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products