Self-encoding document representation method using random walk

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A random walk and self-encoding technology, applied in special data processing applications, natural language data processing, instruments, etc., can solve serious high-dimensional sparse problems

Inactive Publication Date: 2018-08-21

BEIJING INSTITUTE OF TECHNOLOGYGY

View PDF4 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The graphical model text representation method avoids the problem of independence of each dimension in VSM, but the increase in the number of nodes in the graphical model will cause serious high-dimensional sparse problems, so controlling the balance between the number of nodes and the representation effect is very important for the graphical model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0030] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with examples.

[0031] The specific process is:

[0032] Step 1, perform sparse topic encoding on the text set.

[0033] Step 1.1, given a Boolean vector X of a text (i) , then the posterior probability p(t i |X) can be generated by an encoding network composed of a nonlinear sigmoid function, in the form of formula (1).

[0034] p(t i |X)←f θ (X)=σ(WX+b) (1)

[0035] Step 1.2, given text topic code Y (i) , word distribution Z (i) Chinese word w j The posterior probability of occurrence p(w j |Y) can be generated by a decoding network composed of a nonlinear sigmoid function, in the form of formula (2).

[0036] p(w j |Y)←g θ′ (Y)=σ(W T Y+c) (2)

[0037] Step 1.3, use the Bernoulli cross entropy shown in formula (3) to measure the difference between the real wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a self-encoding document representation method using random walk, belonging to the field of natural language processing and machine learning. The goal is to solve the text topic modeling problem. A self-encoding network is adopted; for a given text set, we first use a sparse self-encoding network to construct sparse topic coding of text; then we construct a text neighbor graph based on text similarity measure, generate a random walk structure by applying low rank constraint to the text neighbor graph, and calculate weighted coefficients of the local neighbor text by aconditional access probability of the random walk structure; finally, the sparse topic encoding of the local neighbor text is utilized to perform weighting and embed an intrinsic geometric structure for characterizing the text manifold, to serve as a regular constraint term to fuse into the training of the self-encoding network, and a parameterized topic coding network is established to perform topic modeling on external text of a sample. The self-encoding document representation method has the advantages of being high in accuracy and operation efficiency and capable of modeling the external topics of the text. The method is suitable for the field of text topic modeling which requires high precision, has a great impetus for the development of text representation, and has a good applicationvalue and promotion value.

Description

technical field [0001] The invention relates to a local weighted embedding regularization self-encoding text topic modeling method, which belongs to the field of natural language processing and machine learning. Background technique [0002] Text topic modeling discovers the explanatory factors hidden behind the text set by constructing the probability generation relationship between topics (hidden variables) and words (observed variables), and constructs low-dimensional topic coding of texts based on the probability dependencies between variables , so as to efficiently store and represent the semantic information of the text. Text topic modeling has good explanatory properties and a solid theoretical foundation. It has been widely concerned and applied to many important natural language tasks, such as sentiment analysis, clustering, document retrieval, statistical machine translation, etc. [0003] An excellent text representation needs to meet three characteristics: 1. Lo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F17/22

CPCG06F16/3344G06F40/126

Inventor 罗森林赵一飞潘丽敏魏超

Owner BEIJING INSTITUTE OF TECHNOLOGYGY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Self-encoding document representation method using random walk

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology