A text document representation method and a device based on depth learning topic information enhancement

An information enhancement, text document technology, applied in file management systems, other database retrieval and other directions, can solve the problem of document representation vector lack of semantic information, neglect, insufficient to obtain global topic information of corpus, etc., to reduce topic redundancy. Effect

Active Publication Date: 2019-01-18
SHANXI UNIV
View PDF8 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, ordinary LSTM may not be sufficient to obtain the global topic information of the corpus
[0004] The shortcomings of the above methods show the difficulties faced by the document representation learning task: when the model is based on the global topic information of the corpus, the context information in the document is often lost (for example, it is impossible to determine the word "apple" without context information. fruit or a technology company), and when focusing on these local information, the global topic information is ignored (correlation between documents), and there is no restriction mechanism between topic information, which can easily lead to similarity between them and reduce model performance (For example, separate out redundant topic groups such as "economy", "entertainment", "chariots", and "warships")
All these defects will make the representation vector of the document lack some semantic information, which will limit the effect of these representation vectors in other applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text document representation method and a device based on depth learning topic information enhancement
  • A text document representation method and a device based on depth learning topic information enhancement
  • A text document representation method and a device based on depth learning topic information enhancement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In this embodiment, the experiment of the text document representation method based on deep learning topic information enhancement of the present invention is completed on a cluster computer of the School of Computer and Information Technology, Shanxi University. It adopts Gigabit Ethernet and infiniband 2.5G network. Each node is configured with an eight-core CPU and 128GB of memory. The CPU is an intel xeon E3-1230V53.4GMhz main frequency, and is equipped with two NVIDIA GTX1080 high-performance graphics cards, which can perform large-scale matrix operations and deep learning model training.

[0034] Depend on Figure 1-7 It can be seen that the present invention is divided into several sub-models for processing different semantic information, which are connected layer by layer and finally fused. The learning process mainly includes the following steps:

[0035] S1. For a document consisting of n words in a corpus containing K topics D={w 1 ,w 2 ,...,w n} Perform...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text document representation method and a device based on depth learning topic information enhancement. The method comprises the following steps: S1, data preprocessing operation is carried out on the corpus document in the form of text. S2, a text sequence layer is designed, and the context information of each word in the word order is embedded into the representation vector of each word in the document. S3, the sequence elements are transitioned to higher-level topic information through the attention layer. S4, in the topic layer, a representation of the current document D in all topic directions is generated. S5, the similarity between all the topic information is limited. S6, the topic representation vector is fused into the semantic representation vector Repof the document D at the presentation layer. 7, that parameters of the Rep are updated by a classify and an objective function, the method can efficiently embed the context semantic information and the potential topic information of a text sequence into a document representation vector, and the presentation vectors enhanced by the topic information can significantly improve the performance of a text mining model use the Rep.

Description

technical field [0001] The present invention relates to the field of computer text representation learning, in particular to a text document representation method based on deep learning enhanced topic information enhancement and a text document representation device based on deep learning enhanced topic information enhancement. Background technique [0002] A document-level, holistic grasp of text is an important requirement for many text processing tasks. Currently, this problem is generally addressed by text representation learning. The text document-level representation learning task is mainly dedicated to constructing a method to convert text documents into representation vectors that can be directly operated by computers based on their intrinsic semantic information. Specifically, it is to represent the document in the form of text as a fixed-length real-number vector that implies its semantics. Today, document representation learning has become a fundamental and wide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/93
Inventor 张文跃王素格李德玉
Owner SHANXI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products