Low-rank decomposition based delicate topic mining method

A low-rank decomposition and subject technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inability to focus on text descriptions

Inactive Publication Date: 2015-04-08
INST OF ELECTRONICS CHINESE ACAD OF SCI
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a fine topic mining method based on low-rank decomposition, which aims to solve the problem...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Low-rank decomposition based delicate topic mining method
  • Low-rank decomposition based delicate topic mining method
  • Low-rank decomposition based delicate topic mining method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the purpose, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the invention.

[0040] figure 1 It shows the implementation flow of the low-rank decomposition-based fine topic mining method provided by the embodiment of the present invention.

[0041] The fine topic mining method based on low-rank decomposition includes:

[0042] Step S101, performing word segmentation and removing stop words on the original corpus text;

[0043] Step S102, generate a topic matrix for the word frequency matrix obtained by preprocessing;

[0044] Step S103, decomposing the subject matrix, and decomposing the original corpus text into subject background and keywords.

[0045] In the embodiment o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a low-rank decomposition based delicate topic mining method. The delicate topic mining method comprises the following steps: conducting word dividing and stopword removal processing on an original corpus text; generating a topic matrix on the basis of a word frequency matrix obtained through pre-processing; decomposing the original corpus text into topic background and keywords by the topic matrix. According to the delicate topic mining method, a delicate model for expressing text contents without introducing a new implicit variable is brought forward; the model adopts an LDA (Latent Dirichlet Allocation) model as the basis to extract topic distribution of a text collection, and introduces in an improvement method of principal component analysis, namely the robustness principal component analysis method, in combination with the characteristics of text topics constituted by different aspects, in order to decompose each topic into a low-rank part and a rarefaction part; the low-rank part represents common words under the topic, and the rarefaction part is the delicate descriptions in different angles under the topic, so that the purpose of delicately expressing a text is realized, and the problems that the conventional topic model can only mine the topic background of the text, and cannot delicately describe emphasis points of the text are effectively solved.

Description

technical field [0001] The invention belongs to the technical field of text processing and mining, and in particular relates to a fine topic mining method based on low-rank decomposition. Background technique [0002] Mining hidden topics in text collections is one of the important research contents in the field of text mining. Topic models represented by Latent Dirichlet Allocation (LDA) have been widely used in recent years. These models transform the high-dimensional sparse word frequency matrix representation into a low-dimensional semantic space representation, that is, topic space representation, thereby playing a role in dimensionality reduction. This has wide applications in applications such as text modeling, text classification and information extraction. [0003] The reality corpus can be divided into economics, politics, entertainment, health and other topics according to the content. However, in practical applications, each topic needs to be further divided i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/374G06F16/3335
Inventor 孙显许光銮付琨胡岩峰郑歆慰田璟刁文辉
Owner INST OF ELECTRONICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products