Subtopic mining method

A sub-topic and topic technology, applied in the field of topic mining

Active Publication Date: 2017-06-13
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF7 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Above-mentioned prior art all can't overcome this problem effectively

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subtopic mining method
  • Subtopic mining method
  • Subtopic mining method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] According to one embodiment of the present invention, an LDA subtopic mining method that suppresses background noise is provided. Compared with the original LDA algorithm, the subtopic mining method of this embodiment adds a special topic, namely the background corpus topic . In this embodiment, it is considered that a word may come from the background corpus or it may come from a differentiated topic model. The distribution of words in the background corpus will not change during the iterative process of the topic model, and the distribution of words in the background corpus can be passed in advance. The statistics of the overall corpus are calculated, while the probability distribution of words in the differentiated topic model needs to be calculated in the subsequent update iteration process.

[0060] figure 1 A flow chart of the LDA subtopic mining method for suppressing background noise is shown in this embodiment, including the following steps:

[0061] Step 1: ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a subtopic mining method. The method comprises the steps that (1) a subject value of each term of each document in a corpus is initialized; (2) based on the current subject values of all the terms of all the documents, the probability of each term in each article coming from all subtopics and the probability of each term coming from a background module are calculated, and then a subject value is redistributed for each term in each article through a Gibbs sampling algorithm based on the calculated probabilities, wherein the probability of each term coming from the background module is calculated according to term distribution vectors, subjected to statics in advance, in the background module, and the term distribution vectors in the background module are constant from beginning to end in the iteration process; and (3) if iteration stop conditions are met, LDA subtopics are obtained according to current subject value information, and if not, the step (2) is returned to. Through the method, the topic mining effect targeting a feature article set can be remarkably improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular, the present invention relates to a topic mining method. Background technique [0002] At present, topic mining and analysis has always been an important research direction in the field of natural language processing, and has a wide range of applications in public opinion analysis and other fields. The explosion of network information caused by the rapid development of online social networks has made ordinary users feel at a loss in the face of the huge amount of rapidly generated information. Therefore, there is a general trend of classification and refinement of information on online social networks. Under this trend, the distribution of information is more detailed and compact, such as the label mechanism such as HashTag in Weibo, and the collection mechanism of special articles of similar public accounts in WeChat public accounts. The application dem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3329G06F40/205
Inventor 李静远丘志杰刘悦程学旗王凤
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products