Unlock instant, AI-driven research and patent intelligence for your innovation.

Knowledge-integrated subject model-based microblog topic detection method

A topic model and discovery method technology, applied in the direction of instrumentation, unstructured text data retrieval, calculation, etc., can solve problems such as inconsistent topics, inaccurate clustering results, high-dimensionality and sparsity, and achieve data sparseness , improve clustering accuracy, solve the effect of data sparsity and huge amount of data

Inactive Publication Date: 2017-04-19
NANJING UNIV OF SCI & TECH
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when only traditional clustering methods are used for microblog topic discovery, the problem of high dimensionality and sparsity of feature vectors will occur, making the clustering results inaccurate
In recent years, there have been many studies on the combination of LDA-based topic models and clustering algorithms. However, the modeling results of the standard topic model LDA often produce many internally inconsistent topics.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Knowledge-integrated subject model-based microblog topic detection method
  • Knowledge-integrated subject model-based microblog topic detection method
  • Knowledge-integrated subject model-based microblog topic detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0031] Taking Weibo from January to February 2016 as an example, firstly obtain Weibo data, preprocess the data text, perform HTML parsing, filter out Weibo text, and remove stop words.

[0032] Then set the parameters of the model, as shown in the following table:

[0033] parameter meaning parameter meaning a Retweeting weighting factor K Number of clusters b Comment weighting factor mu 1

Knowledge Rule Threshold 1 k number of topics mu 2

Knowledge Rule Threshold 2

[0034] Run the topic model that integrates knowledge, obtain the output results, and use the hybrid clustering of two-layer K-center and hierarchical clustering for topic discovery, and the results of hot topics are as follows:

[0035]

[0036]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a knowledge-integrated subject model-based microblog topic detection method and relates to the field of natural language processing. The method comprises the following steps of firstly, acquiring microblog data and carrying out scoring, Chinese segmentation and stop word filtration on a microblog text; secondly, modeling the microblog text by using a knowledge-integrated subject model; and finally, detecting the topic by using hybrid clustering of a K-center and hierarchical clustering. By using a knowledge-integrated subject modeling method, the problem of data sparseness of a short microblog text set is effectively solved, a subject feature vector can be accurately defined and the clustering accuracy is effectively improved.

Description

technical field [0001] The invention relates to a microblog topic discovery method, in particular to a microblog topic discovery method based on a topic model of fusion knowledge. Background technique [0002] With the rapid development of the mobile Internet, more and more people use Weibo to express their opinions and opinions, and they can keep abreast of news happening around the world on Weibo. Due to the rapid increase in the number of microblog texts and the complexity of information, it is difficult for users to browse all microblog information. Therefore, obtaining hot topics has important research significance: on the one hand, it can help users quickly understand the hot spots of concern in various fields of society, and on the other hand, it can provide public opinion guidance for the field of public opinion monitoring. [0003] In recent years, the methods of microblog opinion discovery are mainly divided into two types: text clustering algorithms and topic mod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3344G06F16/35
Inventor 夏睿尹通
Owner NANJING UNIV OF SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More