Multi-source network public opinion theme mining method based on improved hierarchical clustering

A technology of hierarchical clustering and source network, applied in text database clustering/classification, special data processing applications, unstructured text data retrieval, etc. It can solve the problem of lack of topic hierarchical mining, and achieve the effect of guiding the trend of network public opinion.

Active Publication Date: 2019-09-10
BEIJING UNIV OF POSTS & TELECOMM
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are few studies on public opinion topics and characteristic public opinion topics on multi-source network platforms, and most of the research angles are devoted to the effect and application of topic mining, and there is a lack of research on topic-level mining.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-source network public opinion theme mining method based on improved hierarchical clustering
  • Multi-source network public opinion theme mining method based on improved hierarchical clustering
  • Multi-source network public opinion theme mining method based on improved hierarchical clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Such as figure 1 As shown, the present invention provides a multi-source network public opinion topic mining method based on improved hierarchical clustering, which specifically includes the following steps:

[0041] Step 1. Acquisition of word vectors. The traditional vector space model constructs vectors through words and word frequency, without considering the relationship between contexts and semantic information; while most of the network platforms are short text, lack of vocabulary, noise and serious colloquialism, use the vector space model Vectorization cannot effectively represent the subject features while the vector dimension is high; correspondingly, the neural network language model uses neural networks to solve natural language processing problems in terms of text vector representation, and uses contextual relations and semantic information for training. A low-dimensional real-valued vector to represent words to avoid the curse of data dimensionality;

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-source network public opinion theme mining method based on improved hierarchical clustering, and relates to the field of theme mining. The method specifically comprisesthe following steps of 1, obtaining a word vector; 2, preprocessing all the data; 3, vectorizing the total sample data sentences preprocessed in the step 2; 4, carrying out sentence vector semi-supervised hierarchical topic mining; and 5, outputting a tree diagram Dendrogram. According to the method, by utilizing the advantage that the hierarchical clustering algorithm comprises the hierarchicalinformation, and on the basis, carrying out optimization at the aspects of priori knowledge use, model input vectorization, high-quality topic screening and the like, so that finally the method provided by the invention can be effectively applied to the topic mining of multi-source network platform short texts with the wide topics, high text noise and lack of grammar specifications.

Description

technical field [0001] The invention relates to the field of topic mining, in particular to a multi-source network public opinion topic mining method based on improved hierarchical clustering. Background technique [0002] Internet public opinion refers to the network public opinion with different views on social issues popular on the Internet, and it is a form of public opinion. In recent years, the impact of Internet public opinion on the order of political life and social stability has been increasing day by day. Some major Internet public opinion events have made people realize that the Internet plays a huge role in social supervision. With the development of the Internet, network platforms have quickly become the main source of Internet public opinion due to their wide user groups, strong openness, and fast information dissemination. The topic of public opinion is a highly abstract summary of the text sent by users. If you understand the topic of public opinion, you ca...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
CPCG06F16/353G06F16/358
Inventor 吴旭颉夏青蔡跃许晋方滨兴陆月明
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products