Network hot event detection method based on text classification and clustering analysis

A technology of clustering analysis and text classification, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve the problems of clustering method efficiency and accuracy improvement, and achieve efficiency improvement and accuracy, reducing time and space complexity, and facilitating text dimensionality reduction

Active Publication Date: 2014-12-24
NANJING UNIV OF POSTS & TELECOMM
View PDF7 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a method for discovering network hotspot events based on text classification and cluster analysis, which is used to solve the problem of clustering method efficiency and accuracy in the traditional single cluster-based hotspot event discovery method. The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network hot event detection method based on text classification and clustering analysis
  • Network hot event detection method based on text classification and clustering analysis
  • Network hot event detection method based on text classification and clustering analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The invention will be described in further detail below in conjunction with the accompanying drawings.

[0030] like figure 1 As shown, the present invention proposes a kind of network hotspot discovery method based on text classification and clustering technology, and this method comprises the following steps:

[0031] Step 1: Use the KNN classification method to classify the test text;

[0032] Step 1-1: Construct a training corpus (ie DTrain) and a test corpus (ie DTest), use the training corpus to extract feature words and perform feature selection. The training set adopts the existing Chinese corpus released on the Internet. The test samples can be obtained from the BBS forum and the news page of the portal website. Use the webpage acquisition module to search and download the required webpages on the Internet, and use the webpage cleaning module to clean the downloaded documents. Advertisements and other interference information are processed, and the main conte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a network hot event detection method based on text classification and clustering analysis. The method solves the problem that the efficiency and accuracy rate of the existing network hot event detection method based on clustering analysis need to be improved. The method comprises the steps that feature words are respectively selected for various classes of files through feature extraction and feature selection by utilizing a training corpus; each training text and test text are represented as vectors in all of the feature spaces by utilizing a vector space model method, and the weight of each dimension of the vectors is determined by utilizing a TF-IDF (term frequency-inverse document frequency) method, and then each test text is classified; the classified test texts in different classes are respectively subjected to clustering analysis, so the hot cluster of each class is obtained, the feature word representing the hot event is obtained through further analysis, and then the word property and other aspects of each feature word are analyzed; the description of each hot event is generated by utilizing relevant language knowledge and necessary linguistic organization. With the network hot event detection method based on text classification and clustering analysis, the detection efficiency and accuracy rate of hot events can be effectively improved.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a method for discovering network hotspot events based on text classification and cluster analysis. Background technique [0002] The development of the Internet provides an ideal channel for the public to express their inner emotions and attitudes. People can express their views and opinions based on the news. Hot events refer to events that arouse people's great concern in a certain period of time and in a certain area, that is, public events that attract a certain amount of public attention. Emergencies are a very important part of network hotspot events. Emergencies refer to events that form suddenly, cause huge property losses, a large number of casualties, and have a serious impact on people's daily life. Government departments need to closely monitor sudden public opinion information on the Internet at any time, hoping to grasp and track the latest social hotspots in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 成卫青范恒亮卢艳红
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products