Rapid detection method for hot issues of timing sequence massive network news

A technology of hot events and detection methods, which is applied in the fields of instrumentation, calculation, electrical and digital data processing, etc., to achieve the effect of improving system efficiency, improving accuracy, and improving efficiency

Inactive Publication Date: 2012-11-14
PEKING UNIV
View PDF3 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the problem with these two types of methods is that the number k of hotspo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid detection method for hot issues of timing sequence massive network news
  • Rapid detection method for hot issues of timing sequence massive network news
  • Rapid detection method for hot issues of timing sequence massive network news

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Below by example the present invention will be further described.

[0028] Suppose there are three consecutive days of online news reports, of which 100 on the first day are about earthquakes, 60 about the college entrance examination, 30 about national defense, 10 about foreign affairs, and 5 about the economy; on the second day there are 70 about health care, 50 articles about earthquakes, 20 articles about national defense, and 30 articles with unclear topics; on the third day, there were 80 articles about health care, 50 articles about tourism, and 10 articles with unclear topics. We don't know the total number of hot news events, nor what type of events each article belongs to.

[0029] First, a few notation descriptions are introduced:

[0030] (1) m is the size of the block, that is, the number of texts, expressed in time series as: x 1:m =(x 1 ,x 2 ,...,x m ), where x i Indicates the i-th text, i=1...m.

[0031] (2) The clustering clusters corresponding t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a rapid detection method for hot issues of timing sequence massive network news, comprising the following steps of: dividing a network news text sequence into block sequences according to time intervals; clustering a news text of the first block according to a Dirichlet process to form a clustered set; attenuating and filtering a result of clustering the front block to be used as prior distribution of subsequent blocks; clustering the subsequent blocks according to the Dirichlet process; carrying out hot degree sequencing of issues of each cluster according to reporting amount; and taking T clusters with the highest sequencing value as the hot issues; and selecting M characteristics with the highest tf-idf value in each cluster as keywords of hot spots and displaying the hot spots. According to the rapid detection method for the hot issues of the timing sequence massive network news disclosed by the invention, the efficiency of clustering the network news can be greatly improved; and meanwhile, the occupation of an internal memory is not linearly increased along the increasing of data quantity, and the rapid detection method is suitable for large-scale text data analysis.

Description

technical field [0001] The invention provides a method for discovering hot events of online news, specifically relates to quickly discovering hot events from massive news texts reported in time series on the Internet, and sorting the events according to popularity, belonging to the fields of natural language processing and data mining. Background technique [0002] With the vigorous development of network technology and the subsequent information explosion, on the one hand, people can obtain the latest and most complete news events at any time, and on the other hand, the time cost for readers to obtain key information has also increased. How to automatically obtain useful information from massive online network news has become an urgent task. The detection of hot events in online news can meet the needs of people to obtain important information from time-series massive network news and improve reading efficiency. It can also help relevant government departments to monitor ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 王厚峰彭楠赟
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products