Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for online news topic detection

A topic detection and news technology, applied in network data retrieval, other database retrieval, unstructured text data retrieval, etc., can solve problems such as topic drift, difference, and impact on clustering effects, and achieve the goal of improving quality and accuracy Effect

Active Publication Date: 2017-10-10
SUN YAT SEN UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such processing will bring about a problem: in the dynamic clustering stage, since there is no other text as a reference during feature extraction, the text processing is too simple, and the centroid of each topic will be generated due to the different order of text reading. A large difference affects the clustering effect
At the same time, the Single-Pass algorithm divides the topic of the text according to a single threshold specified in advance in the process of text and topic aggregation, which can easily lead to topic drift

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for online news topic detection
  • A method for online news topic detection
  • A method for online news topic detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] (1) Case analysis corpus

[0055] The news reports on food safety collected from major online news media were collected from November 2012 to January 2013. The corpus consists of 1034 news articles about food safety, with a total of 11 topics. These topics are gutter oil, golden rice, Jiugui wine, instant chicken, lean meat powder, Mead Johnson milk powder, poisonous bean sprouts, bright milk, gelatin shark's fin, cancer caused by cooking oil, and tap water safety. The number and time distribution of news articles contained in each topic are shown in Table 1 below.

[0056] Table 1 Food safety topic corpus

[0057]

[0058]

[0059] (2) Evaluation method

[0060] A supervised measure is adopted to evaluate the performance of the system, that is, to measure the degree of correspondence between cluster labels and topic labels. Among them, the cluster label refers to the label given by the system to a certain piece of news according to the cluster analysis, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an online news topic detection method and belongs to the field of computer science and technology. The more efficient topic detection method is raised up for web texts with the requirement for topic detection in the internet. A cluster buffer zone is established to initially cluster reached texts of a certain number or within a certain period through an X-means algorithm, a dual-threshold (a topic gathering threshold value and a topic mass center updating threshold value) thought is introduced, topic shift is effectively controlled, and the clustering effect is improved. The effects achieved through the method are superior to those of a classic Single-Pass algorithm at all evaluation indexes, and topics with the topic detection requirement are more accurately identified.

Description

technical field [0001] The invention relates to the field of computer science and technology, and more specifically, to a method for detecting online topics of network news. Background technique [0002] Topic Detection (TD) is one of the five basic research tasks in Topic Detection and Tracking (TDT), which mainly detects topics that are unknown beforehand in the detection and organization system. The TDT (Topic Detection and Tracking) project is a project funded by the US Defense Advanced Research Projects Agency (DARPA) and jointly participated by the University of Massachusetts, Carnegie Mellon University and Dragon Systems. This project is mainly to automate the analysis of continuous news media information, detect the topics in it, and track the detected topics. The research on topic detection is carried out under the background of TDT (Topic Detection and Tracking) project. For the task of topic detection, the Single-Pass algorithm is widely used. Single-Pass is an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/95
Inventor 常会友路永和韦婷婷胡勇军
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products