News event detecting method based on metadata analysis

A technology of event detection and metadata, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of weak cohesion and low accuracy of detection events.

Inactive Publication Date: 2008-05-07
TSINGHUA UNIV
View PDF0 Cites 91 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The accuracy of the existing event detection technology is not high, the events are broadly divided, and the news contained in the event is not cohesive in time and content, and there is still no comparison in the modeling and similarity calculation of news information. Make good use of metadata information such as time and location, and news reports describing a news event are often strongly related to information such as the time and location of the event

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • News event detecting method based on metadata analysis
  • News event detecting method based on metadata analysis
  • News event detecting method based on metadata analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] see picture 1. To fetch several news documents, the steps are as follows:

[0057] (1) Preprocessing

[0058] For each news document, use an XML parser (commonly used such as DOM4j) to extract the news content, title, writing time, author and category information, and use the Chinese word segmentation package ICTCLAS to segment the content and title of the news, and calculate the word frequency at the same time , when calculating the word frequency, the word frequency of the word in the title is weighted 5 times. Merge the feature words in the title with the feature words of the news content, and use keyword extraction technology to limit the feature words of each document to less than 50.

[0059] (2) Calculate the vector model

[0060] According to the chronological order of the news, the news is sorted in ascending order, and marked with the corresponding serial number, and the IDF of the feature words is calculated according to the IDF calculation formula modifie...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a news event testing method based on metadata analysis, belonging to the technical field of data mining. The invention is characterized in that: a multi-dimensional vector model is used to represent news documents, and the time characteristics of news are given adequate consideration during the weight calculation of the characteristic representation, and an improved IDF (inverse document frequency) calculation method for news characteristic words. During calculating the similarity among different pieces of news, such information as the time, categories and specific contents of the news are taken into comprehensive consideration, the news documents are preprocessed by extracting the keywords, and thus the dimensionality of the vector is greatly reduced. Subsequently, the news reports are clustered by means of the hierarchical clustering method, and the clustering result tree is then partitioned to cluster the new reports and accordingly matched with relevant news events. Compared with the prior event testing method, the invention has the advantage of a higher F value, which is used as a standard to assess the quality of clustering.

Description

technical field [0001] News event detection method based on metadata analysis belongs to the field of data mining Background technique [0002] News reports are often regarded as the most important source of information for people. News information has the characteristics of large quantity, rapid growth, strong timeliness and high relevance. People are more and more eager to obtain more high-level news information that they are interested in quickly and accurately from the massive news, although at present major portal websites and major search engine companies provide online news reading services, such as Google and Baidu, etc. The website also supports basic news categories (such as domestic, foreign, political, sports, etc.) browsing functions, and users can browse news that happened in the current day or in the past through these services. However, due to the frequent updates of news reports and the huge amount of data, users often feel that there is too much informatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 李涓子常诚张阔李军张鹏唐杰许斌
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products