On-line supervised theme-modeling and evolution-analyzing method

A topic modeling, supervised technique used in special data processing applications, instrumentation, electrical digital data processing, etc.

Inactive Publication Date: 2012-09-12
ZHEJIANG UNIV
View PDF0 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although many researchers at home and abroad have made efforts to improve the topic model and produced many effective topic mining algorithms, so far there is no model that can simultaneously consider the temporal characteristics and category attributes of documents.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • On-line supervised theme-modeling and evolution-analyzing method
  • On-line supervised theme-modeling and evolution-analyzing method
  • On-line supervised theme-modeling and evolution-analyzing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0085] The experimental data uses the New York Times dataset, which includes text data from January 1, 2011 to April 30, 2011. In order to avoid the impact of individual classes with a small number of texts on topic modeling, these few class data. Only the text data in the 8 categories of the New York Times arts, business, and health are analyzed, and each article is marked with one of the categories. The entire processed data set contains a total of 8295 articles and 32723 different words. The specific information on the number of documents and the number of words contained in each category is shown in the following table, where the unit of the number of words is thousands:

[0086] category

arts

business

health

real estate

science

technology

us

the world

number of documents

1366

1681

313

215

297

229

1928

2326

word count

1932

2028

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an on-line supervised theme-modeling and evolution-analyzing method. The method comprises the following steps: (1) that news texts are downloaded from news media websites and are divided according to certain time granularity; (2) that word segmentation processing is carried out for news texts in each time period, and vocabulary are selected and updated according to word frequencies; (3) that text features are extracted to form a relational matrix between words and texts and to compose input of an on-line supervised theme model; (4) that the on-line supervised theme model is established, wherein the on-line supervised theme-modeling method is used to detect themes of data set in each time granularity to acquire a distribution matrix of words about theme and a distribution matrix of themes on texts; and (5) that a Jensen-Shannon divergence is used to carry out evolution analysis for themes acquired in step (4) and to calculate attributes of each theme, in order to acquire evolution processes of each theme. The method provided in the invention fully utilizes time and classification information of data itself, improves accuracy of theme mining, and effectively analyzes evolution processes of themes by combing classification information.

Description

technical field [0001] The invention relates to the field of topic mining of texts, in particular to an online supervised topic modeling and evolution analysis method. Background technique [0002] With the rapid development of the Internet and the rapid growth of various network resources, it is particularly important how to display huge data sets in a reasonable structure so that users can quickly understand the current and historical information of various subject events. Traditional searching, indexing, and browsing can no longer meet the needs of users. It has become a more scientific and reasonable way to abstract related events into topics at the semantic level, and to represent the entire data set in the form of topics. Therefore, the research on the topic of mining data content and its evolution algorithm has urgent practical significance and is full of endless challenges. [0003] Most of the current topic modeling and analysis methods for discrete data are implem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 邵健张寅任鸿凯吴飞
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products