N-gram-based semantic mining method for increment of topic model

A topic model and semantic mining technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of long training time of topic models

Inactive Publication Date: 2011-11-16
BEIHANG UNIV
View PDF1 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But so far, there is no topic model that can be well applied to the data of scientific and technological literature, and the training time of the topic model is long, which is not suitable for tasks that require high real-time performance such as information retrieval.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • N-gram-based semantic mining method for increment of topic model
  • N-gram-based semantic mining method for increment of topic model
  • N-gram-based semantic mining method for increment of topic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0042] Studies have shown that the semantic expression ability of N-gram (N-grammatical elements) is stronger than that of simple words (Unigram, that is, unary grammatical elements), and the word space composed of N-grams (N-grammatical elements) can effectively improve text mining. Effects, such as: text clustering, text classification, text retrieval. However, directly introducing N-grams (N-gram elements) into the topic model will greatly increase the complexity of model training, making it difficult for the topic model to be directly applied to text mining tasks that require high resource requirements. At the same time, the research also shows that the prior probability value of each random variable in the topic model can control the topic distribution in the final training model, so the prior knowledge of historical data can be recorded by using th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an N-gram-based semantic mining method for increment of a topic model. The N-gram-based semantic mining method comprises the steps of: (1) expanding an Author-Conference topic model, wherein a word space is expanded from Unigram to N-gram; (2) calculating a prior probability parameter of a current model according to linear weighting of posterior probability in a prior trained model with respect to current input data; (3) calculating a posterior probability value of the current model to the current data by adopting a Gibbs sampling method; and (4) repeating the steps (2) and (3) for training the model in an increment manner with respect to newly input data streams. According to the invention, the N-gram is introduced into the topic model and the property of the topic model for modeling scientific and technical literature is improved according to the semantic features contained by the N-gram; and the topic distribution of historic data is recorded by adopting asymmetric prior probability so as to train the model in an increment manner, and the efficiency of the method is increased.

Description

technical field [0001] The invention relates to a method for incrementally creating an N-gram-based topic model for text input streams under the category of topic mining of scientific and technological text data. Background technique [0002] Automatic analysis and extraction of semantic information of scientific and technological literature resources is a problem widely studied by scholars. For this reason, text mining methods have been developed to help improve the semantics of analyzed texts. Topic model, as a powerful tool for mining text intrinsic topic information, is also often applied to the field of text mining. But so far, no topic model can be well applied to the data of scientific and technological literature, and the training time of the topic model is long, which is not suitable for tasks that require high real-time performance such as information retrieval. [0003] The semantic mining method based on the N-gram incremental topic model is proposed to solve t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 王晗徐毅郎波李未
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products