Online internet topic mining method based on improved LDA model

An Internet and model technology, applied in the Internet field, can solve the problems of rationality, timeliness, calculation efficiency and accuracy discount, unreasonable and so on

Active Publication Date: 2015-12-09
SOUTHEAST UNIV
View PDF1 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, the LDA model uses the same hyperparameter when generating the probability distribution of all topics for each word This approach is not reasonable
Therefore, topic mining and detection models such as PLSA and LDA are generally suitable for offline topic mining environments where the corpus is relatively static. For the real-time and streaming online mining requirements of Internet topics, in terms of rationality, timeliness, computing efficiency and accuracy, etc. greatly discounted

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online internet topic mining method based on improved LDA model
  • Online internet topic mining method based on improved LDA model
  • Online internet topic mining method based on improved LDA model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.

[0035] (1) Using the On-LDA model as the basis, it conducts online mining on topics contained in a large number of webpage resources in the Internet. The On-LDA model is an improved LDA model that supports dynamic and online topic mining. Its probability graph model is as follows: figure 1 As shown, the meaning is: the process of mining n webpages (documents) to generate k topics can essentially be regarded as a generation process of a webpage (document) word set, that is, first use the current h...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online internet topic mining method based on an improved LDA model. The method corresponds to a continuous and streaming type topic mining process conducted in a segmented mode, n web pages are processed each time, the web pages are usually acquired by web crawlers from the internet in an online and real-time mode, and the mining results of the contents of the web pages generate k topics. After the current n topics are processed, the newly acquired n web pages are continuously processed through the mentioned process. The process mainly includes initialization of On-LDA model hyper-parameters, dynamic updating of the On-LDA model hyper-parameters, internet topic mining based on the On-LDA model and the like. By means of the method, the assignment way and use effect in respect to the hyper-parameters and of a traditional LDA model in the topic mining process are radically changed, the classified information to which the web page contents belong is fully utilized to assign initial values to the model hyper-parameters and , the initial values of the hyper-parameters completely depend on the web page contents to be mined, and the computing process is simplified while reasonability is achieved.

Description

technical field [0001] The invention belongs to the technical field of the Internet, and specifically relates to an online mining method for Internet topics based on an improved LDA model. Topics are detected and mined online. Background technique [0002] The rapid development and widespread popularization of the Internet has gradually become an important medium for people to quickly obtain, publish and transmit information. Especially in recent years, the mobile Internet has been greatly developed. It fully combines the advantages of both mobile communication and the Internet, making it more convenient for people to obtain information. A large number of information resources from many sources and with different positions are constantly emerging on the Internet, and some hot and sensitive topics reflected by them are often disseminated and diffused at an extremely fast speed with the help of the Internet, which has a major impact on society. Therefore, how to detect and m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F16/36G06F16/951G06F40/30G06F16/00
Inventor 杨鹏卢云骋董永强
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products