Unlock instant, AI-driven research and patent intelligence for your innovation.

Topic mining method specific to forum text

A topic and forum technology, applied in the field of probabilistic topic model, can solve problems such as inability to effectively mine topics, and achieve the effect of improving capabilities

Active Publication Date: 2018-10-26
ZHEJIANG UNIV OF TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the deficiency that existing text mining methods cannot effectively dig out topics in forum texts, the present invention proposes a new topic model BBS-LDA based on LDA for the structural characteristics of forums to more effectively mine topics in forum texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic mining method specific to forum text
  • Topic mining method specific to forum text
  • Topic mining method specific to forum text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described below in conjunction with the accompanying drawings.

[0042] refer to figure 1 , a topic mining method for forum texts, including the following steps:

[0043] Step 1: Crawl the data of the forum, use some text processing methods to find replies that have a high probability of being meaningless, and mark them;

[0044] Step 1 is mainly to obtain data and provide part of the supervision information to the model to help us better model the forum text. There are two main types of spam replies on forums. The first type is some replies that are too short, basically do not contain any meaning, and are just replies posted for top posts or water posts. The second is some promotion.

[0045] Described step 1 comprises the following steps:

[0046] Step 11: Crawl the text in the forum through the crawler. The crawled content includes the content of the reply, the user who replied, and the id of the post corresponding to the rep...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a topic mining method specific to forum text, comprising the following steps: 1, crawling data of a forum, finding out replies more probable to be meaningless by utilizing a text processing means, and marking up; 2, cutting the forum text according to sentences, then carrying out word segmentation, deleting useless words according to parts of speech, and removing stop words; and 3, carrying out parameter estimation on existing text by using Gibbs Sampling according to a BBS-LDA topic model, and finally obtaining words the most probable to belong to each topic. The method provides a new topic model BBS-LDA based on LDA specific to the characteristics of the forum, and a topic in the forum text is more effectively mined, so that forum text topic mining capability isimproved.

Description

technical field [0001] The invention belongs to the field of text mining, and in particular relates to a probability topic model. Background technique [0002] Nowadays, the Internet is developing rapidly, and the Internet has become the main way for netizens to receive and disseminate information. Through the Internet, every netizen can obtain the latest events across the country in a very short period of time, express their own thoughts on these events in real time, and share the events they have learned with others. Through these text information, many meaningful things can be done: the government can obtain the people's livelihood issues that the people are most concerned about through Weibo or forums, and make reasonable improvements; The hottest stock, which stock is the most favored by netizens, and adjust their own investment strategies; consumers can have an objective understanding of the product by checking other users' comments on the product, and judge whether t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/289
Inventor 田贤忠姚明超顾思义
Owner ZHEJIANG UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More