Supercharge Your Innovation With Domain-Expert AI Agents!

Method and device for classifying topics in online communities

A network community and topic technology, applied in the field of data processing, can solve problems such as data imbalance and inaccurate data classification, and achieve the effect of solving low classification accuracy

Active Publication Date: 2020-06-30
BEIJING UNIV OF POSTS & TELECOMM
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the centralized discussion of hot topics by a large number of users in the online community can easily lead to data imbalance. None of the existing classification methods can solve the problem of data imbalance well, resulting in inaccurate classification of some data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for classifying topics in online communities
  • Method and device for classifying topics in online communities

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] According to an embodiment of the present invention, a network community topic classification method is provided, such as figure 1 shown, including:

[0057] Step 101: Collect topic corpus in the online community and determine corresponding category marks, preprocess the collected topic corpus as a sample set;

[0058] According to an embodiment of the present invention, collecting the topic corpus of the online community and determining the corresponding category mark includes: grabbing each topic content in each section of the online community through a web crawler, using the captured topic content as the topic corpus, and passing the corresponding section The serial number establishes a corresponding relationship with each category in the classification system, and determines the category identification of each topic corpus according to the established corresponding relationship; among them, the topic content includes: topic title, topic text, topic release time, top...

Embodiment 2

[0107] According to an embodiment of the present invention, a network community topic classification device is provided, such as figure 2 shown, including:

[0108] Collecting module 201, is used for collecting network community topic corpus and determines corresponding category mark;

[0109] The preprocessing module 202 is used to preprocess the topic corpus collected by the collection module 201 and use it as a sample set;

[0110] The construction module 203 is used to construct the cost-sensitive matrix of the misclassification of the sample set obtained by the preprocessing module 202 according to the category mark determined by the collection module 201 and the Naive Bayesian algorithm;

[0111] The training module 204 is used to train the sample set obtained by the preprocessing module 202 based on the cost-sensitive matrix constructed by the construction module 203 to obtain a classifier;

[0112] The classification module 205 is configured to use the classifier ob...

Embodiment 3

[0145] According to an embodiment of the present invention, there is also provided a network community topic classification device, including one or more processors, a storage device storing one or more programs; when the one or more programs are used by the one or more When the processors execute, the one or more processors implement the steps of the method for classifying topics in the online community as described above.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a network community topic classification method and device, belonging to the technical field of data processing. The method comprises the following steps of collecting the topic corpus of the network community and determining the corresponding category marker; preprocessing the collected topic corpus as a sample set; constructing the cost sensitivity matrix of sample set misclassification according to class labeling and naive Bayesian algorithm; obtaining the classifier by training the sample set based on the cost sensitivity matrix; using a classifier to classify the web community text. In this invention, by constructing the cost sensitive matrix, during the training of the classifier, by introducing cost sensitivity into stochastic forests, and in order to guarantee the performance of the classifier, the problem of low classification accuracy caused by data imbalance is effectively solved, which provides a favorable basis for the analysis and supervision of the topic of the network community.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method and device for classifying topics in a network community. Background technique [0002] In the early 1960s, the Internet developed rapidly, and China ranked among the top in both user scale and information resources. Nowadays, the Internet has gradually penetrated into people's daily life, work, leisure and entertainment, which has greatly promoted the development of informatization. While receiving and obtaining data from the Internet one after another, people began to create and share information. The online community provides a platform for netizens to communicate with each other and share information. The online community refers to the online communication space including forums, post bars, bulletin boards, online chats, interactive dating and wireless value-added services. Due to their strong openness and wide user groups, online communities have become an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951G06F16/332G06F16/35G06F40/284G06Q50/00
CPCG06F40/284G06Q50/01
Inventor 吴旭党习歌颉夏青
Owner BEIJING UNIV OF POSTS & TELECOMM
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More