Junk comment identification method based on collaborative training

A spam comment and collaborative training technology, which is applied in the field of spam comment identification based on collaborative training, can solve problems such as spam comments, achieve the effects of reducing workload, learning models efficiently, and improving accuracy

Inactive Publication Date: 2017-06-13
GUANGXI NORMAL UNIV
View PDF8 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is that there are a large number of spam commen...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Junk comment identification method based on collaborative training
  • Junk comment identification method based on collaborative training
  • Junk comment identification method based on collaborative training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Take the spam comment in microblog as example below, the present invention is described in further detail:

[0026] The overall framework diagram of a spam comment identification method based on collaborative training figure 1 shown.

[0027] Due to the limitation of 140 characters in microblog and its comments, the text content is short, but the comment data is huge and various network words are emerging in an endless stream. This invention designs a microblog spam comment identification method, using Co-Training collaborative training algorithm , construct two classifiers, AdaBoost and SVM, classify and train two classifiers on 10% of the labeled training data, and then use 70% of a large amount of unlabeled data as an additional set for collaborative training of the classifier, and finally use 20% of the labeled data is used as the test set. While improving the classification accuracy, it saves a lot of sample labeling work.

[0028] (1) Experimental data acquisit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a junk comment identification method based on collaborative training. Junk comments are classified into explicit junk comments and implicit junk comments, the explicit junk comments are screened out by adopting a rule-based method, identification training is conducted on one of the implicit junk comments by adopting two classifiers AdaBoost and the SVM based on an automatic identification method, and finally whether the comment is a junk comment or not is further judged through Co-Training. Therefore, the classification accuracy is improved, and meanwhile the classification efficiency of a junk comment classification method is also ensured.

Description

technical field [0001] The invention relates to the technical field of computer machine learning, in particular to a method for identifying spam comments based on collaborative training. Background technique [0002] Machine learning (Machine Learning, ML) is a multi-field interdisciplinary subject, specializing in the study of how computers simulate or implement human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Data mining is one of the theoretical foundations for machine learning. Data mining refers to extracting information hidden in it and unknown to people from a large number of incomplete, noisy, fuzzy, and random actual data. However, it is a process of potentially useful information and knowledge, and review-oriented data mining has always attracted the attention of researchers. [0003] A social network is a social relationship network service built on a network platf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/289
Inventor 李志欣兰丹媚张灿龙
Owner GUANGXI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products