Text multi-vector representation mutual learning-based spam comment filter method

A technology of spam comment and filtering method, which is applied in the field of spam comment filtering based on mutual learning of text multi-vector representation, can solve the problem of unsatisfactory performance of semi-supervised learning method, achieve the effect of improving classification performance and saving labor cost

Active Publication Date: 2018-07-20
SOUTH CHINA UNIV OF TECH
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although machine learning-based methods require less labor costs, if you want to have good results, you also need a l

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text multi-vector representation mutual learning-based spam comment filter method
  • Text multi-vector representation mutual learning-based spam comment filter method
  • Text multi-vector representation mutual learning-based spam comment filter method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0039] Below, the present invention will be further described in detail by taking the spam comments on the Amazon website as an example.

[0040] figure 1 Described is the overall flowchart of the method. The comments on e-commerce website products are usually short in text content, but there are many comments and spam information such as irrelevant advertisements often appear. Therefore, the present invention designs a method for filtering spam comments of e-commerce products, adopts a method of mutual learning of multiple text multi-vector representations, constructs multiple classifiers that promote each other, and trains multiple classifiers on labeled training data , and then use the unlabeled data as an additional set to let these classifiers learn from each other, and the final classifiers can be used as filters for spam comments. This method not only improves the classification accuracy, but also greatly saves the work of manual sample labeling. This example adopts ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text multi-vector representation mutual learning-based spam comment filter method. The method comprises the following steps of: firstly preprocessing a comment text of a training set, and carrying out vectorized representation on a same text by using multiple different text vector representation methods; training multiple different classifiers by adoption of a same classifier training method according to the different vector representation manners; carrying out mutual learning among the classifier by means of a thought of cooperative training, continuously labelling data of an addition set and moving the labelled data into the training set; and finally obtaining a plurality of classifiers, the abilities of which are enhanced. According to the method, the thought of cooperative training is adopted, and automatic labelling can be carried out on residual data through manually labelling a part of data, so that a great deal of manual data labelling work is decreased, the classification ability of the classifiers is enhanced and then the spam comment filter precision is improved.

Description

technical field [0001] The invention relates to a spam comment filtering technology, in particular to a spam comment filtering method based on mutual learning of text multi-vector representations. Background technique [0002] With the rapid development of e-commerce, the number of online user reviews in major e-commerce and related fields has increased dramatically, and these reviews are also important reference information for people when shopping. In short, positive reviews will promote consumers' purchase intention, while negative reviews will largely increase consumers' doubts. Therefore, a large number of organizations and individuals appear on the Internet to use comments to cheat, create spam comments to confuse the public, and mislead users. Although most websites will set up a voting mechanism for "helpful" or "useful" on the comment content, there are very few actual voting records. Therefore, it is particularly important to filter spam comments, purify the netw...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/35G06F16/9535G06F18/2411G06F18/214
Inventor 何克晶刘琰翔
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products