Method and system for filtering UGC (User Generated Content) spam based on user comments

A content filtering and user commenting technology, applied in the field of communication, can solve problems such as strikes, achieve high recognition accuracy, improve ecological health and user experience

Active Publication Date: 2016-10-26
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But all of these are based on the content itself or the user dimension, and it is impossible to effectively combat the new forms of spam described in this article

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for filtering UGC (User Generated Content) spam based on user comments
  • Method and system for filtering UGC (User Generated Content) spam based on user comments
  • Method and system for filtering UGC (User Generated Content) spam based on user comments

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] The invention provides a method for filtering UGC junk content based on user comments, comprising the following steps:

[0046] S101. Collect comment data offline, perform feature extraction, train by machine learning methods, and establish a classification model;

[0047] Specifically: collect positive and negative samples of user comment data offline, where positive samples are spam comments, and negative samples are normal comments, and comment data includes but is not limited to text, pictures, videos, etc. Extract the corresponding features in the positive and negative samples, and train through the naive Bayesian algorithm;

[0048] Construct a classifier from positive samples and negative samples, and learn the structure and CPT of positive and negative samples. For example, the characteristics of spam comments in positive samples include but are not limited to "skin" c1, "most beautiful woman" c2, "beauty" c3 , "passion" c4, "agent" c5, "coupon" c6, "prize sale...

Embodiment 2

[0083] The difference between the embodiment of the present invention and embodiment one is:

[0084] In the embodiment of the present invention, when training samples, a neural network algorithm is used for training to establish a classification model.

[0085] For example, only post data within the last 72 hours is cached. A is the posting user, B, C, D, and E are all commenting users, among which B is a newly registered user, and the others are old users. According to the information and login of B when registering IP etc. can determine that it does not belong to the same cluster as A. C is the same user as A in the real world. Specifically, C is A’s trumpet, and the login positions of D and E are basically the same as A or D and E and C The login location of the user is basically the same, and the interaction with C is frequent, but the interaction with A is infrequent, and all the content published by these users is within 48 hours;

[0086] The text of A’s post is a piece...

Embodiment 3

[0100] Correspondingly, as Figure 5 As shown, the present invention also provides a UGC rubbish content filtering system based on user comments, the system includes a model building module, a clustering module, a judgment module, an acquisition module, a relationship building module and a storage module,

[0101] The model building module is used to collect comment data off-line, and perform feature extraction, train by machine learning methods, and establish a classification model;

[0102] Specifically: the model building module is used to collect positive and negative samples of user comment data offline, wherein the positive samples are spam comments, and the negative samples are normal comments, and the comment data includes but not limited to text, pictures, videos and other forms. Extract the corresponding features in the positive and negative samples, and train through the naive Bayesian algorithm;

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This invention discloses a method for filtering UGC spam based on user comments. The method comprises the following steps: regarding a user publishing text data as a first user; regarding a user publishing comment data as a second user; judging whether the first user and the second user are in the same cluster or not; if so, analyzing the comment data and judging whether the comment data is a spam or not; and if so, deleting the user generated content, which contains the text data and all comment data. Accordingly, this invention further discloses a system for filtering UGC spam based on user comments. By adoption of the method and the system, the text and the comments can be hit together; the spam seemed normally can be identified and hit powerfully; the spam can be controlled effectively; and the ecological health and the user experience of the platform can be improved.

Description

technical field [0001] The invention relates to the field of communication technology, in particular to a method and system for filtering UGC spam content based on user comments. Background technique [0002] UGC (User Generated Content, User Generated Content) is an idea emerging in the current international mainstream media, that is, under the norms and guidance of editors, the content generation process is delivered to users, so that users can obtain the right to speak. After every UGC content platform develops and grows, it is bound to be accompanied by the breeding and growth of spam content. These junk contents mainly include pornography, advertisements, fraud and so on. If it cannot be effectively controlled, the proliferation of spam content will seriously affect the ecological health of the platform, user experience, and even endanger the survival of the platform. [0003] Based on the text content, through machine learning methods, the text is classified and the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535
Inventor 梁传明漆仁尹鹏达刘雪飘
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products