Method and system for filtering UGC (User Generated Content) spam based on user comments
A content filtering and user commenting technology, applied in the field of communication, can solve problems such as strikes, achieve high recognition accuracy, improve ecological health and user experience
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0045] The invention provides a method for filtering UGC junk content based on user comments, comprising the following steps:
[0046] S101. Collect comment data offline, perform feature extraction, train by machine learning methods, and establish a classification model;
[0047] Specifically: collect positive and negative samples of user comment data offline, where positive samples are spam comments, and negative samples are normal comments, and comment data includes but is not limited to text, pictures, videos, etc. Extract the corresponding features in the positive and negative samples, and train through the naive Bayesian algorithm;
[0048] Construct a classifier from positive samples and negative samples, and learn the structure and CPT of positive and negative samples. For example, the characteristics of spam comments in positive samples include but are not limited to "skin" c1, "most beautiful woman" c2, "beauty" c3 , "passion" c4, "agent" c5, "coupon" c6, "prize sale...
Embodiment 2
[0083] The difference between the embodiment of the present invention and embodiment one is:
[0084] In the embodiment of the present invention, when training samples, a neural network algorithm is used for training to establish a classification model.
[0085] For example, only post data within the last 72 hours is cached. A is the posting user, B, C, D, and E are all commenting users, among which B is a newly registered user, and the others are old users. According to the information and login of B when registering IP etc. can determine that it does not belong to the same cluster as A. C is the same user as A in the real world. Specifically, C is A’s trumpet, and the login positions of D and E are basically the same as A or D and E and C The login location of the user is basically the same, and the interaction with C is frequent, but the interaction with A is infrequent, and all the content published by these users is within 48 hours;
[0086] The text of A’s post is a piece...
Embodiment 3
[0100] Correspondingly, as Figure 5 As shown, the present invention also provides a UGC rubbish content filtering system based on user comments, the system includes a model building module, a clustering module, a judgment module, an acquisition module, a relationship building module and a storage module,
[0101] The model building module is used to collect comment data off-line, and perform feature extraction, train by machine learning methods, and establish a classification model;
[0102] Specifically: the model building module is used to collect positive and negative samples of user comment data offline, wherein the positive samples are spam comments, and the negative samples are normal comments, and the comment data includes but not limited to text, pictures, videos and other forms. Extract the corresponding features in the positive and negative samples, and train through the naive Bayesian algorithm;
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com