Deceptive junk comment detection method oriented to user generated contents

A user-oriented and detection method technology, applied in the information field, can solve the problems of difficult discrimination, repetition, and low accuracy of deceptive spam opinions, and achieve the effect of improving adaptability and ensuring accuracy

Active Publication Date: 2014-06-11
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In general, considering the difficulty of identifying untrustworthy opinions (the characteristics of untrustworthy opinions are not obvious, and there is no clear, general, and operable criterion for judging untrustworthy spam opinions, so it is difficult to distinguish), user-generated It is relatively more difficult to detect spam opinions in content, especially untrustworthy opinions (compared to traditional spam page and spam detection)
How to accurately detect deceptive spam opinions in user-generated content has not been effectively solved
Regarding this approach, notice that

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deceptive junk comment detection method oriented to user generated contents
  • Deceptive junk comment detection method oriented to user generated contents
  • Deceptive junk comment detection method oriented to user generated contents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The specific embodiment of the present invention is as figure 1 shown. Each step is described in detail below.

[0038] (1) Opinion collection generation

[0039] For a specific user-generated content information source (such as a certain forum), start the Internet information crawler software to crawl the opinion information in it, and perform preprocessing on it (including metadata extraction such as the author of the webpage information, text extraction, and word segmentation) , part-of-speech tagging, named entity extraction, sentence statistics, paragraph statistics, punctuation statistics, etc.) steps to form a user opinion information set.

[0040] (2) Marking of deceptive spam comments

[0041] Considering that the purpose of deceptive spam opinions is to unrealistically raise or lower the image of specific objects such as websites, web pages, products, characters, etc., the specific performance is to give specific objects, events, characters, etc. Super hig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deceptive junk comment detection method oriented to user generated contents. The method comprises the steps of 1) establishing a user comment information set for crawled webpages, and clustering to obtain a plurality of information regions, 2) calculating a label vector of each information region and sampling to obtain a sample set thereof, 3) labeling samples in each sample set to obtain a deceptive junk comment sample set and an unlabeled comment information sample set of each information region, 4) for each sample, searching for P samples most similar to the sample in the sample set of each information region and calculating the final feature vector of the sample, 5) selecting a machine learning method and establishing a deceptive junk comment detection model for each information region based on the final feature vector of each sample, and 6) performing deceptive junk comment detection on a newly crawled user generated content webpage by virtue of the deceptive junk comment detection model. The method improves the efficiency of detecting the deceptive junk comments.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a user-generated content-oriented fraudulent garbage opinion detection method, which is mainly used in the fields of Internet information monitoring, information early warning, sentiment analysis, information filtering, vertical search and the like. Background technique [0002] With the rise of Web2.0 applications, Internet users can use BBS forums, blogs, microblogs, social networking sites and other places to express their feelings about Internet information by means of browsing, forwarding, commenting, posting blog articles and comments, etc. Concerns and opinions on topics of interest can also be communicated with relevant personnel. In this mode, the views, opinions and other content generated by users are called User-Generated-Content. It has been recognized that mining these user-generated content can yield valuable knowledge that can be applied in many ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/95
Inventor 杨风雷黎建辉
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products