Deceptive junk comment detection method oriented to user generated contents

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A user-oriented and detection method technology, applied in the information field, can solve the problems of difficult discrimination, repetition, and low accuracy of deceptive spam opinions, and achieve the effect of improving adaptability and ensuring accuracy

Active Publication Date: 2014-06-11

COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

View PDF3 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] In general, considering the difficulty of identifying untrustworthy opinions (the characteristics of untrustworthy opinions are not obvious, and there is no clear, general, and operable criterion for judging untrustworthy spam opinions, so it is difficult to distinguish), user-generated It is relatively more difficult to detect spam opinions in content, especially untrustworthy opinions (compared to traditional spam page and spam detection)

How to accurately detect deceptive spam opinions in user-generated content has not been effectively solved

Regarding this approach, notice that there are two important problems: (1) Duplicate opinions are not necessarily deceptive spam opinions; (2) Not all deceptive spam opinions are duplicates

At the same time, in the above research work, there are problems such as not considering the representativeness of the samples when building the classifier, not considering whether the characteristics of the extracted samples are comprehensive and accurate, and not considering the adaptability of the established model.

These reasons may be one of the reasons why the accuracy of fraudulent spam detected by the same method is relatively low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0037] The specific embodiment of the present invention is as figure 1 shown. Each step is described in detail below.

[0038] (1) Opinion collection generation

[0039] For a specific user-generated content information source (such as a certain forum), start the Internet information crawler software to crawl the opinion information in it, and perform preprocessing on it (including metadata extraction such as the author of the webpage information, text extraction, and word segmentation) , part-of-speech tagging, named entity extraction, sentence statistics, paragraph statistics, punctuation statistics, etc.) steps to form a user opinion information set.

[0040] (2) Marking of deceptive spam comments

[0041] Considering that the purpose of deceptive spam opinions is to unrealistically raise or lower the image of specific objects such as websites, web pages, products, characters, etc., the specific performance is to give specific objects, events, characters, etc. Super hig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deceptive junk comment detection method oriented to user generated contents. The method comprises the steps of 1) establishing a user comment information set for crawled webpages, and clustering to obtain a plurality of information regions, 2) calculating a label vector of each information region and sampling to obtain a sample set thereof, 3) labeling samples in each sample set to obtain a deceptive junk comment sample set and an unlabeled comment information sample set of each information region, 4) for each sample, searching for P samples most similar to the sample in the sample set of each information region and calculating the final feature vector of the sample, 5) selecting a machine learning method and establishing a deceptive junk comment detection model for each information region based on the final feature vector of each sample, and 6) performing deceptive junk comment detection on a newly crawled user generated content webpage by virtue of the deceptive junk comment detection model. The method improves the efficiency of detecting the deceptive junk comments.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a user-generated content-oriented fraudulent garbage opinion detection method, which is mainly used in the fields of Internet information monitoring, information early warning, sentiment analysis, information filtering, vertical search and the like. Background technique [0002] With the rise of Web2.0 applications, Internet users can use BBS forums, blogs, microblogs, social networking sites and other places to express their feelings about Internet information by means of browsing, forwarding, commenting, posting blog articles and comments, etc. Concerns and opinions on topics of interest can also be communicated with relevant personnel. In this mode, the views, opinions and other content generated by users are called User-Generated-Content. It has been recognized that mining these user-generated content can yield valuable knowledge that can be applied in many ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/95

Inventor杨风雷黎建辉

OwnerCOMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

Deceptive junk comment detection method oriented to user generated contents

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology