Similarity calculation-based junk comment detection method

A similarity calculation and spam comment technology, which is applied in computing, unstructured text data retrieval, natural language data processing, etc., can solve problems such as uneven quality of user comment text, increased information mining costs, confusion and even misleading

Active Publication Date: 2017-05-24
CHINA JILIANG UNIV
View PDF5 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The uneven quality of user comments brings confusion and even misleading to consumers who browse reviews and companies that obtain product-related information by researching reviews. At the same time, it increases the cost of information mining and reduces the accuracy of automatic mining tools.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similarity calculation-based junk comment detection method
  • Similarity calculation-based junk comment detection method
  • Similarity calculation-based junk comment detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The present invention will be further described below in conjunction with specific drawings.

[0047] The invention takes user comments on network platforms such as forums and e-commerce as the research object, and aims to detect spam comments from network comments, improve the quality of comment texts, and reduce the cost of automatic mining tools.

[0048] Spam review detection method based on similarity calculation, including data acquisition, false review detection, duplicate review detection, product feature dictionary construction and irrelevant review detection five steps, such as figure 1 shown. The five steps are described in detail below.

[0049] 1. Data acquisition: Use web crawlers to crawl forums and e-commerce webpages related to the specified product, then extract the comment data from the webpage, and save the comment data to the database.

[0050] The data acquisition process is as figure 2shown. First, call the Baidu search interface to search fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a similarity calculation-based junk comment detection method. The method comprises the following steps of 1, performing data acquisition: capturing webpages of a forum, e-commerce and the like related to a specified product by utilizing a network crawler, then extracting comment data in the webpages, and storing the comment data in a database; 2, performing false comment detection: calculating a time difference T between commenting time and purchase time, and if T is less than cargo transport time, determining that a comment is a false comment; 3, performing repeated comment detection: performing word segmentation on each comment, then calculating the similarity between the comments, and when the similarity between the comments is higher than a threshold, determining that one comment and another comment similar to the one comment are repeated comments; 4, performing data processing: performing processing including syntactic analysis, emotion annotation, feature word extraction and the like on the comments, and constructing a product feature dictionary according to a product specification; and 5, performing unrelated comment detection: calculating whether a comment target of each comment is a target product and features thereof or not, and detecting out the comments unrelated to the target product. The invention provides a new junk comment detection method.

Description

[0001] Technical field: [0002] The invention belongs to the field of user comments of natural language processing, in particular to a spam comment detection method based on similarity calculation. [0003] Background technique: [0004] With the advent of the mobile Internet era and the increasingly perfect construction of the Internet of Things, online shopping has gradually become a new and important way of consumption. More and more user-generated content (user-generated content) appears in online applications. Most consumers After shopping, they will post their own shopping experience, usage experience and opinions on products on the e-commerce platform. On the one hand, according to a 2011 survey report by Cone Corporation of the United States, 64% of users will refer to existing user reviews before purchasing, and the reviews published by users will have a certain impact on the consumption behavior of potential consumers and have commercial value; On the one hand, user...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344G06F16/335G06F16/36G06F16/9535G06F40/242G06F40/211G06F40/205Y02D10/00
Inventor 徐新胜袁俊林静文超
Owner CHINA JILIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products