Method for filtering comment spam based on bidirectional iteration and automatically constructed and updated corpus

A spam comment and automatic construction technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as the calculation results of limited word similarity, achieve accurate and efficient judgment, reduce costs, and reduce labor costs. Effect

Inactive Publication Date: 2015-11-18
ZHEJIANG SCI-TECH UNIV
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method also relies on the accuracy of spam comments filtered out by the rules, and is limited by HowNet's word similarity calculation results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for filtering comment spam based on bidirectional iteration and automatically constructed and updated corpus
  • Method for filtering comment spam based on bidirectional iteration and automatically constructed and updated corpus
  • Method for filtering comment spam based on bidirectional iteration and automatically constructed and updated corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0065] Such as figure 1 As shown, the spam comment filtering method based on two-way iteration and automatic construction of the updated corpus of the present embodiment includes the following steps:

[0066] (1) Obtain product review texts to build a corpus S, and initially divide the corpus to form a normal review text set Z cand and spam text set L seed .

[0067] The method for filtering spam comments in this embodiment has no special requirements on the number of product categories and the number of comment texts in the corpus S. For the convenience of implementation, this embodiment first captures 2500 mobile phone product review texts from e-commerce websites to form a corpus S, that is, the number of product categories is 1, and the number of review texts is 2500.

[0068] In this embodiment, regular expressions are used to identi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for filtering comment spam based on bidirectional iteration and an automatically constructed and updated corpus. The method comprises the following steps of obtaining a comment text of a product to construct the corpus, and carrying out initial partition on the corpus to form a comment spam text set and a normal comment text set; utilizing a Bayesian filter to carry out comment spam judgment on the comment texts in the normal comment text set and the comment spam text set, and updating the comment spam text set and the normal comment text set; and utilizing the Bayesian filter to iteratively carry out comment spam judgment until results, which are obtained through adjacent iterations, no longer change, and judging the comment text in the comment spam text set, which is obtained at the last iteration, as the comment spam text. According to the method for filtering the comment spam, the Bayesian filter is utilized to iteratively carry out comment spam judgment, so that the new comment spam text can be automatically recognized, and the comment spam text set and the normal comment text set can be automatically updated, and thus a more accurate judging result is obtained.

Description

technical field [0001] The invention relates to the technical field of spam comment filtering of user comment texts in e-commerce websites, in particular to a spam comment filtering method based on two-way iteration and automatic construction of an updated corpus. Background technique [0002] With the popularization of the Internet and the rapid development of e-commerce applications, user comment data in the network is increasing exponentially. These massive user comment data contain a lot of valuable information, which can bring huge commercial value. [0003] However, at the same time, phenomena such as online fraud and fake letters have also emerged. There are often a large number of spam comments in the review data, such as merchant advertisements, fried letter comments, malicious reviews, etc., making it impossible for users to obtain products and sellers. True evaluation also seriously hinders information mining, and may even lead to wrong mining results. Therefore...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 张宇刘妙
Owner ZHEJIANG SCI-TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products