A method for automatically generating false comment dataset of Chinese commodity

A technology for automatic generation and product reviews, which is applied in electronic digital data processing, natural language data processing, special data processing applications, etc. It can solve problems such as large labor consumption, and achieve the effect of low computational complexity and easy programmatic implementation.

Active Publication Date: 2019-01-18
ZHEJIANG GONGSHANG UNIVERSITY
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a method for automatically generating a Chinese commodity false comment data set, which overcomes the problem of large manpower consumption in the existing manual labeling method, and provides a Chinese Provide data set for model training of commodity fake review identification method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for automatically generating false comment dataset of Chinese commodity
  • A method for automatically generating false comment dataset of Chinese commodity
  • A method for automatically generating false comment dataset of Chinese commodity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] Some of the proposals of the embodiments of the present invention are based on the inventor's discovery that the reason for the low degree of automation in the prior art lies in the following aspects:

[0018] First of all, both real reviews and fake reviews are written and published by natural persons. In order to make the fake reviews look real, the fake reviewers often consider the content of the reviews in advance and try to make the fake reviews look like real reviews as much as possible. If you simply analyze the content, there is no obvious difference between fake reviews and real reviews, so it is very difficult for machines to detect and identify them. In addition, most product reviews on e-commerce websites are short reviews, which contain relatively little feature information, which further increases the difficulty of automatic identification.

[0019] Secondly, on the basis of analyzing the review content, although the characteristic information of the revie...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically generating a false comment data set of Chinese commodity, which comprises the following steps: a) reading the comment data collected in advance intoa memory; B) segmenting the comment by using a word segmentation tool to obtain a word sequence representation of the comment text; C) comparing the text similarity between two commodity reviews within a given range to obtain a false commodity review set Rf1; D) executing an associated query on the comments in Rf1 to obtain a product false comment result set R1; E) extracting the name informationof the commentator corresponding to the commodity comment within a given range; (f) analyzing that name of the commentator, finding out the name of a series of commentators conforming to a certain rule characteristic, executing an associated query on the false commentator, and obtaining a product false comment result set R2; G) finally, merging R1 and R2 to obtain the final false comment data setof the commodity. The invention fully automates the detection and identification of the false comment in the commodity comment data and automatically generates the false comment data set of the commodity without manual intervention and labeling.

Description

technical field [0001] The invention relates to a method for automatically generating a data set of false Chinese commodity reviews, which can automatically generate a data set of false commodity reviews based on commodity reviews on domestic e-commerce websites. technical background [0002] At present, major e-commerce (referred to as e-commerce) websites have generated a large amount of review data, coupled with the uneven quality of reviews, it will be a huge challenge for consumers to consult and analyze these data, which has far exceeded The information processing ability of an ordinary consumer. Massive commodity reviews contain not only real and valuable consumer experiences and opinions, but also a considerable number of fake reviews. Behind the false product reviews is the drive of huge commercial interests, which is a typical unfair commercial competition. The proliferation of false product reviews will inevitably seriously affect the healthy development of the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/22G06F16/335
CPCG06F40/194G06F40/279
Inventor 毛郁欣申屠莹莹朱平
Owner ZHEJIANG GONGSHANG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products