Commodity property characteristic word clustering method

A technology of commodity attributes and clustering methods, which is applied in text database clustering/classification, special data processing applications, instruments, etc., can solve problems such as few people propose effective methods, achieve feature dimension reduction and feature sparsity, fast and Accurate performance, reduced number of effects

Active Publication Date: 2016-01-13
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF4 Cites 55 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, few people have proposed effective methods for the clustering of product attribute feature words.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Commodity property characteristic word clustering method
  • Commodity property characteristic word clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0011] The present invention will be described in further detail below in combination with specific embodiments and with reference to the accompanying drawings.

[0012] Such as figure 1 As shown, it is a flow chart of the method for clustering commodity attribute feature words in this specific embodiment.

[0013] The embodiment of the present invention proposes a method for clustering product attribute feature words based on word vector representation for product review texts. First determine the target product that needs to be researched and analyzed, and prepare the data: obtain the comment text of the target product from the relevant e-commerce website, and perform data preprocessing, mainly including word segmentation, part-of-speech tagging, word frequency statistics, stop word filtering, and low-frequency Word filtering: Select a number of comment texts containing product attribute feature words from the obtained product review texts, and manually mark the product att...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a commodity property characteristic word clustering method. The method comprises the following steps: A1: obtaining comment texts of a target commodity from related e-commerce websites, and performing data preprocessing; A2: selecting a comment text containing commodity property characteristic words, performing manual annotation on the commodity property characteristic words, and using the manually annotated commodity property characteristic words as a training sample of an obtained part-of-speech template; A3: training the part-of-speech template according to the manually annotated data in the A2; A4: using data obtained in the A1 to train a language model, thereby obtaining a word vector representation; and A5: using a word vector obtained in the A4 to perform clustering on the commodity property characteristic words obtained in the A3, thereby obtaining a final property characteristic word set of the target commodity. The commodity property characteristic word clustering method provided by the present invention can be applied to a commodity recommendation system based on a commodity comment text. The number of commodity property characteristic words can be reduced by clustering, so that characteristic dimensions and characteristic sparsity are reduced, and the designed recommendation system is faster and more accurate.

Description

technical field [0001] The invention relates to the fields of computer data processing and data mining, in particular to a method for clustering commodity attribute feature words. Background technique [0002] The mining of product review data belongs to the field of computer text processing and mining. It has very direct applications for analyzing the characteristics of target products, analyzing market demand for target products, obtaining users' personalized preferences, and recommending products to users. User comments on products contain rich information, and many researchers are now focusing on using review texts to improve the performance of product recommendation systems. The two most important pieces of information in the user comment text are: the product attributes that the user is concerned about, and the user's evaluation of the performance of the target product on the attributes they are concerned about. Therefore, the acquisition and processing of product att...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/205G06F40/30
Inventor 杨余久袁威强
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products