Commodity classifying method and system based on mutual information

A classification method and mutual information technology, applied in the field of data mining, can solve problems such as classification difficulties, classification methods that cannot be easily classified, misclassification, etc., and achieve the effect of avoiding interdependence

Inactive Publication Date: 2014-05-07
BEIJING QIHOO TECH CO LTD +1
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, in the prior art, products with too similar descriptors are often misclassified because their features are interdependent, or some features are determined by other features, so this misclassification will occur, for example, "Association Laptop" and "Notebook (computer pattern)" will be considered as belonging to the same category
[0004] In addition, due to the intersection of many classification data in commodity classification, for e

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Commodity classifying method and system based on mutual information
  • Commodity classifying method and system based on mutual information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0090] Embodiment 1: Assuming that there are 10 commodity titles in a certain e-commerce website, the 10 commodity titles of the website are extracted from the website server database to construct a training set. Specifically, use the word segmentation technology to perform word segmentation processing on the extracted 10 product titles, filter out the feature words that do not describe the characteristics of the product, and retain the feature words that describe the product information, and then perform word frequency statistics on the filtered feature words, and select The word construction feature lexicon with word frequency higher than the preset value is shown in the table below.

[0091] Commodity category

feature word

cell phone

Sony, mobile phone, WCDMA, GSM

cell phone

iphone, mobile phone, black

cell phone

nokia, 1020, yellow

cell phone

Samsung, 9300, white

notebook

sony ultrabook black

not...

Embodiment 2

[0121] Embodiment 2: The example of Embodiment 2 is still based on the assumption of Embodiment 1. The difference lies in the construction of the training set. Specifically:

[0122] Still assuming that there are a total of 10 product titles in an e-commerce website, the 10 product titles of the website are extracted from the website server database to construct a training set. Specifically, use the word segmentation technology to perform word segmentation processing on the extracted 10 product titles, filter out the feature words that do not describe the characteristics of the product, and retain the feature words that describe the product information, and then perform word frequency statistics on the filtered feature words, and select The word construction feature lexicon with word frequency higher than the preset value is shown in the table below.

[0123] Commodity category

feature word

cell phone

Sony, mobile phone, WCDMA, GSM

cell phone ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a commodity classifying method based on mutual information. The method comprises the following steps: extracting relevant data from a website server database to contract a training set, wherein the relevant data comprise all commodity titles and corresponding commodity classes in a certain electronic business website; segmenting the commodity name of a new commodity to obtain all feature words of the commodity name; calculating the sum of the relevancy values of all the feature words of the commodity in each commodity class by taking the sum of the relevancy values of all the feature words of the commodity in one commodity class as the score of the commodity in the commodity class, wherein a commodity class with the highest score is taken as the class of the commodity. By adopting the method, mutual dependence among each feature word during commodity classification is avoided; the situation of intersection of data in each class is eliminated, and the calculated amount is reduced.

Description

technical field [0001] The present invention relates to the field of data mining, in particular to a mutual information-based commodity classification method and system. Background technique [0002] With the rapid development of electronic information technology, data mining has penetrated into various fields, especially in the field of e-commerce, and an efficient automatic classification method of commodities is very important to manage the massive commodity information in e-commerce. [0003] However, in the prior art, products with too similar descriptors are often misclassified because their features are interdependent, or some features are determined by other features, so this misclassification will occur, for example, "Association "Laptop" and "Notebook (computer pattern)" will be considered as belonging to the same category of goods. [0004] In addition, due to the intersection of many classification data in commodity classification, for example, clothing is divid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06Q30/00
CPCG06F16/285G06F16/35
Inventor 金学禹
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products