Text-subject-model-based data processing method for commodity classification

A subject model and data processing technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as inaccurate results, multi-manpower, and inaccurate results, achieve accurate product classification, and reduce subjectivity. The effect of factors

Active Publication Date: 2013-02-13
BAIDU COM TIMES TECH (BEIJING) CO LTD
View PDF2 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are three problems: 1) A large number of products may lead to excessive human consumption; 2) A product may have multiple category attributes and can be divided into multiple categories, and manual classification will cause the editor to understand the attributes of things. 3) When classifying a certain product, the editor cannot accurately give the credibility of the classification
[0004] 1. It only analyzes the title text of the product, but not all texts related to the product, including product brief description, purchase user comments, etc.;
[0005] 2. The text segmentation...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text-subject-model-based data processing method for commodity classification
  • Text-subject-model-based data processing method for commodity classification
  • Text-subject-model-based data processing method for commodity classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The data processing method of the product classification based on the text topic model of the present invention includes:

[0022] Step 10', first manually classify the commodities into categories with obvious differences;

[0023] Step 10, import business-related Chinese and English vocabulary into the general thesaurus of the word segmentation system, and import business-related white-name English words for brands and common commodity English; at the same time, further expand the stop word thesaurus of the word segmentation system;

[0024] Wherein, the step 10' and the step 10 have no sequence.

[0025] Step 20. Based on the word segmentation system provided in the previous step, perform word segmentation on the description text part of the product, and then make each product have an order-independent word bag;

[0026] Step 30. After counting the word segmentation results, filter out the keywords that contribute more to the product description according to the TF-I...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text-subject-model-based data processing method for commodity classification. The method comprises the following steps of: importing Chinese and English vocabulary related to a service into a universal word library of a word segmentation system, and importing white name English words related to the service for brands and common commodity English; further expanding a stop word library of the word segmentation system; segmenting words for a description text part of a commodity, so that each commodity can have a bag of words which is not related to sequence; counting word segmentation results to acquire uncommon vocabulary with high frequency, and thus constructing a preferential word library; and appointing a general classification quantity, setting related parameters, executing quick Gibbs sampling, acquiring potential semantic association, comparing the latent semantic association with the preferential word library, the universal word library and the stop word library respectively, calculating comparison results to obtain the most possible classification of the commodity, and marking the classification by using the bags of words. In consideration of latent semantics, the influence of subjective factors of editorial staff is reduced, so that the commodity classification is accurate.

Description

【Technical field】 [0001] The invention relates to a data processing technology of electronic commerce, and relates to a data processing method of commodity classification based on a text topic model. 【Background technique】 [0002] In the e-commerce market of the Internet, the existing commodity classification systems are all manually classified by website editors. There are three problems: 1) A large number of products may lead to excessive human consumption; 2) A product may have multiple category attributes and can be divided into multiple categories, and manual classification will cause the editor to understand the attributes of things. 3) When classifying a product, the editor cannot accurately give the credibility of the classification. [0003] The Chinese invention patent with the publication number 102193936A disclosed on 2011-9-21 discloses a data classification method and device. The method is as follows: obtain the relevant data of each product that needs to be ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
Inventor 刘德建陈宏展欧宁吴拥民陈澄宇
Owner BAIDU COM TIMES TECH (BEIJING) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products