Data classification method and device

A data classification and related data technology, applied in the field of data processing, can solve problems such as large amount of calculation, waste of system resources, and reduce execution efficiency of product-related data classification process, so as to reduce management complexity, reduce computing load, and improve execution efficiency Effect

Active Publication Date: 2011-09-21
ALIBABA GRP HLDG LTD
View PDF6 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, using the hierarchical clustering algorithm to classify the relevant data of various commodities in the e-commerce website requires a very large amount of calculation, so that it cannot be completed by a single machine, and a server cluster is required for unified calculation.
Obviously, this will greatly waste system resources and consume a lot of computing time, so that the classification of product-related data cannot be completed in a timely and effective manner, reducing the execution efficiency of the product-related data classification process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method and device
  • Data classification method and device
  • Data classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In order to improve the execution efficiency of the product-related data classification process and reduce the operating load of the system, in the embodiment of this application, when the product-related data is classified, the relevant data of each product that needs to be classified is obtained, and the product title is extracted; Carry out word segmentation for each commodity title, and determine the weight of each word segmentation, where the weight of each word segmentation is used to indicate the historical occurrence frequency of the word; for different commodities, the word segmentation whose weight value meets the preset conditions is selected to form a word segmentation sequence ;Comparing the word segmentation sequences selected for each product, and merging the relevant data of the products with the same word segmentation sequence.

[0023] Among them, when merging products with the same word segmentation sequence, it includes directly merging the relevant d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of data processing and discloses a commodity classification method and device, which are used for increasing the executing efficiency of a commodity classification flow. The method comprises the following steps of: acquiring relevant data of commodities to be classified and extracting commodity titles from the data; dividing participles of commodity titles respectively and determining the weight of each participle, wherein the weight of each participle is used for representing the history occurrence rate of the participle; selecting participles of which the weight values are consistent with a preset condition respectively specific to different commodities to constitute a participle sequence; and comparing the participle sequences selected specific to thecommodities and combining relevant data of commodities having the same participle sequence. By adopting the method and the device, the quantity of relevant data of commodities needing to be processedis reduced greatly, commodity classification can be realized quickly and accurately in a short period of time, the executing efficiency of the commodity classification flow is increased effectively, the management complexity of relevant data of the commodities is lowered, and the operation load of a system is lowered.

Description

technical field [0001] The present application relates to the field of data processing, in particular to a data classification method and device. Background technique [0002] In an e-commerce website, various product data are usually stored in the form of text, data tables, and the like. An e-commerce website needs to manage tens of millions of product data. Therefore, how to classify the product data according to the information content it describes, and manage similar product data in a unified manner, so as to reduce the management complexity of the system and reduce the burden on the system. The operating load is the first issue that needs to be considered when operating an e-commerce website. [0003] At present, in various e-commerce websites, clustering algorithms are usually used to classify various product data, that is, according to a series of preset rules and conditions, the product data is divided into multiple categories through similarity analysis. In the pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06Q30/00
CPCG06F17/3071G06F16/355
Inventor 钟灵刘华雷
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products