Short text clustering method for book titles in book market

A clustering method and short text technology, applied in the computer field, can solve problems such as clustering of book titles in the book market.

Active Publication Date: 2016-04-20
BEIHANG UNIV +1
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present invention provides a short text clustering method for titles in the book market, which

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text clustering method for book titles in book market
  • Short text clustering method for book titles in book market

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0028] With the rapid development of Internet e-commerce websites, automatic classification of commodities has become a basic requirement of Internet e-commerce websites. For the classification of commodities, the prior art generally adopts a K-means clustering algorithm to classify commodities. However, with the traditional k-means clustering al...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a short text clustering method for book titles in a book market. The method comprises the steps that word vectorization is carried out on text data and set clustering key words, the distances from text data word vectors to a clustering key word vector are calculated, the clustering type of the text data is determined according to the distance from each text data word vector to the clustering key word vector, and the text data is divided into the corresponding cluster set according to the clustering type of the text data; the term frequency-inverse document frequency TF-IDF values of feature words in all the text data word vectors in each clustering set are calculated, and the feature words with the TF-IDF values meeting set conditions are determined as updated clustering key words of the clustering set; according to the updated clustering key words, the clustering type of the text data is determined. By the adoption of the short text clustering method, the clustering type of the text data can be more accurately determined, and the clustering result can better meet practical requirements of users.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a short text clustering method for titles in the book market. Background technique [0002] With the in-depth transformation of traditional industries on the Internet, Internet e-commerce websites have developed rapidly, and online shopping has become a trend. E-commerce websites have a huge amount of products. Because the product information on the Internet is intricate, the classification is cumbersome, and the update rate is fast, manual labeling of products often consumes a lot of manpower. Therefore, automatic product classification has become a basic requirement of e-commerce. For the classification of commodities, data mining methods are often used at home and abroad to operate. [0003] In the prior art, a K-means clustering algorithm is usually used to classify commodities. The K-means algorithm is an unsupervised clustering algorithm. It is based on a certain dista...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62G06F17/27
CPCG06F16/35G06F40/279G06F18/23213
Inventor 李欢孙阳刘海星张立尤树林
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products