Clustering method

A clustering method and clustering technology, applied in text database clustering/classification, special data processing applications, instruments, etc., can solve the problems of large influence of initial clustering center and insufficient execution efficiency, achieving easy implementation, Easy to understand, good clustering effect

Inactive Publication Date: 2014-12-10
NANJING UNIV OF INFORMATION SCI & TECH
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the K-MEANS algorithm is one of the most classic clustering algorithms. Its simplicity, speed and ease of implementation make it the most commonly used algorithm in text data mining. However, K-MEANS is too much affected by the initial clustering center. , the execution efficiency cannot meet the requirements and other shortcomings

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method
  • Clustering method
  • Clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0028] The main advantage of the traditional K-MEANS algorithm is that it has good performance in processing sparse matrices, in addition to its simplicity, ease of understanding and implementation. However, the algorithm needs to manually specify the number of clusters K before execution. For unknown data sets, it is impossible to predict the real class distribution in the data set before clustering. The process of determining the K value requires the user's practical experience and there is a lot of uncertainty. Second, since the initial...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering method. The method comprises the steps that firstly, the pre-classification technology based on the density algorithm is used for obtaining a high-density core class, and a class hierarchy tree capable of representing a dataset structure is determined; then, K-MEANS clustering is carried out according to subclass centers with high representativeness in the class hierarchy tree to obtain fine clusters; finally, the fine clusters are combined according to class attributes in the class hierarchy tree to achieve a precise and stable clustering effect. The stable algorithm based on the fine clusters is provided according to sensibility of K-MEANS to initial clustering centers, convex type classes in a dataset can be divided, and the optimal division can be carried out on classes in irregular shapes.

Description

technical field [0001] The invention relates to a clustering method, in particular to a novel K-MEANS clustering method, which belongs to the technical field of data mining. Background technique [0002] With the development of the Internet, data has been shared and accumulated in large quantities, and the phenomenon of data overload and insufficient knowledge has become more and more prominent. The ever-expanding data will become a data grave because it is not utilized. If it can be fully tapped, the potential information contained in it will create a lot of value. The task of data mining is to discover knowledge from massive data. It is mainly aimed at structured data. In fact, a large amount of data is stored in the database in the form of text, which makes text data mining an important branch of data mining. [0003] Clustering technology is a key means of data mining, and its task is to classify texts with similar subject content into one category, while separate texts...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 侯荣涛王琴周彬路郁
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products