Unlock instant, AI-driven research and patent intelligence for your innovation.

Mixed attribute data set clustering method for automatically determining clustering center

A clustering method and automatic determination technology, applied in other database clustering/classification, other database retrieval, instruments, etc., can solve the problems of low algorithm accuracy and stability

Inactive Publication Date: 2020-06-30
BEIJING UNIV OF TECH
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the problem of low accuracy and stability of the algorithm due to the need to manually specify the initial cluster center and the number of clusters in the K-Prototypes clustering algorithm, the present invention proposes a clustering method for mixed attribute data sets that automatically determines the cluster centers, According to the density of data objects and the distance between data objects, this method realizes the automatic identification of the number of clusters, selects the initial cluster center, optimizes the local optimal problem caused by the initial point selection, and ensures the accuracy of the clustering results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed attribute data set clustering method for automatically determining clustering center
  • Mixed attribute data set clustering method for automatically determining clustering center
  • Mixed attribute data set clustering method for automatically determining clustering center

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The clustering method of the mixed attribute set proposed by the present invention depends on relevant hardware devices such as an information collection system, an algorithm server, and a user client, and is realized by clustering algorithm control software.

[0053] Combine below figure 2 The general flow of the invention illustrates the specific implementation method of the method of the present invention. In this implementation method, the basic parameters are set to: the proportion of neighbors p d = 1.5%. The specific implementation method can be divided into the following steps:

[0054] Step (1): Initialize, get data, and preprocess it

[0055] Step (1.1): read Credit approval credit card approval data set, build mixed attribute data set U={x i |1≤i≤N}, the total number of data objects in the dataset N=690;

[0056] Step (1.2): For any data x i ∈U,x i It is a data object described by 15 attributes, constructing an attribute collection Among them, the nu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention designs a mixed attribute data set clustering method for automatically determining a clustering center. Aiming at the problem of low algorithm accuracy and stability caused by manually specifying an initial clustering center and a clustering number in a K-Prototypes clustering algorithm, the method comprises four steps of initializing, pre-selecting clustering center points, determining the clustering center points and iterating a clustering division process. In the invention, based on density distribution of data objects, automatic identification of the number of class clustersis realized, an initial clustering center is selected, a local optimal problem caused by initial point selection is optimized, in addition, by distinguishing different influence weights of each attribute on a clustering result and improving a dissimilarity calculation formula, the clustering accuracy is improved, and a better clustering effect is achieved.

Description

technical field [0001] The invention belongs to the field of data mining, and in particular relates to a hybrid data set clustering method including numerical attributes and classification attributes. Background technique [0002] As an important part of the data mining field, clustering can discover the underlying structure and organizational rules of unlabeled data. In practical applications, data information not only includes quantitative numerical attributes such as age and height, but also category attributes such as gender and occupation. Mixed datasets containing both numeric and categorical attributes are ubiquitous. At present, the K-Prototypes clustering method is mostly used to solve the clustering problem of mixed data. This method has the characteristics of simplicity and high efficiency, and is widely used, but there are some shortcomings. The random selection of the initial clustering center in the K-Prototypes clustering algorithm may cause different conver...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F16/906
CPCG06F16/906G06F18/2321G06F18/24137
Inventor 孙志冉苏航梁毅韩永鹏
Owner BEIJING UNIV OF TECH