Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cluster-feature-weighted fuzzy compact scattering and clustering method

A technology of feature weighting and clustering method, applied in the field of data processing, can solve the problems of unbalanced data division of sample distribution, failure to consider the actual situation of hard division of samples, and no consideration of boundary points of hard division, and achieve high clustering accuracy, The effect of reducing time-consuming and good clustering performance

Active Publication Date: 2014-12-03
南京迪塔维数据技术有限公司
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the fact that the existing WFCM algorithm does not consider the actual situation of sample hard division when clustering, and cannot handle the unbalanced data division of samples well, the FCS algorithm does not consider the situation of hard division boundary points and ignores the influence of sample characteristic parameters on various clusters problem, the present invention discloses a cluster feature weighted fuzzy compact spread clustering method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cluster-feature-weighted fuzzy compact scattering and clustering method
  • Cluster-feature-weighted fuzzy compact scattering and clustering method
  • Cluster-feature-weighted fuzzy compact scattering and clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0098] In order to better illustrate the performance of the present invention, we adopt the method of the present invention to carry out classification experiments for one of the real data sets of UCI repository of machine learning databases: the Iris data set, and the fuzzy index m is respectively set to (1.5,2,2.5,3 ,3.5), the iteration error precision is 10 -6 , the parameters β in the cluster feature addition algorithm CWFCS algorithm of the present invention are respectively set to (0.005, 0.05, 0.5, 1), in order to represent the unbalanced distribution of samples, the Iris data set retains all the data of the first and second classes and uses them from the third Randomly select 10 samples from the class, and a total of 110 samples are divided into 3 categories, wherein the 2nd category and the 3rd category have intersections, and the clustering results using the algorithm of the present invention (abbreviated as CWFCS algorithm) are as follows Figure 2 ~ Figure 6 shown....

Embodiment 2

[0101] In order to verify the superiority of the present invention, we use FCS, WFCM and the CWFCS provided by the present invention to test the Iris data set respectively.

[0102] In the experiment, the fuzzy index m in the experiment is set to (1.5, 2, 2.5, 3, 3.5) error! Reference source not found. , the iteration error precision is 10 -6 , the parameters β in the CWFCS algorithm are respectively set to (0.005, 0.05, 0.5, 1); the experiment is repeated 100 times, and the optimal result and the average result are taken. Use the accuracy rate (Accuracy), the number of iterations (Iter), and the execution time (Time) to measure the optimal performance of the algorithm, and use the average accuracy rate (avg_Accuracy, the number of samples correctly divided / total number of samples), the average number of iterations (avg_Iter) and the average execution time (avg_Time) to measure the overall performance of the algorithm. The best and average results of the clustering results o...

Embodiment 3

[0111] We then use FCS, WFCM and the three methods of CWFCS provided by the present invention to experiment on the Breast Cancer data set. The Breast Cancer data set has 30 attributes in total. In order to represent the unbalanced distribution of samples, the first type randomly selects 10 samples, and the first type randomly selects 10 samples. There are 367 samples in the second category, and the results are shown in Table 2. Table 3 shows that the performance of CWFCS algorithm is the most stable, the number of iterations is slightly higher than that of WFCM algorithm, the execution time is within 0.1 seconds, and the clustering accuracy is higher than the other two algorithms.

[0112] Algorithms

[0113] table 3

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cluster-feature-weighted fuzzy compact scattering and clustering method and aims at the problems that an existing WFCM algorithm does not take actual situations of sample hard division into consideration and is poor in effect on clustering of data with unbalanced sample distribution and an FCS (fuzzy compactness and separation) algorithm does not take situations of hard division boundary points and neglects influence, of sample feature parameters, on clustering of various kinds. By adjusting sample membership degree and feature weight, actual situations of sample hard division are followed, influence, of the sampler feature parameters, on clustering of various kinds is fully taken into consideration, samples are enabled to be compact in a category and disperse among categories as far as possible, the problem of membership degree of the samples positioned at a hard division boundary is solved, and noise data and abnormal data are divided more effectively under the circumstance that the samples are distributed in an unbalanced manner. The cluster-feature-weighted fuzzy compact scattering and clustering method is high in clustering performance, high in convergence speed, high in iteration efficiency and suitable for being applied to occasions with unbalanced sample distribution and high requirements on instantaneity and accuracy in industrial control.

Description

technical field [0001] The invention belongs to the technical field of data processing, in particular to a cluster feature weighted fuzzy compact spread clustering method. Background technique [0002] In natural science and social science, there are a large number of classification problems. Clustering method is a statistical analysis method to study (sample or index) classification problems, and it is also an important algorithm of data mining, which has a wide range of applications. Fuzzy C-means (FCM) clustering algorithm is a commonly used unsupervised pattern recognition method. Many people continue to improve the FCM algorithm. These algorithms take into account the influence of each characteristic parameter of the sample on the clustering center, and improve the impact of noise and abnormal data. and so on. However, these FCM-based clustering algorithms essentially only consider the intra-class compactness (intra-class scatter) of samples, but ignore the inter-class...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F18/23
Inventor 周媛王丽娜何军
Owner 南京迪塔维数据技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products