Unlock instant, AI-driven research and patent intelligence for your innovation.

Partially-weighted incomplete data hybrid clustering method

A complete data, locally weighted technology, applied in the fields of genetic laws, computer components, instruments, etc., can solve problems such as premature convergence, sensitivity to parameter values, and falling into local convergence.

Inactive Publication Date: 2018-03-06
LIAONING UNIVERSITY
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] If you only use local weighting to optimize the clustering algorithm, similar to the FCM algorithm, it uses a gradient descent mechanism to optimize; it is more sensitive to the initial parameter values ​​​​and is prone to local convergence limitations, etc.
If only the genetic algorithm is used to optimize the clustering algorithm, although the clustering effect is significantly improved, there are still defects such as premature convergence.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Partially-weighted incomplete data hybrid clustering method
  • Partially-weighted incomplete data hybrid clustering method
  • Partially-weighted incomplete data hybrid clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] In this paper, a new data set is constructed by partial weighted incomplete data algorithm using data samples with similar neighborhood structure to incomplete data samples, which more fully considers the data probability distribution information. The algorithm first determines the nearest neighbor sample information of the missing data, and the determination method of the nearest neighbor sample will calculate the similarity between samples. The missing attributes in multidimensional incomplete data are described by the corresponding weighted attribute values ​​of data samples with similar structure in the nearest neighbor. Among them, different samples conforming to the nearest neighbor rule can interpolate the missing attributes from different angles, and use the Gaussian kernel function to define the similarity between samples, and calculate the distance between the incomplete sample and the sample in the nearest neighbor to obtain a more reasonable weighting coeffic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a partially-weighted incomplete data hybrid clustering method, and the method comprises the following steps: (1), collecting data; (2), carrying out data processing: randomlyenabling a part of data to be short of a part of attributes, and enabling the part of data to become incomplete data; (3), carrying out the data estimation: carrying out the estimation of the lost data through an individual in an improved genetic algorithm; (4), carrying out the clustering analysis: carrying out the fuzzy clustering analysis of the estimated data. The invention proposes a partially-weighted incomplete data hybrid clustering method (GLW-FCM) optimized through the improved genetic algorithm, and achieves a purpose of searching all problem spaces to find an optimal solution. A UCI standard test dataset comprises Iris, Bupa, Wine, and Breast. The contrast experiment of the method provided by the invention with other five algorithms in a Matlab environment is carried out, and the improved algorithm hides the parallelism in the whole problem space for the searching of the optimal solution, and obtains the more ideal clustering result. The method effectively reduces the meanerror classification number, the mean error classification standard deviation and the mean iteration ending number.

Description

technical field [0001] The invention relates to a locally weighted hybrid clustering method for incomplete data, which belongs to the field of incomplete data clustering. Background technique [0002] The rapid development of information technology has brought massive and complex data in various fields, which has far exceeded the ability of human beings to process both in terms of capacity and scale. In order to efficiently and accurately analyze these data, cluster analysis has become a new development and new trend, using computer programs to intelligently and accurately complete the classification. [0003] Fuzzy C-means (FCM), as a basic unsupervised clustering method, is usually suitable for clustering data without missing attributes. However, in real life and industrial applications, factors such as data leakage, input errors, equipment failures, plan changes, data acquisition failures, and random noise effects cause the data to be incomplete. At this time, the data w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/12
CPCG06N3/126G06F18/23213
Inventor 张利牛明航孙颖石振桔郭炜儒孙军王军赵中洲
Owner LIAONING UNIVERSITY