Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Clustering method for attribute missing data set

A technology of missing data and clustering method, applied in the field of clustering of missing attribute data sets, can solve the problems of poor clustering accuracy of WDS and PDS, large estimation deviation, and low accuracy of clustering results.

Inactive Publication Date: 2016-11-16
HAINAN UNIVERSITY
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These clustering algorithms solve the clustering problem of incomplete data sets to varying degrees, but the estimation deviation for missing attributes is relatively large, and the accuracy of the resulting clustering results is not very high, especially when the missing ratio is high. and PDS clustering accuracy is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method for attribute missing data set
  • Clustering method for attribute missing data set
  • Clustering method for attribute missing data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] A clustering method for missing attribute data sets, clustering the missing data set S, the number of clustered categories is c, the missing data set S contains x data, the attribute dimension is y, and the number of missing attributes is n, the cluster center is expressed as a matrix with a size of c*y, and the clustering method includes the following steps:

[0045] S1. Perform ant colony coding on missing attributes and cluster centers: spatially superimpose all missing attributes in the data set and the attribute values ​​of each dimension of each cluster center to form an n+c*y-dimensional vector, and put the The vector is used as the position vector of a single ant in the ant colony;

[0046] S2. Determine the missing attribute and the value space of the cluster center, the value space is the search range of the dimension corresponding to the position vector.

[0047] For the missing attribute, the nearest neighbor interval of the missing attribute is obtained by...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a clustering method for an attribute missing data set. According to the clustering method, on the basis of accuracy limiting factors of a missed data set clustering algorithm, a nearest neighbor method is employed to determine an estimate constraint space of a missed attribute, real number coding of the missed attribute and a clustering center is carried out, searching optimization is carried out through an ant colony algorithm, a missed attribute estimate and the clustering center are both acquired in an optimization iteration process, fuzzy clustering is further accomplished through a membership function of an FCM algorithm, so a hybrid optimization clustering algorithm of the attribute missing data set is formed. The method is advantaged in that accuracy of the missed attribute estimate is improved, and clustering result error dividing probability is reduced.

Description

technical field [0001] The invention relates to the technical field of information data processing, in particular to a clustering method for data sets with missing attributes. Background technique [0002] The missing attribute of the dataset means that some samples in the dataset are missing one or more attribute values. This situation is often caused by information loss during sensor acquisition or signal transmission. When attributes are missing in multiple samples, the accuracy of the clustering results of the data set will decrease. [0003] To solve this problem, scholars have proposed many solutions based on the existing complete data set clustering algorithms, among which the Fuzzy C-Means algorithm (Fuzzy C-Means, FCM) is widely used. Under the idea of ​​this algorithm, scholars proposed an FCM-based clustering method for missing attribute datasets. Especially the WholeData Strategy (WDS), Partial Distance Strategy (PDS), Nearest Prototype Strategy (NPS) and Opti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 任佳张胜男
Owner HAINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products