Unsupervised cluster characteristic selection method based on Laplace regularization

A feature selection and Laplacian matrix technology, applied in the field of data processing, can solve problems such as data complexity, dimensional disaster practicality, and affecting learning efficiency and effect

Inactive Publication Date: 2012-10-10
ZHEJIANG UNIV
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] These traditional clustering methods have successfully solved the clustering problem of low-dimensional data, but with the rapid development of information technology and the improvement of data collection capabilities, the dimension of data in various fields has increased exponentially. The complexity of traditional clustering methods often fail when dealing with many high-dimensional data
Because when traditional clustering methods cluster high-dimensional data sets, they mainly encounter two problems: (1) There are a large number of irrelevant attributes in high-dimensional data sets, making the possibility of clusters in all dimensions almost zero, which greatly increases (2) The disaster of dimensionality brought by high dimensions makes some clustering algorithms practically zero, which seriously affects the efficiency and effect of learning in many fields such as image, recognition, and information retrieval.
[0015] As a classic spectral method for feature selection, the Laplacian score has been widely used in various applications. This method can effectively find out the main features of the data, but it cannot effectively extract the category features of the data; Q-alpha As a feature selection method, it has a good effect on gene selection, but it is not suitable for other applications (such as image processing); variance feature selection method is one of the simplest feature selection methods, but it is only Select the feature with the largest change as the feature with the most information. This selection method is easily disturbed by noise data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised cluster characteristic selection method based on Laplace regularization
  • Unsupervised cluster characteristic selection method based on Laplace regularization
  • Unsupervised cluster characteristic selection method based on Laplace regularization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to describe the present invention more specifically, the clustering method of the present invention will be described in detail below in conjunction with the drawings and specific embodiments.

[0054] Such as figure 1 As shown, an unsupervised clustering feature selection method based on Laplacian regularization includes the following steps:

[0055] (1) Construct the sample feature matrix.

[0056] In this embodiment, the ORL face data set is taken as an example, and the statistical information of the data set is shown in Table 1.

[0057] Table 1

[0058] data set

Face image frame number

Number of face categories

number of image features

ORL

1400

20

1024

[0059] Among them, there are 1400 frames of face images in the ORL face dataset, and the 1400 frames of face images are composed of 20 face images of people with different appearances (70 frames of face images for each person).

[0060] Select five ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unsupervised cluster characteristic selection method based on Laplace regularization. The unsupervised cluster characteristic selection method comprises the following steps: (1) constructing a sample characteristic matrix, (2) calculating a Laplace matrix, and (3) extracting the characteristics of the sample characteristic matrix. The unsupervised cluster characteristic selection method disclosed by the invention selects the characteristics through directly measuring the variance of follow-up study prediction results, and can directly enhance the follow-up study prediction results. Influence of the selected characteristics to predicted values of the study problems is taken into the consideration in the characteristic extraction process, so that the follow-up study efficiency can be efficiently improved. In addition, the modeling of data of the unsupervised cluster characteristic selection method disclosed by the invention is on the basis of a Laplace method of manifold geometry of the data. The unsupervised cluster characteristic selection method can efficiently reflect distribution information of the data in the space so as to calculate the maximum dimensionality of the information amount.

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to an unsupervised clustering feature selection method based on Laplacian regularization. Background technique [0002] Clustering is a common multivariate statistical analysis method in machine learning and data mining. It discusses a large number of samples and requires reasonable classification according to their respective characteristics. There is no model for reference or to follow, that is, in performed without prior knowledge. At present, as an effective means of data analysis, clustering methods are widely used in various fields: in business, cluster analysis is used to discover different customer groups, and characterize the characteristics of different customer groups through purchase patterns; In biology, cluster analysis is used to classify animals and plants and to classify genes to gain an understanding of the inherent structure of populations; in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 何晓飞姚冠红
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products