Sorting method based on non-supervision feature selection

A feature selection and classification method technology, applied in the field of data processing, can solve the problems of not being able to generate feature subsets, ignoring associations, and not being able to obtain classification results, so as to achieve the effect of improving classification speed and classification accuracy

Active Publication Date: 2014-07-23
ZHEJIANG UNIV
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in feature selection, the importance of the features of high-dimensional data is usually sorted by some evaluation criteria, ignoring the possible associations between different features, so the optimal feature subset cannot be generated, and thus cannot be obtained. best classification result

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sorting method based on non-supervision feature selection
  • Sorting method based on non-supervision feature selection
  • Sorting method based on non-supervision feature selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Aiming at the impact of the "curse of dimensionality" on high-dimensional data mining, the present invention first obtains the similarity matrix of high-dimensional data through spectrogram theory and ITML metric learning, and then uses the SM algorithm to complete the mapping from the original sample set to the feature vector space. Coefficient vectors and MCFS scores for feature selection. Finally, the support vector machine is used to establish a classification model for the data after feature selection and classify the driver's EEG data to verify the effectiveness of the algorithm. Compared with other algorithms, the present invention well preserves the correlation between high-dimensional data features when performing feature selection before building a classification model, and is beneficial to overcome the impact of the "curse of dimensionality" on high-dimensional data.

[0030] Such as figure 1 , figure 2 Shown, the present invention is based on the classifi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sorting method based on non-supervision feature selection. By means of the method, high dimensional data are expressed in similar diagrams, distances between sample points are obtained through the ITML, and a similar matrix of the original high dimensional data is set up; then the SM algorithm is executed on the similar matrix and a diagonal matrix corresponding to the similar matrix to achieve mapping of original sample sets to feather vector space; then through learning of sparse coefficient vectors and MCFS scores, weight coefficients of all attributes in the original sample set are obtained, and the attribute which can best express the original sample information is selected out; finally a support vector machine is used for setting up a sorting model of the selected data to predict fatigue states of a driver. The method selects features of the high dimensional data under the condition of maintaining data aggregate structures before the sorting model is set up, and the negative effect of curse of dimensionality on data sorting is avoided.

Description

technical field [0001] The present invention relates to the fields of data processing such as signal processing, data mining, and cluster analysis, and in particular to a method of using an unsupervised feature selection method based on information metric learning to reduce the dimension of high-dimensional data, and then using a support vector machine to establish a classification model. . Background technique [0002] With the continuous development of the Internet and the information industry, data information in many fields such as economy, electronic information, medicine, and meteorology has also ushered in a stage of explosive growth, and there is no shortage of massive high-dimensional data. How to classify high-dimensional data to better discover potential useful information is a research hotspot in the field of data mining. [0003] Classification is the process of predicting data class labels by establishing a classifier that describes a pre-defined data class or...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
Inventor 郑宝芬苏宏业罗林
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products