Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A feature selection method for clustering algorithm based on density clustering

A feature selection method and clustering algorithm technology, applied in the field of data analysis, can solve the problems of high computational cost, overfitting, and low accuracy, and achieve the effects of accurate clustering results, improved processing capacity, and reduced computational cost.

Pending Publication Date: 2019-03-29
贵州联科卫信科技有限公司
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The first type of method: Filter method, such as Relief, measures based on mutual information and maximum information coefficient, etc. This type of method is to assign weights to each feature. Its main feature is that it is easy to use, but it is not suitable for processing continuous variables. , and the calculation result is very sensitive to the way of discretization, usually this method is low in accuracy; the second type of method: Wrapper method, such as recursive feature elimination method, LasVegasWrapper, etc., this type of method is to regard the selection of feature subsets as For the search optimization problem, then generate different combinations to be evaluated, and finally compare with other combinations. The obvious disadvantage of the Wrapper method is that the calculation cost is too high and there is a risk of over-fitting; the third type of method: Embedded method, such as introducing Regular terms, random forests, etc. This type of method aims to reduce the calculation time required to reclassify different subsets in the Wrapper method. It is to select those features that are important to model training during the process of determining the model; The disadvantage of this type of method is that the effect on high-dimensional data sets is weak; the fourth type of method: the combination of Filter and Wrapper methods, through a specific learning algorithm and Filter method similar time complexity to achieve the best performance, the disadvantages of this type of method is less effective for sparse datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A feature selection method for clustering algorithm based on density clustering
  • A feature selection method for clustering algorithm based on density clustering
  • A feature selection method for clustering algorithm based on density clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] Example 1. A feature selection method for clustering algorithms based on density clustering, which is completed in the following steps,

[0066] a. Suppose: the data set D contains M instances and N features, then there is a feature set F={f consisting of N features 1 , f 2 ,..., f N};

[0067] Normalize the data set D to obtain the data set D', and then use the Euclidean distance (Euclidean distance) as the similarity measure between the features in the data set D' to construct a similarity matrix between the features; normalize Optimization can improve the accuracy rate, and it has a significant effect when it comes to the algorithm of distance calculation;

[0068] b. Use the DBSCAN algorithm to cluster the features of the similarity matrix, and divide the features into three categories: core features, boundary features, and atypical features; specifically, according to the principle of dividing points in the DBSCAN algorithm, the features are divided into three ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a feature selection method for a clustering algorithm based on density clustering. the method includes the following steps: a. Supposing that the data set D contains M instances and N features, then there is a feature set F=,, ..., fN composed of N features; normalizing the data set D to obtain The data set D ', and then using the Euclidean distance as the similarity measure between the features in the data set D' to construct the similarity matrix between the features. B. Using DBSCAN algorithm to cluster the features of similarity matrix, and dividing the features into three categories: core features, boundary features and atypical features; (c) after that feature clustering is completed, using the feature selection algorithm to select an n-dimensional feature subset F ', where n <= N, and (shown in the description) guarantees the least redundancy between features in the feature subset F '. The method has the characteristics of high accuracy, low calculation cost and strong processing ability of massive data and sparse data sets.

Description

technical field [0001] The invention relates to the technical field of data analysis, in particular to a clustering algorithm-oriented feature selection method based on density clustering. Background technique [0002] As one of the research focuses of machine learning, feature learning has made great progress along with the development of machine learning field. In high-dimensional data, the efficiency and accuracy of clustering or classification are usually unsatisfactory, and its performance will drop sharply as the number of features increases. Therefore, feature selection techniques are used to solve the problem before training machine learning models. This problem can play a very good effect. As an important data analysis technique, feature selection is used to reduce feature redundancy and mine hidden information in high-dimensional data, and its accuracy is crucial for data analysis. [0003] At present, many scholars have conducted research on feature selection me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 李晖施若冯刚
Owner 贵州联科卫信科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products