Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

High-dimensional data classification method based on two-stage mixed feature selection

A hybrid feature, high-dimensional data technology, applied in the fields of instruments, character and pattern recognition, computer components, etc., can solve the problems of easy to fall into local optimum, early convergence, easy overfitting of high-dimensional data, etc., to improve classification performance, improved operating speed, and the effect of accurate predictions

Pending Publication Date: 2021-12-10
ZHEJIANG SCI-TECH UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although it can achieve satisfactory results, there are still some problems, such as premature convergence, easy to fall into local optimum, and easy overfitting when dealing with high-dimensional data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional data classification method based on two-stage mixed feature selection
  • High-dimensional data classification method based on two-stage mixed feature selection
  • High-dimensional data classification method based on two-stage mixed feature selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0087] Embodiment 1, high-dimensional data classification method based on two-stage mixed feature selection, such as Figure 1-5 As shown, firstly, the MIC method is used to obtain the correlation between features and labels, and then a suitable deletion threshold is learned according to the Q-Learning algorithm to obtain the selected feature subset; and then the improved Particle Swarm Optimization (PSO, Particle SwarmOptimization ) to search for the optimal feature subset, and then predict the label of the sample in the data set.

[0088] Step 1. Obtain the data set and process it;

[0089] Download the microarray data set from the Internet, then organize the characteristic information of the data in the host computer, mark the classification labels of all samples, and finally remove the serial number of each sample, delete the missing samples in the data set, and obtain the processed data set;

[0090] In this embodiment, 15 medical-related microarray data sets are obtaine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a high-dimensional data classification method based on two-stage mixed feature selection. The method comprises the following steps: obtaining a processed data set; preprocessing the processed data set based on a maximum information coefficient (MIC) method to obtain an MIC matrix; obtaining a selected feature subset; performing fine search on the selected feature subset by using an improved PSO algorithm to obtain an optimal feature subset; updating features in the processed data set obtained in the step S1 according to the optimal feature subsets, establishing a training set and a test set for ten-fold cross validation according to the updated data set, and sequentially inputting the training set and the test set into a KNN classifier of which K is equal to 1 to obtain the classification accuracy of the corresponding ten optimal feature subsets; and taking the average value of the classification accuracy rates of the ten optimal feature subsets as the accuracy rate of the optimal feature subsets.

Description

technical field [0001] The present invention relates to technical fields such as reinforcement learning, feature selection, pattern recognition, machine learning, etc., and specifically relates to a high-dimensional data classification method based on two-stage mixed feature selection. Background technique [0002] With the rapid development of science and technology, more and more data are collected in machine learning tasks. There are a large number of irrelevant and redundant features in these data, which will reduce the prediction accuracy of the model and increase the computational complexity. Therefore, how to filter out the features most relevant to the task to be solved has become an urgent problem in machine learning and pattern recognition. As an effective tool for reducing the feature dimension, feature selection can eliminate useless features in the original data according to a given evaluation standard, save computing costs and improve prediction accuracy. In ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/24147G06F18/214
Inventor 李欣倩沈琪浩任佳
Owner ZHEJIANG SCI-TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products