Feature selection method and device for high-dimensional data and computer storage medium for high-dimensional data

A feature selection method and high-dimensional data technology, applied in the field of information processing, can solve the problems that cannot be widely used and the time complexity of the algorithm is high

Pending Publication Date: 2019-03-22
SHENZHEN UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The conventional global search algorithm can get the optimal subset, but as the feature dimension increases, the time complexity of the algorithm is high and cannot be widely used

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method and device for high-dimensional data and computer storage medium for high-dimensional data
  • Feature selection method and device for high-dimensional data and computer storage medium for high-dimensional data
  • Feature selection method and device for high-dimensional data and computer storage medium for high-dimensional data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Based on the above, see figure 1 , which shows a feature selection method for high-dimensional data provided by an embodiment of the present invention, the method may include:

[0032] S101: Initialize the particles in the population according to the cut point of the data set to be processed;

[0033] S102: Iteratively update each initialized particle according to a preset stop criterion and an update strategy to obtain updated output particles;

[0034] S103: Use the non-dominated solutions in the output particles to test the test set, and determine the optimal solution particle for the test; wherein, the test set is a part of the data set to be processed, and the test optimal solution is used for Feature selection is performed on the data set to be processed, and the optimal solution for the test is the particle with the highest classification accuracy and the least number of features in the test set.

[0035] It should be noted, figure 1 In the solution shown, in ...

Embodiment 2

[0086] In order to verify the feature selection method shown in the first embodiment, this embodiment also compares the technical solution of the first embodiment with other feature selection algorithms to verify the robustness and reliability. First, the technical solution of Embodiment 1 is shown in pseudocode form as follows:

[0087]

[0088]

[0089] In addition, in order to compare the technical solution of Example 1 with other feature selection algorithms, 10 high-dimensional gene datasets are set for testing, and these 10 datasets can be found at http: / / www.gems-system.org download on . The evaluation of the particles and the calculation of the classification accuracy are carried out by the KNN algorithm, the K value is set to 1, and the details of the data set are as follows Figure 5 shown. exist Figure 5 , the first column represents the name of the dataset, the second column represents the total number of features, the third column represents the number o...

Embodiment 3

[0100] Based on the same inventive concept as the previous embodiments, see Figure 7 , which shows a feature selection device 70 for high-dimensional data provided by an embodiment of the present invention, including: an initialization part 701, an update part 702, and a determination part 703; wherein,

[0101] The initialization part 701 is configured to initialize the particles in the population according to the tangent point of the data set to be processed;

[0102] The updating part 702 is configured to iteratively update each initialized particle according to a preset stopping criterion and an update strategy to obtain updated output particles;

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a Feature selection method and device for high-dimensional data and computer storage medium for high-dimensional data. The method may include initializing particles in a population according to a tangent point of a data set to be processed; According to the preset stopping criterion and the updating strategy, iteratively updating each initialized particleto obtain the updated output particle; Testing the test set by using a non-dominant solution in the output particle to determine an optimal solution particle of the test; Wherein the test set is a part of the data set to be processed, the test optimal solution is used for feature selection of the data set to be processed, and the test optimal solution is particles with the highest classification accuracy rate and the least number of features of the test set.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of information processing, and in particular to a feature selection method, device and computer storage medium for high-dimensional data. Background technique [0002] Currently, machine learning has been applied on a large scale in various big data scenarios, such as DNA microarray analysis, image classification, text classification, etc. These data usually have high data dimensions, and there will be irrelevant data features and redundant features in the data. Therefore, in the process of using machine learning to process these data, directly using raw data for processing will affect the efficiency and performance of machine learning algorithms. Based on this phenomenon, when using machine learning algorithms to process big data with high data dimensions, preprocessing is usually performed on the high-dimensional data to be processed, such as a series of preprocessing operations such...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/00G06K9/62
CPCG06N3/006G06F18/24
Inventor 周宇亢俊皓郭海男
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products