Feature selection method for high-dimensional big data analysis and computer storage medium

A feature selection method and data analysis technology, applied in computer components, calculations, instruments, etc., can solve problems such as inability to remove redundant features, high-dimensional big data analysis dimension disaster, etc., and achieve good removal of redundant and irrelevant features Effect

Inactive Publication Date: 2019-07-05
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Purpose of the invention: The technical problem to be solved by the present invention is a feature selection method and computer storage medium for high-dimensional big data analysis, which can solve the defect that redundant features cannot be removed in the current method, and more effectively solve the problem of high-dimensional big data. The Curse of Dimensionality Problem in Analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method for high-dimensional big data analysis and computer storage medium
  • Feature selection method for high-dimensional big data analysis and computer storage medium
  • Feature selection method for high-dimensional big data analysis and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The method of the invention is a feature selection method for high-dimensional big data analysis. First, the correlation between features and categories is calculated, the features are sorted, irrelevant features are removed, and then the features are sorted according to the correlation between the remaining features. Clustering, and finally select the representative features of each feature cluster to remove some redundant features. In the field of software defect prediction, software defect datasets generally have the characteristics of large data volume and high dimension. The method of the present invention takes the software defect prediction data set pc4 as an example. According to the correlation between features and categories, the features are sorted, irrelevant features are removed, and then the features are clustered, and the representative features in the feature clusters are selected to remove some redundant features and construct the final feature subset. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a feature selection method for high-dimensional big data analysis and a computer storage medium, and the method comprises the steps: firstly, calculating the correlation between each feature and a category, so as to remove irrelevant features; calculating the correlation degree between the remaining features, performing feature clustering according to the correlation degreebetween the features, and selecting representative features from each feature cluster to remove redundant features. According to the method, the defect that redundant features cannot be removed in anexisting method can be overcome, and the problem of dimensional disasters in high-dimensional big data analysis is more effectively solved.

Description

technical field [0001] The present invention relates to a feature selection method and a computer storage medium, in particular to a feature selection method and a computer storage medium for high-dimensional big data analysis. Background technique [0002] With the rapid development of computer technology, various information technologies are applied to all walks of life, and the amount of data generated shows an exponential growth trend. Big data refers to the collection of data whose content cannot be captured, managed and processed by conventional software tools within a certain period of time. [0003] As we all know, big data is no longer simply a fact of big data, but the most important reality is to analyze big data. Only through analysis can a lot of intelligent, in-depth and valuable information be obtained. So more and more applications involve big data, and the attributes of these big data, including quantity, speed, diversity, etc., show the growing complexity ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23G06F18/211G06F18/24143
Inventor 徐小龙陈稳
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products