A method and system for quickly identifying feature combinations in high-dimensional data

A technology for identifying features and high-dimensional data, applied in the field of network information, it can solve problems such as lack of optimal feature subset evaluation criteria, infeasible calculation, and NP difficulty.

Active Publication Date: 2017-01-04
ACAD OF MATHEMATICS & SYSTEMS SCIENCE - CHINESE ACAD OF SCI
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] (1) In the high-dimensional feature space (especially when the feature dimension is much larger than the number of samples), feature selection lacks a very good evaluation standard for the optimal feature subset in theory and practice
[0010] (2) The problem of exhaustively searching for the most feature combinations in ultra-high-dimensional spaces has been proven to be an NP-hard problem. Since the time cost of search calculations increases exponentially with the dimension of feature space, it is necessary to use these in high-dimensional feature spaces. Traditional feature selection methods are computationally infeasible
[0011] (3) The current method tends to select too many features when the data is high-dimensional, cannot remove highly correlated and redundant features, and cannot discover nonlinear combination effects between features
[0012] (4) In addition, the existing methods separate classification and feature selection, and fail to achieve simultaneous optimization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for quickly identifying feature combinations in high-dimensional data
  • A method and system for quickly identifying feature combinations in high-dimensional data
  • A method and system for quickly identifying feature combinations in high-dimensional data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention is aimed at the feature combination rapid identification system of high-dimensional data, such as figure 1 shown. The system consists of four modules:

[0048] The data preprocessing module analyzes the original data, preprocesses and groups them, and constructs training and verification data sets.

[0049] A model building block for constructing an optimized model for feature combination recognition;

[0050] The model calibration module is used to calibrate the optimal model for feature combination identification, and determine model parameters and prediction thresholds;

[0051] The recognition module is used to input the features with predictive ability into the optimization model of feature combination recognition to obtain the optimal feature combination with the least number of features, which can achieve the maximum classification accuracy of samples in the control group and the experimental group. optimal division.

[0052] Below we fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for quickly recognizing feature combinations in high-dimensional data. The method and the system have the advantages that cross validation and classification errors which are measured by a leave-one-out process can be minimized while the number of selected features is minimized, and the feature combinations in the high-dimensional data can be ultimately quickly recognized by means of modeling, so that the method can be used for statistically analyzing the high-dimensional data, and the method and the system have wide application prospects in fields of data mining, machine learning, artificial intelligence, biomedicine and the like.

Description

technical field [0001] The invention belongs to the technical field of network information, and relates to a method and system for quickly identifying feature combinations in high-dimensional data. Background technique [0002] The advent of the era of big data calls for research on data modeling and analysis. For example, health diagnosis based on big data biomarkers is an important research hotspot and has broad application prospects. We have noticed that scientists in many disciplines have begun to pay more and more attention to and rely on computer methods and mathematical modeling as auxiliary research tools. Help analyze massive scientific research data, explore the hidden laws in the high-dimensional data space that is difficult for human image thinking, and thus give rise to a series of new interdisciplinary and research directions, such as: numerical computing, data mining, bioinformatics , Computational Finance, Computational Chemistry, and Theoretical Research on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/24G06K9/62
Inventor 王勇
Owner ACAD OF MATHEMATICS & SYSTEMS SCIENCE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products