Binary classification oriented factor screening method based on boosted regression trees

A screening method and regression tree technology, applied in the direction of complex mathematical operations, can solve problems such as failure, difficulty in clustering conclusions, high requirements for multivariate normality and homogeneity of variance, etc., to achieve improved stability, strong operability, Apply a wide range of effects

Active Publication Date: 2018-01-19
ANHUI NORMAL UNIV
View PDF1 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods have certain limitations, such as: principal component analysis needs to ensure that the cumulative contribution rate of the first few principal components extracted reaches a high level, and the name clarity of the extracted principal components is low. In addition, when the principal components When the signs of the factor loads of the components are positive or negative, the meaning of the comprehensive evaluation function is not clear; the cluster analysis has high requirements on the multivariate normality and variance homogeneity of the variables, and when the sample size is large, the clustering conclusion can be obtained It is difficult; factor analysis has specific requirements on the amount of data and components, and there are certain limitations. In addition, this method uses the least square method when calculating factor scores, which may fail in some cases; discriminant analysis is not suitable for processing There is multi-collinearity among the factors; the method based on fuzzy mathematics has a certain degree of subjectivity in the determination of the index weight vector
The common disadvantage of existing methods is that they cannot provide quantitative factor screening methods suitable for various data types without losing the original factor information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Binary classification oriented factor screening method based on boosted regression trees
  • Binary classification oriented factor screening method based on boosted regression trees
  • Binary classification oriented factor screening method based on boosted regression trees

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] In the embodiment of the present invention, the boosted regression trees algorithm (boosted regression trees) adopts the more commonly used gbm software package (https: / / www.r-project.org / ), based on the R software platform, combined with the grass felt layer in the Qilian Mountains (China A diagnostic layer in soil taxonomy) data (point data, as the target variable) and environmental factor data (area raster data, as the predictor) as examples for detailed explanation.

[0021] see figure 1 , the embodiment of the present invention is a binary classification-oriented factor screening method based on the enhanced regression tree algorithm, and the specific steps are as follows:

[0022] 1. Collect the target variables and predictors for the binary classification of grass felt layer, and establish the target variable-predictor data set.

[0023] The grass felt layer data (target variable) in this example comes from the National Natural Science Foundation of China key pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a binary classification oriented factor screening method based on boosted regression trees. The method comprises the following steps that: (1) searching data, and establishinga target variable-predictive factor dataset; (2) on the basis of the target variable and all factors, utilizing the boosted regression trees to carry out modeling, and calculating and sorting factor importance; (3) carrying out correlation analysis on all factors, analyzing a Pearson correlation matrix, and carrying out screening; (4) on the basis of the target variable and the retained factor, utilizing the boosted regression trees to establish a new model, calculating a predictive deviation, calculating and sorting the factor importance, and removing the factor with the lowest importance until the amount of the retained factors is less than or equal to 2; and (5) comparing the predictive deviation of each boosted regression tree model in the (4), and taking all factors adopted by the boosted regression tree model with the smallest predictive deviation as an optimal factor combination. By use of the method, a quantitative factor selection system is established, results are reliable, and an application field is wide.

Description

technical field [0001] The present invention relates to the technical field of factor screening, specifically an enhanced regression tree algorithm applicable to many fields such as agriculture, environment, ecology, hydrology, medical geography (such as epidemiology), disaster early warning and forecasting, and meteorology (such as weather forecast) A factor screening method for binary classification. Background technique [0002] Factor screening is the primary problem to be solved when studying binary classification target variables in many fields such as agriculture, environment, ecology, hydrology, medical geography (such as epidemiology), disaster early warning and forecasting, and meteorology (such as weather forecasting). Previous studies mostly used correlation coefficient method and stepwise regression analysis method. The correlation coefficient method is to conduct correlation analysis on all factors and eliminate factors with high correlation. However, the sele...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/18
Inventor 支俊俊
Owner ANHUI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products