Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Total correlation feature selection method and device based on Shapley value and hypothesis testing

A feature selection method and hypothesis testing technology, applied in the field of full correlation feature selection method and device based on Shapley values ​​and hypothesis testing, can solve the problems of inability to effectively evaluate feature correlation and inability to adaptively identify relevant features, etc. Achieve the effect of improving interpretability and enhancing reliability

Pending Publication Date: 2022-03-01
WUHAN UNIV
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The main purpose of the present invention is to provide a fully relevant feature selection method and device based on Shapley values ​​and hypothesis testing, aiming at solving the problems that feature correlation cannot be effectively evaluated and all relevant features cannot be adaptively identified

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Total correlation feature selection method and device based on Shapley value and hypothesis testing
  • Total correlation feature selection method and device based on Shapley value and hypothesis testing
  • Total correlation feature selection method and device based on Shapley value and hypothesis testing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0090] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0091] In the first aspect, the embodiment of the present invention provides a fully relevant feature selection method based on Shapley values ​​and hypothesis testing.

[0092] refer to figure 1 , figure 1 It is a schematic flow chart of the full correlation feature selection method based on Shapley value and hypothesis testing involved in the embodiment of the present invention.

[0093] Such as figure 1 As shown, the fully relevant feature selection methods based on Shapley values ​​and hypothesis testing include:

[0094] Step 1: Relevance assessment;

[0095] The input of step 1 is a data set consisting of N samples, denoted as where the feature vector of the nth sample is x (n) =(x 1 ,...,x M ), a total of M candidate features, and record the feature set as

[0096] Use the Shapley value to quantify ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a total correlation feature selection method and device based on a Shapley value and hypothesis testing. The method is suitable for feature sets with supervised tasks. A feature selection model for solving a total correlation problem is designed, the model firstly uses a Shapley attribution algorithm to calculate the local importance of features, then uses random features to construct an adaptive threshold, and then uses the importance and the threshold to evaluate the correlation of the features. In the aspect of a selection strategy, dual hypothesis testing is designed, irrelevant features are quickly eliminated by utilizing local hypothesis testing, and then the risk of deleting the relevant features by mistake is reduced by utilizing global hypothesis testing. And finally, all the features related to the problem domain are obtained, so that the interpretability of the feature set is improved, and the prediction reliability is enhanced.

Description

technical field [0001] The present invention relates to the technical field of feature selection, in particular to a fully relevant feature selection method and device based on Shapley values ​​and hypothesis testing. Background technique [0002] Feature selection is one of the important issues in feature engineering. The task of feature selection is to select a subset of features related to the problem domain from the original feature set. The purpose of feature selection is to improve the interpretability and prediction performance of the feature set. It is crucial to solve this problem in feature data-centric scenarios. The current traditional feature selection research is mainly to solve the minimum optimal problem, that is, to select the minimum feature subset with the best classification performance. According to the feature subset evaluation criteria, the methods can be divided into two types: filtering and encapsulation. The filtering method specifically refers to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/211G06F18/214
Inventor 陈丹殷丁泽汤云波李小俚熊明福
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products