Feature selection method and application based on feature identification degree and independence

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A feature selection method and recognition technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as large time overhead, overfitting of small sample data sets, etc.

Inactive Publication Date: 2016-09-14

SHAANXI NORMAL UNIV

View PDF3 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The Wrapper method relies on the learning process. The feature subset generation process is completed based on the performance of the classification model based on the corresponding feature subset in the verification set. Generally, a feature subset with better performance and smaller scale than the Filter method is selected, but The classification model needs to be trained multiple times, which takes a lot of time and is prone to "overfitting" problems on small sample data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0066] In this embodiment, the feature selection method based on feature recognition and independence is implemented by the following steps:

[0067] (1) Randomly generate the first type of data set D conforming to the normal distribution 1 , denoted as D 1 ={X 1 ;X 2 ;…;X 10}∈R 10×50 , randomly generate the second type of data set D conforming to the normal distribution 2 , denoted as D 2 ={X 11 ;X 12 ;…;X 20}∈R 10×50 , data set D 1 and D 2 Each contains 10 samples, and each sample has 50 features. Dataset D 1 and D 2 Merge into a data set D, expressed as D={X 1 ;X 2 ;…;X 20}∈R 20×50 , which contains 20 samples, distributed in 2 categories, each sample contains 50 features, and then use the bootstrap method to divide the data set to obtain the training set and test set.

[0068] (2) Calculate the recognition degree of each feature

[0069] (2.1) Use the Wilcoxon rank sum test method to calculate the weight w of each feature in the data set D i ,specifical...

Embodiment 2

[0088] In step (2) of this embodiment, the weight w of each feature in the data set D i The calculation method of can also be calculated by the D-Score method. D-Score is a feature weight calculation method based on intra-class and inter-class distances. The specific calculation formula is as follows:

[0089] D i = Σ j ′ = 1 c ( x ‾ i ( j ′ ) - ...

Embodiment 3

[0093] In step (2) of this embodiment, the weight w of each feature in the data set D i The calculation method of can also be calculated by the method based on mutual information. Mutual information is used to evaluate the correlation between two features or between features and class labels. The calculation formula is as follows:

[0094] I(f i ,Y)=H(Y)-H(Y|f i )

[0095] Among them, Y represents the class label vector of the data set; I(f i , Y) represents the feature f in the data set i The mutual information value between and the class label vector Y, that is, the feature f i weight; H(Y) is the information entropy of the class label vector Y; H(Y|f i ) for the feature f i The information entropy of the class label vector Y under the condition of determining the value.

[0096] For continuous features, it needs to be discretized in advance.

[0097] Other steps are the same as in Example 1.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a feature selection method and application based on feature identification degree and independence. The method comprises following steps: calculating the importance degree of each feature by measuring inter-class distinguished ability with feature identification degree and measuring correlational relationship between features with feature independence and sequencing in a descending order; and selecting top k features with the importance higher than those of others to form a feature subset with high class-discrimination performance. Differently-expressed gene subsets selected in application of oncogene expression profile data obtain fine time and class discrimination performance. The feature selection method and application based on feature identification degree and independence have following beneficial effects: easy calculations can be made; time complexity is reduced; selection efficiency runs high; and a good reference is provided for clinical diagnoses and judgments of tumors and other diseases.

Description

technical field [0001] The present invention relates to a feature selection method based on feature recognition and independence and its application in tumor gene expression profile data, and specifically relates to the technical field of preprocessing for bioinformatics tumor expression profile gene data mining and analysis. Gene selection methods for gene expression profiling. Background technique [0002] The emergence of high-dimensional data with a large number of redundant and irrelevant features has brought great challenges to machine learning and data mining algorithms. On the premise of keeping the data classification ability unchanged, feature selection selects features from the original feature set that are highly related to the category, are as uncorrelated as possible, and contain most or all of the classification information of the original feature set to form a feature subset. The classification model of the feature subset is more accurate and understandable,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F19/24G06F19/20

CPCG16B25/00G16B40/00

Inventor 谢娟英王明钊

Owner SHAANXI NORMAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Feature selection method and application based on feature identification degree and independence

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology