Unlock instant, AI-driven research and patent intelligence for your innovation.

Metagenomic Feature Selection Method Based on Variable Importance Score and Neyman-Pearson Test

A feature selection method and metagenomic technology, applied in the field of metagenomic feature selection based on variable importance score and Neiman Pearson test, can solve the problem that random forest is easily affected by noise, achieve good stability and classification effect, Good robustness, convenient for medical verification test effect

Active Publication Date: 2021-02-02
XI AN JIAOTONG UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, metagenomic abundance data contains a large number of irrelevant features, and the feature importance score of random forest is easily affected by noise.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Metagenomic Feature Selection Method Based on Variable Importance Score and Neyman-Pearson Test
  • Metagenomic Feature Selection Method Based on Variable Importance Score and Neyman-Pearson Test

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention will be further described below in conjunction with the accompanying drawings and embodiments. This embodiment is aimed at the cirrhosis of the liver (Cirrosis of Liver, CIR) metagenomic data set. The cirrhosis data set is collected from the intestinal tract, including 232 samples, including 118 cases of cirrhosis. There were 114 cases in the control group, involving 532 operable taxa.

[0026] refer to figure 1 , a metagenome feature selection method based on variable importance scores and Neyman-Pearson tests, including the following steps:

[0027] Step A: For the cirrhosis operational unit dataset, calculate the correlation of each microbial signature with the sample phenotype using symmetric uncertainty, sort by size, select the top 200 features as a feature subset, and generate subdata of the original data set for subsequent analysis.

[0028] Step B: First sample the sub-dataset with replacement sampling, then calculate the variable import...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a metagenomic feature selection method based on variable importance score and Neyman-Pearson test. 1. For an input metagenomic classification operable unit data set, use symmetric uncertainty to calculate the relationship between each microbial feature and The correlation of the sample phenotype, filter the features according to the correlation score, and generate a sub-dataset; 2. Sampling the sub-dataset with replacement sampling, and then use the variable importance score to select the top k features, iterate the above steps, After the iteration is completed, the number of occurrences of each feature is counted; 3. Use the Neyman-Pearson test method to calculate the threshold under the given parameters, and select the features whose occurrences are greater than the threshold as the candidate feature set, and the top k features with the most occurrences are the target Feature subset: the metagenome feature extracted by the present invention significantly improves the classification effect and has higher stability, and the generated candidate feature set facilitates the development of subsequent medical experiments of the metagenomic group.

Description

technical field [0001] The invention belongs to the field of analysis of metagenomic abundance data, and in particular relates to a method for selecting metagenomic features based on variable importance scores and Neyman-Pearson tests. Background technique [0002] The core problem of metagenomic abundance data analysis is to effectively identify a small number of microorganisms that have potential effects on phenotypes from a large number of microorganisms, which is widely used in medicine, biology, environmental science, food science and other disciplines. The metagenomic data set contains a wide variety of microorganisms, and it takes a lot of work to directly study the impact of all microorganisms on the phenotype. Therefore, it is necessary to use other methods to remove the noise of the original data set and leave the microorganisms that have the potential to affect the phenotype, that is, macro. Feature selection for genomic data. [0003] In order to effectively ide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B40/00G16B25/00
CPCG16B25/00G16B40/00
Inventor 宋永红丁志文张元林
Owner XI AN JIAOTONG UNIV