Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Feature selection method facing to SNP (Single Nucleotide Polymorphism) data

A feature selection method and data technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as low algorithm efficiency

Inactive Publication Date: 2015-02-25
SHANGHAI UNIV
View PDF1 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the former ignores the relationship between features, and the latter algorithm is inefficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method facing to SNP (Single Nucleotide Polymorphism) data
  • Feature selection method facing to SNP (Single Nucleotide Polymorphism) data
  • Feature selection method facing to SNP (Single Nucleotide Polymorphism) data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] see figure 1 , this feature selection method for SNP data is characterized in that the specific steps are as follows:

[0062] (A) Perform data preprocessing;

[0063] (B), using the redesigned Relief algorithm to eliminate irrelevant SNP features;

[0064] (C), using the improved SVM-RFE algorithm to sort the criticality of the SNP features;

[0065] (D) Screening of key SNPs using cross-validation.

Embodiment 2

[0067] refer to Figure 1 to Figure 5 , the present invention is a kind of feature selection method facing SNP data, here take the SNP data feature of hypertensive patients as an example, its specific steps are as follows:

[0068] (1) Perform data preprocessing, such as figure 2 As shown, the specific steps are as follows:

[0069] a) Categorical labeling: SNP data belongs to two-category samples, that is, there are only two types of samples: the disease-case group and the healthy-normal group, the case group is labeled with a category label {+1}, and the normal group is labeled with a category label {-1};

[0070] b) Data coding: The SNP typing results detected by the gene chip have four forms: wild homozygous AA, mutant heterozygous AB, mutant homozygous BB, and typing failure marks NC; AA is coded as 0, AB is coded as 1, BB is coded as 2; NC is cleaned during the data cleaning process and is not coded;

[0071] c) Data cleaning: NC is noise data in SNP data analysis. W...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a feature selection method facing to SNP (Single Nucleotide Polymorphism) data, which specifically comprises the following steps: firstly, carrying out data pre-processing; secondly, removing unrelated SNP features by using a newly-designed Relief algorithm; thirdly, carrying out critical degree sorting on the SNP features by using an improved SVM-RFE algorithm; and finally, screening the critical SNP sorting by using cross validation. The feature selection method has the beneficial effects that the advantages of Filter feature selection and Wrapper feature selection are combined, and a secondary division method is used in the machine learning process, so that the problems of a high-dimensional small sample in the SNP feature selection and a SNP pathogenic combination mode are solved, and the analysis efficiency and the accuracy are improved.

Description

technical field [0001] The invention relates to related technologies for feature selection of massive data with high-dimensional and small-sample characteristics, in particular to design a feature selection method for SNP data, which belongs to the field of computer application technology. Background technique [0002] Feature selection for data with high-dimensional and small-sample characteristics is one of the research hotspots in the field of data mining. This type of data generally has the characteristics of huge data volume, high feature dimension, and small number of samples. Commonly used data analysis methods have sample tendency, and the efficiency and accuracy of high-dimensional small-sample data analysis are low. [0003] SNP is the abbreviation of single nucleotide polymorphism, that is, single nucleotide polymorphism, which refers to the DNA sequence polymorphism caused by the variation of a single nucleotide at the genomic level. SNP is the most abundant ge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/22
Inventor 吴悦吴红霞雷州刘宗田张文宾
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products