Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

SNP selection method based on improved fuzzy clustering algorithm

A fuzzy clustering and algorithm technology, applied in the field of SNP selection, can solve the problems of no further mining of associations, research troubles, no differences, etc., and achieve the effect of excellent classification effect.

Active Publication Date: 2019-05-24
JIANGSU UNIV +1
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, with so many SNPs, not all of them play a decisive role in the difference of biological phenotypes. In other words, there are many redundant SNPs
If these redundant SNPs are not screened or eliminated, it will lead to the disaster of dimensionality and cause great trouble for subsequent research.
[0003] The problem of SNP selection can also be regarded as a subproblem of feature selection to some extent. However, existing selection methods either do not treat SNPs that have different effects on disease outcomes differently, or do not further mine those SNPs that have different effects on disease outcomes. Significant SNP-to-SNP associations within a local range

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • SNP selection method based on improved fuzzy clustering algorithm
  • SNP selection method based on improved fuzzy clustering algorithm
  • SNP selection method based on improved fuzzy clustering algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] A SNP selection method based on an improved fuzzy clustering algorithm. For SNP data, while considering the impact of a single SNP on the classification results, it also takes into account the interrelationships between SNPs in local areas, and realizes dimensionality reduction for data. At the same time, the information inside the SNP was fully excavated. Specifically include the following steps:

[0055] Step 1. Obtain the SNP data set. Generally, the original data is expressed in the form of genotype, such as AT, GC, AA...CG, etc.

[0056] Step 2, preprocessing the SNP data to obtain the preprocessed data, the preprocessing mainly includes the processing of missing values ​​and data recoding; the details are as follows:

[0057] 1): First of all, for each SNP, the absence of genotype representation can be counted. If the missing ratio is higher than the set threshold (here set to 20%), the corresponding SNP will be deleted from the data set.

[0058] 2): For the de...

Embodiment 2

[0102] It is verified by experiments that the SNP subset constructed by this method has a better classification effect than other selection methods, and can be applied to the selection of SNP data. Use clinical data to verify (select part of the data, and the data is recorded as G1000), the experiment is implemented as follows image 3 As shown, it specifically includes the following parts:

[0103] The data preprocessing unit 2 is used for preliminary screening of data based on hypothesis testing. The threshold of MAF is set to 0.05, and the results show that the MAF values ​​of the dataset G1000 are all greater than 0, so there is no need to delete any SNP; the threshold of the p-value of the chi-square test is set to 0.03, and the results show that there are 228 SNPs that do not meet this condition. to delete.

[0104]The clustering algorithm effectiveness evaluation verification unit 3 is used to evaluate the clustering method proposed by the present invention, specifica...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an SNP (Single Nucleotide Polymorphisms) selection method based on an improved fuzzy clustering algorithm. The SNP selection method includes the steps: obtaining an SNP data set; preprocessing the obtained SNP data, including cleaning and re-encoding the data; performing preliminary screening on the pre-processed data based on a hypothesis test; for the preliminarily screened data, respectively calculating the importance degree of each SNP; clustering the SNPs by using the improved fuzzy clustering algorithm; and further screening each cluster obtained from the clustering according to the principle of symmetrical imbalance, and constructing SNP subsets. The SNP selection method based on an improved fuzzy clustering algorithm focuses on the SNP data, considers the influence of a single SNP on the classification result while considering the correlation between the local area SNPs, and fully exploits the information inside the SNPs while achieving dimensionality reduction on the data. The SNP subsets constructed by using the SNP selection method based on an improved fuzzy clustering algorithm has a better classification effect than other selection methods and can be applied to the selection of SNP data.

Description

technical field [0001] The invention relates to the field of data mining, in particular to the subset construction of information SNP and a SNP selection method based on an improved fuzzy clustering algorithm. Background technique [0002] Genetic disease is a disease caused by the change of genetic material. There are many types of this type of disease and the incidence rate is high. So far, more than 3,000 genetic diseases have been discovered, which have caused a great impact on society. In recent years, with the great progress of DNA microarray technology, people can obtain tens of thousands of gene expression profiles, so that we can deeply understand diseases from the genetic level, and provide a powerful tool for the study of disease pathogenesis. support. With the advancement of Genome-Wide Association Study (GWAS), research on diseases such as schizophrenia and rheumatoid joint disease has made good progress. GWAS is a method of examining all or most of the genes ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B40/00
Inventor 周从华张波张付全张婷蒋跃明
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products