Method for screening disease phenotype related mutation sites and application thereof

A technology of mutation sites and phenotypes, applied in the field of bioinformatics, can solve the problems of no observation value of genome combination, sparse data distribution, and dimension confusion, avoiding the influence of allele frequency, high efficiency, and reducing the total sample size. amount of effect

Pending Publication Date: 2021-04-30
BEIJING USCI MEDICAL DEVICES CO LTD
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the Logistic regression model has certain limitations when using a small sample size to estimate more parameters (such as single nucleotide polymorphisms). For each additional SNP site, the required sample size will increase exponentially. Considering When it comes to genotype frequency, even if the sample size is large, the data distribution in high-dimensional space is still relatively sparse, and it is very likely that there will be no observation value for a certain genome combination, which is called "dimension trouble".

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for screening disease phenotype related mutation sites and application thereof
  • Method for screening disease phenotype related mutation sites and application thereof
  • Method for screening disease phenotype related mutation sites and application thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] In this embodiment, the method for screening disease phenotype-related mutation sites provided by the present invention is used to mine type 2 diabetes-related SNP sites, as follows:

[0055] 1. Select 200 patients with type 2 diabetes and 200 normal people as controls for microarray sequencing, with a total of 743,722 loci.

[0056] 2. Association rule analysis: According to the genotype of the mutation site, the mutation data and sample phenotype data are converted into binary variables, and the association rule analysis parameters are set. The minimum support degree min_sup=20%, the minimum confidence degree min_conf=80% .

[0057] 3. Apply FP-Growth algorithm to generate frequent itemsets.

[0058] 4. After obtaining the frequent itemsets, find the association rules whose confidence is greater than min_conf as strong association rules

[0059] 5. Select the effective strong association rules from the strong association rules, that is, select all the rules whose ac...

Embodiment 2

[0071] In this example, 100 cases of hypertension, 126 cases of obesity, 410 cases of lung cancer, 360 cases of breast cancer, 134 cases of colorectal cancer and 200 normal samples were selected for GWAS analysis and association rule analysis, and the p-value in the GWAS analysis results was selected <10-e7 and the top 20 sites with p-value<0.005 in the analysis of association rules, compare the proportion of the detected sites in the GWAS Catalog database and record the phenotype-related sites The results are shown in Table 5:

[0072] table 5

[0073]

[0074] It can be seen that the number of SNP sites obtained by analyzing each phenotype using association rules has a higher proportion of the phenotype-related sites recorded in the GWAS Catalog database than the GWAS analysis results.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of bioinformatics, in particular to a method for screening disease phenotype related mutation sites and application thereof. The method comprises the following steps: obtaining sequencing data of a plurality of disease samples and normal samples, and carrying out variation detection; performing association rule mining by taking the phenotype of the samples and the mutation type of the detected mutation site as a total project set to obtain a mutation site having a strong association relationship with the phenotype of the disease samples; and performing modeling analysis on the mutation sites obtained through association rule mining and screening to obtain mutation sites related to disease phenotypes. Alleles are converted into classification variables for association rule mining, and then modeling analysis is carried out on sites strongly associated with disease phenotypes, so that the total quantity of analyzed samples can be effectively reduced, and the influence of allele frequency on an analysis result is avoided; and screening and analysis of disease phenotype related sites can be completed only by obtaining mutation genotype information.

Description

technical field [0001] The invention relates to the technical field of bioinformatics, in particular to a method for screening mutation sites related to disease phenotypes and its application. Background technique [0002] A central goal of human genetics is to identify genetic risk factors for common complex diseases such as schizophrenia and type 2 diabetes and rare Mendelian disorders such as cystic fibrosis and sickle cell anemia. While understanding the complexities of human health and disease is one of the keys to current research, it is not the only focus of research in human genetics, and the field of pharmacology is an equally important one. The goal of pharmacogenomics is to identify DNA sequence variations associated with drug metabolism, efficacy, and side effects. For example, warfarin is a blood-thinning drug that helps prevent blood clots in patients. While using warfarin, it is necessary to strictly control the dosage of the drug formulated for each patient...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16H50/70G16B20/20G16B20/50
CPCG16H50/70G16B20/20G16B20/50
Inventor 张静波姬晓勇徐冰单光宇伍启熹王建伟刘倩唐宇
Owner BEIJING USCI MEDICAL DEVICES CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products