Genotype-phenotype association analysis method in multi-omics data based on small sample

A technology of omics data and association analysis, applied in the field of bioinformatics, can solve problems such as difficult access to clinical data, large SNP feature quantity, and inability to meet the data requirements of multi-omics data fusion methods, and achieve the goal of improving prediction accuracy Effect

Active Publication Date: 2022-04-26
NORTHWESTERN POLYTECHNICAL UNIV +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Furthermore, the use of multi-omics methods to explore the relationship between genotype and phenotype requires that each omics data be the same sample set. Due to the huge amount of SNP features, the two types of multi-omics analysis methods require a large sample size when building models. , and due to the protection of patients' personal privacy and the requirements of each institution for data, it is difficult to obtain clinical data
Therefore, the public clinical data cannot meet the data requirements of the multi-omics data fusion method in terms of sample size or the number of omics.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genotype-phenotype association analysis method in multi-omics data based on small sample
  • Genotype-phenotype association analysis method in multi-omics data based on small sample
  • Genotype-phenotype association analysis method in multi-omics data based on small sample

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0057] 1. Data source and preprocessing

[0058] To verify the effectiveness of the method, the present invention uses two sets of data derived from the GEO database (Gene Expression Omnibus database, https: / / www.ncbi.nlm.nih.gov / geo / ) to verify. GSE33356 is studying lung adenocarcinoma. It includes lung cancer patients and their adjacent normal tissues, which are harvested from the patients. Lung tumors and normal specimens from 84 non-smoking female patients with adenocarcinoma were analyzed using Affymetrix SNP 6.0 and Affymetrix U133plus2.0 chips. GSE114269 is the data comparing myeloid breast cancer (MBC) and non-myeloid basal-like breast cancer (non-MBC BLC), with a sample size of 48. The main reason for choosing these two sets of data for experiments is to illustrate that the method of the present invention can be widely applied to such genotype and phenotype classification problems based on small sample multi-attributes.

[0059] The protein network data comes from ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A small-sample-based genotype-phenotype association analysis method in multi-omics data is disclosed, which specifically includes the following steps: generating a weighted undirected gene association graph using protein networks and gene expression values, and using the SPICi clustering method to The undirected graph is clustered to generate gene clusters; the gene clusters are screened by the group Lasso method; the SNP clusters corresponding to the screened gene clusters are obtained through the eQTL data; each SNP cluster, the corresponding gene cluster and the phenotype It is constructed as a three-layer network class block, and the sparse partial least squares method is used for regression operation on the relationship between SNP and gene in each class block, and logistic regression is used for operation on the relationship between gene and phenotype; The results are averaged to obtain the final prediction result. The invention can solve the problem that the eigenvalue is huge and cannot be effectively returned under the condition of small samples in the three-layer network; the prediction accuracy rate is improved; the biological meaning is clearer; and the tissue specificity is considered.

Description

technical field [0001] The invention relates to the field of bioinformatics, in particular to a small sample-based method for researching the association between genotype and phenotype in multi-omics data. Background technique [0002] An important goal of current genetics is to establish a complete functional link between genotype and phenotype, the so-called genotype-phenotype map. Studying the relationship between genotype and phenotype can make the process of genetic variation more clear. Genome-wide association studies (GWAS) between common genotypes and phenotypes are an effective way to reveal the link between an individual's genetic background and a specific disease or trait. Its principle is to find out the difference sites on all genomes, and analyze the correlation between the difference sites and the phenotype. Over the past decade or so, numerous genome-wide association studies have identified many genetic variants associated with complex diseases or other tra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/20G16B40/30G16B40/20G16B50/30
CPCG16B20/20G16B40/30G16B40/20G16B50/30
Inventor 郭新鹏宋亚飞刘帅忱刘树慧王艺菲尚学群
Owner NORTHWESTERN POLYTECHNICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products