Multi-omics data association relationship discovery method based on sparse matching

A technology of omics data and correlation, applied in the field of bioinformatics, can solve the problems of unutilized, too simple integrated research methods, insufficient sample size, etc.

Active Publication Date: 2018-09-07
SOUTH CHINA UNIV OF TECH
View PDF9 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the existing models reduce the false positive rate caused by the random error of a single omics data by integrating multiple omics data to explore the potential relationship between them, most of the models have some shortcomings, such as The integrated research method is too simple, the data sources are not unified, the sample size is insufficient, etc.
[0005] At the same time, most studies only focus on the omics data itself, rarely consider adding other important prior information to the model, or even do not use the existing proven information at all, and the reasonable use of prior information Clearly positive impact on model accuracy, robustness, and execution efficiency improvements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-omics data association relationship discovery method based on sparse matching
  • Multi-omics data association relationship discovery method based on sparse matching
  • Multi-omics data association relationship discovery method based on sparse matching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] A multi-omics data association relationship discovery method based on sparse matching, the basic process is as follows figure 1 As shown, it includes: reading in data, data cleaning, data transformation, data reduction, calculating feature similarity matrix, mining data correlation, and obtaining results.

[0063] The scheme of multi-omics data association discovery based on sparse matching is as follows:

[0064] 1. Preprocess the input data, improve the quality of the data, and make the data better adapt to this method.

[0065] In the massive raw data, there are a lot of abnormal data and many problems such as too large dimensions, which seriously affect the execution efficiency of the model, and may even lead to deviations in the model results. Therefore, it is particularly important to perform data preprocessing. This method uses the following methods: Steps to preprocess the data:

[0066] 1.1 Data cleaning:

[0067] For the missing values ​​in the data, the cu...

Embodiment 2

[0120] A multi-omics data association discovery method based on sparse matching, such as figure 2 Shown is a specific schematic diagram of the method, including:

[0121] Step 1: Acquire gene expression data M 1 and drug response data M 2 , gene-gene association network W 1 and the drug-drug association network W 2 ;

[0122] Step 2: Separate gene expression data M 1 and drug response data M 2 Perform data preprocessing.

[0123] Step 3: Utilize gene expression data M 1 and drug response data M 2 Calculate the gene-drug similarity measure matrix H.

[0124]Step 4: The gene-drug similarity measure matrix H, gene-gene association network W 1 and the drug-drug association network W 2 , as the model input to mine the association relationship between genes and drugs.

[0125] Step 2 specifically includes:

[0126] Step 2.1: Clean the data. For the missing values ​​in the data, the cubic spline interpolation method is used to interpolate them; for the outliers in the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-omics data association relationship discovery method based on sparse matching. The method includes: preprocessing input data to improve quality of the data; selecting asuitable similarity measure according to data characteristics to calculate a similarity matrix among data features; and fusing prior information to mine potential association relationships among thedata features on the basis of a similarity network among the features. According to the method of the invention, the prior information of features of existing already-proven omics data can be fully utilized, impacts of noises on results can be reduced, uncertainty caused by data errors can be reduced, and accuracy and robustness of the results can be improved.

Description

technical field [0001] The invention relates to the technical field of biological information, in particular to a method for discovering association relationships of multi-omics data based on sparse matching. Background technique [0002] Biomics consists of Genomics, Transcriptomics, Proteomics and Metabolomics, aiming to study human genes, ribonucleic acid, proteins and The interaction of its metabolites, etc., provides a more scientific and comprehensive method for exploring the pathogenesis of human diseases through the integration and analysis of the internal correlations of tissues at all levels of the human body. [0003] With the development of science and technology, the emergence of high-throughput sequencing technology has greatly reduced the cost of sequencing, improved the performance of sequencing, and made it possible to efficiently and comprehensively determine the different levels of omics data of the same sample. The TCGA (The Cancer Genome Atlas) database...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/24
CPCG16B40/00
Inventor 蔡就伦蔡宏民
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products