Method for establishing prediction model of complex data

A complex data and predictive model technology, applied in the direction of electrical digital data processing, special data processing applications, hybridization, etc., can solve problems such as distinctions

Inactive Publication Date: 2017-08-08
赵乐平
View PDF0 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] Although OOR is closely related to kernel machine methods, there are differences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for establishing prediction model of complex data
  • Method for establishing prediction model of complex data
  • Method for establishing prediction model of complex data

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0053] First Embodiment: Next, the method of the present invention will be described in detail by taking the process of constructing a prediction model of high-dimensional omics data from clinical translational research as an example.

[0054] 1. Method:

[0055] 1.1. Motivation

[0056] 1.1.1. Problem Statement: Take n objects (i=1,2,...,n) in the database as samples. On each i-th subject (Xi), the observed set of high-dimensional (p-dimensional in this case) sparse covariates is denoted as X i =(x i1 ,x i2 ,...,x ip ), based on the typical characteristics of HDOD, where the number of covariates is usually much larger than the sample size. The corresponding target Y is also observed on each i-th object i The outcome variable for , which can be binary, categorical, continuous, or truncated (that is, partially observed). The likelihood of all observed data can be written as

[0057]

[0058] Among them, in the above-mentioned summation function, n objects are summed ...

no. 2 example

[0166] Second Embodiment: Next, taking the construction of a disease prediction model of polymorphic and multi-allelic HLA genes as an example, the method of the present invention will be further described in detail.

[0167] 1. Method

[0168] 1.1. Motivation

[0169] Analysis of covariate data generated from high-dimensional polymorphic genetic studies. Specifically, including T1D and eight class II HLA genes (HLA*DRB1, *DRB3, *DRB4, *DRB5, *DQA1, *DQB1, *DPA1, *DPB1) (manuscript: Zhao et al 2015, to be submitted) case-controlled study. Due to their structural polymorphism, only one of the HLA*DRB3, *DRB4, and *DRB5 alleles will appear on any single chromosome. Therefore, HLA*DRB345 is used below to represent the genotypes of all three genes. Wherein, each gene contains two alleles, and each allele represents a completely phase-separated nucleotide sequence. When the jth gene has mj possible sequence variations, if a pair of alleles is in Hardy-Weinberg equilibrium (HWE,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for establishing a prediction model of complex data comprising following steps: A. obtaining high dimension omics data HDOD and determining a group of data objects having representativeness for the HDOD as examples; B. determining the similarity measurement between each data object in the HDOD and each example and accordingly establishing a similarity measurement matrix of the data objects and examples; C. selecting examples containing information from the examples through the penalized likelihood method by means of the similarity measurement matrix of the data objects and the examples; D. establishing a prediction model based on the selected examples. The method of the invention provides a natural quantification tool for discovering and verifying interaction among complex variables. The prediction model of the invention is suitable for large-scale database through searching based on similarity.

Description

technical field [0001] The invention relates to a method for constructing a complex data prediction model. Background technique [0002] The advent of next-generation sequencing technologies has enabled researchers to process large amounts of collected data (for example, enabling clinical researchers to process hundreds of biological samples collected from patients) and to perform data such as genome-wide expression levels, methylation levels, or body The analysis of cell mutations is called high-dimensional omics data (HDOD, high dimension omics data). Although the amount of clinical samples available is usually limited, the bottleneck of clinical research has shifted from sample collection to data management and data analysis because the number of observed variables per sample can reach thousands or millions. Using HDOD along with other clinical variables to build predictive models of specific clinical outcomes has been one of the many analytical goals of researchers in b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/20G06F19/24
CPCG16B40/00G16B25/00
Inventor 赵乐平
Owner 赵乐平
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products