Unlock instant, AI-driven research and patent intelligence for your innovation.

Systems and methods for outlier significance assessment

An unusual and significant technology, applied in the field of outlier identification, can solve the problems of limitation, large number of analysis, outlier analysis cannot be directly compared, etc.

Pending Publication Date: 2019-07-09
ILLUMINA INC
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, current outlier analysis is not directly comparable between different analyzers or under different input parameters
This limits the ability to integrate the results of the analyzes in a meta-analysis
In addition, current methods of meta-analysis may require extensive analysis and may not adjust results for analyzes of different sizes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Systems and methods for outlier significance assessment
  • Systems and methods for outlier significance assessment
  • Systems and methods for outlier significance assessment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0117] bootstrap resampling

[0118] To estimate statistical significance, permutation is standard practice when the sampling distribution is unknown [10]. However, this method cannot be used in this case because permuting sample labels has no effect on the COPA score, which is obtained by ranking the values ​​of gene samples and taking the specified percentile after normalization. Therefore, we employ a slightly different randomization strategy, which we refer to as bootstrap resampling.

[0119] In order to perform a COPA randomization test, the gene expression values ​​within a gene need to be randomized, however, the expression values ​​do not have to be independent and equally distributed from each other. To address this, normalization is followed by randomization, which puts the values ​​on the same scale. Therefore, we can assume that normalized values ​​are overall independent and equally distributed for data types and platforms (eg RNA-Seq-Illumina HiSeq). We rando...

Embodiment 2

[0123]Bootstrap method + generalized Pareto distribution (function method)

[0124] It is well known that the resolution of resampled p-values ​​is limited by the amount of randomization performed. To achieve higher resolutions, it usually involves intensive computation, so it is usually not suitable for large-scale applications. Knijnenburg et al. [11] proposed to use extreme value theory to estimate small permutation p-values, i.e. close to the tails of the distribution via the generalized Pareto distribution. This can be extended to resampling p-values.

[0125] Thus bootstrapping resampling with a function can be applied to the TCGA BRCA dataset to approach the tail of the null distribution of COPA scores produced by 100 randomizations of bootstrapping resampling. This greatly reduces the amount of randomization required to generate p-values. Based on the distribution of resampled COPA scores, the right (or symmetric left) tail (corresponding to 0.1% of the resampled CO...

Embodiment 3

[0129] ssCOPA, the binomial resampling method

[0130] While a p-value for a given observed COPA score can be obtained empirically by randomization testing, it becomes computationally intensive as the test group size increases. Furthermore, highly significant p-values ​​require an infeasible number of bootstrap resampling tests to achieve the resolution required to estimate them, which in turn is computationally intensive. To address this issue, we introduce an exact solution that allows direct computation of p-values.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Systems and methods are provided for identifying genes with outlier expression across multiple samples, including: at least one processor; and at least one non-transitory computer readable medium containing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations including: receiving gene expression data of a plurality of samples, the samples comprising gene expression values corresponding to genes; standardizing the gene expression data using the median and median absolute deviation of each gene; determining a value of a distribution statistic for the standardized gene expression observations based on a probability of outlier gene expression data; determining a null distribution of the distribution statistic using the standardized gene expression data; and outputting a significance value of the genes across the multiple samples, the significance value based on the value of the distribution statistic and the null distribution.

Description

[0001] Cross References to Related Applications [0002] This application claims priority to U.S. Provisional Patent Application Serial No. 62 / 417,149, filed November 3, 2016, the entire contents of which are incorporated herein by reference. technical field [0003] The systems and methods of the present disclosure relate to outlier identification. More specifically, the systems and methods of the present disclosure relate to improved methods of determining the significance of factors of continuous-valued observations in an analysis involving samples comprising observation data corresponding to the factors. Background technique [0004] Outlier analysis can identify outliers in gene expression observation data. However, current outlier analysis is not directly comparable between different analyzers or under different input parameters. This limits the ability to integrate the results of the analyzes in a meta-analysis. In addition, current methods of meta-analysis may req...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B25/00G16B25/10
CPCG16B25/00G16B25/10G06F17/18G16B40/00
Inventor 山姆·吴洪·高亨德里库斯·贾斯珀·格尔曼
Owner ILLUMINA INC