Systems and methods for outlier significance assessment
An unusual and significant technology, applied in the field of outlier identification, can solve the problems of limitation, large number of analysis, outlier analysis cannot be directly compared, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0117] bootstrap resampling
[0118] To estimate statistical significance, permutation is standard practice when the sampling distribution is unknown [10]. However, this method cannot be used in this case because permuting sample labels has no effect on the COPA score, which is obtained by ranking the values of gene samples and taking the specified percentile after normalization. Therefore, we employ a slightly different randomization strategy, which we refer to as bootstrap resampling.
[0119] In order to perform a COPA randomization test, the gene expression values within a gene need to be randomized, however, the expression values do not have to be independent and equally distributed from each other. To address this, normalization is followed by randomization, which puts the values on the same scale. Therefore, we can assume that normalized values are overall independent and equally distributed for data types and platforms (eg RNA-Seq-Illumina HiSeq). We rando...
Embodiment 2
[0123]Bootstrap method + generalized Pareto distribution (function method)
[0124] It is well known that the resolution of resampled p-values is limited by the amount of randomization performed. To achieve higher resolutions, it usually involves intensive computation, so it is usually not suitable for large-scale applications. Knijnenburg et al. [11] proposed to use extreme value theory to estimate small permutation p-values, i.e. close to the tails of the distribution via the generalized Pareto distribution. This can be extended to resampling p-values.
[0125] Thus bootstrapping resampling with a function can be applied to the TCGA BRCA dataset to approach the tail of the null distribution of COPA scores produced by 100 randomizations of bootstrapping resampling. This greatly reduces the amount of randomization required to generate p-values. Based on the distribution of resampled COPA scores, the right (or symmetric left) tail (corresponding to 0.1% of the resampled CO...
Embodiment 3
[0129] ssCOPA, the binomial resampling method
[0130] While a p-value for a given observed COPA score can be obtained empirically by randomization testing, it becomes computationally intensive as the test group size increases. Furthermore, highly significant p-values require an infeasible number of bootstrap resampling tests to achieve the resolution required to estimate them, which in turn is computationally intensive. To address this issue, we introduce an exact solution that allows direct computation of p-values.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


