Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Single cell RNA-SEQ data processing

A data processing and data technology, applied in the direction of electrical digital data processing, digital data processing parts, laboratory analysis data, etc., can solve the problems of limited analysis results, affecting gene-gene correlation inference, etc.

Pending Publication Date: 2022-04-29
REGENERON PHARM INC
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

31(13): pp. 2123-2130), but analysis of expression data was limited to measuring average gene expression across cell pools
These data preprocessing methods may affect gene-gene correlation inference and subsequent gene co-expression network construction, such as introducing false positive gene-gene correlations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single cell RNA-SEQ data processing
  • Single cell RNA-SEQ data processing
  • Single cell RNA-SEQ data processing

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0087] Example 1. Data preprocessing using representative normalization / imputation methods

[0088] Several representative normalization / imputation methods were benchmarked, focusing on their impact on inference of gene-gene associations. The global scaling normalization method performs minimal data manipulation by normalizing the gene expression of each cell by the total expression. Because log transformation and z-score scaling do not change the ordination correlations, this approach is usually followed after log transformation and z-score scaling; only total UMI normalization (termed NormUMI) was included in the comparison. Includes a framework for normalizing and stabilizing the variance of scRNA-seq data using "regularized negative binomial regression" (termed NBR), which removes the effects of technical noise while preserving biological heterogeneity. Also included are three other methods representing different classes of imputation methods, e.g., (i) MAGIC - a data smo...

example 2

[0090] Example 2. Computing gene-gene correlations in single cells

[0091] Real bone marrow scRNA-seq data from the Human Cell Atlas Preview dataset was used as a benchmark dataset for various data preprocessing methods (Regev et al.). like image 3 As shown in Table 1, the complete dataset contains 378,000 myeloid cells that can be divided into 21 cell clusters, covering all major immune cell types. 50,000 cells were randomly sampled from the original dataset. Genes expressed in less than 0.2% (100 cells) were excluded from this subset. The final dataset contained 12,600 genes and yielded more than 79 million possible gene pairs.

[0092]

[0093] Figure 4 An overview of the benchmark framework is shown. Apply five representative data preprocessing methods (e.g., NormUMI, NBR, DCA, MAGIC, and SAVER) to single-cell expression data matrices (e.g., bone marrow, single-cell expression data), as Figure 4 shown. Gene-gene correlations calculated directly from the resul...

example 3

[0095] Example 3. Observing artifacts using data preprocessing methods

[0096] Five representative data preprocessing methods (e.g., NormUMI, NBR, DCA, MAGIC, and SAVER) were applied to bone marrow scRNA-seq data from the Human Cell Atlas project. The distribution of overall gene-gene correlations in five different data matrices processed by different methods was compared. Since most gene pairs do not have any association, the peak of the association distribution is expected to be 0. like Figure 5A As shown, NormUMI produces a correlation distribution with a peak of 0. However, the median correlations for the other four methods are much higher, according to the Spearman correlation coefficient, as Figure 5A As shown, (NormUMIρ=0.023, NBRρ=0.839, MAGICρ=0.789, DCAρ=0.770, SAVERρ=0.166).

[0097] After applying specific data preprocessing methods, the interactions between two genes are captured to reveal whether a higher correlation reflects a higher chance of a functiona...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method of processing single cell gene expression (revealing gene-gene correlation) by applying a noise regularization process to reduce gene-gene correlation artifacts. The computer-implemented method of the present invention comprises the steps of processing gene expression data for normalization or interpolation, applying a noise regularization process to the normalized or interpolated gene expression data, and applying a gene-gene correlation calculation process to obtain related gene pairs. And adding random noise according to the expression value of the cell gene in the expression matrix to obtain a noise regularization expression matrix.

Description

technical field [0001] The present invention generally relates to methods and systems for processing gene-gene correlation gene expression data by applying a noise regularization process. Background technique [0002] Gene expression data obtained from microarray and RNA-sequencing of bulk cells have been successfully applied to infer gene-gene correlations for the construction of gene networks (Ballouz et al., Guidance for Construction and Analysis of RNA-seq Co-Expression Networks: Digital Security. Journal of Bioinformatics, 2015. 31(13): pp. 2123-2130), but the analysis results of expression data are limited to measuring the average gene expression among cell pools. Gene expression can be analyzed at single-cell resolution using single-cell RNA sequencing (scRNA-seq), dissecting heterogeneity in apparently homogenous cell populations to reveal hidden gene-gene associations masked by bulk expression profiles Sexuality (Kolodziejczyk et al., Single-cell RNA-sequencing tec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B25/10G16B5/00G16B40/00
CPCG16B25/10G16B5/00G16B40/00G16H10/40G06F7/588
Inventor G·S·阿特瓦尔W·K·利姆张若瑜
Owner REGENERON PHARM INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products