Single-cell RNA-seq data clustering method based on deep noise reduction auto-encoder

An autoencoder and data clustering technology, applied in the field of single-cell RNA-seq data analysis, which can solve problems such as inability to cluster data

Pending Publication Date: 2022-01-04
XIAN THERMAL POWER RES INST CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, traditional clustering methods such as hierarchical clustering, spectral clustering, and density-based clustering methods with noise have been widely us

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single-cell RNA-seq data clustering method based on deep noise reduction auto-encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention is described in further detail below in conjunction with embodiment.

[0026] Such as figure 1 As shown, it shows the four steps of the present invention to improve the clustering effect of single-cell RNA-seq data based on the deep noise reduction self-encoder, adjusting the batch effect and data standardization preprocessing, data reconstruction and noise reduction, data dimensionality reduction, Gaussian mixture clustering and data visualization.

[0027] The invention provides a method for clustering single-cell RNA-seq data based on a deep layer noise reduction autoencoder. The invention comprises the following steps:

[0028]Step 1. Adjust the batch effect and data standardization preprocessing. The present invention uses 5 public data sets downloaded from ArrayExpress and GEO databases to verify the effectiveness of the present invention. The gene expression values ​​in these 5 public data sets are taken from various tissue cells, includin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a single-cell RNA-seq data clustering method based on a deep noise reduction auto-encoder. The method comprises the following steps: firstly, adjusting the batch effect of single-cell RNA-seq data and standardizing the data so as to reduce adverse effects caused by technical noise; secondly, using a deep noise reduction self-encoder based on zero-expansion negative binomial distribution for effectively excavating feature information of the single-cell RNA-seq data; carrying out dimensionality reduction on single-cell RNA-seq data by using a rapid independent component analysis method to improve the calculation efficiency of a method model; and finally, carrying out more accurate clustering on the cells through a Gaussian mixture model based on expectation maximization, and visualizing a final single-cell RNA-seq data clustering result by using a T distribution random neighbor embedding method. According to the method, the interference of the characteristics of high dimension, large noise and the like of the single-cell RNA-seq data on data clustering can be effectively reduced, the gene expression information of the single-cell RNA-seq data is accurately learned so as to cluster cells, and help is provided for gene network construction, cell type discovery and early cancer discovery and treatment.

Description

technical field [0001] The invention belongs to the technical field of single-cell RNA-seq data analysis in bioinformatics, and in particular relates to a single-cell RNA-seq (RiboNucleicAcid-sequencing) data clustering method based on a deep noise reduction autoencoder. Background technique [0002] With the rapid development of sequencing technology, researchers have obtained a large amount of single-cell RNA-seq data. Unsupervised clustering plays an important role in the analysis of single-cell RNA-seq data. Clustering methods for single-cell RNA-seq data can not only identify unknown cell types, but also reveal cell heterogeneity. Through the research on the clustering method of single-cell RNA-seq data, researchers can more accurately identify the cell state, build the network structure between cells, and deeply understand the differentiation process of cancer cells, etc., laying the foundation for the early detection and treatment of cancer in the future. Base. At p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B40/00G06K9/62
CPCG16B40/00G06F18/2134G06F18/23213
Inventor 王艺杰王文庆杨东胥冠军崔逸群毕玉冰刘超飞董夏昕刘迪肖力炀刘骁
Owner XIAN THERMAL POWER RES INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products