Single cell transcriptome missing value filling method based on deep hybrid network

A hybrid network and single-cell technology, which is applied in the field of single-cell transcriptome deletion filling, can solve the problems of large computing resources, unreliable data interpretation, and inability to use single-cell transcriptome data universally, so as to improve reliability, reduce occupancy, Guaranteed versatility

Active Publication Date: 2020-04-03
ZHONGSHAN OPHTHALMIC CENT SUN YAT SEN UNIV
View PDF6 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the technical defects that the existing single-cell transcriptome missing value filling method cannot be used universally for all single-cell transcriptome data, the calculation resources are huge, and the data interpretation after filling is unreliable, a single-cell transcriptome based on deep hybrid network is provided. Transcriptome missing value imputation method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single cell transcriptome missing value filling method based on deep hybrid network
  • Single cell transcriptome missing value filling method based on deep hybrid network
  • Single cell transcriptome missing value filling method based on deep hybrid network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] like figure 1 As shown, a single-cell transcriptome missing value filling method based on a deep hybrid network includes the following steps:

[0056] S1: Preprocess the single-cell sequencing data to obtain the expression matrix;

[0057] S2: Standardize the expression matrix to obtain the initial expression matrix;

[0058] S3: Build a hybrid model based on deep learning, including two parts: autoencoder and cyclic neural network;

[0059] S4: Input the initial expression matrix into the autoencoder for dimensionality reduction processing to obtain a dimensionality-reduced feature matrix and a reconstructed expression matrix;

[0060] S5: Input the dimensionality-reduced feature matrix into the recurrent neural network, predict the expression values ​​of all genes, and obtain the corresponding predicted expression matrix;

[0061] S6: Using the predicted expression matrix obtained in step S5 as the input of the autoencoder, repeating step S4 and step S5 until the p...

Embodiment 2

[0065] More specifically, such as figure 2 As shown, the step S1 specifically includes the following steps:

[0066] S11: Use the existing library construction method to obtain the processed cells, perform sequencing to obtain sequence data, and the file format, such as Fastq;

[0067] S12: using mapping software, such as Tophat2, to map the sequence data;

[0068] S13: using data splitting software, such as UMI-tools, to divide the mapped sequence data by cells to obtain sequence splitting data;

[0069] S14: Use quantitative software, such as FeatureCounts, to quantify the mapped and divided results to obtain a gene × cell expression matrix.

[0070] More specifically, the step S2 is specifically:

[0071] The expression matrix is ​​normalized according to the library size ls of each cell to eliminate the effect of library size, where, for the gene expression value vector C of cell c c The standardized formula for is:

[0072]

[0073] Among them, sf represents the ...

Embodiment 3

[0085] More specifically, such as Figure 4As shown, in the application process of the hybrid model, the single-cell data is input into the hybrid model by using non-blocking multi-process block random read data; the specific process is:

[0086] Enter the storage address of the single-cell data file, which meets any type of access matrix and read in blocks;

[0087] According to the storage address, read the dimension information of the single-cell transcriptome matrix stored in the file, including the number of cells and the number of genes, and enter the corresponding cell name and gene name;

[0088] Divide all cells into multiple data clusters in order, and mark each data cluster with a serial number, and all cluster serial numbers are used as a serial number pool;

[0089] Create a copy based on the serial number pool, randomly extract a certain number of cluster serial numbers without replacement each time, and extract the data set. If the copy data is extracted, a new...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a single cell transcriptome missing value filling method based on a deep hybrid network. The method comprises the steps of: carrying out sequencing and preprocessing of a singlecell, obtaining an expression matrix, and carrying out standardization processing; constructing a hybrid model based on deep learning, and inputting the standardized expression matrix into the hybridmodel for cyclic calculation to obtain a plurality of prediction expression matrixes; calculating the weight of each cycle, performing weighted average on the multiple prediction expression matrixesaccording to the corresponding weights, wherein the obtained result is filling output of the hybrid model, and filling of missing values is completed. According to the filling method provided by the invention, the fitting capability of the deep neural network to a complex function is adapted to the expression distribution of the single cells, so that the universality of the filling method to various single cell transcriptome data is ensured; and moreover, the expansibility of deep learning on a data set with an ultra-large cell number is reserved, filling of the single cell transcriptome missing value is completed, and the reliability of single cell data interpretation is remarkably improved.

Description

technical field [0001] The present invention relates to the technical field of single-cell transcriptome deletion filling, and more specifically, relates to a single-cell transcriptome missing value filling method based on a deep hybrid network. Background technique [0002] Single-cell transcriptome sequencing technology has developed into a major method for studying gene expression at the single cell level, and has been widely used to study important biological issues such as new cell types, cell differentiation, developmental trajectories, and tumor development. The number of captured cells has grown from the first few to the current million levels. However, due to the extremely low RNA content of a single cell, the low efficiency of transcript capture, technical noise, and the high cost of sequencing a large number of cells, the low sequencing depth of a single cell is difficult to cover the transcripts it contains, resulting in a large number of Gene expression values ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B40/00
CPCG16B40/00Y02A90/10
Inventor 何尧谢志袁皓
Owner ZHONGSHAN OPHTHALMIC CENT SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products