Single cell clustering method and device, electronic equipment and storage medium

A clustering method and single-cell technology, applied in the field of bioinformatics, to achieve good prediction performance and improve clustering performance

Active Publication Date: 2020-02-21
YULIN NORMAL UNIVERSITY +1
View PDF6 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The use of unsupervised learning methods for cell clustering research does not require prior knowledge and can automatically estimate the...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single cell clustering method and device, electronic equipment and storage medium
  • Single cell clustering method and device, electronic equipment and storage medium
  • Single cell clustering method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] This embodiment provides a single-cell clustering method, including the following steps:

[0037] Step 1. Based on the gene expression matrix (that is, the gene expression matrix of single-cell transcriptome sequencing data, which is listed as single cell and behavioral gene expression, which can be downloaded from the public database), calculate the similarity between single-cell pairs and construct a global Feature space matrix S l ;S l The size of is N×N, N is the number of single cells, and the element S l (i, j) represents the similarity (local similarity) between single cell i and single cell j, S l Each row in represents the similarity between a single cell and all other single cells, S l Contains the global similarity information, which is the global feature space matrix;

[0038] Step 2. Based on the matrix S l , use the weighted Gaussian kernel function to calculate the similarity between single-cell pairs, and construct a sparse global similarity matrix ...

Embodiment 2

[0044] The single-cell clustering method of this embodiment, on the basis of Example 1, in the step 1, based on the gene expression matrix, calculate the Spearman correlation coefficient (Spearman correlationcoefficient) between the column vectors corresponding to two single cells ), as their similarity; the specific steps to calculate the Spearman correlation coefficient (Spearman correlation coefficient) Rs(i,j) of the column vectors corresponding to single cell i and single cell j are:

[0045] Step1: Transform the elements S(m,i) and S(m,j) corresponding to the column vectors S(:,i) and S(:,j) corresponding to single cell i and single cell j in the gene expression matrix S is the ranking (descending position) in the respective column vectors, recorded as R[S(m,i)] and R[S(m,j)], where m=1,2,...,M, M is the number of genes number, wherein the gene expression matrix S is a matrix of M rows and N columns, and N is the number of single cells;

[0046] step2: According to the ...

Embodiment 3

[0052] The single-cell clustering method of this embodiment is based on the embodiment 2, and the step 2 is specifically implemented by the following steps:

[0053] Step 2.1, based on the global feature space matrix S l , according to the value of each row element in the matrix, determine the K nearest neighbors (K nearest neighbors, KNN) of each single cell; for a single cell i, the matrix S l i line S l (i,:) divides S l The single cell corresponding to the largest K elements other than (i, i) is its K nearest neighbors, and the set of its K nearest neighbors is recorded as KNN(i); where S l (i,i) means S l The i-th element in (i,:), namely S l The elements of row i and column i; the above operation is used in the global feature space matrix S l On the basis of filtering out the weakly correlated nodes;

[0054] Step 2.2. Use the weighted Gaussian kernel function to calculate the weighted Gaussian kernel similarity D(i,j) between each single cell pair as their similar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a single cell clustering method and device, electronic equipment and a storage medium. On the basis of calculating local similarity between node pairs based on distance information, a global feature space is constructed. Based on the global feature space, the global similarity between the node pairs is calculated by using a multi-core learning method. Then, nodes on all second-order paths of the node pairs are expanded and considered, more relevant node information is added, and a more effective global similarity calculation method is constructed. Finally, by sorting the nodes according to the node degrees, and determining an initial association joining sequence of the nodes, a Louvain association detection method is improved, and clustering is carried out by usingthe method. The method is simple and effective, and compared with other methods, tests on a public single cell transcriptome sequencing data set show that the method has good prediction performance inthe aspect of single cell transcriptome sequencing data clustering.

Description

technical field [0001] The invention relates to the field of bioinformatics, in particular to a single cell clustering method and device for identifying cell types and providing a basis for analyzing cell differentiation processes. Background technique [0002] Traditional large-scale cell sequencing methods are difficult to apply to research fields that need to consider the characteristics of individual cells. Single-cell transcriptome sequencing data (single-cell RNA-seq data, scRNA-seq data) can be better applied to the study of cell differences and the identification of cell types during the study of cell differentiation. With the rapid development of scRNA-seq technology, scRNA-seq data can more accurately reflect the gene expression data of each cell, reducing the influence of different cells on controlling gene expression, cell behavior and cell type. However, scRNA-seq data is characterized by high dimensionality, small samples, and lack of prior knowledge. [0003...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/00
CPCG16B30/00
Inventor 朱晓姝彭小清李洪东王建新郭立渌李剑
Owner YULIN NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products