Spatial structure resolution and batch effect correction method based on srt data
By constructing a spatial structure resolution network model, the problems of inaccurate embedding representation and batch effect correction of multi-layer spatially resolved transcriptome data were solved. High-precision spatial structure resolution and cross-layer consistency were achieved, and local and global information were dynamically integrated, thereby improving the spatial transcriptomics analysis capabilities of complex tissue systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- YUNNAN OPEN UNIV
- Filing Date
- 2025-07-18
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies suffer from problems such as inaccurate embedding representation, imbalance between local and global information, and difficulty in multi-layer integration in spatial domain identification, multi-layer integration, and batch effect correction of multi-layer spatially resolved transcriptome data, making it difficult to achieve high-precision spatial structure resolution and cross-layer consistency maintenance.
A spatial structure analysis and batch effect correction method based on SRT data is adopted. By constructing a spatial structure analysis network model, including a first encoder, a second encoder, a graph predictor, a decoder, a cross-mask latent consistency module, and a spatial-semantic learning module, the model is trained using reconstruction loss, contrast loss, and consistency loss. Combined with the three-dimensional adjacency matrix, mask gene expression matrix, and complementary mask gene expression matrix, the latent spatial features are reconstructed and consistency is corrected.
It improves feature robustness, dynamically integrates local and global contextual information, achieves robust batch effect correction in multi-layer integration, preserves the biologically meaningful spatial architecture, outperforms existing methods on single-layer datasets, and enhances the ability of spatial transcriptomics analysis in complex tissue systems.
Smart Images

Figure CN122201448A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of bioinformatics and deep learning, and in particular to a method for spatial structure analysis and batch effect correction based on SRT data. Background Technology
[0002] The tissue function of multicellular organisms depends on precise spatial coordination and regulation between cells. Traditional single-cell sequencing technologies can analyze cellular expression heterogeneity but cannot capture the "spatial code" controlling tissue function. Breakthrough developments in spatially resolved transcriptomics (SRT) combine gene expression profiles with spatial coordinates, providing a new paradigm for revealing the molecular blueprint of tissue structure. Current SRT technology frameworks can be broadly divided into two categories: in situ capture sequencing platforms and in situ hybridization-based detection strategies. The former, represented by technologies such as 10*Visium, Slide-Seq, Stereo-Seq, and spatial transcriptomics (ST), combines spatial location information with high-throughput sequencing to achieve transcriptome analysis of thousands of genes within the tissue microenvironment. The latter, including methods such as MERFISH and seqFISH, breaks through the limitations of single-cell resolution, precisely establishing the mapping between gene expression profiles and spatial coordinates. These advances enable researchers to gain a deeper understanding of the spatial organization of biology and the progression of disease.
[0003] The spatial complexity of biological tissues stems from the differentiation of heterogeneous regions, which form spatial domains with specific biological characteristics. Therefore, interpreting these spatial domains is a key challenge in understanding the physiology and pathology of SRT (Spatial Transcriptome Reproduction). Current spatial transcriptome clustering methods can be divided into two categories: non-spatial clustering methods and spatial clustering methods. Classical non-spatial methods (such as K-means and Louvain) divide spatial domains solely based on gene expression matrices, ignoring the biological relationships between spatially adjacent sequencing points. To overcome this limitation, researchers have introduced deep learning architectures, such as graph neural networks (GNNs) and autoencoders, to fuse spatial topology with gene expression features. For example, SpaGCN integrates spatial distance and histological features into an adjacency matrix, combining gene expression data and using graph convolutional networks (GCNs) to learn graph embeddings under unsupervised clustering loss. SEDR uses deep autoencoder networks to learn spatial representations while simultaneously using variational autoencoders to embed spatial information. STAGATE developed an adaptive graph attention autoencoder to learn low-dimensional latent representations of SRT data and identify spatial domains. Although these methods combine gene expression and spatial information for spatial domain identification, they rely entirely on unsupervised learning, resulting in inaccurate embedding representations that are inconsistent with pathological annotations.
[0004] In recent years, generative self-supervised learning frameworks based on masking mechanisms have developed rapidly. By randomly masking parts of the input features, these frameworks force the model to predict the masked content based on contextual information, thereby guiding the model to learn better clustering representations. For example, STMGraph, based on a dynamic graph attention network, uses a mask-remasking mechanism to build a dual-decoding view, enabling the embedding to retain unmasked features while reconstructing masked features. SpaMask constructs a dual-masked graph autoencoder, where sample point and edge masks ensure that spatially adjacent points are similar at the feature level. Although these methods effectively avoid the shortcomings of traditional graph autoencoders that rely solely on adjacency matrix reconstruction, their supervision signals are limited to explicit reconstruction tasks in the original feature space, ignoring implicit supervision in the latent space, thus weakening the model's robustness.
[0005] Graph contrastive learning (GCL), as an emerging self-supervised learning framework, drives the model to learn discriminative low-dimensional embeddings by constructing positive and negative sample pairs, thereby overcoming the impact of data noise and high dimensionality on clustering. For example, GraphST introduces self-supervised graph contrastive learning to learn gene expression maps and their spatial coordinate information representations, enriching latent representations. ConGI applies contrastive learning to adapt gene expression to histopathological images, thereby resolving spatial domains. stDCL integrates spatial location and gene expression information, using spatially aware contrastive learning and cluster-level feature contrastive learning mechanisms to identify spatial domains in complex tissue structures. However, these GCL-based spatial domain recognition models typically only capture local or global information, making it difficult to balance the two, leading to a disconnect between spatial and expression information, blurred domain boundaries, and heavy reliance on the refinement process of clustering results.
[0006] Finally, multilayer integration faces significant methodological challenges. Inconsistencies in the coordinate systems of continuous tissue slices create geometric integration barriers, making it difficult for coordinate-dependent single-layer clustering methods (such as SpaGCN) to integrate multilayers due to the lack of cross-layer information transfer mechanisms. Although some methods, such as GraphST and STitch3D, align spatial coordinates before integrating expression data to eliminate physical batch effects, rigid coordinate alignment alone is insufficient to mitigate these effects when dealing with slices exhibiting significant physical deformation. Furthermore, methods that do not rely on multilayer spatial coordinate alignment, such as Splane33 and SEDR19, struggle to eliminate technical biases between different sequencing platforms due to differences in slice thickness and molecule capture efficiency.
[0007] Therefore, existing technologies still need to be improved and developed. Summary of the Invention
[0008] To overcome the shortcomings of existing technologies, the present invention aims to provide a method for spatial structure analysis and batch effect correction based on SRT data, which aims to solve the technical problems of spatial domain identification, multilayer integration and batch effect correction of multilayer spatially resolved transcriptome data, and achieve high-precision spatial structure analysis and cross-layer consistency maintenance.
[0009] The first aspect of this invention provides a method for spatial structure resolution and batch effect correction based on SRT data, comprising: acquiring gene expression data and spatial coordinates of sample points from multiple tissue slices in spatially resolved transcriptomics data; performing principal component analysis based on the gene expression data of the multiple tissue slices, extracting the first 200 principal component gene expression matrices as gene expression information; constructing a three-dimensional adjacency matrix based on the spatial coordinates of sample points in each tissue slice, randomly masking each row of the gene expression information according to a predetermined ratio to obtain a masked gene expression matrix, and then performing complementary masking on the gene expression information to obtain a complementary masked gene expression matrix; and constructing a spatial structure resolution network model, wherein the spatial structure resolution network model... The model includes a first encoder, a second encoder, a graph predictor, a decoder, a cross-mask latent consistency module, and a spatial-semantic learning module. Using the three-dimensional adjacency matrix, the masked gene expression matrix, and the complementary masked gene expression matrix as input data, and employing reconstruction loss, contrastive loss, and consistency loss as the overall loss function, the spatial structure resolution network model is trained to obtain a trained spatial structure resolution network model. The three-dimensional adjacency matrix A, the masked gene expression matrix, and the complementary masked gene expression matrix, obtained after preprocessing the gene expression data and spatial coordinates from the spatially resolved transcriptomics data to be analyzed, are input into the trained spatial structure resolution network model, outputting a latent representation of spatial structure resolution features.
[0010] Optionally, in a first implementation of the first aspect of the present invention, principal component analysis is performed based on the gene expression data of the multiple tissue slices to extract the first 200 principal component gene expression matrices as gene expression information. This includes the steps of: connecting the gene expression data of multiple tissue slices along the sample point dimension to obtain integrated gene expression data; screening and regularizing the integrated gene expression data, selecting the gene expression matrices of the first 2000 hypervariable genes for principal component analysis, and extracting the first 200 principal component gene expression matrices as gene expression information.
[0011] Optionally, in a second implementation of the first aspect of the present invention, a three-dimensional adjacency matrix is constructed based on the spatial coordinates of each tissue slice sample point, including the steps of: spatially registering the sample points using an iterative nearest neighbor algorithm, minimizing the three-dimensional Euclidean distance between sample points in adjacent slices, and establishing a three-dimensional coordinate system, wherein the tissue slice plane is defined as the XY plane, and the Z-axis represents the distance between adjacent slices; if the three-dimensional Euclidean distance between two sample points is less than 1.1 times the nearest neighbor distance within the slice, a topological connection is established to form a three-dimensional adjacency matrix.
[0012] Optionally, in a third implementation of the first aspect of the present invention, the spatial structure parsing network model is trained using the three-dimensional adjacency matrix, the masked gene expression matrix, and the complementary masked gene expression matrix as input data, and reconstruction loss, contrast loss, and consistency loss as the total loss function, to obtain a trained spatial structure parsing network model. The steps include: inputting the masked gene expression matrix and the three-dimensional adjacency matrix into the first encoder; inputting the output masked gene latent representation, after re-masking, along with the three-dimensional adjacency matrix, into the graph predictor to obtain a predicted gene latent representation; inputting the predicted gene latent representation into the decoder to obtain a reconstructed gene expression matrix; inputting the complementary masked gene expression matrix and the three-dimensional adjacency matrix into the second encoder to output a complementary masked gene latent representation; inputting the complementary masked gene latent representation and the predicted gene latent representation into the cross-masking latent consistency module to output a positive-negative sample pair similarity matrix; and constructing a latent spatial consistency loss L based on the positive-negative sample pair similarity matrix. NCE Based on the reconstructed gene expression matrix, a reconstruction loss L is constructed. SCE The masked gene latent representation and the three-dimensional adjacency matrix are input into the spatial-semantic learning module, which outputs a hybrid neighborhood summary vector and contrastive learning sample pairs. A contrastive loss L is constructed based on the hybrid neighborhood summary vector and the contrastive learning sample pairs. BCE Based on the reconstruction loss L SCE Potential spatial consistency loss L NCE And contrast loss L BCE Construct the overall loss function L = λ1L SCE +λ2L NCE +λ3L BCE The spatial structure parsing network model is trained to obtain a trained spatial structure parsing network model, where λ1, λ2 and λ3 are hyperparameters that control the proportion of reconstruction loss, spatial consistency loss and contrast loss in the total loss, respectively.
[0013] Optionally, in a fourth implementation of the first aspect of the present invention, the masked gene expression matrix and the three-dimensional adjacency matrix are input into the first encoder, and the output masked gene latent representation is re-masked and input together with the three-dimensional adjacency matrix into a graph predictor to obtain a predicted gene latent representation. This includes the following steps: the first encoder includes a feedforward neural network and several graph convolutional layers; the feedforward neural network maps the masked gene expression matrix to a semantic space; the graph convolutional layers perform convolution operations on the mapped masked gene expression matrix and the three-dimensional adjacency matrix to obtain the masked gene latent representation; the masked gene latent representation is re-masked, where the masked rows of the masked gene latent representation are the same as the masked rows in the gene expression information, to obtain a re-masked gene latent representation; the re-masked gene latent representation and the three-dimensional adjacency matrix are input into the graph predictor to output the predicted gene latent representation.
[0014] Optionally, in a fifth implementation of the first aspect of the present invention, the complementary mask gene latent representation and the predicted gene latent representation are input into the cross-mask latent consistency module, outputting a positive-negative sample pair similarity matrix, and a latent space consistency loss L is constructed based on the positive-negative sample pair similarity matrix. NCE The steps include: extracting the complementary mask gene latent representation Z cg And predicting potential representation Z p Extract the mask node set V from each of the following: m The corresponding latent vector is used as a positive sample pair (z) p,i ,z cg,i Then, from the complementary mask gene latent representation Z cg Extract the corresponding latent vector z of K non-masked nodes. cg,j The corresponding latent vectors z of the K non-masked nodes cg,j With z p,i Pairing to form negative sample pairs (z p,i ,z cg, The cosine similarity is calculated for each positive and negative sample pair to obtain an initial positive-negative sample pair similarity matrix. This matrix is then scaled using a temperature parameter τ to sharpen the similarity distribution and highlight the differences between positive and negative samples, resulting in a scaled positive-negative sample pair similarity matrix. Each row of the scaled matrix is then softmax normalized, and the log-likelihood of the positive sample pairs is calculated. The negative values of these log-likelihoods are then averaged to obtain the latent space consistency loss L. NCE .
[0015] Optionally, in a sixth implementation of the first aspect of the present invention, the masked gene latent representation and the three-dimensional adjacency matrix are input into the spatial-semantic learning module, which outputs a summary vector of the mixed neighborhood and contrastive learning sample pairs. A contrastive loss L is then constructed based on the summary vector and the contrastive learning sample pairs. BCE The process includes the following steps: calculating the cosine similarity between sample points based on the masked gene latent representation and the three-dimensional adjacency matrix; selecting spatially adjacent and semantically co-clustered sample points to generate a hybrid neighborhood; aggregating the latent representation of the hybrid neighborhood and calculating its mean; and generating a summary vector by sigmoid activation; and representing z with sample points in the masked gene latent representation. g,i The positive sample pairs are composed of the corresponding summary vectors, and z is perturbed by the corruption function. g,i The positive and negative sample pairs are then paired with the corresponding summary vectors to form negative sample pairs. The similarity between the positive and negative sample pairs is calculated using a bilinear scoring function, and a contrastive loss Li is constructed. BCE .
[0016] A second aspect of this invention provides a spatial structure analysis and batch effect correction device based on SRT data, comprising: a data acquisition module for acquiring gene expression data and spatial coordinates of sample points from multiple tissue slices in spatially resolved transcriptomics data; an extraction module for performing principal component analysis based on the gene expression data of the multiple tissue slices and extracting the first 200 principal component gene expression matrices as gene expression information; an adjacency matrix construction module for constructing a three-dimensional adjacency matrix based on the spatial coordinates of sample points in each tissue slice; a masking module for randomly masking each row of the gene expression information according to a predetermined ratio to obtain a masked gene expression matrix, and then performing complementary masking on the gene expression information to obtain a complementary masked gene expression matrix; and a model construction module for constructing a spatial structure analysis network model. The spatial structure resolution network model includes a first encoder, a second encoder, a graph predictor, a decoder, a cross-mask latent consistency module, and a spatial-semantic learning module. A model training module is used to train the spatial structure resolution network model using the three-dimensional adjacency matrix, masked gene expression matrix, and complementary masked gene expression matrix as input data, and reconstruction loss, contrastive loss, and consistency loss as the overall loss function, to obtain a trained spatial structure resolution network model. A latent representation output module is used to input the three-dimensional adjacency matrix A, masked gene expression matrix, and complementary masked gene expression matrix obtained after preprocessing gene expression data and spatial coordinates from the spatially resolved transcriptomics data to be analyzed into the trained spatial structure resolution network model, and output the latent representation of spatial structure resolution features.
[0017] A third aspect of the present invention provides a spatial structure analysis and batch effect correction device based on SRT data, comprising: a memory and at least one processor, wherein the memory stores computer-readable instructions, and the memory and the at least one processor are interconnected via a circuit; the at least one processor invokes the computer-readable instructions in the memory to cause the spatial structure analysis and batch effect correction device based on SRT data to perform the various steps of the spatial structure analysis and batch effect correction method based on SRT data as described above.
[0018] A fourth aspect of the present invention provides a computer-readable storage medium storing computer-readable instructions that, when executed on a computer, cause the computer to perform the steps of the spatial structure analysis and batch effect correction method based on SRT data as described above.
[0019] Beneficial Effects: This invention provides a spatial structure analysis and batch effect correction method based on SRT data. It employs a cross-mask graph autoencoder to reconstruct gene expression features while preserving spatial relationships and mitigating identity mapping problems. The implicit constraints on latent representations are strengthened through a cross-mask latent consistency module, improving feature robustness. More importantly, effective multi-layer integration is achieved by dynamically integrating local and global contextual information through a spatial-semantic learning module. Extensive evaluations show that this invention outperforms eight commonly used methods on single-layer datasets, achieving robust batch effect correction in multi-layer integration while preserving biologically meaningful spatial architectures, highlighting the potential of this invention in advancing spatial transcriptomics analysis in complex tissue systems. Attached Figure Description
[0020] Figure 1 A flowchart of a spatial structure analysis and batch effect correction method based on SRT data provided in an embodiment of the present invention.
[0021] Figure 2 This is a schematic diagram illustrating the principle of the spatial structure analysis and batch effect correction method based on SRT data of the present invention.
[0022] Figure 3The figures show the results of SpaCross in spatial domain clustering and SVG recognition on the DLPFC dataset. A is a schematic diagram of the DLPFC dataset with manual cortical annotations (L1-L6: cortex; WM: white matter); B is a comparative evaluation of the clustering performance of 12 tissue slices using ARI and ACC; C is a spatial domain visualization of slice 151675; D is a UMAP projection of the latent embeddings; E shows the SVGs identified by SpaCross in slice 151675, including layer-specific markers (such as PLP1 in WM); F is a cross-slice validation plot of conservative SVG spatial patterns (such as PLP1, NEFL) in slice 151509; and G is a comparative evaluation plot of SVG spatial autocorrelation using Moran's I and Geary's C.
[0023] Figure 4 The results of SpaCross robustly depicting tissue structures on various experimental platforms are shown in the following figures: A shows the histopathological regions with H&E staining and manual annotations from the 10xVisium human breast cancer (BRCA) dataset; B shows the results of comparing and evaluating clustering performance using ARI and ACC on the BRCA dataset; C shows the spatial domain identification of tumor regions, highlighting the segmentation differences between SpaCross and benchmark methods—SpaCross's results are highly consistent with the actual annotations, while the benchmark methods suffer from region fragmentation; D shows the comparative evaluation of SVG spatial autocorrelation using Moran's I and Geary's C; E shows the spatial domain visualization of the mouse primary visual cortex (MVC); F shows the embedded UMAP visualization, capturing the linear topological structure of the cortical layers; G shows the dynamic evolution of the hybrid spatial-semantic map during optimization; and H shows the quantitative benchmark test on the mouse somatosensory cortex (MSC).
[0024] Figure 5 The image shows a comparative evaluation of multilayer integration methods for continuous tissue sections. In this image, A shows the clustering performance evaluated using ARI and ACC, spatial coherence measured by DIS, integration efficiency evaluated by F1LISI, and batch effect correction and biomarker preservation balanced. B shows the spatial domains identified by SpaCross, SPIRAL, STitch3D, and STAligner in the integrated sections. C shows the UMAP embedding plots showing batch mixing (top), manual annotation (middle), and spatial domains detected by each method (bottom).
[0025] Figure 6 A schematic diagram of the spatial structure analysis and batch effect correction device based on SRT data provided in an embodiment of the present invention.
[0026] Figure 7A schematic diagram of the spatial structure analysis and batch effect correction device based on SRT data provided in an embodiment of the present invention. Detailed Implementation
[0027] This invention provides a method for spatial structure analysis and batch effect correction based on SRT data. The terms "first," "second," "third," "fourth," etc. (if present) in the specification, claims, and accompanying drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in a sequence other than that illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0028] Existing technologies mainly employ non-spatial clustering methods (such as K-means and Louvain) and spatial clustering methods (such as SpaGCN, SEDR, and STAGATE) to analyze the spatial structure of multi-layer spatially resolved transcriptome data. Among them, spatial clustering methods integrate spatial topology and gene expression features by introducing architectures such as graph neural networks and autoencoders, but they still have the following significant limitations.
[0029] Unsupervised learning constraints: Traditional methods rely entirely on unsupervised learning, leading to inconsistencies between embedded representations and pathological annotations. For example, SpaGCN is based solely on adjacency matrix reconstruction, lacking implicit supervision of the latent space. Imbalance between local and global information: Existing models (such as stDCL and GraphST) struggle to balance local spatial continuity with global semantic consistency, resulting in blurred domain boundaries. For instance, DeepST exhibits discontinuities in layer transition regions. Multilayer integration challenges: Inconsistent coordinate systems, physical deformations, and platform differences among consecutive slices make batch effect correction difficult. For example, STitch3D relies on rigid coordinate alignment and cannot handle significant deformations; SEDR struggles to eliminate cross-platform technical biases. Insufficient feature robustness: Existing self-supervised learning methods (such as STMGraph and SpaMask) limit their supervision signals to explicit reconstruction of the original feature space, ignoring latent space constraints, resulting in poor robustness to noise and missing data.
[0030] Based on this, the present invention provides a method for spatial structure analysis and batch effect correction based on SRT data, such as... Figure 1 As shown, it includes the following steps:
[0031] S10. Obtain gene expression data and spatial coordinates of sample points from multiple tissue slices in spatially resolved transcriptomics data;
[0032] Specifically, the spatially resolved transcriptomics (SRT) data is acquired through high-throughput sequencing technology. It records the gene expression profile of each spot in a tissue section, and the location information (such as coordinates) of each spot corresponds one-to-one with its physical location in the histological image. For example, a tissue section is placed on a Visium chip, the chip surface of which is covered with capture regions containing oligonucleotide probes (each region is called a "spot"). These oligonucleotide probes capture the mRNA released from the section and assign a unique spatial barcode to each spot. The gene expression profile of each spot is obtained through sequencing, and the expression data is mapped to the corresponding spatial coordinates using the probe barcodes, thus obtaining the spatially resolved transcriptomics data, which includes both gene expression data and the spatial coordinates of the spot.
[0033] S20. Based on the gene expression data of the multiple tissue sections, perform principal component analysis and extract the first 200 principal component gene expression matrices as gene expression information;
[0034] Specifically, gene expression data shared across all tissue slices is retained. Then, gene expression data from multiple tissue slices are concatenated along the point dimension to obtain integrated gene expression data. This concatenation step is unnecessary if there is only one slice. Then, the Scanpy tool is used to filter out genes with no information, and the entire gene expression dataset is regularized. Subsequently, principal component analysis (PCA) is performed on the gene expression matrices of the top 2000 highly variable genes, extracting the top 200 principal component gene expression matrices as gene expression information for subsequent model input.
[0035] S30. Construct a three-dimensional adjacency matrix based on the spatial coordinates of each tissue slice sample point;
[0036] Specifically, to fully integrate spatial information, this embodiment first employs the Iterative Nearest Point (ICP) algorithm to spatially register sample points, minimizing the three-dimensional Euclidean distance between sample points in adjacent slices to construct a three-dimensional coordinate system. The tissue slice plane is defined as the XY plane, and the Z-axis represents the distance between adjacent slices. Adjacency relationships are determined through a dynamic threshold principle—if the three-dimensional Euclidean distance between two sample points is less than 1.1 times the nearest neighbor distance within a slice, a topological connection is established, forming a three-dimensional adjacency matrix A. If point j is a neighbor of point i, then A... ij =A ji =1, and the constructed three-dimensional adjacency matrix is then used for each step of the graph neural network.
[0037] S40. The gene expression information is randomly masked in each row according to a predetermined ratio to obtain a masked gene expression matrix. Then, the gene expression information is complementaryly masked to obtain a complementary masked gene expression matrix.
[0038] Specifically, such as Figure 2 As shown, before training the model, the gene expression information needs to be randomly masked in each row according to a predetermined ratio to obtain the masked gene expression matrix X. m Then, the gene expression information is complementally masked to obtain the complementally masked gene expression matrix X. cm .
[0039] Taking the gene expression information containing five sample points—sample point 1, sample point 2, sample point 3, sample point 4, and sample point 5—as an example, the gene expression corresponding to sample points 2 and 4 in the gene expression information is first masked to obtain the masked gene expression matrix X. m Then, the gene expressions corresponding to sample points 1, 3, and 5 in the gene expression information are complementarily masked to obtain the complementary masked gene expression matrix X. cm These matrices are used to generate latent representations and provide supervision signals for those representations. Specifically, they are generated from the set of all sample points at a mask rate ρ. Randomly sample a subset of the mask v m In contrast, complementary mask subsets are represented as Make
[0040] Based on the dot masking mechanism, this embodiment constructs a masked gene expression matrix. This is to solve the identity mapping problem. Specifically, for any sample point v i ,if The corresponding feature vectors are replaced with learnable mask labels. That is, x m,i =x [M] Otherwise, x m,i =x i .
[0041] Similarly, this embodiment constructs a complementary mask gene expression matrix. To provide a continuous monitoring signal in the potential space, it is defined as follows: if Its corresponding feature vector is replaced with a mask marker, i.e., x cm,i =x [M] Otherwise, x cm,i =x i .
[0042] S50. Construct a spatial structure parsing network model, which includes a first encoder, a second encoder, a graph predictor, a decoder, a cross-mask latent consistency module, and a spatial-semantic learning module.
[0043] S60. Using the three-dimensional adjacency matrix, mask gene expression matrix, and complementary mask gene expression matrix as input data, and using reconstruction loss, contrast loss, and consistency loss as the total loss function, the spatial structure parsing network model is trained to obtain the trained spatial structure parsing network model.
[0044] Specifically, such as Figure 2 As shown, the first encoder includes a feedforward neural network (FNN) and several graph convolutional layers (GCN). The feedforward neural network maps the masked gene expression matrix to a semantic space. The graph convolutional layers then perform convolution operations on the mapped masked gene expression matrix and the three-dimensional adjacency matrix to obtain the masked gene latent representation. Specifically, the first encoder uses the three-dimensional adjacency matrix A and the masked gene expression matrix X... m For input, output mask gene latent representation Where d is the dimension of the latent space, that is, For the l-th layer of the FNN, the input is Output features It is given by the following formula: in, Furthermore, L=2, indicating that two layers of feedforward network (FNN) are used, and the output of the last layer (i.e., the second layer) is the output of the entire feedforward encoder; while the input of the feedforward encoder is the masked enhanced feature X. m ; This refers to the input of the feedforward network encoder.
[0045] H f It is the output of the final feedforward network encoder, ELU is the exponential linear unit activation function, and BN represents the batch normalization process.
[0046] Then, utilizing the information propagation mechanism of several graph convolutional layers, the masked node can learn features from its unmasked neighboring sample points, mathematically represented as follows: in These are the weights of the l-th layer of the GCN. It is a symmetric normalized adjacency matrix, defined as
[0047] Next, the masked gene latent representation is re-masked, and the rows of the masked gene latent representation are the same as the rows of the gene expression information that are masked, to obtain the re-masked gene latent representation; the re-masked gene latent representation and the three-dimensional adjacency matrix are input into the graph predictor, and the predicted gene latent representation is output.
[0048] To improve self-supervised learning in masked gene latent representations, this embodiment proposes a graph predictor. Used for latent spatial self-supervision. Graph predictors use remasked gene latent representations Z. m Using the three-dimensional adjacency matrix A as input, generate the predicted gene latent representation. Make Remasking the latent representation of genes Z m By analyzing the mask node set The result is obtained by applying a remasking technique, where sample points in the latent space represent those that are masked. Specifically, for node v... i ,if Its latent representation is replaced with learnable mask labels. That is, z m,i =z [RM] Otherwise, z m,i =z g,i Predicting the latent representation of genes Z p Using the weight matrix W p The calculation is as follows: Z p The complementary representation will be self-supervised and used to reconstruct the original features.
[0049] After obtaining the predicted gene latent representation, the predicted gene latent representation is input into the decoder to obtain the reconstructed gene expression matrix. Then, a reconstruction loss L is constructed based on the reconstructed gene expression matrix. SCE .
[0050] Specifically, decoder Z p Mapping back to the original data space to reconstruct the original features yields a reconstructed gene expression matrix. The calculation formula is as follows: W d This represents the weight matrix. One of the main objectives of this implementation of the reconstruction loss LSCE is to reconstruct V given a set of partially observed points and their adjacency relationships. m The mask feature of the midpoint. The scaling cosine error (SCE) is used as the objective function, defined under a predetermined scaling factor γ as follows: Here, γ is set to 2 to reduce the contribution of simple samples during training. The cosine similarity sim() is calculated as follows, representing the number of elements in the mask set: T represents the transpose of a matrix, and x and y represent different vectors.
[0051] In this embodiment, the second encoder has the same structure as the first encoder. The complementary mask gene expression matrix and the three-dimensional adjacency matrix are input into the second encoder, and the complementary mask gene latent representation is output. Specifically, as follows... Figure 2 As shown, unlabeled self-supervised learning models, such as those based on autoencoder structures, risk overfitting the training data. To address this limitation, this embodiment designs a Cross-Mask Latent Consistency (CMLC) module to provide continuous supervision signals for each mask point in the latent space. The CMLC framework utilizes a second encoder... Implementing a complementary masking strategy, the encoder processes the complementary masking feature matrix X. cm Generate complementary mask gene latent representations using a 3D adjacency matrix A. according to: The complementary mask gene latent representation Z cg To guide the prediction of gene latent representation Z in the latent space p The self-supervised matching provides a continuous supervisory signal.
[0052] In this embodiment, after obtaining the complementary mask gene latent representation and the predicted gene latent representation, both are input into the cross-mask latent consistency module, which outputs a positive-negative sample pair similarity matrix. Based on the positive-negative sample pair similarity matrix, a latent space consistency loss L is constructed. NCE Specifically, in order to force the prediction of the gene latent representation Z p And complementary mask gene latent representation Z cg To ensure semantic consistency between the nodes, this embodiment uses InfoNCE (Noise Contrast Estimation, NCE) loss as the learning objective of the CMLC module. Furthermore, this loss function is applied to the mask node set. Running on it, the dual-view latent space is aligned by comparing node-specific protocols with negative samples of perturbations.
[0053] In terms of form, for each mask node This embodiment uses the complementary mask gene latent representation Z... cg And predicting potential representation Z p Extract the mask node set V from each of the following: m The corresponding latent vector is used as a positive sample pair (z) p,i ,z cg,i Then, from the complementary mask gene latent representation Z cg Extract the corresponding latent vector z of K non-masked nodes. cg,j The corresponding latent vectors z of the K non-masked nodes cg,j With z p,iPairing to form negative sample pairs (z p,i ,z cg, ).
[0054] Cosine similarity is calculated for each positive and negative sample pair to obtain an initial positive-negative sample pair similarity matrix. The initial similarity matrix is then scaled using a temperature parameter τ to sharpen the similarity distribution and highlight the differences between positive and negative samples, resulting in a scaled positive-negative sample pair similarity matrix. Each row of the scaled similarity matrix is then softmax normalized, and the log-likelihood of the positive sample pairs is calculated. The negative values of these log-likelihoods are then averaged to obtain the latent space consistency loss L. NCE ,
[0055] Here, τ = 0.5 is a temperature parameter used to sharpen the similarity distribution.
[0056] In this embodiment, the masked gene latent representation and the three-dimensional adjacency matrix are further input into the spatial-semantic learning module, which outputs a hybrid neighborhood summary vector and contrastive learning sample pairs. Based on the hybrid neighborhood summary vector and contrastive learning sample pairs, a contrastive loss L is constructed. BCE In this embodiment, the cosine similarity between sample points is calculated based on the masked gene latent representation and the three-dimensional adjacency matrix. Spatially adjacent and semantically co-clustered sample points are selected to generate a hybrid neighborhood. The hybrid neighborhood is then aggregated with latent representation and its mean is calculated. A summary vector is generated by sigmoid activation. The sample points in the masked gene latent representation represent z. g,i The positive sample pairs are composed of the corresponding summary vectors, and z is perturbed by the corruption function. g,i The positive and negative sample pairs are then paired with the corresponding summary vectors to form negative sample pairs. The similarity between the positive and negative sample pairs is calculated using a bilinear scoring function, and a contrastive loss Li is constructed. BCE .
[0057] Specifically, for any target sample point To construct a hybrid spatial-semantic neighborhood, this embodiment first forms a candidate nearest neighbor set. This neighborhood set consists of cross-level candidate sets. and inter-layer candidate set Composition, that is
[0058]
[0059] Assume T(i) = t represents the sample point v i The latent representation of the slice to which it belongs is denoted as z. i First, by calculating the sample point v i v with each other sample point within the same slice (i.e., satisfying T(j) = t) j Cosine similarity sim(z) between ∈Vi ,z j Construct a cross-layer candidate set. During the separate inference phase, the first encoder of the three-dimensional adjacency matrix A is used. Encode feature X to obtain the latent graph embedding matrix Right now Then, select with v i The top K with the highest similarity across A number of sample points are used to form a cross-layer candidate set (if it is a single slice, the candidate set is only a cross-layer candidate set), formally defined as:
[0060]
[0061] For the inter-layer candidate set, this embodiment considers all sample points from slices different from slice T(i). That is, for each sample point v in other slices... j (that is, satisfying T(j)≠t), we calculate v i The cosine similarity between these interlayer points is then used to select the top K points with the highest similarity. inter 100 sample points. Formal definition:
[0062] To ensure that candidate sample points are not only similar in the latent semantic space, but also exhibit spatial continuity, we introduce spatial constraints and set the spatial constraint neighborhood. Defined as: in Indicates sample point v i We define the local neighborhood set.
[0063] To capture global semantic consistency, this embodiment applies the k-means clustering algorithm to the latent representation Z, thereby dividing all points in the semantic space into several clusters and setting up semantically similar neighborhood sets. Defined as: in Indicates the relationship with sample point v i A set of points belonging to the same cluster. When sample point v i When assigned to cluster c, we define It is assigned to cluster c.
[0064] Finally, this embodiment integrates spatially constrained neighborhood sets. semantically similar neighborhood set Forming spatial-semantic hybrid neighborhoods Mixed Neighborhood It not only preserves local spatial continuity but also emphasizes global semantic consistency, thus providing richer and more refined neighborhood information for clustering tasks.
[0065] This embodiment extracts an integrated node embedding matrix by utilizing an adaptive hybrid spatial-semantic nearest neighbor approach. This matrix captures detailed spatial and semantic features. Specifically, for each sample point v... i We compute the aggregated summary vector s by applying a neighborhood aggregation function to its fused neighborhood. i : in This represents the number of neighbors in the mixed set.
[0066] Summary vector s i Used as anchor points for alignment point embedding. Specifically, positive sample pairs (z g,i ,s i The latent embedding z of the sample points g,i and its corresponding summary vector s i To generate negative samples, we perturb the original embedding using a corruption function to obtain... therefore, Used as a negative sample pair.
[0067] This embodiment enhances their alignment in the embedding space by maximizing the mutual information between node embeddings and their corresponding summary vectors, while mitigating the collapse phenomenon. This alignment is enforced using a contrastive objective of binary cross-entropy (BCE) loss: The discriminator D() is a bilinear scoring function. The above formula not only ensures that each point is embedded in z g,i Relative to its aggregated summary vector s i It is highly informative and robustly distinguishes damaged embeddings, thereby significantly improving the clustering performance of the model.
[0068] Finally, this embodiment is based on the reconstruction loss L SCE Potential spatial consistency loss L NCE And contrast loss L BCE Construct the overall loss function L = λ1L SCE +λ2L NCE +λ3L BCE The spatial structure parsing network model is trained to obtain a trained spatial structure parsing network model, where λ1, λ2, and λ3 are hyperparameters controlling the proportions of reconstruction loss, spatial consistency loss, and contrast loss in the total loss, respectively. Specifically, the total loss function is adjusted by weight factors λ1, λ2, and λ3 and includes three main components: the reconstruction loss L... SCE The reconstruction error used to measure mask features; the latent spatial consistency loss L NCE The aim is to ensure the consistency of the latent representations; and the contrastive loss L_BCE is used to optimize the similarity and dissimilarity between samples.
[0069] S70. The three-dimensional adjacency matrix A, the mask gene expression matrix, and the complementary mask gene expression matrix obtained after preprocessing the gene expression data and spatial coordinates in the spatially resolved transcriptomics data to be analyzed are input into the trained spatial structure resolution network model, and the spatial structure resolution feature latent representation is output.
[0070] The spatial structure analytical feature latent representation obtained by the method of this invention retains both spatial continuity and clustering accuracy. The Mclust algorithm can be used to perform spatial clustering on the spatial structure analytical feature latent representation. This algorithm automatically determines the optimal number of clusters by maximizing the fit of a Gaussian mixture model, and each cluster represents a different spatial domain.
[0071] Specifically, the method of this invention aims to improve the accuracy and cross-layer consistency of spatial pattern recognition in spatial transcriptomics data. To support the integrated analysis of multi-layer spatial transcriptomics data, this invention first performs data preprocessing, integrating multi-layer gene expression matrices along the point dimension; then, the data is filtered and standardized to remove low-quality genes, retaining only highly variable genes for subsequent modeling. Next, principal component analysis (PCA) is applied for dimensionality reduction to reduce computational complexity. Subsequently, a 3D spatial registration method is employed, using the Iterative Nearest Point (ICP) algorithm to align the spatial coordinates of different layers, dynamically constructing a three-dimensional adjacency matrix to ensure the continuity of spatial relationships. Based on the adjusted three-dimensional spatial coordinates, Euclidean distance is calculated to construct a k-nearest neighbor (k-NN) graph, thus forming the topological structure of the spatial graph. This ensures that the model captures the spatial continuity of adjacent points while preserving cross-layer semantic propagation.
[0072] To enhance the model's robustness to missing data and noise, this invention introduces a cross-mask self-supervised learning mechanism. Specifically, two complementary mask views are randomly generated on the input features, used for feature reconstruction and latent space consistency learning, respectively. The mask feature matrix simulates missing information during training, improving the model's imputation ability, while the complementary mask provides consistent supervision to the latent space, effectively mitigating overfitting in the autoencoder. The encoder integrates a feedforward neural network and a graph convolutional layer, utilizing the graph structure to propagate features between adjacent nodes, generating a more robust latent representation.
[0073] During the latent representation learning process, this invention further enhances the reliability of these representations through a cross-mask latent consistency (CMLC) mechanism. The model uses contrastive learning to align latent embeddings generated from complementary views, thereby strengthening the consistency between different views and ensuring that feature representations remain stable even when dealing with incomplete data or views augmented with different data.
[0074] To address the insufficient integration of local and global information in SRT data, this invention develops an adaptive hybrid spatial-semantic graph modeling method. Based on latent embeddings, the model selects local spatial neighbors and global semantic clustering neighbors, fusing them into a unified hybrid neighborhood. This strategy not only preserves spatial continuity but also introduces cross-regional semantic consistency, significantly enhancing the discriminative ability of downstream tasks. By aggregating hybrid neighborhood features and using contrastive learning to optimize node embeddings, the model effectively delineates the boundaries between different categories.
[0075] The method employed in this invention enables complex spatial domain identification within tissue sections while preserving spatial continuity and clustering accuracy. It supports the integration of consecutive sections, comparisons across developmental stages, and cross-platform datasets. These capabilities provide a unified framework for spatial omics research in development, disease, and cross-species studies.
[0076] In a specific embodiment of this invention, during the data preprocessing stage, we first use the PCA algorithm to extract the top 200 principal components from 2000 highly variable genes as input features. For datasets with fewer than 2000 but more than 200 genes, PCA is directly applied to the available genes; if the number of genes is less than 200, all normalized gene expressions are used as input features. For datasets generated from 10xVisium, Stereo-seq, and STARmap platforms, we use K=12 neighbors to construct a spatial adjacency graph, while for datasets from other platforms, the number of neighbors is controlled between 6 and 8. In the first encoder and second encoder... In this model, two layers of FNN with dimensions of 64 and 32 are used, followed by two layers of GCN with dimensions of 64 and 16, generating a 16-dimensional latent space. (Graph predictor) A GCN with an output dimension of 16 is used. Decoder It consists of a single GCN layer with an output dimension equal to 200 of the input dimension. The mask rate is set to ρ = 0.5, the scaling factor to γ = 2, and the temperature parameter to τ = 0.5. K is chosen for the calculation of the mixed neighbors. across =K inter The 15 most similar neighbors were selected as candidate points, and the update step size for the spatial-semantic hybrid neighbors was fixed at 50. The loss weight factors λ1, λ2, and λ3 were set to 0.5, 0.2, and 0.9, respectively. During training, the Adamizer was used to optimize the total loss function, with an initial learning rate of 0.001 and a weight decay of 0.0003, for a total of 300 training epochs.
[0077] The effectiveness of the spatial structure analysis and batch effect correction method based on SRT data of the present invention is verified through specific experiments:
[0078] To systematically evaluate the spatial domain recognition performance of the method (SpaCross) of this invention, we first benchmarked it on the human dorsolateral prefrontal cortex (DLPFC) dataset generated on the 10x Visium platform. This dataset contains 12 tissue slices from three independent donors (four slices from each donor). Maynard et al. manually annotated each slice with six cortical layers (layers 1-6) and one white matter (WM) layer (e.g., based on established molecular markers and morphological features). Figure 3 (As shown in A). For quantitative comparison, we evaluated SpaCross against eight state-of-the-art spatial clustering methods (stDCL, DiffusionST, GraphST, STAGATE, SEDR, DeepST, CCST, and SpaGCN) using adjusted RAND index (ARI) and clustering accuracy (ACC, both in the range [0,1], with higher values indicating better performance; calculation details are in the Methods section). Empirical results show that SpaCross exhibits superior clustering ability across all 12 slices (e.g., ...). Figure 3 As shown in Figure B), it achieved the highest average score (ARI = 0.566, ACC = 0.681) and median score (median ARI = 0.585, median ACC = 0.683). Notably, the second-best performing stDCL had significantly lower scores (average ARI = 0.537, median ARI = 0.563), with an ARI performance gap exceeding 0.029.
[0079] To further elucidate the superiority of SpaCross in hierarchical spatial domain identification, we conducted an in-depth analysis of representative slice 151675. For example... Figure 3 As shown in Figure C, SpaCross not only achieves the best clustering metrics (ARI = 0.683, ACC = 0.761) but also demonstrates excellent topological organization: all domains exhibit clearly defined boundaries with no point mixing. In contrast, DeepST shows discontinuous layer transitions (such as between domains 2 and 4), SEDR completely embeds domain 1 within domain 6, and STAGATE, while distinguishing the main layered structure, fails to adequately separate adjacent regions of layer 1 and domain 6. It is worth noting that post-processing-dependent (label refinement) methods, such as stDCL, DiffusionST, and GraphST, partially improve region continuity (e.g., reducing the mixing of domains 3 / 4 with domain 2), but still retain local embedding anomalies (e.g., ...). Figure 3 (As shown in C).
[0080] To explore the algorithm mechanism, we visualized the latent embeddings using UMAP (the results are shown below). Figure 3(As shown in D). SpaCross reveals a linear developmental trajectory from layer 1 to WM, accurately reflecting the biological hierarchical structure of the cortical architecture. In contrast, stDCL and GraphST exhibit interlayer overlap in the embedding space, indicating incomplete untangling of spatial heterogeneity. This difference stems from stDCL and GraphST neglecting global clustering semantics during contrastive learning, resulting in disordered and fragmented clustering patterns. In contrast, SpaCross effectively integrates spatial proximity and global clustering information to preserve topological relationships. Quantitatively, we introduce a discreteness score to assess the latent spatial quality, with lower values indicating better spatial continuity. Experimental results show that SpaCross's DIS (0.0394) is significantly lower than that of stDCL (0.1675) and GraphST (0.0831), confirming its enhanced topological fidelity. This conclusion is consistent across all other slices, highlighting the robustness of SpaCross in resolving complex spatial hierarchies.
[0081] To investigate the specificity of SpaCross in identifying spatially variable genes (SVGs), we used a workflow similar to SpaGCN to systematically screen for SVGs enriched in each spatial domain. In slice 151675, SpaCross detected 30 SVGs with different spatial heterogeneity: 23 genes in domain 7 (e.g., PLP1, a myelination marker), 1 gene in domain 4 (NEFL, neurofilament light chain), and 6 genes in domain 3 (e.g., ENC1, a cytoskeleton regulator) (e.g., ...). Figure 3 (As shown in section E). Cross-slice validation showed that these genes have conserved spatial expression patterns in the corresponding domains of slice 151509 (e.g., ...). Figure 3 (As shown in Figure F). We performed a comprehensive comparative evaluation of the SVGs identified by SpaCross with five state-of-the-art spatial analysis methods (stDCL, GraphST, STAGATE, SEDR, SpaGCN) using Moran's I and Geary's C indices. Figure 3 As shown in G, compared with other methods, SpaCross exhibits a higher median Moran's I value and a lower median Geary's C value, indicating that the SVGs it detects have a stronger spatial autocorrelation pattern.
[0082] To systematically verify the cross-platform robustness of the method (SpaCross) of this invention, we first evaluated its performance using a complex human breast cancer (BRCA) dataset generated on the 10x Visium platform. This dataset contains 20 histopathological regions, meticulously annotated by Xu et al. based on H&E staining images, including ductal carcinoma in situ / lobular carcinoma in situ (DCIS / LCIS), healthy tissue, invasive ductal carcinoma (IDC), and tumor margin regions (such as...). Figure 4 As shown in Figure A). Experimental results show that SpaCross has significant performance advantages: it achieves the highest spatial domain clustering accuracy (ARI = 0.65 and ACC = 0.72), which is more than 0.07 higher than the poorly performing methods DiffusionST (ARI = 0.58) and DeepST (ARI = 0.58). Figure 4 (As shown in B).
[0083] At the organizational topology resolution level, SpaCross effectively addresses the segmentation bias of the IDC2 and IDC5 regions found in existing algorithms (including stDCL, DiffusionST, GraphST, and SEDR). Figure 4 As shown in Figure C, although these methods incorrectly segment continuous IDC regions into discrete subclusters when pathological features are clearly blocky, the identification results of SpaCross show significant consistency with manual annotation. Quantitative analysis of the inter-cluster similarity matrix shows that domains 8, 20, and 5 defined by SpaCross have strong spatial co-localization with IDC1, IDC2, and IDC4, respectively. Notably, the algorithm accurately depicts the dynamic tumor edge transition zone, mapping spatial domain 1 to Tumor_edge_3 and domain 3 to Tumor_edge_2 (e.g., ...). Figure 4 As shown in Figure C), SpaCross demonstrates tumor-stromal boundary resolution capabilities comparable to recent contrastive learning frameworks. SVG analysis further confirms SpaCross's superiority, exhibiting a significantly higher median Moran's I (0.646) and a lower Geary's C (0.352) compared to existing models. Figure 4 (As shown in D). These metrics demonstrate enhanced spatial autocorrelation detection, consistent with advanced representation learning methods that effectively capture gene expression gradients. Furthermore, we evaluated the spatial domain recognition performance of SpaCross and baseline methods on a more complex mouse brain dataset containing 52 spatial domains on the 10xVisium platform. SpaCross achieved the highest accuracy among all methods, with an ARI of 0.5, and exhibited more spatially continuous domain segmentation.
[0084] Next, we systematically evaluated SpaCross's ability to depict anatomical structures on a mouse primary visual cortex (MVC) dataset generated by the STARmap platform. This dataset contains manually annotated anatomical regions, including the corpus callosum (CC), hippocampus (HPC), and six neocortices (L1-L6). Experimental results show that SpaCross significantly outperforms all benchmark methods (such as...). Figure 4(As shown in E). Consistent with previous findings, stDCL failed to distinguish between HPC and CC regions. GraphST and SpaGCN both showed interlayer cellular mixing, while STAGATE and SEDR failed to resolve the anatomical boundaries between L2 / 3 and L4 layers. CCST incorrectly segmented the CC region into multiple subdomains and misclassified the HPC and L1 layer as a single anatomical domain. Notably, SpaCross achieved perfect consistency with the gold standard anatomical annotation, producing clearly defined domain boundaries without cellular mixing.
[0085] UMAP visualization of potential embeddings (e.g.) Figure 4 As shown in Figure F, SpaCross captures a linear topology consistent with cortical layer development. In contrast, CCST exhibits erroneous overlap between the centroids of domains 1 and 5, while GraphST, STAGATE, and SEDR generate biologically uninterpretable embedding distributions. To elucidate the biological relevance of the SpaCross representation, we track the dynamic evolution of the hybrid spatial-semantic proximity graph during iterative optimization (e.g., ...). Figure 4 (As shown in G). The initial PCA-based gene expression similarity network only showed intra-domain correlations in CC, while anomalous cross-domain similarities were prevalent in other regions. Through iterative refinement, SpaCross progressively strengthens intra-domain associations while maintaining biologically meaningful inter-domain relationships, demonstrating an effective integration of spatial proximity and global co-expression patterns to establish a biologically coherent topological representation.
[0086] Finally, we performed quantitative benchmarking on a mouse somatosensory cortex (MSC) dataset generated on the osmFISH platform. In terms of anatomical accuracy as measured by ARI, the benchmark methods scored below 0.6, while SpaCross demonstrated superior performance with an ARI of 0.67 (e.g., ...). Figure 4 (As shown in H). Notably, stDCL exhibits domain assignment errors in layer 5 by inappropriately mixing multiple anatomical regions, while GraphST incorrectly divides layer 4 into two discrete partitions. SpaCross not only accurately depicts layer boundaries but also maintains excellent biological fidelity in resolving spatial cell distribution patterns (e.g., ...). Figure 4 (As shown in H).
[0087] To comprehensively evaluate the cross-layer integration performance of our proposed method (SpaCross), we implemented it on an independent DLPFC donor dataset and benchmarked it against state-of-the-art multilayer integration methods (SPIRAL, STitch3D, Splane, and STAligner). Our multidimensional evaluation framework includes: 1) cluster consistency analysis using manual annotation as the gold standard (quantified via ARI and ACC); 2) spatial domain dispersion assessment (DIS), explicitly excluding post-processing of label refinement; and 3) a novel composite index, F1-harmonized Local Inverse Simpson Index (F1LISI), used to quantify the balance between batch effect removal and bioprotection. The F1LISI index (range: 0-1) integrates batch group LISI (LISI_batch) and domain group LISI (LISI_domain) through an adjustable weighting coefficient α (described in detail in the methodology), with higher values indicating better performance in preserving biological variation while eliminating technical noise.
[0088] like Figure 5 As shown in Figure A, SpaCross demonstrates superior performance across all evaluation metrics, achieving median ARI = 0.637, ACC = 0.702, DIS = 0.0413, and F1LISI = 0.915. This advantage stems from its innovative multi-layer integrated hybrid adjacency graph architecture, which enhances spatial domain continuity through cross-layer representation similarity modeling and improves cross-layer robustness by incorporating a latent spatial consistency module. SPIRAL exhibits competitive clustering accuracy (ARI = 0.637) and batch mixing (F1LISI = 0.915) through its dual embedding learning mechanism (independent optimization of biological and batch embeddings). However, its spatial domain results show discrete outliers (such as...). Figure 5 As shown in B), the highest DIS value of 0.0908 is reflected in this, which may be due to insufficient preservation of local spatial information during domain adaptation. STitch3D uses ICP and PASTE algorithms for spatial coordinate alignment and integrates single-cell reference data to achieve a suboptimal median index (ARI = 0.538, ACC = 0.685). Its rigid transformation assumption limits its adaptability to complex tissue deformation. STAligner exhibits spatial domain discontinuity in layer 4 (e.g., ... Figure 5 As shown in Figure B), this indicates that its graph attention mechanism failed to capture continuous structural features in deep tissue layers. UMAP visualization (as shown in Figure B) Figure 5As shown in Figure C, the multi-layer integration of Splane is poor (F1LISI = 0.671), with slice 151673 forming isolated clusters and exhibiting significant intra-domain mixing, highlighting the limitations of spatial constraint modeling. Although GraphST and SEDR achieve sufficient batch mixing, they lack clear spatial boundaries, indicating a trade-off in local feature preservation within contrastive learning frameworks. Notably, only SpaCross and SPIRAL successfully balance batch correction with the preservation of biological tissue topology.
[0089] The spatial structure analysis and batch effect correction method based on SRT data in the embodiments of the present invention has been described above. The spatial structure analysis and batch effect correction device based on SRT data in the embodiments of the present invention is described below. Please refer to [link to relevant documentation]. Figure 6 One embodiment of the spatial structure analysis and batch effect correction device based on SRT data in this invention includes:
[0090] Data acquisition module 10 is used to acquire gene expression data and spatial coordinates of sample points from multiple tissue slices in spatially resolved transcriptomics data;
[0091] Extraction module 20 is used to perform principal component analysis based on the gene expression data of the multiple tissue slices, and extract the first 200 principal component gene expression matrices as gene expression information;
[0092] The adjacency matrix construction module 30 is used to construct a three-dimensional adjacency matrix based on the spatial coordinates of each tissue slice sample point;
[0093] The masking module 40 is used to randomly mask each row of the gene expression information according to a predetermined ratio to obtain a masked gene expression matrix, and then perform complementary masking on the gene expression information to obtain a complementary masked gene expression matrix.
[0094] The model building module 50 is used to build a spatial structure parsing network model, which includes a first encoder, a second encoder, a graph predictor, a decoder, a cross-mask latent consistency module, and a spatial-semantic learning module.
[0095] The model training module 60 is used to train the spatial structure parsing network model with the three-dimensional adjacency matrix, mask gene expression matrix and complementary mask gene expression matrix as input data, and reconstruction loss, contrast loss and consistency loss as the total loss function, so as to obtain the trained spatial structure parsing network model.
[0096] The latent representation output module 60 is used to input the three-dimensional adjacency matrix A, mask gene expression matrix and complementary mask gene expression matrix obtained after preprocessing the gene expression data and spatial coordinates in the spatially resolved transcriptomics data to be analyzed into the trained spatial structure resolution network model, and output the latent representation of spatial structure resolution features.
[0097] above Figure 6 The spatial structure analysis and batch effect correction device based on SRT data in this embodiment of the invention is described in detail from the perspective of modular functional entities. The spatial structure analysis and batch effect correction device based on SRT data in this embodiment of the invention is described in detail below from the perspective of hardware processing.
[0098] Figure 7 This is a schematic diagram of a spatial structure analysis and batch effect correction device based on SRT data provided in an embodiment of the present invention. This SRT data-based spatial structure analysis and batch effect correction device 100 can vary considerably due to different configurations or performance. It may include one or more central processing units (CPUs) 11 (e.g., one or more processors) and a memory 12, and one or more storage media 13 (e.g., one or more mass storage devices) for storing application programs 133 or data 132. The memory 12 and storage media 13 can be temporary or persistent storage. The program stored in the storage media 13 may include one or more modules (not shown in the diagram), each module including a series of instruction operations on the SRT data-based spatial structure analysis and batch effect correction device 100. Furthermore, the processor 11 may be configured to communicate with the storage media 13 and execute the series of instruction operations in the storage media 13 on the super-resolution gene expression map prediction device 100.
[0099] The spatial structure analysis and batch effect correction device 100 based on SRT data may also include one or more power supplies 14, one or more wired or wireless network interfaces 15, one or more input / output interfaces 16, and / or one or more operating systems 131, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will understand that... Figure 7 The device structure shown does not constitute a limitation on the SRT data-based spatial structure analysis and batch effect correction device 100, and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0100] The present invention also provides a computer-readable storage medium, which can be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the steps of a spatial structure analysis and batch effect correction method based on SRT data.
[0101] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the system, device, or unit described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0102] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0103] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for spatial structure analysis and batch effect correction based on SRT data, characterized in that, Including the following steps: Obtain gene expression data and spatial coordinates of sample points from multiple tissue slices in spatially resolved transcriptomics data; Principal component analysis was performed based on the gene expression data of the multiple tissue sections, and the first 200 principal component gene expression matrices were extracted as gene expression information. A three-dimensional adjacency matrix is constructed based on the spatial coordinates of each tissue slice sample point. The gene expression information is randomly masked in each row according to a predetermined ratio to obtain a masked gene expression matrix. Then, the gene expression information is complementaryly masked to obtain a complementary masked gene expression matrix. A spatial structure parsing network model is constructed, which includes a first encoder, a second encoder, a graph predictor, a decoder, a cross-mask latent consistency module, and a spatial-semantic learning module. Using the three-dimensional adjacency matrix, mask gene expression matrix, and complementary mask gene expression matrix as input data, and using reconstruction loss, contrast loss, and consistency loss as the total loss function, the spatial structure parsing network model is trained to obtain the trained spatial structure parsing network model. The three-dimensional adjacency matrix A, the mask gene expression matrix, and the complementary mask gene expression matrix obtained after preprocessing the gene expression data and spatial coordinates in the spatially resolved transcriptomics data to be analyzed are input into the trained spatial structure resolution network model, and the spatial structure resolution feature latent representation is output.
2. The spatial structure analysis and batch effect correction method based on SRT data according to claim 1, characterized in that, Principal component analysis was performed based on the gene expression data from the multiple tissue sections, and the first 200 principal component gene expression matrices were extracted as gene expression information, including the following steps: Gene expression data from multiple tissue slices are connected along the sample point dimension to obtain integrated gene expression data; The integrated gene expression data were screened and regularized. Principal component analysis was performed on the gene expression matrices of the top 2000 hypervariable genes, and the top 200 principal component gene expression matrices were extracted as gene expression information.
3. The spatial structure analysis and batch effect correction method based on SRT data according to claim 1, characterized in that, Constructing a three-dimensional adjacency matrix based on the spatial coordinates of sample points in each tissue slice, including the following steps: The iterative nearest point algorithm is used to spatially register sample points, minimize the three-dimensional Euclidean distance between sample points in adjacent slices, and establish a three-dimensional coordinate system, in which the tissue slice plane is defined as the XY plane and the Z-axis represents the distance between adjacent slices; If the three-dimensional Euclidean distance between two sample points is less than 1.1 times the nearest neighbor distance within the slice, a topological connection is established to form a three-dimensional adjacency matrix.
4. The spatial structure analysis and batch effect correction method based on SRT data according to claim 1, characterized in that, Using the aforementioned three-dimensional adjacency matrix, masked gene expression matrix, and complementary masked gene expression matrix as input data, and employing reconstruction loss, contrastive loss, and consistency loss as the overall loss function, the spatial structure parsing network model is trained to obtain a trained spatial structure parsing network model, including the following steps: The masked gene expression matrix and the three-dimensional adjacency matrix are input into the first encoder. The output masked gene latent representation is then re-masked and input into the graph predictor along with the three-dimensional adjacency matrix to obtain the predicted gene latent representation. The predicted gene latent representation is input into the decoder to obtain the reconstructed gene expression matrix; The complementary mask gene expression matrix and the three-dimensional adjacency matrix are input into the second encoder to output the complementary mask gene latent representation; The complementary mask gene latent representation and the predicted gene latent representation are input into the cross-mask latent consistency module, which outputs a positive-negative sample pair similarity matrix. Based on the positive-negative sample pair similarity matrix, a latent space consistency loss L is constructed. NCE ; Based on the reconstructed gene expression matrix, a reconstruction loss L is constructed. SCE ; The masked gene latent representation and the three-dimensional adjacency matrix are input into the spatial-semantic learning module, which outputs a hybrid neighborhood summary vector and contrastive learning sample pairs. A contrastive loss L is then constructed based on the hybrid neighborhood summary vector and the contrastive learning sample pairs. BCE ; Based on the reconstruction loss L SCE Potential spatial consistency loss L NCE And contrast loss L BCE Construct the overall loss function L = λ1L SCE +λ2L NCE +λ3L BCE The spatial structure parsing network model is trained to obtain a trained spatial structure parsing network model, where λ1, λ2 and λ3 are hyperparameters that control the proportion of reconstruction loss, spatial consistency loss and contrast loss in the total loss, respectively.
5. The spatial structure analysis and batch effect correction method based on SRT data according to claim 1, characterized in that, The masked gene expression matrix and the three-dimensional adjacency matrix are input into the first encoder. The output masked gene latent representation is then re-masked and input together with the three-dimensional adjacency matrix into the graph predictor to obtain the predicted gene latent representation, including the following steps: The first encoder includes a feedforward neural network and several graph convolutional layers. The feedforward neural network maps the masked gene expression matrix to the semantic space. The graph convolutional layers perform convolution operations on the mapped masked gene expression matrix and the three-dimensional adjacency matrix to obtain the masked gene latent representation. The masked gene latent representation is then re-masked, and the rows of the masked gene latent representation that are masked are the same as the rows of the gene expression information that are masked, thus obtaining the re-masked gene latent representation. The remasked gene latent representation and the three-dimensional adjacency matrix are input into the graph predictor, which outputs the predicted gene latent representation.
6. The spatial structure analysis and batch effect correction method based on SRT data according to claim 1, characterized in that, The complementary mask gene latent representation and the predicted gene latent representation are input into the cross-mask latent consistency module, which outputs a positive-negative sample pair similarity matrix. Based on the positive-negative sample pair similarity matrix, a latent space consistency loss L is constructed. NCE The steps include: From the complementary mask gene latent representation Z cg And predicting potential representation Z p Extract the mask node set V from each of the following: m The corresponding latent vector is used as a positive sample pair (z) p,i ,z cg,i Then, from the complementary mask gene latent representation Z cg Extract the corresponding latent vector z of K non-masked nodes. cg,j The corresponding latent vectors z of the K non-masked nodes cg,j With z p,i Pairing to form negative sample pairs (z p,i ,z cg, ); Cosine similarity is calculated for the positive and negative sample pairs respectively to obtain an initial positive and negative sample pair similarity matrix. The initial positive and negative sample pair similarity matrix is then scaled using the temperature parameter τ to make the similarity distribution sharper and highlight the differences between positive and negative samples, thus forming a scaled positive and negative sample pair similarity matrix. Softmax normalization is applied to each row of the scaled similarity matrix of positive and negative sample pairs. The log-likelihood of the positive sample pairs is calculated, and the negative values of the log-likelihoods are taken and averaged to obtain the latent space consistency loss L. NCE .
7. The spatial structure analysis and batch effect correction method based on SRT data according to claim 1, characterized in that, The masked gene latent representation and the three-dimensional adjacency matrix are input into the spatial-semantic learning module, which outputs a summary vector of the mixed neighborhood and contrastive learning sample pairs. Based on the summary vector and contrastive learning sample pairs, a contrastive loss L is constructed. BCE The steps include: The cosine similarity between sample points is calculated based on the masked gene latent representation and the three-dimensional adjacency matrix. Spatially adjacent and semantically co-clustered sample points are selected to generate a hybrid neighborhood. The mixed neighborhood is aggregated and its mean is calculated, and then a summary vector is generated by Sigmoid activation. z is represented by sample points in the masked gene latent representation. g,i The positive sample pairs are composed of the corresponding summary vectors, and z is perturbed by the corruption function. g,i The positive and negative sample pairs are then paired with the corresponding summary vectors to form negative sample pairs. The similarity between the positive and negative sample pairs is calculated using a bilinear scoring function, and a contrastive loss Li is constructed. BCE .
8. A spatial structure analysis and batch effect correction device based on SRT data, characterized in that, include: The data acquisition module is used to acquire gene expression data and spatial coordinates of sample points from multiple tissue slices in spatially resolved transcriptomics data. The extraction module is used to perform principal component analysis based on the gene expression data of the multiple tissue slices, and extract the first 200 principal component gene expression matrices as gene expression information; The adjacency matrix construction module is used to construct a three-dimensional adjacency matrix based on the spatial coordinates of each tissue slice sample point; The masking module is used to randomly mask each row of the gene expression information according to a predetermined ratio to obtain a masked gene expression matrix, and then perform complementary masking on the gene expression information to obtain a complementary masked gene expression matrix. The model building module is used to build a spatial structure parsing network model, which includes a first encoder, a second encoder, a graph predictor, a decoder, a cross-mask latent consistency module, and a spatial-semantic learning module. The model training module is used to train the spatial structure parsing network model with the three-dimensional adjacency matrix, mask gene expression matrix and complementary mask gene expression matrix as input data, and reconstruction loss, contrast loss and consistency loss as the total loss function, to obtain the trained spatial structure parsing network model. The latent representation output module is used to input the three-dimensional adjacency matrix A, mask gene expression matrix, and complementary mask gene expression matrix obtained after preprocessing the gene expression data and spatial coordinates in the spatially resolved transcriptomics data to be analyzed into the trained spatial structure resolution network model, and output the latent representation of spatial structure resolution features.
9. A spatial structure analysis and batch effect correction device based on SRT data, characterized in that, It includes a memory and at least one processor, wherein the memory stores computer-readable instructions; The at least one processor invokes the computer-readable instructions in the memory to perform the steps of the spatial structure analysis and batch effect correction method based on SRT data as described in any one of claims 1-7.
10. A computer-readable storage medium storing computer-readable instructions thereon, characterized in that, When the computer-readable instructions are executed by the processor, they implement the steps of the spatial structure analysis and batch effect correction method based on SRT data as described in any one of claims 1-7.