Explainable spatial domain identification method for spatial transcriptomics
By integrating data preprocessing and enhancement, latent space representation learning modules, and 3D spatial clustering modules, this approach addresses the lack of gene interpretability in existing multi-slice data integration and 3D reconstruction frameworks. It achieves accurate identification of the 3D spatial domain and interpretation of gene features, thereby improving the completeness and interpretability of spatial transcriptomics analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- YUNNAN UNIV
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing spatially resolved transcriptomics methods cannot effectively integrate multi-slice data, cannot correct for batch effects across slices, and lack gene interpretability within the three-dimensional reconstruction framework. Consequently, the three-dimensional reconstruction framework lacks gene interpretability and cannot identify spatially specific genes in complex tissue structures.
A spatial domain identification method is adopted, including data preprocessing, data augmentation, latent space representation learning module and 3D spatial clustering module. Multi-slice data is integrated through graph neural network to identify 3D spatial domain labels, and gene contribution is calculated through classifier submodule to generate SVGs list.
It improves the integrity and interpretability of multi-slice spatial transcriptome analysis, enables accurate identification of three-dimensional spatial domains and interpretation of gene features, and enhances the robustness and generalization ability of the model to spatial expression patterns.
Smart Images

Figure CN122201435A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of spatial gene expression data analysis technology, and in particular to an interpretable spatial domain identification method in spatial transcriptomics. Background Technology
[0002] Spatial-resolved transcriptomics, by measuring gene expression while preserving spatial location, overcomes the critical limitation of single-cell RNA sequencing, which loses spatial information due to tissue dissociation. This spatial information is crucial for understanding cellular microenvironment function. Current spatial-resolved transcriptomics technologies are mainly divided into two categories: sequencing platforms that provide full transcriptome coverage, and imaging methods that achieve single-cell resolution for specific genes. The rapid development and widespread application of these technologies have generated massive amounts of complex data, necessitating a computational framework capable of integrating data from multiple platforms to analyze complex tissue structures.
[0003] One of the core challenges of spatially resolved transcriptomics analysis is identifying spatial domains, i.e., groups of sequencing sites with similar transcriptomic profiles and spatial proximity. Early methods integrated gene expression and spatial information through graph autoencoders to learn low-dimensional representations of individual slices. Subsequent methods introduced contrastive learning or multimodal information to enhance discriminative capabilities. However, these methods are all geared towards single slices and cannot correct for batch effects or achieve alignment across multiple slices, limiting the comparison of different samples and conditions, as well as the final 3D reconstruction. To address this, some methods focus on cross-slice integration and batch correction, aligning multiple 2D slices to a shared latent space. While this achieves coordination and comparison of multi-slice data, its output is still primarily a 2D embedding and does not explicitly reconstruct a physically continuous 3D tissue model.
[0004] Further approaches aim to achieve higher-fidelity 3D tissue structure modeling and characterize tissue morphology, but they have not yet solved the problem of how to define these newly discovered 3D domains at the molecular level, i.e., to identify the active domain-specific spatially variable genes within them. Various tools have been developed for identifying spatially specific genes, but most are geared towards single slices, identifying genes with global spatial patterns, and have not been integrated with multi-slice structures or 3D reconstruction frameworks. This results in a lack of gene interpretability in 3D reconstruction frameworks, and spatially specific gene discovery tools cannot be directly applied to complex 3D multi-slice structures. Therefore, a comprehensive method that can unify 3D structure reconstruction and gene feature interpretation is urgently needed. Summary of the Invention
[0005] To overcome the shortcomings of existing technologies, the present invention aims to provide an interpretable spatial domain identification method for spatial transcriptomics, which aims to improve the integrity and interpretability of multi-slice spatial transcriptomics analysis.
[0006] The first aspect of this invention provides an interpretable spatial domain identification method for spatial transcriptomics, comprising: acquiring an initial multi-slice spatial transcriptome dataset; performing data preprocessing on the initial multi-slice spatial transcriptome dataset to obtain a preprocessed dataset; performing data augmentation on the preprocessed dataset to obtain an augmented dataset; acquiring a pre-trained spatial domain identification model, the spatial domain identification model including a latent spatial representation learning module and a three-dimensional spatial clustering module, the latent spatial representation learning module being connected to the three-dimensional spatial clustering module; performing neighborhood aggregation processing on the augmented dataset based on the latent spatial representation learning module to obtain a target latent spatial embedding; performing clustering processing on the target latent spatial embedding based on the three-dimensional spatial clustering module to obtain a three-dimensional spatial domain label; introducing a classifier submodule into the latent spatial representation learning module to obtain an updated latent spatial representation learning module; and performing gene contribution calculation processing on the three-dimensional spatial domain label based on the updated latent spatial representation learning module to obtain a list of SVGs (spatial variable genes).
[0007] A second aspect of the present invention provides an interpretable spatial domain identification device for spatial transcriptomics, comprising: a preprocessing module for acquiring an initial multi-slice spatial transcriptome dataset and performing data preprocessing on the initial multi-slice spatial transcriptome dataset to obtain a preprocessed dataset; an enhancement module for performing data enhancement processing on the preprocessed dataset to obtain an enhanced dataset; a model acquisition module for acquiring a pre-trained spatial domain identification model, the spatial domain identification model including a latent space representation learning module and a three-dimensional spatial clustering module, the latent space representation learning module being connected to the three-dimensional spatial clustering module; a neighborhood aggregation module for performing neighborhood aggregation processing on the enhanced dataset based on the latent space representation learning module to obtain a target latent space embedding; a clustering module for performing clustering processing on the target latent space embedding based on the three-dimensional spatial clustering module to obtain three-dimensional spatial domain labels; and a gene contribution calculation module for introducing a classifier submodule into the latent space representation learning module to obtain an updated latent space representation learning module, performing gene contribution calculation processing on the three-dimensional spatial domain labels based on the updated latent space representation learning module to obtain a list of SVGs.
[0008] A third aspect of the present invention provides an interpretable spatial domain identification device for spatial transcriptomics, the interpretable spatial domain identification device for spatial transcriptomics comprising: a memory and at least one processor, the memory storing instructions; at least one processor invokes the instructions in the memory to cause the interpretable spatial domain identification device for spatial transcriptomics to perform the steps of the interpretable spatial domain identification method for spatial transcriptomics as described in any of the preceding claims.
[0009] A fourth aspect of the present invention provides a computer-readable storage medium storing instructions that, when executed by a processor, implement the steps of the interpretable spatial domain identification method for spatial transcriptomics described in any of the preceding claims.
[0010] In the technical solution of this invention, an initial multi-slice spatial transcriptome dataset is obtained; the initial multi-slice spatial transcriptome dataset is preprocessed to obtain a preprocessed dataset; the preprocessed dataset is augmented to obtain an augmented dataset; a pre-trained spatial domain recognition model is obtained, which includes a latent spatial representation learning module and a three-dimensional spatial clustering module. The latent spatial representation learning module is connected to the three-dimensional spatial clustering module. Based on the latent spatial representation learning module, neighborhood aggregation processing is performed on the augmented dataset to obtain a target latent spatial embedding; based on the three-dimensional spatial clustering module, clustering processing is performed on the target latent spatial embedding to obtain three-dimensional spatial domain labels; a classifier sub-module is introduced into the latent spatial representation learning module to obtain an updated latent spatial representation learning module; based on the updated latent spatial representation learning module, gene contribution calculation processing is performed on the three-dimensional spatial domain labels to obtain an SVGs list. This aims to improve the completeness and interpretability of multi-slice spatial transcriptome analysis. Attached Figure Description
[0011] Figure 1 A logical flowchart for interpretable spatial domain identification in spatial transcriptomics provided in this embodiment of the invention; Figure 2 An overall flowchart of interpretable spatial domain identification in spatial transcriptomics provided for embodiments of the present invention; Figure 3 The results of this invention are shown in the dataset of the human dorsolateral prefrontal cortex provided in the embodiments of the present invention; Figure 4 The image shows the integration and spatially specific gene identification results of the cross-platform mouse olfactory bulb dataset provided in this embodiment of the invention. Figure 5 The figure shows the results of cross-species three-dimensional reconstruction and spatiotemporal transcriptome mapping analysis provided in the embodiments of the present invention; Figure 6 A schematic diagram of the spatial domain device for spatial transcriptomics provided in an embodiment of the present invention. Detailed Implementation
[0012] This invention provides an interpretable spatial domain identification method for spatial transcriptomics. In this invention, the terms "first," "second," "third," "fourth," etc. (if present) in the specification, claims, and accompanying drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0013] For ease of understanding, the specific process of the embodiments of the present invention is described below. Please refer to [link / reference]. Figure 1 One embodiment of the spatial domain identification method for interpretable spatial transcriptomics in this invention includes: 101. Obtain the initial multi-slice spatial transcriptome dataset, and perform data preprocessing on the initial multi-slice spatial transcriptome dataset to obtain the preprocessed dataset; In this embodiment, an initial multi-slice spatial transcriptome dataset is obtained. This dataset contains multiple slices and a gene expression matrix corresponding to each slice. Each gene expression matrix contains multiple sequencing points and their corresponding spatial coordinates and multiple genes. During data preprocessing, the Euclidean distance between each spatial coordinate is first calculated to construct a directed adjacency matrix corresponding to each slice. By default, each sequencing point is connected to its k nearest neighbors. Then, symmetry processing is performed on each directed adjacency matrix to obtain multiple undirected sparse adjacency matrices. For the multi-slice data, each undirected sparse adjacency matrix is block-diagonal concatenated to form a global adjacency matrix. The global adjacency matrix is then symmetrically normalized to obtain a normalized adjacency matrix, which serves as the spatial graph input for the subsequent graph neural network. Simultaneously, preset gene filtering rules were obtained, and genes were filtered based on these rules to remove genes detected in fewer than 50 sequencing sites or with a total count of less than 10, resulting in multiple filtered genes corresponding to each sequencing site. Spatial transcriptome analysis tools were used to calculate the variability parameters of each filtered gene, and based on these parameters, 2000 highly variable genes with significant variability were selected from the multiple filtered genes. These highly variable genes were then standardized to obtain a high-dimensional expression matrix. Subsequently, principal component analysis (PCA) was used to reduce the dimensionality of the high-dimensional expression matrix, retaining the first 200 principal components to obtain a low-dimensional input feature matrix. Finally, the normalized adjacency matrix and the low-dimensional input feature matrix were integrated to obtain the preprocessed dataset. This preprocessing workflow effectively captures the spatial proximity between sequencing points by constructing spatial adjacency relationships in an adaptive graph neural network, providing reliable spatial constraints for subsequent neighborhood aggregation. Multi-step gene processing eliminates sequencing depth differences and low expression noise, cross-slice gene processing ensures the consistency of gene sets across different slices, and highly variable gene screening focuses on genes with significant biological variations, improving analytical sensitivity. PCA dimensionality reduction reduces data dimensionality while preserving key expression features, reducing the computational complexity of subsequent modeling, providing high-quality standardized input for spatial domain identification models, and enhancing the model's adaptability to multi-slice data and the reliability of analysis results.
[0014] 102. Perform data augmentation on the preprocessed dataset to obtain an augmented dataset; In this embodiment, when performing data augmentation on the preprocessed dataset, a complementary masking strategy is adopted. First, a mask subset is randomly sampled from the set of spatial location indices, and then its corresponding complementary subset is defined. This ensures that the two subsets are mutually exclusive and cover all spatial locations. The masking probability is typically set to 0.5, meaning that approximately half of each spatial location is masked. Next, the low-dimensional input feature matrix is subjected to mask label replacement processing based on the mask subsets. If a spatial location belongs to a mask subset, its feature vector is replaced by a learnable mask label; otherwise, the original features are preserved, resulting in a mask feature matrix. Similarly, the low-dimensional input feature matrix is subjected to mask label replacement processing based on the complementary subsets to obtain a complementary mask feature matrix. Finally, the mask feature matrix, the complementary mask feature matrix, and the normalized adjacency matrix are integrated to form the augmented dataset. This complementary masking design ensures that each spatial location is masked in one view but not in another, making each spatial location both a prediction target and a context provider. Simultaneously, spatial adjacency remains unchanged during masking, allowing graph convolution to aggregate information from its unmasked neighbors and infer mask content based on local spatial consistency. This data augmentation strategy maximizes data utilization, provides rich self-supervised signals through complementary views, enhances the model's robustness and generalization ability to spatial representation patterns, and preserves spatial adjacency, ensuring the effectiveness of information aggregation by the graph neural network.
[0015] 103. Obtain a pre-trained spatial domain recognition model, wherein the spatial domain recognition model includes a latent space representation learning module and a three-dimensional spatial clustering module, and the latent space representation learning module is connected to the three-dimensional spatial clustering module; In this embodiment, a pre-trained spatial domain recognition model is obtained. This model adopts the iSpa3D (interpretable spatial 3D framework) interpretable deep learning framework. The model consists of a latent space representation learning module and a 3D spatial clustering module, which form a hierarchical overall architecture. The latent space representation learning module is the core feature learning unit of the model, which integrates a graph encoding submodule, a graph decoder submodule, a clustering head submodule, a de-batching submodule, and an optimization submodule. The graph encoding submodule is responsible for receiving the augmented dataset and realizing the deep fusion and aggregation of gene expression features and spatial adjacency relationships. The graph decoder submodule completes the reconstruction representation of spatial structure based on the aggregated features. The clustering head submodule realizes the initial feature clustering assignment mapping. The de-batching submodule performs batch effect elimination and correction for multi-slice data. The optimization submodule completes the joint tuning of parameters of each submodule based on the multi-task loss signal. The 3D spatial clustering module is the downstream execution unit of the architecture, which is directly connected to the output of the latent space representation learning module and is used to perform 3D spatial structure construction and spatial domain clustering operations.
[0016] The model training first involves independent self-supervised pre-training of the latent space representation learning module. The training process integrates multi-task losses, including reconstruction loss, clustering loss, triplet loss, and cross-entropy loss. The parameters of the graph encoding submodule, graph decoder submodule, clustering head submodule, and de-batching submodule are iteratively updated by optimizing submodules until the loss metrics of each submodule converge. This ensures that the module learns a stable latent space embedding that integrates spatial information, eliminates batch differences, and possesses biological representation capabilities. After the latent space representation learning module completes pre-training and achieves stable representation results, the training process for the 3D spatial clustering module is initiated, using latent space representation... The target latent space embedding output by the learning module serves as the core input. First, a 3D registration unit is trained. Based on the target latent space representation, biological anchor points are determined. The root mean square deviation (RMSD) between the spatial coordinates of the anchor points is calculated to construct a rigid transformation function. The function is solved using singular value decomposition (SVD) to obtain the rotation matrix and translation vector. The registration parameters are iteratively optimized using the minimization of the RMS deviation as the loss function, achieving precise alignment of all slices to a unified 3D coordinate system. Next, a Gaussian mixture model clustering unit is trained. Combining the aligned 3D spatial coordinates with the target latent space embedding, the clustering parameters are optimized with the maximization of the silhouette coefficient as the objective, completing the initial spatial domain partitioning and clustering model training. After the 3D spatial clustering module training converges, it is structurally integrated with the pre-trained latent space representation learning module to complete the construction of the pre-trained model. The modular architecture design of the iSpa3D interpretable deep learning framework enables synergistic interaction between feature learning and 3D clustering tasks. The multi-task self-supervised pre-training of the latent space representation learning module ensures the quality of the target latent space embedding, effectively eliminating batch differences among multiple slices and fusing both spatial and gene expression information.
[0017] 104. Based on the latent space representation learning module, perform neighborhood aggregation processing on the augmented dataset to obtain the target latent space embedding; In this embodiment, when performing neighborhood aggregation processing on the augmented dataset based on the latent space representation learning module, the feedforward neural network of the graph coding submodule first performs two layers of nonlinear transformation on the mask feature matrix. Projected features are obtained through exponential linear units and batch normalization. Then, the graph convolutional network, combined with a normalized adjacency matrix, performs neighborhood aggregation on the projected features. Spatial dependency features are extracted using linear rectified units and batch normalization to generate the initial latent space embedding. Two views generated based on a complementary masking strategy are then used to obtain the preset viewpoint embedding. Subsequently, the graph decoder submodule performs feature reconstruction on the initial latent space embedding based on the normalized adjacency matrix. The PCA feature space is reconstructed through graph convolutional layers and an identity activation function. The scaling cosine error loss is calculated as the reconstruction loss, constraining the model to infer the masked features from spatial adjacency. The clustering head submodule performs multilayer perceptron mapping on the preset viewpoint embedding to generate a clustering assignment matrix. The semantic distribution of the complementary views is aligned through cluster contrast loss, while KL (Kullback-Leibler) divergence constraint with average assignment probability is used to avoid cluster collapse. The debater module uses the nearest neighbor algorithm to mine anchor triples from the initial latent space embedding, constructs triples, and calculates the triple loss. Simultaneously, it calls the discriminator to perform batch prediction on the initial latent space embedding and calculates the cross-entropy loss to eliminate batch effects. The optimization module updates the graph coding module parameters using an alternating min-max strategy based on a weighted overall objective function of reconstruction loss, clustering loss, triple loss, and cross-entropy loss, until the model converges. The final output is a target latent space embedding that integrates spatial information and gene expression features while remaining batch-invariant. By deeply fusing spatial adjacency relationships and gene expression features through a graph convolutional network, complementary masking strategies and clustering contrastive learning enhance the model's robustness to spatial expression patterns. Nearest neighbor triple alignment and adversarial debater effectively eliminate batch differences between multi-slice data, enabling the target latent space embedding to possess spatial consistency, biological discriminative power, and batch invariance.
[0018] 105. Based on the three-dimensional spatial clustering module, cluster the target latent space embedding to obtain a three-dimensional spatial domain label; In this embodiment, when clustering the target latent space embedding based on the 3D spatial clustering module, biological anchor points are first determined by finding the nearest neighbors of the target latent space embedding in the latent space. This ensures that the alignment process is driven by transcriptomic similarity rather than just physical proximity. Then, the spatial coordinates of the anchor points corresponding to each biological anchor point are obtained from the initial multi-slice spatial transcriptomic dataset. Next, the root mean square deviation between the spatial coordinates of each anchor point is calculated, and a rigid transformation function to minimize this deviation is constructed. This function is then analytically solved using a singular value decomposition tool to obtain the optimal rotation matrix and translation vector. All slices are registered to a unified 3D coordinate system, resulting in the 3D spatial coordinates corresponding to each sequencing point. Finally, a clustering analysis tool is used to perform clustering operations by combining the aligned 3D spatial coordinates with the target latent space embedding, generating 3D spatial domain labels. This progressive registration scheme based on latent representation effectively avoids alignment biases that rely solely on physical proximity by using transcriptome similarity-driven anchor point determination, thus improving the biological rationality of multi-slice 3D reconstruction. Singular value decomposition ensures the optimality of rigid transformation, making slice registration more accurate. The clustering strategy combining 3D spatial coordinates and target latent space embedding fully integrates spatial location information and gene expression characteristics, significantly improving the accuracy and biological interpretability of 3D spatial domain labels.
[0019] 106. Introduce a classifier submodule into the latent space representation learning module to obtain an updated latent space representation learning module. Based on the updated latent space representation learning module, perform gene contribution calculation on the three-dimensional spatial domain labels to obtain an SVGs list.
[0020] In this embodiment, a classifier submodule is introduced into the latent space representation learning module to obtain an updated latent space representation learning module. First, a two-layer multilayer perceptron structure classifier submodule is constructed. The input of the classifier submodule is connected to the output of the graph coding submodule. The target latent space embedding output by the graph coding submodule is used as input, and the pre-classification probability of each sequencing point corresponding to different three-dimensional spatial domain labels is output. Then, the cross-entropy loss function is used, with the three-dimensional spatial domain labels as supervision signals, to jointly fine-tune the parameters of the graph coding submodule and the classifier submodule without freezing the pre-training parameters of the graph coding submodule. The updated latent space representation learning module is formed through parameter iterative optimization. Based on this updated latent space representation learning module, gene contribution calculations are performed on 3D spatial domain labels to obtain an SVG list. The core process involves using 3D spatial domain labels as supervision to perform supervised fine-tuning of the target latent space embedding, generating a fine-tuned latent space embedding that more accurately captures domain-specific features. The target domain is determined based on the 3D spatial domain labels, and its spatial neighborhood set is defined. A spatial neighborhood baseline set is constructed using the feature mean of sequencing points within this set as a benchmark. Three complementary gradient attribution algorithms—integral gradient, masking, and gradient Sharpe ratio—are used in conjunction with the spatial neighborhood baseline set to perform feature-level attribution calculations on the fine-tuned latent space embedding, obtaining an attribution score set, which is then processed using PCA. The loading matrix backprojects the feature-level attribution scores to the original gene space, generating gene-level attribution scores. After L2 standardization of the gene-level attribution scores, the difference between the average attribution scores of the target domain and the set of spatial neighboring domains is calculated to obtain the differential attribution scores. The Borda counting method is used to fuse the differential attribution score ranking results of the three attribution algorithms to generate gene consensus scores. Finally, the preset screening rules are obtained, and the genes in the spatial neighboring baseline set are screened according to the mean plus 1.5 times the standard deviation after logarithmic transformation, or the Top-K screening or elbow rule screening strategy based on the descending consensus scores. The genes in the spatial neighboring baseline set are screened according to the gene consensus scores corresponding to the attribution score set, and finally the SVGs list is obtained. Joint fine-tuning achieved domain-specific optimization of latent space embedding, making feature representations more closely aligned with the biological characteristics of the three-dimensional spatial domain, providing a high-quality foundation for gene contribution analysis. Baseline sets constructed based on spatial neighborhood domains overcame the limitations of global baselines, making attribution analysis more consistent with the actual situation of the local spatial microenvironment. The fusion of three gradient attribution algorithms and the application of Borda counting improved the robustness and objectivity of gene contribution quantification results, avoiding biases caused by single algorithms. The dual-track screening rule balanced the biological specificity and screening flexibility of SVGs, achieving deep synergy between three-dimensional spatial domain identification and domain-specific gene mining, filling the technical gap in gene interpretability within the three-dimensional reconstruction framework, and effectively improving the completeness, accuracy, and biological interpretability of spatial transcriptomics analysis.
[0021] To comprehensively verify the performance of the spatial domain identification method in spatial transcriptomics, this embodiment constructs a multi-dimensional evaluation index system covering clustering accuracy, spatial coherence, and batch integration quality. The effectiveness and generalization ability of the method are verified through high-resolution gene expression matrix datasets from multiple species, multiple technology platforms, and multiple biological scenarios.
[0022] In assessing clustering accuracy, the adjusted RAND index is used to quantify the consistency between the predicted spatial domain labels and the true labels. This index measures the similarity between the two partitions after correcting for random consistency, and its calculation formula is as follows: ,in, For the real category The middle was predicted to be a cluster The number of sequencing sites, For the real category Total number of sequencing sites included. To predict clustering The total number of sequencing sites, This represents the total number of sequencing sites. This is a formula for calculating combinations, representing the number of combinations from... The number of combinations of two sequencing sites selected from each sequencing site. To further quantify the information-theoretic quality of clustering, normalized mutual information, homogeneity, and integrity scores are calculated simultaneously. The formula for calculating normalized mutual information is as follows: Among them, entropy Depend on Calculations show that Indicates the true label, This represents the predicted cluster assignment. Represents conditional entropy. The entropy representing the true label. Entropy represents the predicted cluster assignment; the formulas for calculating homogeneity and integrity are respectively... and Homogeneity measures whether each cluster contains sequencing points of a single true class, while completeness measures whether all sequencing points of a given true class are assigned to the same cluster. These three metrics are aggregated into a comprehensive score, calculated using the following formula: This allows for a balanced assessment of clustering quality from a multi-information perspective.
[0023] In terms of spatial coherence assessment, two complementary metrics are used to quantify the domain allocation consistency of spatially neighboring sequencing points. The first metric is CHAOS (spatial discontinuity index), which quantifies the average distance from sequencing points within the same cluster to their nearest neighbors. Its calculation formula is as follows: ,in, This represents the total number of sequencing sites. sequencing sites The cluster to which it belongs, and Sequencing points and The spatial coordinates. The second indicator is the alignment purity score, which is calculated using the following formula: ,in, sequencing sites The corresponding spatial domain label, sequencing sites The set of ten nearest neighbor sequencing sites, This is an indicator function; it takes a value of 1 when the condition within the parentheses is true, and 0 otherwise. This indicator is defined as the percentage of sequencing points whose spatial domain label differs from at least six of their ten nearest neighbors' sequencing site labels. A lower value indicates higher spatial homogeneity in the local neighborhood. The CHAOS score is aggregated with the alignment purity score to form a comprehensive measure of spatial discontinuity, calculated using the following formula: The lower the value of this indicator, the stronger the spatial coherence of the spatial domain allocation.
[0024] In batch integration quality assessment, the local inverse Simpson index is used to quantify the local mixing entropy in the latent space, thereby evaluating the effectiveness of removing batch effects while preserving biological variation. The batch local inverse Simpson index and the domain local inverse Simpson index are combined to form the F-score, calculated using the following formula: Among them, the standardized batch local inverse Simpson index Standardized domain local inverse Simpson index , The total number of batches. The total number of spatial domains, This represents the batch-local inverse Simpson index; a higher value indicates better batch mixing of different slices. The F-score represents the domain-local inverse Simpson index. A lower value indicates that the biological structure is more completely preserved. A higher F-score indicates that the method achieves better batch integration results without losing biological signals.
[0025] To further verify the effectiveness and generalization ability of the method, such as Figure 2 As shown in section F, this embodiment applies the method to a diverse set of high-resolution gene expression matrix datasets covering multiple species, technology platforms, and complex biological backgrounds. This dataset includes the human dorsolateral prefrontal cortex dataset from the 10x Genomics Visium platform, which provides the gold standard for cortical layer identification, such as… Figure 3As shown, the slides from the three donors were manually annotated at the cortical level and spatial domain. The integration and alignment effects of the embeddings were visualized using UMAP (Uniform Manifold Approximation and Projection). Box plots compared the performance of various methods in clustering and batch integration. Cross-slice integration results verified the consistency of spatial domain recognition, and also demonstrated the spatial domain recognition and the original expression and denoising effects of labeled genes. The dataset includes a mouse embryonic developmental atlas dataset and a salamander telencephalon maturation dataset generated by Stereo-seq, used to evaluate the method's ability to handle morphological changes; a large-scale anatomical atlas dataset of 35 coronal slices of the entire mouse brain and a sagittal brain slice analysis dataset, used to verify the method's adaptability to different anatomical perspectives; a high-resolution targeted gene dataset from the mouse hypothalamus using MERFISH (multiple in situ fluorescence hybridization) to test the method's performance in high-resolution targeted gene analysis; and a mouse primary olfactory bulb dataset integrated from the Stereo-seq (Spatial Transcriptome Sequencing) and Slide-seqV2 platforms, used to rigorously test the method's technical robustness, such as... Figure 4 As shown, using the biological structure of mouse olfactory bulb laminar tissue as a reference, this method effectively integrates data from different platforms and preserves rare populations, generating laminar structures consistent with biological reality, outperforming several baseline methods. Furthermore, the spatially variable genes identified by this method exhibit high spatial specificity, effectively enhancing the expression contrast of layer-specific marker genes. The method also includes a high-resolution dataset from the STARmap PLUS (Spatially-resolved Transcript Amplicon Readout Mapping PLUS) dataset from an Alzheimer's disease mouse model, used to validate the method's practicality in capturing neurodegenerative changes at different disease stages in a pathological context. Figure 5As shown, spatial domain recognition and 3D reconstruction of the hippocampus were achieved on a single-cell resolution mouse brain dataset. Comparison with various baseline methods validated the significant advantages of this method in clustering accuracy and batch integration performance. Furthermore, marker genes in different spatial domains were identified, demonstrating their 3D expression patterns and denoising effects. On a cross-species salamander brain dataset, the superior performance of this method in spatial domain recognition during juvenile and adult stages was also verified. In addition, the spatially variable genes identified by this method exhibit high spatial specificity, and the expression patterns of classic marker genes were verified through 3D visualization, fully demonstrating the practicality and interpretability of the method in pathological contexts and cross-species scenarios. These diverse datasets provide robust and multifaceted benchmarks for verifying the method's ability to perform 3D reconstruction and interpretable analysis in a wide range of real-world scenarios, fully demonstrating the technical feasibility and practical application value of this method.
[0026] In this embodiment of the invention, the initial multi-slice spatial transcriptome dataset includes multiple slices and a gene expression matrix corresponding to each slice. The gene expression matrix includes multiple sequencing points and spatial coordinates and multiple genes corresponding to each sequencing point. The data preprocessing of the initial multi-slice spatial transcriptome dataset to obtain a preprocessed dataset includes: calculating the Euclidean distance between each spatial coordinate to construct a directed adjacency matrix corresponding to each slice; performing symmetry processing on each directed adjacency matrix to obtain multiple undirected sparse adjacency matrices; performing block diagonal concatenation on each undirected sparse adjacency matrix to obtain a global adjacency matrix, and processing the global adjacency matrix... Symmetric normalization is performed to obtain a normalized adjacency matrix; a preset gene filtering rule is obtained, and the genes are filtered based on the gene filtering rule to obtain multiple filter genes corresponding to each sequencing point; the variability parameter of each filter gene is calculated using a spatial transcriptome analysis tool, and multiple highly variable genes are screened from the multiple filter genes based on the variability parameter, and the multiple highly variable genes are normalized to obtain a high-dimensional expression matrix; principal component analysis (PCA) is used to reduce the dimensionality of the high-dimensional expression matrix to obtain a low-dimensional input feature matrix; the normalized adjacency matrix and the low-dimensional input feature matrix are integrated to obtain the preprocessed dataset.
[0027] In this embodiment, the initial multi-slice spatial transcriptome dataset contains multiple slices and a gene expression matrix corresponding to each slice. The gene expression matrix contains multiple sequencing points and their corresponding spatial coordinates and genes. During data preprocessing, the Euclidean distance between each spatial coordinate is first calculated. A spatial adjacency graph is constructed for each slice, connecting each spatial location to its k nearest neighbors (k is 6 by default). A directed adjacency matrix is defined; if position j is one of the k nearest neighbors of position i, the corresponding element is 1; otherwise, it is 0. Symmetry processing is performed on each directed adjacency matrix by adding it to its transpose to obtain an undirected sparse adjacency matrix, ensuring bidirectional adjacency relationships while maintaining the sparseness of the matrix. For the multi-slice data, a hierarchical strategy is adopted, performing block diagonal concatenation on each undirected sparse adjacency matrix. The adjacency matrix of each slice is placed on the main diagonal of the global adjacency matrix, and non-block diagonals are set to zero, forming a structure as shown below. Figure 2 The global adjacency matrix shown in Part A is then subjected to symmetric normalization, and the specific calculation formula is as follows: , in, Represents the global adjacency matrix The result obtained after performing symmetric normalization. This represents the identity matrix, used to introduce self-loops to ensure that each sequencing point retains its own characteristics when clustering in its neighborhood. The degree matrix is represented by... Diagonal matrices of the same dimension The inverse square root matrix of the degree matrix is used to perform weighted normalization on the adjacency relationship, balancing the differences in the number of adjacencies of different sequencing points and preventing sequencing points with a large number of adjacencies from dominating the aggregation process. This normalization operation can make the feature distribution of the adjacency matrix more stable, adapting to the spatial graph input requirements of subsequent graph neural networks.
[0028] Next, a pre-defined gene filtering rule was obtained, which removes genes detected in fewer than 50 sequencing sites or with a total count of less than 10. Genes were filtered based on this rule, resulting in multiple filter genes corresponding to each sequencing site. Spatial transcriptome analysis tools, such as Seurat v3, were used to calculate the variability parameter for each filter gene. Based on this variability parameter, 2000 highly variable genes with significant variation were selected from the multiple filter genes. These highly variable genes were then z-score normalized to obtain a high-dimensional expression matrix, eliminating scale differences in gene expression levels and highlighting genes with significant biological variation. Principal component analysis (PCA) was used to reduce the dimensionality of the high-dimensional expression matrix, retaining the first 200 principal components, resulting in... Figure 2The low-dimensional input feature matrix shown in Part A reduces data dimensionality while preserving key expression features, thus reducing the computational complexity of subsequent modeling. Finally, the normalized adjacency matrix and the low-dimensional input feature matrix are integrated to obtain the preprocessed dataset. By constructing the normalized adjacency matrix, the spatial proximity relationships between sequencing points are effectively captured, providing reliable spatial constraints for subsequent graph neural networks. Multi-step gene processing operations can remove low-expression noise genes, focusing on highly variable genes with significant biological variation, improving analytical sensitivity. Standardization and PCA dimensionality reduction operations eliminate scale differences in gene expression levels, reduce data dimensionality, and improve model training efficiency and stability.
[0029] In this embodiment of the invention, the step of performing data augmentation on the preprocessed dataset to obtain an augmented dataset includes: obtaining a mask subset and a complementary subset corresponding to the mask subset; performing mask label replacement processing on the low-dimensional input feature matrix based on the mask subset and the complementary subset respectively to obtain a mask feature matrix and a complementary mask feature matrix; and integrating the mask feature matrix, the complementary mask feature matrix and the normalized adjacency matrix to obtain the augmented dataset.
[0030] In this embodiment, a mask subset and its corresponding complementary subset are obtained. The mask subset is randomly sampled from the set of spatial location indices, and its complementary subset is defined such that the two subsets are disjoint and cover all spatial locations. The masking rate is typically set to 0.5, meaning approximately half of each spatial location is masked. Based on the mask subset and the complementary subset, masking label replacement processing is performed on the low-dimensional input feature matrix, constructing a mask as follows: Figure 2 The mask feature matrix shown in part A and complementary mask feature matrix ,in, Represents the set of real numbers. Indicates the number of sequencing sites. This represents the feature dimension after PCA dimensionality reduction. For any spatial location... ,like If a feature vector belongs to a subset of the mask, then its feature vector is labeled with a learnable mask symbol. Replacement, i.e. The row equals ,otherwise The The row is equal to the first row of the low-dimensional input feature matrix. Okay. By integrating the mask feature matrix, complementary mask feature matrix, and normalized adjacency matrix, an augmented dataset is obtained. The normalized adjacency matrix remains unchanged during the masking process, enabling graph convolution to aggregate information from unmasked neighbors and infer mask content based on local spatial consistency. This complementary masking strategy maximizes data utilization; each spatial location is masked in one view but not in another, thus each spatial location serves as both a prediction target and a context provider, offering rich self-supervised signals to the model. Keeping the normalized adjacency matrix unchanged ensures that the graph neural network can effectively utilize spatial adjacency information to aggregate features, improving the model's robustness and generalization ability to spatial representation patterns. The augmented dataset provides more challenging and informative training data for subsequent latent space representation learning, enhancing the model's ability to represent multi-slice spatial transcriptome data and contributing to the learning of more spatially consistent and biologically discriminative feature representations.
[0031] In this embodiment of the invention, the latent space representation learning module includes a graph encoding submodule, a graph decoder submodule, a clustering head submodule, a debater submodule, and an optimization submodule. The graph encoding submodule is connected to the graph decoder submodule, the clustering head submodule, the debater submodule, and the optimization submodule, respectively. The graph encoding submodule includes a feedforward neural network and a graph convolutional network, and the feedforward neural network and the graph convolutional network are connected. The debater submodule includes a discriminator. The step of performing neighborhood aggregation processing on the augmented dataset based on the latent space representation learning module to obtain the latent space embedding includes: performing nonlinear transformation processing on the mask feature matrix based on the feedforward neural network to obtain projected features; performing neighborhood aggregation processing on the projected features and the normalized adjacency matrix based on the graph convolutional network to obtain an initial latent space embedding; and obtaining a preset viewpoint embedding based on the initial latent space embedding. The graph decoder submodule performs feature reconstruction processing on the initial latent space embedding based on the normalized adjacency matrix to obtain a reconstructed feature matrix, and calculates the reconstruction loss based on the reconstructed feature matrix. The clustering head submodule performs clustering assignment mapping processing on the preset viewpoint embedding to obtain a clustering assignment matrix, and calculates the clustering contrast loss based on the clustering assignment matrix to obtain the clustering loss. The de-batching submodule uses the MNN algorithm to mine anchor triples in the initial latent space embedding to construct triples, and calculates the triplet loss based on the triples. The discriminator is called to perform batch prediction processing on the initial latent space embedding to obtain the cross-entropy loss. The optimization submodule updates the parameters of the graph coding submodule based on the reconstruction loss, the clustering loss, the triplet loss, and the cross-entropy loss, so that the updated graph coding submodule outputs the target latent space embedding.
[0032] In this embodiment, when performing neighborhood aggregation processing on the augmented dataset based on the latent space representation learning module, the feedforward neural network of the graph coding submodule first performs nonlinear transformation processing on the mask feature matrix, with the initial feature matrix as the input. , The mask feature matrix is used to obtain the projected features through a two-layer feedforward neural network. The specific calculation expression is as follows: , in, Represents the projected features. This indicates the number of layers in the feedforward neural network and has a value of 2. It is the activation function of the exponential linear unit. This indicates a batch of standardized operations. and They represent the first The trainable weight matrix and bias vector of the layer.
[0033] Subsequently, the Graph Convolutional Network (CGN) performs neighborhood aggregation based on projected features and the normalized adjacency matrix, first through the formula... After completing the first round of neighborhood aggregation, the formula is then used. Generate initial latent space embedding ,in, Indicates the number of sequencing sites. Represents the dimension of the latent space. This represents the activation function of the linear rectifier unit. Represents the normalized adjacency matrix. , , and These are the trainable weight matrices and bias vectors of two layers of the graph convolutional network, respectively. At the same time, based on the processing results of the complementary mask view, a preset view embedding is obtained from the initial latent space embedding. The preset view includes two types of latent space embeddings corresponding to the mask view and the complementary mask view.
[0034] Next, the graph decoder submodule performs feature reconstruction processing on the initial latent space embedding based on the normalized adjacency matrix, using the formula... Get as Figure 2 The reconstructed feature matrix shown in part B ,in, This indicates a batch of standardized operations. and The weight matrix and bias vector are trainable for the graph decoder. Represents the normalized adjacency matrix. The initial latent space embedding is represented, and the reconstruction loss is calculated based on the reconstructed feature matrix. The scaled cosine error loss function is used, and the loss is calculated only on the mask subset. The constraint model infers the masked features through the spatial neighborhood.
[0035] The clustering head submodule performs clustering assignment mapping on the preset view embeddings using a formula. Obtain the clustering assignment matrix ,in, Indicates the preset perspective. The mapping function representing the cluster heads. Indicates perspective The corresponding latent space embedding matrix, Represents the Gaussian error linear unit activation function. , and , These are the trainable weight matrix and bias vector of the first two layers of the multilayer perceptron for clustering. Represents the set of real numbers. Indicates the number of sequencing sites. Represented as the number of cluster categories, Indicates sequencing point From the preset perspective The middle belongs to clustering The assignment probability is then calculated; the cluster contrast loss is then calculated, and the specific expression is: , in, To and Complementary perspectives Indicates perspective Mid-cluster Standardized prototype vector, Indicates perspective Mid-cluster Standardized prototype vector, Indicates perspective Mid-cluster Standardized prototype vector, To iterate through the index of all cluster categories, Represents the cosine similarity function. Representing the temperature parameter, while minimizing the average distribution vector. With uniform distribution The KL divergence is used to avoid cluster collapse and ultimately obtain the cluster loss.
[0036] like Figure 2 As shown in parts C and D, the batch de-embedding submodule uses the MNN algorithm to mine anchor triples in the initial latent space embedding, which are the spatial locations of anchor points in the source slice. Matching MNN positive samples in the target slice And sample negative samples After constructing the triplet, the formula is used. Calculate the triplet loss, where, Indicates the loss of the triplet. This represents the marginal value (default value is 1). Represents the cosine similarity function. , , These represent the initial latent space embeddings corresponding to the anchor point, positive sample, and negative sample, respectively. To obtain positive values, the discriminator is simultaneously invoked to perform batch prediction processing on the initial latent space embedding. The discriminator is a three-layer multilayer perceptron structure, determined by the formula... Calculate the cross-entropy loss, where, This represents the cross-entropy loss of batch prediction. This indicates the total number of data slices in the batch. Indicates the first The sequencing point corresponds to the first... b Each batch of uniquely coded labels, The feature mapping function of the discriminator. Represents the normalization function. Indicates the first The latent space embedding of each sequencing site is the batch probability vector output by the discriminator. This indicates the first element in the vector. The probability value corresponding to each batch.
[0037] Finally, the optimization submodule, based on reconstruction loss, clustering loss, triplet loss, and cross-entropy loss, uses the formula... Construct the overall objective function, where, This represents the overall objective loss during model training. , , and To balance the hyperparameters of each loss term, the above values are default values. An alternating min-max strategy is used to iteratively update the parameters of the graph coding submodule. During the training cycle, the discriminator parameters are updated first to minimize the cross-entropy loss, and then the graph coding submodule parameters are updated to maximize the loss while simultaneously minimizing the reconstruction loss, clustering loss, and triplet loss. A warm-up period is set to delay the activation of the clustering loss and triplet loss to avoid premature cluster collapse. Anchor triplets are dynamically refreshed every 50 training cycles to adapt to the feature changes in the latent space until the overall objective function converges, so that the updated graph coding submodule outputs a target latent space embedding that combines spatial and biological features.
[0038] In this embodiment, the neighborhood aggregation process uses a graph convolutional network as its core carrier, combined with a feedforward neural network to achieve deep fusion of gene expression features and spatial adjacency information. The feature reconstruction loss of the masked view constrains the effectiveness of model learning from the perspective of spatial dependence. The refined design of the clustering contrast loss formula and KL divergence regularization achieve precise alignment of the clustering semantics of the two complementary views, fundamentally avoiding the clustering collapse problem. The synergistic application of MNN anchor triple loss and adversarial debatch loss effectively eliminates the batch effect of multi-slice data while fully preserving the true variation features at the biological level. The weighted balance design and alternating optimization strategy of the overall objective function take into account the constraint priority of each task loss, ensuring the stability and convergence efficiency of model training. The final output target latent space embedding simultaneously possesses spatial consistency, biological discriminative power, and batch invariance, providing high-quality feature input for the subsequent 3D spatial clustering module and significantly improving the accuracy of 3D spatial domain recognition.
[0039] In this embodiment of the invention, the step of clustering the target latent space embedding based on the three-dimensional spatial clustering module to obtain a three-dimensional spatial domain label includes: determining multiple biological anchor points based on the target latent space embedding using the three-dimensional spatial clustering module, and obtaining the anchor point spatial coordinates corresponding to each biological anchor point based on the initial multi-slice spatial transcriptome dataset; calculating the root mean square deviation between the spatial coordinates of each anchor point to construct a rigid transformation function; solving the rigid transformation function using a singular value decomposition tool to obtain a rotation matrix and a translation vector; registering all slices to the same three-dimensional coordinate system based on the rotation matrix and the translation vector to obtain the three-dimensional spatial coordinates corresponding to each sequencing point; and clustering the multiple three-dimensional spatial coordinates and the target latent space embedding using a clustering analysis tool to obtain the three-dimensional spatial domain label.
[0040] In this embodiment, when processing the target latent space embedding based on the 3D spatial clustering module to determine biological anchors, the target latent space embedding is used to mine the nearest neighbor relationships in the latent space to determine the biological anchors. This ensures that the matching of anchors is driven by transcriptomic similarity rather than just physical proximity. Subsequently, the spatial coordinates of the anchors corresponding to each biological anchor are extracted from the initial multi-slice spatial transcriptomic dataset, completing the basic acquisition of anchor coordinates. Next, the root mean square deviation between the spatial coordinates of matching anchors in any two adjacent slices is calculated to construct a rigid transformation function, which has the form of: ,in, This represents a rigid transformation used to achieve continuous two-dimensional slice alignment. Represents a rotation matrix, used to describe the rotation transformation relationship of spatial coordinates. The translation vector represents the displacement of spatial coordinates. The core objective of this function is to minimize the root mean square deviation (RMSD) of all matched anchor points, thereby finding the optimal rigid transformation parameters that minimize slice alignment error. For this optimization problem of minimizing RMSD, the singular value decomposition (SVD) of the cross-covariance matrix is used for analytical solution. SVD, by orthogonally decomposing the cross-covariance matrix constructed from the spatial coordinates of the anchor points, can efficiently and accurately solve for the optimal rotation matrix and translation vector, avoiding the accumulated errors caused by numerical iteration and mathematically ensuring the accuracy and stability of the rigid transformation solution. After obtaining the rotation matrix and translation vector, all slices are uniformly registered to the same three-dimensional coordinate system based on this optimal rigid transformation. The three-dimensional spatial coordinates of each sequencing point in the unified three-dimensional space are generated one by one, achieving accurate mapping of multi-slice two-dimensional spatial information to three-dimensional space, and completing the initial reconstruction of the three-dimensional tissue structure. Finally, clustering analysis tools, such as the mclust package (v6.0.1) in R software, were used to perform clustering by taking the three-dimensional spatial coordinates corresponding to multiple sequencing points and the target latent space embedding as joint inputs. At the same time, the clustering performance was evaluated by using a variety of evaluation metrics, and finally, three-dimensional spatial domain labels were generated.
[0041] In this embodiment, biological anchor points are determined through latent spatial nearest neighbor mining, driving the slice alignment process from the perspective of transcriptome similarity. This effectively avoids alignment bias caused by relying solely on physical proximity, significantly improving the biological rationality and accuracy of multi-slice 3D registration. The construction of rigid transformation functions and the analytical solution of singular value decomposition ensure the accuracy of solving rotation matrices and translation vectors, achieving precise alignment of all slices to a unified 3D coordinate system, laying a solid spatial foundation for the realistic reconstruction of the 3D tissue microenvironment. The strategy of clustering by combining 3D spatial coordinates and target latent space embedding fully integrates the spatial location information of sequencing points and gene expression feature information. Compared with clustering methods that rely on only single information, this significantly improves the accuracy and biological interpretability of 3D spatial domain partitioning. The professional clustering capabilities of the R software mclust package, combined with a performance evaluation system of multiple evaluation metrics, further ensures the reliability and comprehensiveness of the clustering results, avoiding the limitations of a single clustering algorithm or evaluation metric.
[0042] In this embodiment of the invention, introducing a classifier submodule into the latent space representation learning module to obtain an updated latent space representation learning module includes: constructing the classifier submodule; connecting the input of the classifier submodule to the output of the graph coding submodule; using the cross-entropy loss function; and fine-tuning the parameters of the graph coding submodule and the classifier submodule based on the target latent space embedding output by the graph coding submodule to obtain the updated latent space representation learning module.
[0043] In this embodiment, as Figure 2Part E illustrates the process of introducing a classifier submodule into the latent space representation learning module and updating the latent space representation learning module. First, a classifier submodule is constructed, employing a multilayer perceptron structure consisting of two layers of linear transformations and linear rectified unit activation functions. The input of the classifier submodule is connected to the output of the graph coding submodule, using the target latent space embedding output from the graph coding submodule. As input, first use the formula Calculate the logarithmic output of the softmax function, where, This represents the learnable weight matrix of the first layer of the classifier submodule. This represents the learnable bias vector of the first layer of the classifier submodule. This represents the activation function of the linear rectifier unit, used to introduce nonlinear transformations to enhance the model's expressive power. This represents the learnable weight matrix of the second layer of the classifier submodule. This represents the learnable bias vector of the second layer of the classifier submodule. This indicates the total number of sequencing sites. This represents the total number of spatial domains. This represents the logarithmic output matrix, where the first... The element in row c corresponds to the element in column c. The logarithmic probability of each sequencing site in the c-th spatial domain. Then, the softmax function is applied to the logarithmic output matrix. Normalization is performed to obtain the neighborhood probability prediction. , The first in the matrix Line number The elements of the column represent the first... The sequencing site at the ... The predicted probabilities of each spatial domain are calculated. Then, the cross-entropy loss function is used to calculate the cross-entropy loss between the predicted probabilities and the 3D spatial domain labels. Using the 3D spatial domain labels as supervision signals, the classifier submodule and the graph coding submodule are jointly trained without freezing the pre-training parameters of the graph coding submodule. The parameters of the graph coding submodule and the classifier submodule are fine-tuned through backpropagation, further optimizing the target latent space embedding and better capturing domain-specific gene expression patterns. This ultimately yields an updated latent space representation learning module. By introducing the classifier submodule and employing a joint fine-tuning strategy, the supervision signals from the 3D spatial domain are effectively integrated into the latent space representation learning process, making the target latent space embedding more domain-discriminative and biologically specific. Not freezing the pre-training parameters of the graph coding submodule preserves the spatial and gene expression features learned during pre-training while strengthening the extraction capability of domain-specific features through supervised fine-tuning. The cross-entropy loss, as an explicit supervision constraint, improves the model's classification accuracy in the spatial domain, providing a more accurate feature basis for subsequent gene contribution analysis, enhancing the model's interpretability and biological application value, and laying a solid foundation for the identification of spatial domain-specific marker genes.
[0044] In this embodiment of the invention, the step of calculating gene contribution of the three-dimensional spatial domain label based on the updated latent space representation learning module to obtain an SVGs list includes: fine-tuning the target latent space embedding according to the three-dimensional spatial domain label based on the updated latent space representation learning module to obtain a fine-tuned latent space embedding; determining the spatial neighborhood set of the target domain of the three-dimensional spatial domain label, and constructing a spatial neighborhood baseline set based on the spatial neighborhood set; using a gradient attribution algorithm to perform feature-level attribution calculation on the fine-tuned latent space embedding based on the spatial neighborhood baseline set to obtain an attribution score set; obtaining a preset screening rule, and performing gene screening processing on the genes of the spatial neighborhood baseline set based on the screening rule and the attribution score set to obtain the SVGs list.
[0045] In this embodiment, based on the latent space representation learning module, the target latent space embedding is fine-tuned according to the 3D spatial domain label to obtain the fine-tuned latent space embedding. After fine-tuning, the target domain is determined based on the 3D spatial domain label. In the 2D organizational coordinate space, the set of spatial neighboring domains of the target domain is determined based on the radius standard. If a sufficient proportion of the spatial location of a certain neighborhood u is within the spatial location R of the target domain c (R is set by the characteristics of the dataset), then neighborhood u is included in the set of spatial neighboring domains of the target domain c. Based on this spatial neighborhood set, a spatial neighborhood baseline set is constructed, and the mean value of the PCA dimensionality reduction input features of all sequencing points within each neighborhood u in the spatial neighborhood set is calculated. All Integrating to form a spatial neighborhood baseline set .
[0046] Next, three complementary gradient attribution algorithms—integral gradient, masking, and gradient Sharpe ratio—are employed, combined with a spatial neighborhood baseline set, to perform feature-level attribution calculations on the fine-tuned latent space embedding. The integral gradient algorithm is implemented using the formula... Calculate the attribution score, where, Indicates the first The sequencing site number Integral gradient attribution score for each feature dimension Indicates the first The sequencing site number PCA dimensionality reduction eigenvalues, The first represents the baseline input features of the spatial neighborhood baseline set. Mean of each dimension Indicates the step size parameter. For interpolation vectors, Represents the target domain The corresponding spatial neighborhood baseline mean vector is obtained by taking the mean of all baseline input features in the spatial neighborhood baseline set. Indicates the first The complete PCA dimensionality reduction input feature vector for each sequencing site This indicates that the classifier submodule performs a certain operation on the interpolated vector in the target domain. Logarithmic output on This indicates that the logarithmic output is relative to the th The partial derivatives of each feature dimension are approximated by integrals using numerical methods; the occlusion algorithm uses the formula... Calculate the attribution score, where, Indicates the first The sequencing site number Masking attribution scores for each feature dimension Indicates the first The sequencing site number Each feature dimension is replaced with The eigenvectors after gradient; the Sharpe ratio algorithm uses the formula Calculate the attribution score, where, Indicates the first The sequencing site number Gradient Sharpe ratio attribution score for each feature dimension The expected computation operator is represented by the sampled mean of the baseline input features and step size parameter of the spatial neighborhood baseline set.
[0047] Since attribution calculations are performed in the PCA feature space, the matrix is loaded via PCA. The feature-level attribution scores are back-projected to the original gene space, where... This represents the feature dimension after PCA dimensionality reduction. This represents the total number of genes, yielding the gene-level attribution score for each attribution algorithm. ,in, ; Perform gene-level attribution vector analysis for each spatial location After standardization, through formula Calculate the target domain Intragene The average attribution score, where, Represents the target domain The number of sequencing sites within, Indicates the first The L2 norm of gene-level attribution vectors at each sequencing site; calculated using the formula... Genes within the computational spatial neighborhood set The average attribution score, where, This represents the total number of sequencing points in the spatial neighborhood set; then, using the formula... Calculate differential attribution scores and quantify genes. The contribution difference between the target domain and the spatial neighborhood domain was analyzed; the Borda counting method was used to fuse the difference attribution scores of the three algorithms for ranking, and the result was determined by the formula. Obtaining genes The consensus attribution score, among which, Indicates gene In the algorithm The ranking is determined by taking the highest score when the gene is ranked 1st, and the consensus attribution scores of all genes are integrated to form an attribution score set.
[0048] Then, two preset screening rules are obtained. The first type is an adaptive threshold screening based on score distribution, which performs a logarithmic transformation on the consensus attribution scores to stabilize the variance and calculates the mean of the logarithmically transformed scores. with standard deviation Filter out those whose scores exceed the logarithmic transformation. The genes, among which This is a user-defined parameter that controls the strictness of the selection. Larger... This value leads to fewer but more discriminative spatially variable genes; in practice, we typically set... To balance sensitivity and specificity. The second type is ranking-based screening, which prioritizes genes according to consensus attribution scores. The number of Top-K genes can be manually specified to select the top K genes, or the elbow rule can be used to automatically determine the optimal cutoff point. Based on the above screening rules, the screening is completed from the genes corresponding to the spatial neighborhood baseline set, and finally the list of SVGs is obtained.
[0049] In this embodiment, a supervised fine-tuning operation using three-dimensional spatial domain labels allows for the embedding of deep-adaptive domain-specific features into the latent space, laying a precise feature foundation for gene contribution analysis. A baseline set constructed based on spatial proximity domains eliminates biologically meaningless global baselines, enabling attribution analysis to accurately measure gene contributions relative to the local tissue microenvironment, thus improving the biological rationality of the attribution results. The combined application of three complementary gradient attribution algorithms, along with a Borda counting fusion strategy, effectively avoids the limitations of a single algorithm, enhancing the robustness and objectivity of gene contribution quantification results. A dual-track screening rule ensures the biological specificity of SVGs through adaptive thresholds while providing flexible application options through ranking screening. The resulting SVG list possesses robustness, specificity, and biological interpretability, accurately identifying specific marker genes in the three-dimensional spatial domain. This provides core support for functional analysis of three-dimensional tissue structures, mining of domain-specific biological mechanisms, and research on spatial expression patterns of related diseases.
[0050] The above describes the interpretable spatial domain identification method for spatial transcriptomics in embodiments of the present invention. The following describes the interpretable spatial domain identification device for spatial transcriptomics in embodiments of the present invention. One embodiment of the interpretable spatial domain identification device for spatial transcriptomics in embodiments of the present invention includes: Preprocessing module: used to obtain the initial multi-slice spatial transcriptome dataset, and to preprocess the initial multi-slice spatial transcriptome dataset to obtain the preprocessed dataset; Enhancement module: used to perform data augmentation processing on the preprocessed dataset to obtain an enhanced dataset; Model acquisition module: used to acquire a pre-trained spatial domain recognition model, which includes a latent space representation learning module and a three-dimensional spatial clustering module, and the latent space representation learning module is connected to the three-dimensional spatial clustering module; Neighborhood aggregation module: used to perform neighborhood aggregation processing on the augmented dataset based on the latent space representation learning module to obtain the target latent space embedding; Clustering module: used to perform clustering processing on the target latent space embedding based on the three-dimensional space clustering module to obtain three-dimensional space domain labels; Gene contribution calculation module: used to introduce a classifier submodule into the latent space representation learning module to obtain an updated latent space representation learning module, and to perform gene contribution calculation on the three-dimensional spatial domain labels based on the updated latent space representation learning module to obtain an SVGs list.
[0051] Based on the same ideas as the methods in the above embodiments, the apparatus provided in this application can implement the methods in the above embodiments.
[0052] The above describes in detail the interpretable spatial domain identification device for spatial transcriptomics in the embodiments of the present invention from the perspective of modular functional entities. The following describes in detail the interpretable spatial domain identification device for spatial transcriptomics in the embodiments of the present invention from the perspective of hardware processing.
[0053] Figure 6 This is a schematic diagram of the structure of a spatial transcriptomics interpretable spatial domain identification device 500 provided in an embodiment of the present invention. The spatial transcriptomics interpretable spatial domain identification device 500 can vary considerably due to different configurations or performance. It may include one or more central processing units (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532. The memory 520 and storage media 530 can be temporary or persistent storage. The program stored in the storage media 530 may include one or more modules (not shown in the diagram), each module may include a series of instruction operations on the spatial transcriptomics interpretable spatial domain identification device 500. Furthermore, the processor 510 may be configured to communicate with the storage media 530 and execute a series of instruction operations in the storage media 530 on the spatial transcriptomics interpretable spatial domain identification device 500 to implement the steps of the spatial transcriptomics interpretable spatial domain identification method provided in the above-described method embodiments.
[0054] The spatial domain identification device 500 for interpretability in spatial transcriptomics may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input / output interfaces 560, and / or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will understand that... Figure 6 The illustrated spatial transcriptomics interpretability spatial domain identification device structure does not constitute a limitation on spatial transcriptomics interpretability spatial domain identification devices, and may include more or fewer components than illustrated, or combine certain components, or have different component arrangements.
[0055] The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium, wherein the computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the steps of an interpretable spatial domain identification method for spatial transcriptomics.
Claims
1. A method for interpretable spatial domain identification in spatial transcriptomics, characterized in that, include: An initial multi-slice spatial transcriptome dataset is obtained, and the initial multi-slice spatial transcriptome dataset is preprocessed to obtain a preprocessed dataset. The preprocessed dataset is augmented to obtain an augmented dataset; A pre-trained spatial domain recognition model is obtained, the spatial domain recognition model including a latent space representation learning module and a three-dimensional spatial clustering module, the latent space representation learning module being connected to the three-dimensional spatial clustering module; Based on the latent space representation learning module, the augmented dataset is subjected to neighborhood aggregation processing to obtain the target latent space embedding; The target latent space embedding is clustered based on the three-dimensional spatial clustering module to obtain the three-dimensional spatial domain label; A classifier submodule is introduced into the latent space representation learning module to obtain an updated latent space representation learning module. Based on the updated latent space representation learning module, gene contribution calculation is performed on the three-dimensional spatial domain labels to obtain an SVGs list.
2. The method for interpretable spatial domain identification in spatial transcriptomics according to claim 1, characterized in that, The initial multi-slice spatial transcriptome dataset includes multiple slices and a gene expression matrix corresponding to each slice. The gene expression matrix includes multiple sequencing points and spatial coordinates and multiple genes corresponding to each sequencing point. The process of preprocessing the initial multi-slice spatial transcriptome dataset to obtain a preprocessed dataset includes: Calculate the Euclidean distance between each of the spatial coordinates to construct a directed adjacency matrix corresponding to each of the slices; Symmetric processing is performed on each of the directed adjacency matrices to obtain multiple undirected sparse adjacency matrices; Each of the undirected sparse adjacency matrices is subjected to block diagonal concatenation to obtain a global adjacency matrix, and the global adjacency matrix is subjected to symmetric normalization to obtain a normalized adjacency matrix. Obtain preset gene filtering rules, and perform gene filtering processing on the genes based on the gene filtering rules to obtain multiple filtered genes corresponding to each sequencing point; The variability parameters of each of the filtering genes were calculated using spatial transcriptomics analysis tools. Based on the variability parameters, multiple highly variable genes were screened from the multiple filtering genes, and the multiple highly variable genes were standardized to obtain a high-dimensional expression matrix. Principal component analysis (PCA) was used to reduce the dimensionality of the high-dimensional expression matrix to obtain a low-dimensional input feature matrix. The preprocessed dataset is obtained by integrating the normalized adjacency matrix and the low-dimensional input feature matrix.
3. The method for interpretable spatial domain identification in spatial transcriptomics according to claim 2, characterized in that, The step of performing data augmentation on the preprocessed dataset to obtain an augmented dataset includes: Obtain a subset of the mask and a complementary subset corresponding to the subset of the mask; Based on the mask subset, the complementary subset is used to perform mask label replacement processing on the low-dimensional input feature matrix to obtain the mask feature matrix and the complementary mask feature matrix; The enhanced dataset is obtained by integrating the mask feature matrix, the complementary mask feature matrix, and the normalized adjacency matrix.
4. The method for interpretable spatial domain identification in spatial transcriptomics according to claim 3, characterized in that, The latent space representation learning module includes a graph encoding submodule, a graph decoder submodule, a clustering head submodule, a debater submodule, and an optimization submodule. The graph encoding submodule is connected to the graph decoder submodule, the clustering head submodule, the debater submodule, and the optimization submodule, respectively. The graph encoding submodule includes a feedforward neural network and a graph convolutional network, which are connected. The debater submodule includes a discriminator. The step of performing neighborhood aggregation processing on the augmented dataset based on the latent space representation learning module to obtain the latent space embedding includes: The mask feature matrix is nonlinearly transformed based on the feedforward neural network to obtain the projected features. Based on the graph convolutional network, the projected features and the normalized adjacency matrix are subjected to neighborhood aggregation processing to obtain the initial latent space embedding, and the preset viewpoint embedding is obtained based on the initial latent space embedding. The graph decoder submodule performs feature reconstruction processing on the initial latent space embedding based on the normalized adjacency matrix to obtain a reconstructed feature matrix, and calculates the reconstruction loss based on the reconstructed feature matrix. Based on the clustering head submodule, clustering allocation mapping is performed on the preset view embedding to obtain a clustering allocation matrix, and clustering contrast loss is calculated based on the clustering allocation matrix to obtain the clustering loss; Based on the de-batching submodule, the MNN algorithm is used to mine anchor triples in the initial latent space embedding to construct triples, and the triple loss is calculated based on the triples. The discriminator is invoked to perform batch prediction processing on the initial latent space embedding to obtain cross-entropy loss; Based on the optimization submodule, the parameters of the graph coding submodule are updated according to the reconstruction loss, the clustering loss, the triplet loss, and the cross-entropy loss, so that the updated graph coding submodule outputs the target latent space embedding.
5. The method for interpretable spatial domain identification in spatial transcriptomics according to claim 2, characterized in that, The process of clustering the target latent space embedding based on the three-dimensional spatial clustering module to obtain three-dimensional spatial domain labels includes: Based on the three-dimensional spatial clustering module, multiple biological anchor points are determined according to the target latent space embedding, and the spatial coordinates of the anchor point corresponding to each biological anchor point are obtained based on the initial multi-slice spatial transcriptome dataset. Calculate the root mean square deviation between the spatial coordinates of each anchor point to construct a rigid transformation function; The rigid transformation function is solved using singular value decomposition to obtain the rotation matrix and translation vector. Based on the rotation matrix and the translation vector, all slices are registered to the same three-dimensional coordinate system to obtain the three-dimensional spatial coordinates corresponding to each sequencing point; Clustering analysis tools are used to cluster multiple three-dimensional spatial coordinates and the target latent space embedding to obtain the three-dimensional spatial domain label.
6. The method for interpretable spatial domain identification in spatial transcriptomics according to claim 4, characterized in that, The step of introducing a classifier submodule into the latent space representation learning module to obtain an updated latent space representation learning module includes: Construct the classifier submodule; The input of the classifier submodule is connected to the output of the graph coding submodule. The cross-entropy loss function is used to fine-tune the parameters of the graph coding submodule and the classifier submodule based on the target latent space embedding output by the graph coding submodule, so as to obtain the updated latent space representation learning module.
7. The method for interpretable spatial domain identification in spatial transcriptomics according to claim 1, characterized in that, The gene contribution calculation of the three-dimensional spatial domain labels based on the updated latent space representation learning module yields a list of SVGs, including: Based on the updated latent space representation learning module, the target latent space embedding is fine-tuned according to the three-dimensional spatial domain label to obtain the fine-tuned latent space embedding. Determine the set of spatial neighboring domains of the target domain of the three-dimensional spatial domain label, and construct a set of spatial neighboring domain baselines based on the set of spatial neighboring domains; A gradient attribution algorithm is used to perform feature-level attribution calculation on the fine-tuned latent space embedding based on the spatial neighborhood baseline set, and an attribution score set is obtained. Obtain preset filtering rules, and perform gene filtering on the genes of the spatial neighborhood baseline set according to the attribution score set based on the filtering rules to obtain the SVGs list.
8. A spatial domain identification device for interpretable spatial transcriptomics, characterized in that, include: Preprocessing module: used to obtain the initial multi-slice spatial transcriptome dataset, and to preprocess the initial multi-slice spatial transcriptome dataset to obtain the preprocessed dataset; Enhancement module: used to perform data augmentation processing on the preprocessed dataset to obtain an enhanced dataset; Model acquisition module: used to acquire a pre-trained spatial domain recognition model, which includes a latent space representation learning module and a three-dimensional spatial clustering module, wherein the latent space representation learning module is connected to the three-dimensional spatial clustering module; Neighborhood aggregation module: used to perform neighborhood aggregation processing on the augmented dataset based on the latent space representation learning module to obtain the target latent space embedding; Clustering module: used to perform clustering processing on the target latent space embedding based on the three-dimensional space clustering module to obtain three-dimensional space domain labels; Gene contribution calculation module: used to introduce a classifier submodule into the latent space representation learning module to obtain an updated latent space representation learning module, and to perform gene contribution calculation on the three-dimensional spatial domain labels based on the updated latent space representation learning module to obtain an SVGs list.
9. A spatial domain identification device for interpretable spatial transcriptomics, characterized in that, The spatial transcriptomics interpretable spatial domain identification device includes: a memory and at least one processor, wherein the memory stores instructions; At least one of the processors invokes the instructions in the memory to cause the spatial transcriptomics interpretable spatial domain identification device to perform the steps of the spatial transcriptomics interpretable spatial domain identification method as described in any one of claims 1-7.
10. A computer-readable storage medium storing instructions thereon, characterized in that, When the instructions are executed by the processor, they implement the steps of the interpretable spatial domain identification method for spatial transcriptomics as described in any one of claims 1-7.