An integrated analysis method for multi-slice spatial transcriptome data
By constructing a unified representation learning framework and a bi-branch graph convolutional network, and combining contrastive learning and adversarial training, the problems of error accumulation and batch effect in multi-slice spatial transcriptome data analysis were solved, achieving high-precision and reliable cross-slice consistency analysis, and improving biological interpretability and clinical application value.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- YUNNAN OPEN UNIV
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-19
AI Technical Summary
Existing stepwise analysis methods suffer from problems such as error accumulation, insufficient information utilization, difficulty in eliminating batch effects, and inconsistency in results across multiple tasks when processing multi-slice spatial transcriptome data.
An integrated analysis method for multi-slice spatial transcriptome data is adopted. By constructing a unified representation learning framework, combining spatial adjacency matrix and expression adjacency matrix, a bi-branch graph convolutional network is used for feature extraction. Cross-slice alignment is achieved through contrastive learning and adversarial training to eliminate batch effects, and the output cell type composition ratio, spatial domain category and denoised gene expression matrix are output.
It significantly improves the accuracy and biological interpretability of multi-slice spatial transcriptome data analysis, enhances cross-slice consistency and clinical application value, and strengthens the model's ability to characterize and robustly represent complex tissue microenvironments.
Smart Images

Figure CN122245442A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the interdisciplinary field of bioinformatics and computational biology, and in particular to an integrated analysis method for multi-slice spatial transcriptome data. Background Technology
[0002] Spatial transcriptomics (ST) technology enables high-throughput measurement of in situ gene expression profiles while preserving tissue spatial location information. Compared to single-cell transcriptome sequencing (scRNA-seq), ST data contains crucial information on cell type composition and spatial organizational patterns, making it irreplaceable for analyzing tissue microenvironments, revealing developmental trajectories, and understanding spatial heterogeneity of diseases (such as the tumor microenvironment). With technological advancements, multi-slice ST datasets from multiple regions of the same tissue, different individuals, or different time points are becoming increasingly common, making integrated analysis a current research frontier and urgent need.
[0003] Analysis of multi-slice ST data typically involves several core computational tasks: (1) Cell type deconvolution: using scRNA-seq reference data to infer the proportion of cell types within each ST spatial spot; (2) Cross-slice alignment: eliminating technical variations caused by experimental batches, slice locations, or individual differences to make data from different slices biologically comparable; (3) Spatial domain identification: dividing tissues into regions with continuous or similar functions based on expression profiles and spatial locations; (4) Expression denoising and reconstruction: overcoming the inherent high sparsity and technical noise of ST data to restore more reliable gene expression signals.
[0004] The current mainstream analytical paradigm is a step-by-step process, which involves performing deconvolution, batch correction, and clustering steps independently first, and then correlating the results. For example, deconvolution is performed first using tools like SPOTlight and RCTD, then batch consolidation is performed using tools like Harmony and Scanorama, and finally spatial clustering is performed using tools like SpaGCN and BayesSpace. This step-by-step approach has significant limitations: Error accumulation and propagation: Errors in upstream steps (such as deconvolution) directly affect the input quality of downstream tasks (such as alignment and clustering), leading to amplified result deviations. Inconsistent representation space: Different steps often learn different data representations, making it difficult to establish consistent and mutually interpretable associations between the final results (such as cell composition and spatial domain); Insufficient information utilization: Most methods only utilize a single source of information from spatial adjacency or gene expression similarity, failing to effectively coordinate local spatial continuity with global expression patterns; The trade-off between batch effect removal and biological signal preservation: Existing batch correction methods may obscure real, subtle biological differences while eliminating technical differences, and are particularly difficult to handle multi-slice data with complex spatial contexts.
[0005] Therefore, there is an urgent need for an integrated analysis method that can jointly optimize all the above tasks within a unified representation learning framework. This method should be able to utilize both spatial and representational information, explicitly handle data noise, and ensure the consistency, accuracy, and biological interpretability of cross-slice analysis results. Summary of the Invention
[0006] To overcome the shortcomings of the prior art, the present invention aims to provide an integrated analysis method for multi-slice spatial transcriptome data, which solves the technical problems of error accumulation, insufficient information utilization, difficulty in eliminating batch effects, and inconsistent results of multiple tasks in the processing of multi-slice spatial transcriptome data by existing stepwise analysis methods.
[0007] The first aspect of this invention provides an integrated analysis method for multi-slice spatial transcriptome data, comprising the following steps: performing gene alignment and normalization preprocessing on the acquired multi-slice spatial transcriptome data and corresponding single-cell transcriptome reference data to construct a unified gene expression feature matrix and spatial coordinate set; for each spatial transcriptome slice, constructing a spatial adjacency matrix based on spatial coordinates and an expression adjacency matrix based on gene expression similarity; performing a random masking operation on the unified gene expression feature matrix to generate a masked matrix and a complementary masking matrix; constructing a bi-branch graph convolutional network including a spatial encoder and an expression encoder, inputting the masked matrix into the spatial encoder and the expression encoder respectively, wherein the spatial encoder uses the spatial adjacency matrix to perform graph convolution operations to extract spatial structure features, and the expression encoder uses the expression adjacency matrix to perform graph convolution operations to extract expression pattern features; fusing the spatial structure features and the expression pattern features to generate a unified low-dimensional latent representation; and constructing a fused spatial adjacency matrix... The model is constructed using the low-dimensional latent representation and the combined adjacency matrix of the expression adjacency matrix. Using the low-dimensional latent representation and the combined adjacency matrix as input, a decoder reconstructs the denoised gene expression matrix, where the reconstruction loss function is calculated only for the masked positions. Based on known cell type or spatial domain labels as prior knowledge, triplet samples across slices are constructed. A contrastive learning loss function is used to bring the latent representations of similar samples closer together and push away the latent representations of dissimilar samples, achieving supervised cross-slice alignment. A discriminator is integrated onto the low-dimensional latent representation, and adversarial training is performed using a gradient inversion layer, enabling the encoder to learn to generate domain-invariant representations that are insensitive to slices or batches. A three-stage training strategy is employed: the first stage trains the discriminator; the second stage fixes the discriminator and jointly trains the encoder and decoder; the third stage fixes the encoder and trains the downstream task prediction head. Based on the trained model, the low-dimensional latent representation is decoded, simultaneously outputting the cell type composition ratio, spatial domain category, and denoised gene expression matrix for each sequencing point.
[0008] A second aspect of this invention provides an integrated analysis device for multi-slice spatial transcriptome data, comprising: a preprocessing module for performing gene alignment and normalization preprocessing on the acquired multi-slice spatial transcriptome data and corresponding single-cell transcriptome reference data to construct a unified gene expression feature matrix and a set of spatial coordinates; an adjacency graph construction module for constructing a spatial adjacency matrix based on spatial coordinates and an expression adjacency matrix based on gene expression similarity for each spatial transcriptome slice; a masking module for performing random masking operations on the unified gene expression feature matrix to generate a masked matrix and a complementary mask matrix; a feature extraction module for constructing a bi-branch graph convolutional network containing a spatial encoder and an expression encoder, inputting the masked matrix into the spatial encoder and the expression encoder respectively, wherein the spatial encoder uses the spatial adjacency matrix to perform graph convolution operations to extract spatial structure features, and the expression encoder uses the expression adjacency matrix to perform graph convolution operations to extract expression pattern features; a feature fusion module for fusing the spatial structure features and the expression pattern features to generate a unified low-dimensional latent representation; and a reconstruction module for using... The system constructs a combined adjacency matrix that integrates the spatial adjacency matrix and the expression adjacency matrix. Using the low-dimensional latent representation and the combined adjacency matrix as input, a decoder reconstructs the denoised gene expression matrix, where the reconstruction loss function is calculated only for the masked positions. An alignment module is used to construct cross-slice triplet samples based on known cell types or spatial domain labels as prior knowledge. A contrastive learning loss function is used to bring the latent representations of similar samples closer together and push away the latent representations of dissimilar samples, achieving supervised cross-slice alignment. An adversarial module is used to connect a discriminator to the low-dimensional latent representation and perform adversarial training through a gradient inversion layer, enabling the encoder to learn to generate domain-invariant representations that are insensitive to slices or batches. A training module is used to train the model using a three-stage strategy: the first stage trains the discriminator; the second stage fixes the discriminator and jointly trains the encoder and decoder; the third stage fixes the encoder and trains the downstream task prediction head. An output module is used to decode the low-dimensional latent representation based on the trained model, and simultaneously output the cell type composition ratio, spatial domain category, and denoised gene expression matrix for each sequencing point.
[0009] A third aspect of the present invention provides an integrated analysis device for multi-slice spatial transcriptome data, comprising: a memory and at least one processor, wherein the memory stores computer-readable instructions, and the memory and the at least one processor are interconnected via a circuit; the at least one processor invokes the computer-readable instructions in the memory to cause the integrated analysis device for multi-slice spatial transcriptome data to perform the various steps of the integrated analysis method for multi-slice spatial transcriptome data as described above.
[0010] A fourth aspect of the present invention provides a computer-readable storage medium storing computer-readable instructions that, when executed on a computer, cause the computer to perform the steps of the integrated analysis method for multi-slice spatial transcriptome data as described above.
[0011] Beneficial Effects: This invention proposes an integrated analysis method for multi-slice spatial transcriptome data. By jointly optimizing cell type deconvolution, cross-slice alignment, spatial domain identification, and expression data denoising within a unified representation learning framework, it achieves the following significant effects compared to existing step-by-step analysis processes: 1. This invention breaks through the limitations of traditional methods that rely solely on single spatial adjacency or expression similarity. It innovatively constructs spatial adjacency graphs and expression similarity graphs, and designs a dual-branch graph convolutional network for collaborative encoding. The spatial encoder focuses on local tissue topological continuity, while the expression encoder captures global transcriptional similarity patterns. The latent representation generated by the fusion of these two methods not only preserves the spatial boundaries of the tissue structure but also identifies spatial... 1. Non-adjacent but biologically homologous regions significantly improve the model's ability to characterize complex tissue microenvironments and its robustness in representation; 2. To address the inherent high sparsity and technical noise of spatial transcriptome data, this invention introduces a random masking strategy and a combined adjacency decoding mechanism, forcing the model to reconstruct masked expression values using neighborhood information and latent structures; this mechanism enables the model to learn to complete missing signals and suppress random noise during training, and the denoised gene expression matrix output is significantly superior to the original data in terms of spatial continuity and expression smoothness, providing a more reliable data foundation for downstream analysis; 3. This invention utilizes cell type or spatial domain labels to construct cross-slice triplet samples, and compares loss functions... By bringing similar samples closer together and separating dissimilar samples, the model eliminates technical differences between slices while preserving true biological structural differences. This supervised alignment strategy overcomes the shortcomings of traditional batch correction methods that easily blur biological boundaries, enabling accurate matching of homologous regions across slices in the latent space, laying the foundation for joint analysis across samples and time points. 4. By introducing an adversarial discriminator and a gradient reversal layer into the latent representation layer, this invention explicitly suppresses batch and slice effects, forcing the encoder to learn domain-invariant representations. Combined with a three-stage decoupled training strategy (first training the discriminator, then jointly optimizing the encoder and decoder, and finally training the downstream task head), it effectively avoids multi-task joint training. The method significantly improves the convergence stability and generalization performance of the model on large-scale real data by addressing gradient conflicts and oscillations. Furthermore, this invention simultaneously outputs cell type deconvolution ratios, spatial domain categories, and denoised expression matrices in a unified low-dimensional latent space. All results originate from the same representation, fundamentally avoiding the problem of contradictory deconvolution, clustering, and alignment results in traditional step-by-step processes. Experiments show that this method outperforms existing mainstream methods in prediction accuracy, spatial continuity, and cross-slice consistency in tasks such as colorectal cancer tumor region identification and dynamic analysis of cardiac development. It provides high-precision and high-consistency technical support for tumor microenvironment analysis, developmental biology research, and disease spatial heterogeneity analysis. This invention systematically solves the core challenges in multi-slice spatial transcriptome data analysis through an integrated modeling strategy, significantly enhancing the biological interpretability and clinical application value of the results while improving analytical accuracy. Attached Figure Description
[0012] Figure 1 A flowchart illustrating the integrated analysis method for multi-slice spatial transcriptome data provided in this embodiment of the invention.
[0013] Figure 2 This is a schematic diagram illustrating the principle of the integrated analysis method for multi-slice spatial transcriptome data of the present invention.
[0014] Figure 3 This is a comparison chart showing the results of the method of the present invention and the comparative method in the task of identifying tumor progression-related regions across slices on multi-slice colorectal cancer spatial transcriptome data.
[0015] Figure 4 This is a visualization of the spatial distribution and gene expression patterns of key cell types at different developmental time points on multi-slice data of human heart development, and a comparison of the predictive consistency of different methods in conjunction with correlation indicators.
[0016] Figure 5 This is a schematic diagram of the integrated analysis device for multi-slice spatial transcriptome data provided in an embodiment of the present invention.
[0017] Figure 6 This is a schematic diagram of the integrated analysis device for multi-slice spatial transcriptome data provided in an embodiment of the present invention. Detailed Implementation
[0018] This invention provides an integrated analysis method for multi-slice spatial transcriptome data. The terms "first," "second," "third," "fourth," etc. (if present) in the specification, claims, and accompanying drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in a sequence other than that illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that includes a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0019] Please see Figure 1 , Figure 1 A flowchart of an integrated analysis method for multi-slice spatial transcriptome data provided by the present invention is shown below. Figure 1 As shown, it includes the following steps: S10. Gene alignment and normalization preprocessing are performed on the acquired multi-slice spatial transcriptome data and the corresponding single-cell transcriptome reference data to construct a unified gene expression feature matrix and spatial coordinate set. Specifically, this embodiment targets multi-slice spatial transcriptomics (ST) data, where each slice consists of several sequencing spots, each containing both gene expression data and spatial coordinates. To achieve unified modeling across slices, this embodiment first performs unified preprocessing on the multi-slice ST data and single-cell transcriptomics reference data (scRNA-seq) to construct a consistent input tensor. As an example, for the first slice... Each slice is defined with its gene expression matrix as follows: And define its spatial coordinate set as: ,in, It is the slice number (i.e., which slice). Indicates the first The number of spots per slice. This represents the number of genes after unification. Representing the The slice The first spot Gene expression values, These are the two-dimensional coordinates of the spot.
[0020] This invention also introduces single-cell reference data with cell type annotations (for deconvolution supervision or prior construction), and aligns and normalizes ST and scRNA-seq on a unified gene set, so that the differences learned by the subsequent model come as much as possible from real biological differences rather than differences in technical dimensions.
[0021] S20. For each spatial transcriptome slice, construct a spatial adjacency matrix based on spatial coordinates, and construct an expression adjacency matrix based on gene expression similarity; Specifically, one of the core characteristics of spatial transcriptomics is that spatially adjacent points often have a continuous organizational structure. Therefore, in this embodiment, a spatial adjacency matrix is constructed for each slice to characterize the local topological relationships between spots: First, the spots are calculated based on their coordinates. and Euclidean distance: ,in, Indicates the first A slice spot and Spatial distance; then, for each spot, select the one with the closest spatial distance. 1. 2 neighbors, construct a spatial adjacency matrix: ,in, It is the number of spatial neighbors. Indicates selection by distance A set of nearest neighbors; ,when Time indicates yes Spatial neighbors; in this way, the spatial adjacency matrix ensures that the model does not destroy the continuous boundaries of the organization when learning representations, denoising, and partitioning, thus reducing the fragmentation of results.
[0022] Furthermore, using only spatial adjacency might miss regions that are not spatially adjacent but have similar expression patterns (e.g., the same tumor subgroup appearing in different locations). Therefore, this embodiment also constructs an expression adjacency matrix to supplement the global expression structure: First, the cosine similarity of the expression vectors of spot is calculated: ,in, It is spot The representation of the feature vector (which can be a normalized or dimensionality-reduced representation), Represent the vector norm; then select the vectors with the highest similarity. Given 10 neighbors, construct an adjacency matrix to represent them. ,in, It expresses the number of neighbors. This represents the set of neighbors selected based on expression similarity. The adjacency matrix is used to allow the model to see similar regions at long distances, which can improve the alignment across slices / regions.
[0023] S30. Perform a random masking operation on the unified gene expression feature matrix to generate a masked matrix and a complementary mask matrix; Specifically, ST data exhibits high sparsity and noise. This embodiment uses random mask reconstruction to teach the model to complete missing representations, thereby achieving noise reduction; for the input feature matrix... (This can refer to a slice or concatenated input), obtained by random masking. Simultaneously construct its complementary mask. , satisfy:
[0024] in, It is the original input feature matrix. This represents the matrix after it has been randomly set to zero (or set to a mask value). Then it represents and Complementary matrix (preserving the original values / or corresponding position indicators of the masked positions), operator " "For element-wise addition, the model must rely on neighborhood structure and latent representation to recover the masked part, thereby learning a more stable and noise-resistant representation."
[0025] S40. Construct a dual-branch graph convolutional network containing a spatial encoder and an expression encoder. Input the masked matrix into the spatial encoder and the expression encoder respectively. The spatial encoder uses the spatial adjacency matrix to perform graph convolution operation to extract spatial structure features. The expression encoder uses the expression adjacency matrix to perform graph convolution operation to extract expression pattern features. Specifically, such as Figure 2 As shown, to simultaneously characterize the local tissue topology and global expression similarity patterns in spatial transcriptome data, this embodiment designs a dual-graph convolutional feature encoder, including a spatial encoder and an expression encoder. The spatial encoder uses a spatial adjacency matrix to capture the local continuity and anatomical boundary information between adjacent spatial points; the expression encoder uses an expression adjacency matrix to connect spatially non-adjacent regions with similar expression patterns, thereby supplementing global structural information. The synergistic effect of both ensures that the latent representation obtained by the model does not destroy spatial continuity and can identify homologous structures across regions and slices, providing a high-quality representation foundation for subsequent denoising reconstruction, cross-slice alignment, and multi-task prediction.
[0026] This invention employs a Graph Convolutional Network (GCN) to perform feature aggregation and encoding on two graphs: the spatial adjacency matrix and the representational adjacency matrix. For any graph, its... layer to the first The propagation update of a layer can be represented as:
[0027] It is the first The feature representation matrix of the nodes (spots) of the layer, where , For the input feature matrix after random masking, in this formula, It is the first The layer can learn a parameter matrix, which is used for linear transformation of features. It is a non-linear activation function (such as ReLU) used to enhance expressive power. Its specific definitions on the spatial graph and the expression graph are as follows: For spatial encoders: Through formula Spatial feature representation ; For expression encoders: Through formula Obtain the representation of expression features .
[0028] in, Represent the adjacency matrix (which can be ) or This reflects the connection relationships between nodes. This represents the identity matrix (self-connected term), used to preserve the node's own information and stabilize training; It is the adjacency matrix after adding self-connections. yes The degree matrix (diagonal matrix) has its diagonal elements being the sum of the connectivity degrees of each node; It is a normalized adjacency matrix, used to mitigate the scale bias caused by nodes with different degrees and ensure stable information propagation.
[0029] Through formulas The spatial encoder, through its multi-layered stacking, aggregates higher-order spatial neighborhood information layer by layer, making the latent representation more consistent with the local continuity and boundary features of the tissue structure. The expression encoder, on the other hand, aggregates similar neighborhood information layer by layer, enabling the latent representation to identify functional regions that are far apart but share consistent transcriptional patterns. Since spatial transcriptome data generally suffers from high noise and sparsity, the dual-graph encoding allows the model to perform robust representation learning using two complementary neighborhood types when faced with local deletions or aberrant expressions. This provides solid representational support for subsequent mask reconstruction denoising, cross-slice contrastive learning alignment, and adversarial de-batch processing.
[0030] S50. The spatial structure features and the expression pattern features are fused to generate a unified low-dimensional latent representation; Specifically, after obtaining spatial structure features and expression pattern features through the spatial encoder and expression encoder respectively, this embodiment further designs a dual-branch feature fusion and combined adjacency decoding module to integrate spatial structure information and expression similarity information in a unified latent space, and to complete the denoising reconstruction of gene expression in this latent space. The core idea of this module is to fuse two types of complementary features in a linearly controllable manner and reconstruct the mask position by combining a dual-graph structure, thereby explicitly suppressing the influence of noise and missing values while ensuring spatial continuity and expression consistency. The dual-branch feature fusion step is as follows: Let the spatial structure features output by the spatial encoder be... The expression mode features of the encoder output are: This invention fuses the two encoding results into a unified low-dimensional latent representation. : in, It is a unified low-dimensional latent representation matrix, where each row corresponds to a low-dimensional latent vector of a spot; and The feature fusion weight coefficients can be preset as constants or learned automatically through backpropagation during training. They are used to dynamically adjust the relative contributions of spatial structure information and representational similarity information in the latent representation.
[0031] S60. Construct a combined adjacency matrix that integrates the spatial adjacency matrix and the expression adjacency matrix; using the low-dimensional latent representation and the combined adjacency matrix as input, reconstruct the denoised gene expression matrix through a decoder, wherein the reconstruction loss function is calculated only for the masked positions; To utilize both spatial adjacency relationships and representational similarity relationships during the decoding phase, this embodiment further constructs a combined adjacency matrix: ,in, It is a spatial adjacency matrix. To express the adjacency matrix; and The combined weights are used to balance the influence of local spatial topology and global representation patterns during the decoding process. Through this combined adjacency matrix, the decoder can propagate information under the combined effect of spatial continuity constraints and representation similarity constraints.
[0032] like Figure 2 As shown, the decoder uses a unified low-dimensional latent representation. and combined adjacency matrix As input, output the reconstructed gene expression matrix. To avoid the model only learning simple identity mappings and to force it to truly utilize neighborhood information for completion, this embodiment only calculates the reconstruction loss at the locations where the model is randomly masked: ,in, This represents the reconstructed gene expression matrix, which is the output after denoising and completion. It is the original gene expression matrix. This represents a mask indicator matrix, where the mask position takes the value of... The value at the non-mask position is .
[0033] Through the aforementioned dual-branch feature fusion and combined adjacency decoding mechanism, this invention integrates spatial structure constraints and representational similarity constraints simultaneously within a unified latent space. By employing a mask reconstruction strategy that only reconstructs occluded locations, the model is forced to utilize neighborhood information and latent representations to complete missing or noisy representations, rather than simply copying the input data. The resulting reconstructed representation matrix... It is more spatially continuous, more smoothly expressed, and more biologically consistent, providing a high-quality, low-noise input foundation for subsequent cross-slice alignment, cell type deconvolution, and spatial domain recognition.
[0034] S70. Based on known cell type or spatial domain labels as prior knowledge, construct triplet samples across slices. By contrastive learning loss function, bring the latent representations of similar samples closer together and push away the latent representations of dissimilar samples, thus achieving supervised cross-slice alignment. In multi-slice spatial transcriptome data analysis, a core challenge is achieving cross-slice alignment of biological structures. Because different slices may originate from different samples, patients, or even time points, significant slice and batch effects exist between the data, causing similar cell types or tissue structures to tend to be separated rather than clustered in the potential representation space. This separation hinders joint analysis and comparison across slices, making it difficult to identify shared spatial domains and cell type distributions.
[0035] Traditional batch calibration methods (such as Harmony and Seurat's CCA) are typically based on statistical distribution matching, but they often struggle to preserve subtle biological differences while eliminating technical variations, and they cannot utilize known biological prior knowledge (such as cell type labels) to guide the alignment process. To address this issue, this embodiment introduces a biological prior contrastive learning module. Its core idea is to use existing cell type or spatial domain labels as supervisory signals, constructing cross-slice contrastive sample pairs to bring samples of the same type (regardless of which slice they come from) closer together in the latent representation space, while simultaneously pushing away samples of different types. This eliminates slice effects while maintaining the resolving power of biological structures.
[0036] Specifically, the biological prior knowledge required for this invention can take one of the following two forms: Cell type labels: These are derived from the cell type annotations in paired single-cell transcriptome reference data (scRNA-seq). During training, they can be indirectly obtained through the supervision signal of the deconvolution task, or by using some annotated spatial points (e.g., through pathologist annotations or reference mappings) as priors. Spatial domain labels: These are derived from artificial pathological annotations of some slides (e.g., tumor regions, normal regions), or region labels obtained through manual correction after preliminary clustering. For each sequencing point, if it has a reliable cell type or spatial domain label, it is used as a labeled sample for contrastive learning; otherwise, it does not participate (or is generated through pseudo-labeling techniques). In practical applications, usually only a subset of labeled samples is needed to drive effective contrastive learning.
[0037] In each training iteration, a triple is constructed for each anchor participating in contrastive learning: anchor ( ): Randomly select a tagged sequencing site from the current batch, and denote its latent representation vector as . The corresponding label is denoted as (Cell type or spatial domain category); Positive samples ( ): Randomly select a slice from other slices that has the same label as the anchor point. = A point in space, whose latent representation vector is denoted as . Positive samples must come from different slices. This is to force the model to learn slice-independent representations, meaning that points of the same type should be close to each other regardless of the slice source.
[0038] negative samples ( ): Randomly select a slice with a different label from any slice (including the slice where the anchor point is located or other slices). ≠ A point in space, whose latent representation vector is denoted as . Negative samples can come from any slice, with the aim of increasing the distinguishability between different types.
[0039] This invention uses triplet margin loss as the objective function for contrastive learning, and its mathematical form is: ,in, These are the latent vectors of the anchor point, positive sample, and negative sample, respectively. It is a spacing parameter that, through biological prior contrastive learning, brings homologous regions across slices closer together and pushes heterologous regions further apart, achieving cross-slice alignment without confusion.
[0040] The intuitive interpretation of this loss function is: we hope that the distance between the anchor point and the positive sample is at least smaller than the distance between the anchor point and the negative sample. If this condition is met, the loss is 0; otherwise, the loss is positive, prompting the model to adjust its parameters to narrow the gap. and or push away and . This embodiment introduces biological prior labels, constructs cross-slice triples, and performs comparative learning to achieve supervised, semantically aware cross-slice alignment in a unified latent representation space. This method not only overcomes the limitations of traditional batch correction methods but also significantly improves the accuracy, consistency, and interpretability of the model in multi-slice joint analysis, which is one of the core technical points of this invention.
[0041] S80. A discriminator is connected to the low-dimensional latent representation, and adversarial training is performed through a gradient inversion layer, so that the encoder learns to generate a domain-invariant representation that is insensitive to slices or batches. In this embodiment, multi-slice spatial transcriptome data are often affected by factors such as experimental procedures, sequencing platforms, sample processing conditions, and slice differences during actual acquisition, resulting in significant batch and slice effects. If these effects are not eliminated, the latent representations learned by the model may be separated according to which slice / batch it comes from rather than the actual biological structure, leading to poor cross-slice alignment, difficulty in sharing spatial domains, and incomparable deconvolution results. Therefore, this embodiment uses a unified low-dimensional latent representation... Introducing an adversarial discriminant Furthermore, it incorporates a gradient reversal layer (GRL) for adversarial training, enabling the encoder to learn a domain-invariant representation that is insensitive to batches / slices.
[0042] Specifically, the discriminator With low-dimensional latent representation The discriminator takes (or its pointwise vector) as input and outputs the predicted probability distribution of each spot belonging to different batch / slice categories; the training objective of the discriminator is to minimize the following cross-entropy loss: ,in, It is the total number of training samples (spots). This is the number of batch / slice categories. For true labels (one-hot format) For the discriminator to spot Category The predicted probability.
[0043] This invention inserts a gradient inversion layer (GRL) between the encoder and the discriminator. During forward propagation, the GRL maintains an identity mapping with respect to the input, enabling the discriminator to... The discriminator learns to distinguish between different slices / batches; during backpropagation, the GRL multiplies the gradients fed back from the discriminator to the encoder by a negative coefficient, thus causing the encoder to optimize in the opposite direction to the discriminator. Intuitively, the discriminator is trained to distinguish slices / batches as clearly as possible, while the encoder is forced to learn representations that make slices / batches indistinguishable. Through this adversarial process, low-dimensional latent representations are obtained. Technical differences related to batches / slices are suppressed, while signals related to real biological structures are preserved, ultimately obtaining domain-invariant representations that are comparable and alignable across slices, providing a stable and reliable foundation for subsequent cross-slice alignment, spatial domain identification, and cell type deconvolution.
[0044] S90. A three-stage strategy is adopted to train the model. The first stage trains the discriminator, the second stage fixes the discriminator and trains the encoder and decoder together, and the third stage fixes the encoder and trains the prediction head for downstream tasks. Specifically, to address the instability issues and mutual interference among multiple loss terms that are prone to occur during adversarial training, this embodiment employs a three-stage training strategy to progressively optimize the model, decoupling the batch / slice removal effect from downstream task prediction. This significantly improves the training stability and generalization performance of the model on multi-slice spatial transcriptome data. The three-stage training process is as follows: Phase 1: Discriminator Training In the first stage, with the encoder and decoder parameters fixed, only the batch / slice discriminator is trained to fully learn the discriminative features related to the batch or slice in the latent representation. The optimization objective of this stage is to minimize the cross-entropy loss of the discriminator. in, To train the total number of spots, For the number of categories in a batch or slice, For genuine one-hot tags, This represents the predicted probability output by the discriminator. Through training in this stage, the discriminator gains the ability to distinguish between different batches or slices, providing reliable gradient signals for subsequent adversarial training.
[0045] Phase Two: Fixing the discriminator and jointly optimizing the encoder and decoder In the second stage, discriminator parameters are fixed, and adversarial learning is implemented on the encoder side through a gradient inversion layer (GRL). Based on this, the encoder and decoder are jointly optimized. The goal of this stage is to minimize a weighted objective function consisting of reconstruction loss, biological contrast loss, and the adversarial loss processed by gradient inversion. in, It is the reconstruction loss The weighting coefficients are used to adjust the importance of the denoising and completion target; It is biological contrast loss The balance coefficient between the loss of combat and the loss of resistance. The representation of mask reconstruction loss constrains the model's ability to complete the masked representation. Representing biological prior contrast loss, it promotes the alignment of homologous regions across slices in the latent space. It is the batch / slice discrimination loss, where the symbol " "" indicates the adversarial direction achieved on the encoder side through gradient inversion (intuitively understood as making slices harder to recognize). Through joint optimization in this stage, the latent representation simultaneously possesses denoising and completion capabilities in the same space (by...). (Driven by), cross-slice homology alignment capability (by) (Driven by) and domain invariance (driven by adversarial terms).
[0046] Phase 3: Multi-task prediction head training In the third stage, the encoder parameters are fixed, and training is performed only on the downstream multi-task prediction module, including the cell type deconvolution branch and the spatial domain recognition branch. This stage minimizes the prediction loss function, enabling the model to obtain reliable task outputs on a stable latent representation space. ,in, Mean squared error, used to measure the numerical difference between the predicted value and the target value. It is the Jensen-Shannon divergence, used to measure the difference between the predicted cell proportion distribution and the reference distribution. This represents the JS divergence loss weight, used to balance numerical precision and distribution consistency.
[0047] Through the above three-stage training strategy, this invention effectively decouples the batch / slice effect from the multi-task prediction objective in the training process, avoids gradient conflicts and oscillations during adversarial training, significantly improves the convergence stability and final prediction performance of the model, and enables it to be stably applied to large-scale, multi-slice spatial transcriptome data analysis scenarios.
[0048] S100: Input the preprocessed multi-slice spatial transcriptome data to be analyzed into the trained model, and output a unified low-dimensional latent representation, the proportion of cell types at each sequencing point, the spatial domain category, and the denoised gene expression matrix.
[0049] Specifically, after three stages of training from steps S10 to S90, this invention obtains a fully optimized integrated analysis model. The core capability of this model is reflected in its encoder's ability to map multi-slice spatial transcriptome data to a unified, cross-slice aligned, batch-effect-free low-dimensional latent representation; the decoder's ability to reconstruct the denoised gene expression matrix from this latent space; and the two prediction branches (deconvolution branch and spatial domain recognition branch)'s ability to directly output the cell type composition and spatial domain category of each spatial point based on this latent representation.
[0050] The purpose of step S100 is to apply this trained model to new or existing multi-slice data, and obtain all the above analysis results through a single forward propagation, thereby providing biologists and clinical researchers with high-quality data products that can be directly used for downstream biological interpretation.
[0051] Before inference, the multi-slice spatial transcriptome data to be analyzed needs to be preprocessed in exactly the same way as during the training phase: Gene alignment: Ensure that the gene set of the data to be analyzed is completely consistent with the unified gene set (G genes) used during training; if the new data contains extra genes, discard them; if some genes are missing, set the expression value of the gene to 0 at all points; Normalization: Use the same normalization method as in the training phase (such as CP10K log normalization) to eliminate sequencing depth differences; Graph Construction: For each slice to be analyzed, construct the spatial adjacency matrix and the expression adjacency matrix independently based on its spatial coordinates and the normalized expression matrix. The construction method is exactly the same as in step S20. Note: The expression graph here is only constructed based on the data within the slice and does not involve cross-slice information, because the model has learned how to use graph structures for encoding during the training phase.
[0052] During the inference phase, random masking is no longer performed. This is because the masking autoencoder task is only used during the training phase to drive the model to learn denoising and completion capabilities. However, during inference, this invention aims to utilize the model's full expressive power to process all input data and directly output high-quality results. Therefore, the encoder's input is the complete, unmasked, normalized representation matrix X.
[0053] Input the prepared data into the trained model and perform a complete forward propagation: Encoder: The complete representation matrix X is passed through the spatial encoder and the representation encoder respectively, and graph convolution is performed using their respective adjacency matrices to output spatial structure features and representation pattern features; then, a unified low-dimensional latent representation Z is obtained through feature fusion. Decoder: The unified low-dimensional latent representation Z is input into the decoder, and information is propagated using the combined adjacency matrix. The output is the reconstructed expression matrix, which is the denoised gene expression matrix.
[0054] Deconvolution branch: The unified low-dimensional latent representation Z is input into the trained deconvolution branch (usually a fully connected layer with an output dimension of the number of cell types). After Softmax normalization, the proportion vector of cell types for each spatial point is obtained.
[0055] Spatial domain recognition branch: The unified low-dimensional latent representation Z is input into the trained spatial domain recognition branch (usually a fully connected layer plus Softmax, with the output dimension being the number of spatial domain categories K), to obtain the probability of each spatial point belonging to each spatial domain, and the category corresponding to the highest probability is taken as the spatial domain label of that point.
[0056] In traditional step-by-step analysis workflows, steps such as deconvolution, batch correction, and clustering run independently, often resulting in conflicting results (such as cell type distribution and spatial domain labels) that are difficult to integrate and interpret. This invention, however, outputs all results simultaneously through a single model, all derived from the same latent representation Z, thus possessing inherent mathematical consistency and biological interpretability.
[0057] In summary, this invention addresses the common problems in practical analysis of multi-slice spatial transcriptome data, such as limited resolution, high data sparsity and noise, significant cross-slice batch effects, and the easy accumulation of errors in step-by-step modeling of existing methods. It proposes an integrated analysis method for multi-slice spatial transcriptome data, which jointly completes cell type deconvolution, cross-slice alignment, spatial domain identification, and gene expression denoising within a unified latent space. This method has significant innovation and practical application value.
[0058] First, this invention introduces a dual-graph structure that combines spatial adjacency graphs and representational similarity graphs at the modeling level. By using a dual-graph graph convolutional encoder, it simultaneously characterizes local spatial topological relationships and global representational similarity patterns. This breaks through the limitations of existing methods that rely solely on a single spatial adjacency or representational similarity graph for construction. This allows the latent representation to maintain the continuity of organizational space while identifying spatially non-adjacent but biologically homologous regional structures, thus improving the ability to characterize complex organizational structures.
[0059] Secondly, this invention explicitly achieves gene expression denoising and completion within the model through mask autoencoding and combined adjacency decoding mechanisms. It only imposes reconstruction constraints on the masked positions, forcing the model to make full use of neighborhood information and potential structures for inference. This effectively alleviates the dropout and technical noise problems that are common in spatial transcriptome data. The output denoised expression matrix is significantly better than traditional preprocessing or postprocessing methods in terms of spatial continuity and biological consistency.
[0060] Furthermore, this invention integrates prior biological information to design a cross-slice comparative learning mechanism, actively bringing together spatial points from different slices that share the same cell type composition or spatial domain, while maintaining the distinguishability between different biological structures, thus achieving automatic alignment of homologous regions across slices. Compared to alignment methods that rely solely on statistical distribution correction, this strategy better preserves the true differences in biological structures while eliminating slice effects.
[0061] Furthermore, this invention introduces an adversarial discriminator and a gradient inversion layer to explicitly suppress batch and slice effects at the latent representation level. This enables the encoder to learn a domain-invariant representation that is insensitive to slice origin but highly sensitive to biological structure, fundamentally improving the comparability of multi-slice spatial transcriptome data in the same latent space and providing a reliable foundation for joint analysis across samples and time points.
[0062] Regarding training strategies, this invention proposes a three-stage training process that decouples discriminator training, joint optimization of denoising and alignment, and downstream multi-task prediction. This effectively reduces instability during adversarial training and gradient conflicts between multiple loss terms, making the model convergence more stable on large-scale real multi-slice data and improving the repeatability and generalizability of engineering implementation.
[0063] Finally, this invention simultaneously outputs cell type deconvolution results, spatial domain partitioning results, and denoised gene expression matrices on a unified latent space, achieving inherent consistency among multi-task results and avoiding the contradictory results found in traditional step-by-step analysis workflows. Through multi-dimensional evaluation metrics, this invention demonstrates significant advantages in deconvolution accuracy, cross-slice alignment, spatial domain reconstruction quality, and spatial cell interaction analysis. It can provide more accurate, stable, and biologically interpretable technical support for various spatial omics applications such as tumor microenvironment analysis, brain stratification research, and dynamic analysis of developmental processes.
[0064] To further verify the effectiveness and stability of the method in real-world data scenarios, a system-wide experimental evaluation was conducted in a deep learning framework-based implementation environment, and the method was compared with several mainstream spatial transcriptome analysis algorithms. The experiments covered different datasets, including multi-slice and single-slice datasets, to test the generalization ability and robustness of the invention under cross-sample and cross-slice conditions. Information on the real-world datasets used is summarized in Table 1.
[0065] Table 1. Summary of information on multi-slice spatial transcriptome and paired single-cell transcriptome datasets.
[0066] This embodiment uses a variety of evaluation indicators and charts to compare and analyze the model performance of each group of data, and conducts quantitative and qualitative evaluations.
[0067] in, Figure 3This paper presents a comparison of the results of the proposed method and a comparative method in identifying tumor progression-related regions across slices of multi-slice colorectal cancer spatial transcriptome data, and provides corresponding performance indicators. The prediction results for representative slices are also visualized. Specifically, A is a schematic diagram of the artificial pathological annotation results of the multi-slice colorectal cancer spatial transcriptome data, showing seven spatial transcriptome slices (slices 1–7) from different patients. Pathologists labeled each spatial point as a normal related region, a warning region, and a cancerous region based on tissue morphology and cellular composition, serving as a reference for subsequent method evaluation. B is a comparison diagram of the identification results of normal related regions. Taking slice 6 as an example, based on the multi-slice spatial transcriptome data, after processing with the proposed combined deconvolution and cross-slice alignment method, the spatial point attributes were obtained. The predicted probability distribution of normal related regions is compared with the results of manual annotation, and the area under the receiver operating characteristic (AUC) index corresponding to different methods is given; C is a comparison of the identification results of warning regions, taking slices 1, 5 and 7 as examples. The warning regions contain a small number of cancer cells. The method of the present invention achieves cross-slice latent representation alignment through dual-image coding, feature fusion and biological prior contrastive learning, thereby accurately identifying such regions, and comparing the prediction results with those of existing methods; D is a comparison of the identification results of cancerous regions, taking slices 3 and 4 as examples. The cancerous regions contain a high proportion of cancer cells. The prediction results obtained after processing by the method of the present invention are superior to the comparison methods in terms of spatial continuity and regional boundary consistency, and the corresponding quantitative AUC evaluation results are given.
[0068] Figure 4This paper presents visualizations of the spatial distribution and gene expression patterns of key cell types at different developmental time points (including comparisons before and after denoising) on multi-slice data of human heart development. The predictive consistency of different methods is compared using correlation indices. Specifically, A shows the spatial distribution of atrial cardiomyocytes at different developmental time points (weeks 5, 6, and 9) in multi-slice spatial transcriptome data. This result is obtained by jointly deconvolving and aligning multiple slices at each time point using the method of this invention, demonstrating the spatial consistency of the same cell type across different developmental stages and slices. B is a schematic diagram of the expression distribution of the atrial cardiomyocyte marker gene MYH6 in the original spatial transcriptome data, showing the gene expression results without denoising, exhibiting strong noise and sparsity in its spatial distribution. C shows MYH6 obtained through the random mask reconstruction and dual-image decoding mechanism of this invention. A schematic diagram of gene expression distribution after denoising. Compared with Figure (B), the denoised gene expression is more spatially continuous and more consistent with the spatial distribution of atrial cardiomyocytes. D shows the correlation assessment results between the deconvolution results of different cell types and their corresponding marker gene expression. The consistency of the prediction results of the method of the present invention and the comparative method under multi-slice conditions is compared by calculating the Pearson correlation coefficient (PCC). E is a schematic diagram of the spatial domain division results obtained by clustering spatial points based on the cell type composition ratio predicted by the method of the present invention, which is used to show the changing trend of cardiac tissue in spatial structure at different developmental stages.
[0069] Comprehensive experimental results show that, compared with existing mainstream methods, this invention exhibits superior performance in terms of cross-slice consistency, expression denoising effect, key region identification accuracy, and overall stability, and has good generalization ability and significant robustness.
[0070] The above describes the integrated analysis method for multi-slice spatial transcriptome data in the embodiments of the present invention. The following describes the integrated analysis device for multi-slice spatial transcriptome data in the embodiments of the present invention. Please refer to [link to relevant documentation]. Figure 5 One embodiment of the integrated analysis device for multi-slice spatial transcriptome data in this invention includes: The preprocessing module 10 is used to perform gene alignment and normalization preprocessing on the acquired multi-slice spatial transcriptome data and the corresponding single-cell transcriptome reference data, and to construct a unified gene expression feature matrix and spatial coordinate set. The adjacency graph construction module 20 is used to construct a spatial adjacency matrix based on spatial coordinates for each spatial transcriptome slice, and to construct an expression adjacency matrix based on gene expression similarity. The masking module 30 is used to perform a random masking operation on the unified gene expression feature matrix to generate a masked matrix and a complementary masking matrix. The feature extraction module 40 is used to construct a dual-branch graph convolutional network containing a spatial encoder and an expression encoder. The masked matrix is input to the spatial encoder and the expression encoder respectively. The spatial encoder uses the spatial adjacency matrix to perform graph convolution operation to extract spatial structure features, and the expression encoder uses the expression adjacency matrix to perform graph convolution operation to extract expression pattern features. The feature fusion module 50 is used to fuse the spatial structure features and the expression pattern features to generate a unified low-dimensional latent representation. The reconstruction module 60 is used to construct a combined adjacency matrix that integrates the spatial adjacency matrix and the expression adjacency matrix; taking the low-dimensional latent representation and the combined adjacency matrix as input, the decoder reconstructs the denoised gene expression matrix, wherein the reconstruction loss function is calculated only for the masked positions; Alignment module 70 is used to construct triplet samples across slices based on known cell type or spatial domain labels as prior knowledge. It uses a contrastive learning loss function to bring the latent representations of similar samples closer together and push away the latent representations of dissimilar samples, thereby achieving supervised cross-slice alignment. The adversarial module 80 is used to connect a discriminator to the low-dimensional latent representation and perform adversarial training through a gradient inversion layer, so that the encoder learns to generate a domain-invariant representation that is insensitive to slices or batches. Training module 90 is used to train the model using a three-stage strategy: the first stage trains the discriminator, the second stage fixes the discriminator and jointly trains the encoder and decoder, and the third stage fixes the encoder and trains the downstream task prediction head. The output module 100 is used to input the preprocessed multi-slice spatial transcriptome data to be analyzed into the trained model and output a unified low-dimensional latent representation, the proportion of cell types at each sequencing point, the spatial domain category, and the denoised gene expression matrix.
[0071] above Figure 5 The integrated analysis device for multi-slice spatial transcriptome data in this embodiment of the invention is described in detail from the perspective of modular functional entities. The integrated analysis device for multi-slice spatial transcriptome data in this embodiment of the invention is described in detail from the perspective of hardware processing.
[0072] Figure 6This is a schematic diagram of the structure of an integrated analysis device for multi-slice spatial transcriptome data provided in an embodiment of the present invention. The integrated analysis device 101 for multi-slice spatial transcriptome data can vary significantly due to different configurations or performance. It may include one or more central processing units (CPUs) 11 (e.g., one or more processors) and a memory 12, and one or more storage media 13 (e.g., one or more mass storage devices) for storing application programs 133 or data 132. The memory 12 and storage media 13 can be temporary or persistent storage. The program stored in the storage media 13 may include one or more modules (not shown in the diagram), each module including a series of instruction operations on the integrated analysis device 101 for multi-slice spatial transcriptome data. Furthermore, the processor 11 may be configured to communicate with the storage media 13 and execute the series of instruction operations in the storage media 13 on the integrated analysis device 101 for multi-slice spatial transcriptome data.
[0073] The integrated analysis device 101 for multi-slice spatial transcriptome data may also include one or more power supplies 14, one or more wired or wireless network interfaces 15, one or more input / output interfaces 16, and / or one or more operating systems 131, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will understand that... Figure 6 The device structure shown does not constitute a limitation on the integrated analysis device 101 for multi-slice spatial transcriptome data, and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0074] The present invention also provides a computer-readable storage medium, which can be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium, wherein the computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the steps of an integrated analysis method for multi-slice spatial transcriptome data.
[0075] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the system, device, or unit described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0076] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0077] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. An integrated analysis method for multi-slice spatial transcriptome data, characterized in that, Including the following steps: Gene alignment and normalization preprocessing were performed on the acquired multi-slice spatial transcriptome data and the corresponding single-cell transcriptome reference data to construct a unified gene expression feature matrix and spatial coordinate set. For each spatial transcriptome slice, a spatial adjacency matrix is constructed based on spatial coordinates, and an expression adjacency matrix is constructed based on gene expression similarity; A random masking operation is performed on the unified gene expression feature matrix to generate a masked matrix and a complementary mask matrix; A dual-branch graph convolutional network comprising a spatial encoder and an expression encoder is constructed. The masked matrix is input to the spatial encoder and the expression encoder respectively. The spatial encoder uses the spatial adjacency matrix to perform graph convolution operations to extract spatial structure features, and the expression encoder uses the expression adjacency matrix to perform graph convolution operations to extract expression pattern features. The spatial structure features and the expression pattern features are fused to generate a unified low-dimensional latent representation; Construct a combined adjacency matrix that integrates the spatial adjacency matrix and the expression adjacency matrix; use the low-dimensional latent representation and the combined adjacency matrix as input, and reconstruct the denoised gene expression matrix through a decoder, wherein the reconstruction loss function is calculated only for the masked positions; Based on known cell type or spatial domain labels as prior knowledge, triplet samples across slices are constructed. By contrastive learning loss function, the latent representations of samples of the same type are brought closer together, while the latent representations of samples of different types are pushed further away, thus achieving supervised cross-slice alignment. A discriminator is connected to the low-dimensional latent representation, and adversarial training is performed through a gradient inversion layer, so that the encoder learns to generate a domain-invariant representation that is insensitive to slices or batches. A three-stage strategy is adopted to train the model: the first stage trains the discriminator, the second stage fixes the discriminator and jointly trains the encoder and decoder, and the third stage fixes the encoder and trains the prediction head for downstream tasks. The preprocessed multi-slice spatial transcriptome data to be analyzed is input into the trained model, which outputs a unified low-dimensional latent representation, the proportion of cell types at each sequencing point, the spatial domain category, and a denoised gene expression matrix.
2. The integrated analysis method for multi-slice spatial transcriptome data according to claim 1, characterized in that, The construction of the spatial adjacency matrix based on spatial coordinates includes the following steps: Calculate the pairwise Euclidean distances between all sequencing sites within a spatial transcriptome slice; For each sequencing site, select the spatially nearest [sequencing site]. Each point is defined as a neighbor, and a binary spatial adjacency matrix is constructed. If sequencing point j is a neighbor of sequencing point i, then (i, j)=1, otherwise 0.
3. The integrated analysis method for multi-slice spatial transcriptome data according to claim 1, characterized in that, Constructing an expression adjacency matrix based on gene expression similarity includes the following steps: Calculate the cosine similarity of all spatial points within a spatial transcriptome slice based on their gene expression vectors; For each sequencing site, the one with the highest expression similarity is selected. Each point is defined as a neighbor, and a binary adjacency matrix is constructed to represent the adjacency. If sequencing point j is a neighbor of sequencing point i, then (i, j)=1, otherwise 0.
4. The integrated analysis method for multi-slice spatial transcriptome data according to claim 1, characterized in that, Both the spatial encoder and the representation encoder are implemented using multi-layer graph convolutional networks, where the feature propagation formula for each layer is: ,in, It is the first The sequencing point feature representation matrix of the layer, where , The matrix after masking; It is the first Layer learnable parameter matrix, It is a non-linear activation function, for a spatial encoder, For the expression encoder, , It is the identity matrix. for The degree matrix.
5. The integrated analysis method for multi-slice spatial transcriptome data according to claim 1, characterized in that, Construct a combined adjacency matrix that integrates the spatial adjacency matrix and the representational adjacency matrix; Using the low-dimensional latent representation and the combined adjacency matrix as input, the denoised gene expression matrix is reconstructed through a decoder, including the following steps: Through formula Construct a combined adjacency matrix, where, It is a spatial adjacency matrix. To express the adjacency matrix; and The combined weights are used to balance the influence of local spatial topology and global representation patterns during the decoding process. With a unified low-dimensional latent representation and combined adjacency matrix As input to the decoder, the output is the reconstructed gene expression matrix. Calculate the reconstruction loss at the positions of the random mask: ,in, This represents the reconstructed gene expression matrix, which is the output after denoising and completion. It is the original gene expression matrix. This represents a mask indicator matrix, where the mask position takes the value of... The value at the non-mask position is .
6. The integrated analysis method for multi-slice spatial transcriptome data according to claim 1, characterized in that, Based on known cell type or spatial domain labels as prior knowledge, triplet samples across slices are constructed. A contrastive learning loss function is used to bring the latent representations of similar samples closer together and push away the latent representations of dissimilar samples, achieving cross-slice alignment. The steps include: For each anchor point Select positive samples of the same type or region from other slices. , and negative samples of different types or from different domains Construct triplet samples across slices; Define the contrastive learning loss function for triple samples: ,in, These are the latent vectors of the anchor point, positive sample, and negative sample, respectively. It is a spacing parameter that, through biological prior contrastive learning, brings homologous regions across slices closer together and pushes heterologous regions further apart, achieving cross-slice alignment without confusion.
7. The integrated analysis method for multi-slice spatial transcriptome data according to claim 1, characterized in that, The model is trained using a three-stage strategy: the first stage trains the discriminator; the second stage fixes the discriminator and jointly trains the dual-branch graph convolutional network and decoder; and the third stage fixes the encoder and trains the prediction head for downstream tasks. The steps include: In the first stage, the encoder and decoder parameters are fixed, and only the loss function is used. Train the discriminator until convergence, wherein, The total number of training sample points. For the number of categories in a batch or slice, For real labels, For the discriminator to spot Category The predicted probability; In the second stage, with the discriminator parameters fixed, adversarial learning is implemented on the encoder side through a gradient inversion layer. Based on this, the encoder and decoder are jointly optimized. The goal of this stage is to minimize the objective function, which is a weighted sum of reconstruction loss, biological contrast loss, and adversarial loss processed by gradient inversion. ,in, It is the reconstruction loss The weighting coefficients are used to adjust the importance of the denoising and completion target; It is biological contrast loss The balance coefficient between the loss of the opponent and the loss of the opponent; In the third stage, the encoder parameters are fixed, and cell type deconvolution branches and spatial domain recognition branches are added and trained. This stage minimizes the prediction loss function, enabling the model to obtain reliable task output on a stable latent representation space. ,in, Mean squared error, used to measure the numerical difference between the predicted value and the target value. It is the Jensen-Shannon divergence, used to measure the difference between the predicted cell proportion distribution and the reference distribution. This represents the JS divergence loss weight, used to balance numerical precision and distribution consistency.
8. An integrated analysis device for multi-slice spatial transcriptome data, characterized in that, include: The preprocessing module is used to perform gene alignment and normalization preprocessing on the acquired multi-slice spatial transcriptome data and the corresponding single-cell transcriptome reference data, and to construct a unified gene expression feature matrix and spatial coordinate set. The adjacency graph construction module is used to construct a spatial adjacency matrix based on spatial coordinates for each spatial transcriptome slice, and to construct an expression adjacency matrix based on gene expression similarity. The masking module is used to perform random masking operations on the unified gene expression feature matrix to generate a masked matrix and a complementary mask matrix. The feature extraction module is used to construct a dual-branch graph convolutional network containing a spatial encoder and an expression encoder. The masked matrix is input to the spatial encoder and the expression encoder respectively. The spatial encoder uses the spatial adjacency matrix to perform graph convolution operation to extract spatial structure features, and the expression encoder uses the expression adjacency matrix to perform graph convolution operation to extract expression pattern features. The feature fusion module is used to fuse the spatial structure features and the expression pattern features to generate a unified low-dimensional latent representation. The reconstruction module is used to construct a combined adjacency matrix that integrates the spatial adjacency matrix and the expressive adjacency matrix; Using the low-dimensional latent representation and the combined adjacency matrix as input, the denoised gene expression matrix is reconstructed by a decoder, wherein the reconstruction loss function is calculated only for the masked positions; The alignment module is used to construct triplet samples across slices based on known cell type or spatial domain labels as prior knowledge. It uses a contrastive learning loss function to bring the latent representations of similar samples closer together and push away the latent representations of dissimilar samples, thus achieving supervised cross-slice alignment. An adversarial module is used to connect a discriminator to the low-dimensional latent representation and perform adversarial training through a gradient inversion layer, so that the encoder learns to generate a domain-invariant representation that is insensitive to slices or batches. The training module is used to train the model using a three-stage strategy: the first stage trains the discriminator, the second stage fixes the discriminator and jointly trains the encoder and decoder, and the third stage fixes the encoder and trains the prediction head for downstream tasks. The output module is used to input the preprocessed multi-slice spatial transcriptome data to be analyzed into the trained model and output a unified low-dimensional latent representation, the proportion of cell types at each sequencing point, the spatial domain category, and the denoised gene expression matrix.
9. An integrated analysis device for multi-slice spatial transcriptome data, characterized in that, It includes a memory and at least one processor, wherein the memory stores computer-readable instructions; The at least one processor invokes the computer-readable instructions in the memory to perform the steps of the integrated analysis method for multi-slice spatial transcriptome data as described in any one of claims 1-7.
10. A computer-readable storage medium storing computer-readable instructions thereon, characterized in that, When the computer-readable instructions are executed by a processor, they implement the steps of the integrated analysis method for multi-slice spatial transcriptome data as described in any one of claims 1-7.