Human organ tissue slice AI three-dimensional reconstruction method

By generating tissue foreground masks and using feature matching neural networks for multi-reference candidate registration, combined with pixel correlation thresholding and semantic segmentation, the problems of unstable registration and insufficient reconstruction continuity in the 3D reconstruction of biological tissue slices are solved, achieving 3D reconstruction with higher accuracy and stability.

CN122244331APending Publication Date: 2026-06-19SOUTHERN MEDICAL UNIVERSITY +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTHERN MEDICAL UNIVERSITY
Filing Date
2026-05-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for three-dimensional reconstruction of biological tissue sections suffer from problems such as unstable registration, accumulation of misregistration, and insufficient reconstruction continuity. Furthermore, the reconstruction methods have poor versatility and platform compatibility.

Method used

By generating tissue foreground masks and using feature matching neural networks for multi-reference candidate registration, combined with pixel correlation thresholding and semantic segmentation, robust organ tissue slice sequence registration and continuous structural reconstruction are achieved.

🎯Benefits of technology

It improves the accuracy and stability of 3D reconstruction, expands the applicability of the method, reduces the cumulative spread of misregistration, and enhances the visualization of reconstruction results and the usability of subsequent analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244331A_ABST
    Figure CN122244331A_ABST
Patent Text Reader

Abstract

This invention relates to the field of image processing technology, specifically disclosing an AI-based 3D reconstruction method for human organ tissue slices. The method includes: acquiring a sequence of consecutive slice images and preprocessing them to generate a tissue foreground mask; using previously registered slices as references, estimating the homography matrix through a feature matching neural network and transforming it to obtain candidate registration results; calculating the pixel correlation between the candidate and the reference foreground mask, retaining those not less than a threshold and taking the largest to form a registration slice sequence; semantically segmenting the registration sequence to obtain a target structure mask sequence, and stacking it in voxels according to the interlayer spacing to output the 3D reconstruction result. By using multi-reference registration based on foreground correlation for optimal selection, the accumulation of misregistration is suppressed, and the alignment stability is improved; post-registration segmentation and voxel stacking realize multi-structure 3D reconstruction and sub-structure display. This method enhances registration reliability and supports multi-label 3D display.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to an AI-based three-dimensional reconstruction method for human organ tissue slices. Background Technology

[0002] Human understanding of tissue morphology and functional mechanisms relies on both the precise characterization of normal morphology and the systematic analysis of pathological morphological changes. In normal tissues, the spatial arrangement of cells, hierarchical interfaces, and three-dimensional microstructures collectively construct the functional base of organs. For example, from the microcirculation of liver lobules to the filtration pathways of nephrons, from the hierarchical connections of the cerebral cortex to the three-dimensional orientation of skeletal and cardiac muscle fibers, the morphological-functional relationship must be explored on a three-dimensional scale. However, in pathological conditions, these spatial relationships often undergo local or global reconstruction, manifesting as blurred cell boundaries, remodeling of matrix components, cell infiltration, necrosis, and angiogenesis, resulting in a high degree of heterogeneity in the originally ordered three-dimensional configuration. Especially when deviations occur at tissue-cell boundaries, such as differences in immunomarker intensity and cytoplasmic and nuclear staining, accurately characterizing boundary heterogeneity at the three-dimensional level becomes crucial for revealing biological processes such as development, regeneration, inflammation, and tumors.

[0003] In recent years, advancements in whole-slide digital scanning (WSI) and multicolor fluorescence or immunostaining techniques have enabled two-dimensional slides to provide unprecedented high-resolution local morphological information. However, simply relying on two-dimensional slide observation or simple projection of slides along the Z-axis often fails to restore the true three-dimensional spatial relationships and structural continuity. Biological tissue slides inevitably exhibit physical and chemical variations during preparation, such as folding, tearing, uneven thickness, differences in staining background, and local stretching or compression. This results in significant deviations in position, scale, and shape between adjacent slides, posing a major challenge to cross-level registration and continuity reconstruction. Traditional volumetric rendering methods based on voxelization and ray casting, while capable of presenting overall visual volume, face a dilemma of "accuracy versus cost" in voxel size selection: excessively large voxels easily lose boundary details, while excessively small voxels lead to a surge in computational and storage costs. Furthermore, most rendering outputs prioritize visual presentation rather than editable solid models, making it difficult to meet the needs of engineering simulation, structural disassembly, and downstream quantitative analysis. In addition, current patents are limited to the three-dimensional reconstruction of specific tissue structures within continuous slides of a single organ or tissue, but the versatility and platform-based nature of tissue three-dimensional reconstruction are somewhat lacking. Summary of the Invention

[0004] The purpose of this invention is to overcome the problems of unstable registration, accumulated misregistration, and insufficient reconstruction continuity caused by fluctuations in slice quality and differences across layers in existing 3D reconstruction of continuous tissue slices, as well as the poor versatility and platformability of the reconstruction methods. This invention provides an AI-based 3D reconstruction method for human organ and tissue slices. It constructs a tissue foreground mask from continuous slice sequences and generates multi-reference candidate registration results based on a feature matching neural network. Thresholding and optimal selection are performed using pixel correlation between foreground masks to obtain robustly registered slice sequences. Furthermore, semantic segmentation and voxel stacking based on layer spacing are combined to output the 3D reconstruction results. This achieves robust registration and continuous structural reconstruction of different organ and tissue slice sequences, improving the accuracy, stability, and applicability of 3D reconstruction.

[0005] To achieve the above objectives, the present invention provides an AI three-dimensional reconstruction method for human organ tissue slices, the method comprising: acquiring and preprocessing a sequence of continuous slice images of the same organ tissue to generate a tissue foreground mask; For each slice to be registered, the registered slices of the preceding preset number of layers are used as reference slices. The correspondence is obtained through the feature matching neural network, the homography matrix is ​​estimated, and geometric transformation is performed on the slice to be registered to generate candidate registration results. Pixel correlation is calculated based on the tissue foreground mask corresponding to the candidate registration results and the tissue foreground mask corresponding to the reference slice; Candidate registration results with pixel correlation not less than a preset threshold are retained. The candidate registration result with the highest pixel correlation is selected as the registration result of the slice to be registered. The registration results corresponding to each slice to be registered form a registration slice sequence. The target structure mask sequence is obtained by semantic segmentation of the registered slice sequence, and the mask sequence is stacked by voxelization according to the interlayer spacing to output the three-dimensional reconstruction result.

[0006] More preferably, the step of obtaining continuous slice sequence images of the same organ tissue includes: performing continuous slicing of the same organ tissue, staining and scanning each slice to obtain slice sequence images, establishing a sequence identifier for each slice sequence image, obtaining the slice thickness of each slice, and determining the interlayer spacing based on the slice thickness.

[0007] More preferably, the preprocessing includes at least one of downsampling, noise reduction, brightness and color normalization, background suppression, cropping to obtain a target image containing tissue regions, and format conversion.

[0008] More preferably, the generation of the tissue foreground mask includes: thresholding the preprocessed slice sequence image to obtain the tissue foreground region, and performing morphological processing on the tissue foreground region to remove isolated noise or fill holes, thereby obtaining the tissue foreground mask.

[0009] More preferably, the pre-registered slices of the preset number of layers preceding the slice to be registered are at least three consecutive pre-registered slices preceding the slice to be registered; the slice to be registered generates candidate registration results with each pre-registered slice as a reference slice, and calculates the pixel correlation between the tissue foreground mask of each candidate registration result and the tissue foreground mask of the corresponding reference slice, and selects the candidate registration result with the highest pixel correlation as the registration result of the slice to be registered; when the slice to be registered has less than three pre-registered slices preceding it, the existing pre-registered slices are used as reference slices to generate candidate registration results.

[0010] More preferably, the calculation of pixel correlation includes: converting the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice into binary matrices, wherein the binary matrices are used to characterize whether the corresponding pixel position belongs to the tissue foreground or the background; counting the number of pixels in the two binary matrices that are simultaneously foreground at the corresponding pixel position as the intersection pixel count, and counting the number of pixels in the two binary matrices that are at least one foreground at the corresponding pixel position as the union pixel count; and determining the ratio of the intersection pixel count to the union pixel count as the pixel correlation.

[0011] More preferably, the calculation of the pixel correlation includes: dividing the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice into no less than two local regions, calculating the local pixel correlation of each local region, and weighting and aggregating all local pixel correlations to obtain the global pixel correlation, and using the global pixel correlation as the pixel correlation between the two tissue foreground masks; The weights of the weighted aggregation are determined based on the proportion of foreground pixels in the corresponding local region or the proportion of the local region area.

[0012] More preferably, the determination of the preset threshold includes: for the target organ tissue type or section staining method, selecting a preset number of benchmark section samples or candidate registration result samples to calculate the corresponding pixel correlation, and determining the preset threshold based on the statistical distribution characteristics of the pixel correlation; when the organ tissue type or section staining method changes, the preset threshold is re-determined based on the changed samples.

[0013] More preferably, it also includes establishing a reference confidence level for the reference slice, wherein the reference confidence level is determined by the maximum pixel correlation corresponding to the process of generating the registration result of the reference slice itself, or by the statistical characteristics of the pixel correlation obtained by the reference slice participating in the registration process of subsequent slices to be registered; When the pixel correlation difference of the retained candidate registration results is less than the preset difference or there are ties for the maximum value, the evaluation value is calculated by using the pixel correlation of the candidate registration results and the reference confidence of their corresponding reference slices, and the registration result of the slice to be registered is determined by the evaluation value.

[0014] More preferably, the method further includes: when there are no candidate registration results with pixel correlation not less than a preset threshold, interpolation estimation is performed based on the geometric transformation parameters corresponding to the registered slices adjacent to the slice to be registered to obtain interpolated geometric transformation parameters, and geometric transformation is performed on the slice to be registered according to the interpolated geometric transformation parameters to generate a compensated registration result, and the compensated registration result is used as the registration result of the slice to be registered to form a registration slice sequence; when there are no qualified candidate registration results for a preset number of consecutive times, registration is stopped and an abnormality indication message is output.

[0015] More preferably, the semantic segmentation performs pixel-level classification on the registered slice sequence through an image segmentation neural network to output a target structure mask sequence; the training data of the image segmentation neural network is constructed using data augmentation, including cropping image blocks from the slice images, performing random augmentation on the image blocks and pasting them onto a blank canvas of a preset size to generate augmented training images, and cropping the augmented training images into training image blocks as training data; The voxelization stacking includes mapping the target structure mask sequence into a voxel mesh based on the interlayer spacing and outputting the three-dimensional reconstruction result.

[0016] Compared with the prior art, the method provided by the present invention has at least the following beneficial effects: 1. By employing a collaborative mechanism of multi-reference candidate registration, pixel correlation evaluation based on tissue foreground mask, threshold screening, and maximum correlation selection, the registration stability can be improved and the cumulative diffusion of misregistration in the sequence can be reduced when there are deformations, brightness differences, or local defects in continuous slices, thereby enhancing the geometric consistency and reliability of 3D reconstruction.

[0017] 2. By statistically determining the pixel correlation threshold using benchmark slice samples or candidate registration result samples, and resetting the threshold when organ tissue type or staining method changes, the adaptability of the method to different data types can be enhanced, and the probability of misjudging normal structural differences as registration errors or missing obvious misalignments can be reduced.

[0018] 3. Based on the registered slice sequence, semantic segmentation and voxel stacking are performed to output three-dimensional results, so that the target structure can stably enter the three-dimensional reconstruction process in the form of mask sequence, improving the visualization of reconstruction results and the usability of subsequent analysis; at the same time, the use of optimization methods such as local relevance weighted aggregation is beneficial to maintaining the robustness of the evaluation when there are local anomalies. Attached Figure Description

[0019] Figure 1 This is a flowchart of an AI-based three-dimensional reconstruction method for human organ tissue slices according to an embodiment of the present invention; Figure 2 This is a schematic diagram of three-dimensional reconstruction of biological tissue slice images in an embodiment of the present invention; Figure 3 This is a schematic diagram illustrating the image registration process in an embodiment of the present invention. Figure 4 This is an image registration result of the hepatopancreatic ampulla and testicular samples in an embodiment of the present invention; Figure 5 This is an example diagram showing the processing result of the data augmentation method in an embodiment of the present invention; Figure 6 A schematic diagram of the SegNeXt image segmentation network and its multi-scale convolutional attention module structure; Figure 7 This is a segmentation result image of the hepatopancreatic ampulla and testis samples in an embodiment of the present invention; Figure 8 This is a visualization result of the three-dimensional model of the hepatopancreatic ampulla and testis samples in an embodiment of the present invention. Detailed Implementation

[0020] The present invention will be further described below with reference to the accompanying drawings and specific embodiments. It should be understood that the embodiments described herein are only for explaining the present invention and are not intended to limit the scope of protection of the present invention. Those skilled in the art to which this invention pertains can make various modifications or substitutions to the technical solutions of the present invention without departing from the concept of the present invention, and all such modifications or substitutions should fall within the scope of protection of the present invention.

[0021] In this invention, the sample tissue is selected from serial sections stained with conventional immunohistochemical staining or hematoxylin-eosin staining. Then, feature points are extracted from two section images (the section to be registered and the reference section) using a feature matching neural network. The homography matrix between the images is calculated based on the feature points, and applied to the section to be registered, thereby adjusting the position and shape of the two images to achieve spatial alignment. Subsequently, the unregistered and registered images are adjusted and replaced sequentially, and the above registration steps are repeated to achieve alignment of the entire serial section series. Simultaneously, an image segmentation neural network is trained to perform semantic segmentation of multiple subtle target structures in the tissue sections. The trained image segmentation neural network is applied to the registered serial section sequence, and finally, a three-dimensional model with multiple labels is obtained through three-dimensional reconstruction, achieving both overall tissue display and individual structure display. Specifically, this embodiment provides an AI three-dimensional reconstruction method for human organ tissue sections, referencing... Figure 1 and Figure 2As shown, the method includes: acquiring and preprocessing consecutive slice sequences of images of the same organ tissue to generate a tissue foreground mask; for each slice to be registered, using the pre-preceded registered slices of a predetermined number of layers as reference slices, obtaining the correspondence through a feature matching neural network, estimating the homography matrix, and performing geometric transformations on the slice to be registered to generate candidate registration results; calculating pixel correlation based on the tissue foreground mask corresponding to the candidate registration results and the tissue foreground mask corresponding to the reference slices; retaining candidate registration results with pixel correlation not less than a predetermined threshold, selecting the candidate registration result with the highest pixel correlation as the registration result of the slice to be registered, and forming a registration slice sequence from the registration results corresponding to each slice to be registered; semantically segmenting the registration slice sequence to obtain a target structure mask sequence, and stacking the mask sequences by voxelization according to the layer spacing to output a three-dimensional reconstruction result.

[0022] In the above embodiments, the continuous slice sequence image refers to a set of slice images obtained by continuously slicing and scanning the same organ tissue sample; the tissue foreground mask is used to characterize the tissue foreground region and background region in the slice image; the candidate registration result refers to the registered image obtained by geometrically transforming the slice to be registered relative to the reference slice; the pixel correlation is used to characterize the degree of consistency between the candidate registration result and the reference slice in the tissue foreground region; the interlayer spacing can be determined by the slice thickness or obtained by the acquisition process.

[0023] The sequential image sequence comes from the scan results of consecutive slices of tissue samples from the same organ. Because brightness differences, background fluctuations, and local noise are easily introduced during slice preparation and scanning, the slice sequence images are preprocessed before registration to make the images more consistent in resolution, intensity distribution, and background condition, thus providing more reliable input for subsequent stable feature extraction and mask calculation. After preprocessing, a tissue foreground mask is further generated to identify tissue regions and non-tissue background regions in the slice images. The purpose of this mask is to ensure that subsequent quality assessment focuses on the tissue itself, avoiding interference from blank background regions in similarity calculation.

[0024] For each slice to be registered, a pre-defined number of registered slices from its preceding layers are selected as reference slices. This multi-reference design is to avoid accidental misregistrations caused by relying on a single reference. The same slice to be registered is matched and transformed with multiple reference slices to obtain multiple candidate registration results. Subsequently, the best candidate is selected based on pixel correlation, forming a stable closed loop of candidate-screening-best selection.

[0025] When generating candidate registration results, the slice to be registered and a reference slice are input into a feature matching neural network to obtain a set of feature correspondences between them. Based on this correspondence, the homography matrix between the two is estimated, and the homography matrix is ​​applied to the slice to be registered. Geometric transformation and resampling are then performed on the slice to be registered, thereby generating candidate registration results aligned with the reference slice. The above process is repeated for each reference slice to obtain a set of candidate registration results.

[0026] Subsequently, pixel correlation is calculated based on the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice to quantify the degree of overlap of the aligned tissue regions. In specific implementation, the two masks can be regarded as binary matrices, and the number of pixels belonging to the foreground at the same pixel position is counted as the intersection pixel count, and the number of pixels belonging to the foreground at least at the same pixel position is counted as the union pixel count; then the pixel correlation is determined by the ratio of the intersection pixel count to the union pixel count. The higher the pixel correlation, the more fully the tissue contour of the candidate registration result matches the reference slice.

[0027] In the candidate screening and selection stage, the candidate registration results are first thresholded, retaining those with pixel correlation not less than a preset threshold to exclude candidates with obvious misalignment or poor matching. Then, the candidate registration result with the highest pixel correlation is selected from the retained candidates as the final registration result for that slice. This final registration result is then incorporated into the already registered slice sequence, and the same process of generating candidates from multiple references, calculating correlation, thresholding, and selecting the slice with the highest correlation is repeated for the next slice to be registered until the entire slice sequence is registered, forming a registered slice sequence. This iterative approach maintains relatively stable consistency constraints within the sequence, reducing the risk of misregistration propagating to subsequent layers.

[0028] After sequence registration is completed, semantic segmentation is performed on the registered slice sequence to obtain the target structure mask sequence. Semantic segmentation takes the registered slice image as input and outputs the corresponding pixel-level classification result, thereby obtaining the mask representation of the target structure on each slice layer; when the target structure contains multiple categories, multi-category masks can be obtained to support multi-label representation of subsequent 3D results.

[0029] Finally, the target structure mask sequence is stacked in voxel form according to the layer spacing. Each mask layer is mapped to its corresponding spatial location, and Z-axis accumulation is performed according to the spatial interval between adjacent slices to obtain the reconstruction result in three-dimensional voxel form. For multi-category masks, the label information of different structures can be retained in the three-dimensional voxel space, so that the overall organizational structure can be presented in 3D display, and individual structures can be displayed and analyzed by category.

[0030] In one embodiment of the present invention, the slice sequence images and their spatial scale parameters used for subsequent sequence processing and three-dimensional reconstruction are obtained in the following manner, so that the hierarchical relationship and spatial interval of the image data have a consistent and traceable correspondence. Specifically, obtaining continuous slice sequence images of the same organ tissue includes: performing continuous slicing of the same organ tissue, staining and scanning each slice to obtain slice sequence images, establishing a sequence identifier for each slice sequence image, obtaining the slice thickness of each slice, and determining the interslice spacing based on the slice thickness.

[0031] Based on the overall scheme described above, the key to continuous slicing lies in maintaining consistency in slice orientation and sequence, ensuring a natural spatial continuity of tissue structures between adjacent slices. Only in this way can the resulting slice set be considered a continuous observation of the same tissue at adjacent levels. In the staining and scanning stage, each slice is routinely stained and digitally scanned to obtain slice sequence images with stable resolution and computable expression. For example, sample slices are stained using methods such as HE, immunohistochemistry, and immunofluorescence, and scanned using a digital pathology slide scanner to obtain the slice sequence of scanned images. This facilitates subsequent computational processing and result alignment within a unified image space.

[0032] The sequence identifier is used to solidify the hierarchical relationship of which slice comes first and which comes last, preventing image files from becoming disordered during storage, transmission, or batch import. In specific implementations, the sequence information can be written into file naming rules, metadata fields, or external index tables, and mapped one-to-one with images, so that the same hierarchical order can be restored whenever the sequence is reloaded.

[0033] The slice thickness is used to establish the true spatial scale. Thickness can be obtained from slide preparation process records, equipment setting parameters, or related record information, and is stored in association with the corresponding sequence identifier. For example, the slice thickness parameter can be set to a slice thickness of 2 micrometers to 20 micrometers. When determining the interlayer spacing based on the slice thickness, if the thickness is stable, it can be used as the spatial interval between adjacent layers; if the thickness fluctuates or there are missing layers in the sequence, the interlayer spacing can be set accordingly based on the recorded thickness information, so that the interlayer spacing can reflect the actual layer span. Therefore, during subsequent 3D voxel stacking, each layer result can be mapped to the corresponding spatial position according to this interlayer spacing, thus ensuring both hierarchical continuity and the consistency and interpretability of the 3D results at scale.

[0034] Through the above data acquisition methods, two types of basic inputs are ultimately formed: one is a sequence of consecutive slice images with sequential identifiers, and the other is the interlayer spacing parameters corresponding to the sequence; together, they provide stable data and spatial references for subsequent sequence registration, structure extraction, and 3D reconstruction.

[0035] In one embodiment of the present invention, to improve the stability and consistency of continuous slice image sequences in subsequent registration, similarity evaluation, and structure extraction processes, the following preprocessing scheme is performed on the slice image sequences. Specifically, the preprocessing includes at least one of the following: downsampling, denoising, brightness and color normalization, background suppression, cropping to obtain a target image containing the tissue region, and format conversion. This preprocessing scheme makes different slices as consistent as possible in terms of resolution, noise level, intensity distribution, and background conditions, thereby reducing the interference of differences introduced during slide preparation and scanning on subsequent processing.

[0036] Downsampling is used to reduce the pixel size and computational burden of a single slice while ensuring that the main information of tissue morphology can be expressed, and to enable subsequent processing of sequential images at a uniform resolution. The downsampling factor can be set in combination with the scanning resolution and the scale of the target structure to balance detail preservation and processing efficiency. Denoising is used to suppress the influence of scanning noise, granular background, or local artifacts on feature extraction and mask generation. Denoising methods can use conventional filtering or smoothing methods based on image statistical features to reduce noise while preserving tissue edges and texture structure as much as possible.

[0037] Brightness and color normalization are used to eliminate overall intensity drift and hue deviation between different sections caused by differences in staining intensity, color development, or changes in lighting conditions, making the overall contrast and color distribution of the sequence images more consistent. Normalization can be achieved by stretching the brightness range, linearly correcting the color channels, or establishing a baseline distribution using reference sections, in order to reduce the occurrence of the same tissue appearing in different colors on different sections.

[0038] Background suppression is used to reduce the impact of blank backgrounds, slide edges, imprints, or non-organic regions on subsequent processing. In practice, based on the color and brightness characteristics of the background region, the response of the background region can be suppressed or smoothed, making the tissue region stand out more in the image, thereby improving the stability of subsequent tissue foreground extraction. Cropping to obtain the target image containing the tissue region further reduces the proportion of irrelevant areas, avoiding unnecessary computation under large background conditions, and also helps subsequent registration and similarity evaluation to focus more on the main tissue. The cropping region can be determined based on the approximate outer boundary of the tissue, the coarse foreground localization results, or a preset boundary strategy.

[0039] Format conversion is used to unify the storage and processing format of image sequences, such as unifying images from different sources to the same bit depth, number of channels, or compression method, so as to maintain data structure consistency during batch processing. Through at least one or more of the above preprocessing methods, sliced ​​image sequences can be converted into an input format more suitable for computational processing before entering the subsequent core processes, thereby improving the robustness and consistency of subsequent steps.

[0040] In a specific implementation, using continuous tissue slices from the ampulla of Vater of Hepatopancreatic tissue as test samples, the first downsampling operation involves calculating the physical resolution of the original image and the target image to obtain the required image scaling factor. Secondly, nearest-neighbor interpolation is used to scale the image. Simultaneously, the continuous tissue images are format-converted, such as converting SVS format to JPG or TIFF format. Finally, nerves, smooth muscle, arterioles, venules, and blank areas, as well as some pancreatic gland tissue, are manually and meticulously annotated in some continuous slice images for subsequent image AI model training, image recognition, and other tasks.

[0041] In one embodiment of the present invention, to stably characterize the tissue region and background region during subsequent registration quality evaluation and structure extraction, the following tissue foreground mask generation scheme is adopted. Specifically, the generation of the tissue foreground mask includes: thresholding the preprocessed slice sequence image to obtain the tissue foreground region, and performing morphological processing on the tissue foreground region to remove isolated noise or fill holes, thereby obtaining the tissue foreground mask. By implementing the above process, each slice image corresponds to a foreground mask, thus forming a unified foreground representation at the sequence level that can be used for subsequent calculation and judgment.

[0042] The threshold segmentation is used to separate the main tissue from the background in a sliced ​​image. The threshold can be determined based on the grayscale or brightness distribution of the image, for example, by selecting the segmentation threshold according to the peak-valley relationship of the image histogram, or by using an adaptive thresholding strategy for local regions when there is uneven brightness in the image, so that the tissue region can be consistently extracted in different slices. After performing threshold segmentation, an initial tissue foreground region is obtained, which generally covers the main tissue, but may still contain scattered noise, holes, or irregularities such as boundary burrs.

[0043] Morphological processing is used to correct the shape and remove noise from the initial tissue foreground region, making the final mask closer to the real tissue contour and with better connectivity. Removing isolated noise points can be achieved by culling small connected regions; that is, scattered foreground regions with an area or number of pixels below a preset threshold are treated as noise and removed to avoid being mistakenly identified as part of the tissue structure during subsequent similarity evaluation. Filling holes is used to repair foreground defects within the tissue region caused by staining voids, scanning reflections, or local intensity anomalies. By filling internal void areas, the main tissue body presents a continuous foreground coverage, reducing unnecessary broken boundaries within the mask.

[0044] At the sequence level, to maintain mask consistency, the same thresholding strategy and morphological processing rules can be applied to each slice, making the masks generated from different slices as uniform as possible in scale and boundary style. The final tissue foreground mask is represented in a binary manner to indicate whether each pixel position belongs to the tissue foreground. This is used to calculate the pixel correlation between the subsequent candidate registration results and the reference slice, and to provide basic spatial constraints for subsequent structure extraction and 3D reconstruction.

[0045] In one embodiment of the present invention, to suppress accidental mismatches caused by a single reference and reduce the cumulative diffusion of errors in the sequence during sequence registration, the following multi-reference candidate registration and selection scheme is adopted. Specifically, the preceding pre-set number of registered slices are at least three consecutive pre-set registered slices preceding the slice to be registered; the slice to be registered generates candidate registration results using each pre-set slice as a reference slice, and calculates the pixel correlation between the tissue foreground mask of each candidate registration result and the tissue foreground mask of the corresponding reference slice, selecting the candidate registration result with the highest pixel correlation as the registration result of the slice to be registered; when the preceding pre-set slice of the slice to be registered has less than three pre-set registered slices, the existing preceding pre-set registered slices are used as reference slices to generate candidate registration results. Through the above implementation process, each slice to be registered can form comparable candidates under multiple reference constraints, and the selection is performed using a unified pixel correlation standard, thereby improving the stability of sequence alignment.

[0046] At least three consecutive registered slices are used to define the source and continuity of the reference set. The reference slices are taken from the slice layer preceding the slice to be registered, and which have already been registered, ensuring strong adjacency and comparability of the reference slices and the slice to be registered in terms of tissue structure. The significance of this continuity setting is that the hierarchical span between reference slices is small, and tissue morphology changes are usually smoother, which is beneficial for obtaining more reliable candidate registration results. At the same time, using multiple reference slices can, to some extent, offset the influence of local defects, staining fluctuations, or deformation anomalies in a single reference slice on the registration results. In a specific implementation, after preprocessing, this invention uses a deep learning neural network to register all consecutive slice sequences. Specifically, for each slice series, the middle nth image (imagen) is set as the reference image, and the images immediately before and after it are the (n-1)th image (imagen-1) and the (n-2)th image (imagen-2), respectively. This image is also set as the moving image. Two images are simultaneously input into a deep learning neural network for registration and alignment with the nth image. Subsequent images are then input and registered and aligned with the preceding and following images.

[0047] In this embodiment, the registration process can employ the GIM method, a self-training framework based on deep learning. This framework combines a base model with various supplementary methods to filter outliers, and then generates long-distance frame pseudo-labels through cross-frame propagation. These generated pseudo-labels are then used to train the base model, improving the performance of the deep learning image registration model. Currently, the GIM method provides several base models, among which GIMroma uses the RoMa model as its base model. The RoMa model's matching process includes: for input images A and B, firstly, a frozen DINOv2 model is used to extract coarse features; then, a Transformer matching decoder is used to predict the transform field and confidence map; simultaneously, the VGG19 model is used to extract fine features from input images A and B. These fine features, along with the predicted transform field and confidence, are then passed to a thinner to obtain a further refined transform field and confidence map. Finally, high-resolution sparse keypoints are obtained through matching point sampling.

[0048] It should be noted that the above-described registration process, along with the GIM self-training framework, GIMroma method, and RoMa model matching and transformation estimation processes involved, are all existing image registration techniques in this field. They can be directly used to obtain the correspondence between the slice to be registered and the reference slice and to estimate the geometric transformation in this embodiment. This embodiment does not limit the specific registration model and implementation method used. In addition to the methods described above, any existing registration method or its equivalent alternative that can obtain the feature correspondence between slices, estimate the geometric transformation, and output the registration result can be used as a reference. The above content is only used to illustrate optional implementation paths and does not constitute a limitation on this embodiment.

[0049] When generating candidate registration results, for the same slice to be registered, each reference slice is selected as the registration benchmark, and a registration operation is performed independently to obtain a candidate registration result aligned with that reference slice. Since the local morphology, defect location, and imaging differences of each reference slice may be different, the candidate registration results obtained for the same slice to be registered under different reference conditions often differ. These differences provide a basis for comparison in subsequent selection, so that the final result does not have to rely on the sole judgment of a single reference.

[0050] like Figure 4 As shown in the figure, this figure provides a typical image registration example for two types of organ samples (the top four images are hepatopancreatic ampulla samples, and the bottom four images are testicular samples). It is used to intuitively demonstrate the effect of estimating geometric transformation and completing alignment based on feature correspondence, and thus provide a basis for subsequent registration quality evaluation and sequence registration iteration.

[0051] Specifically, Figure 4The four images on the left are registration point maps of two types of organ samples, used to demonstrate the key point correspondence between the reference slice and the slice to be registered, obtained through the feature matching process. This key point correspondence can be output by a feature matching neural network, and based on this, matching point pairs with good spatial consistency are selected to support the estimation of subsequent geometric transformation parameters (such as homography transformation). Since key points are usually distributed near tissue contours, textured areas, and structural boundaries, their coverage and quantity can reflect the feasibility and stability of this registration. When the key point coverage is sufficient and the correspondence is relatively consistent, the geometric transformation estimation is more reliable. When there are artifacts such as wrinkles, tears, or blemishes in the local area, the spatial consistency screening of key points can reduce the interference of abnormal matches on the transformation estimation, thereby reducing the risk of misregistration.

[0052] Figure 4 The four images on the right show the registration results of the two types of organ samples, illustrating the registration results obtained after applying the estimated geometric transformation to the slices to be registered. After registration, it can be observed that the differences in overall position, scale, and shape between the slices to be registered and the reference slices are corrected, and the outer contours of the tissues and the boundaries of the main structures tend to coincide spatially, indicating that the estimated geometric transformation can effectively achieve cross-layer alignment. This alignment result provides a consistent spatial coordinate basis for subsequent pixel correlation evaluation based on tissue foreground masks, allowing the overlap evaluation to focus on the main tissue region; on the other hand, it also lays a stable starting point for iterative registration of sequences, reducing the cumulative diffusion of single-layer registration errors in subsequent layers, thereby improving the structural continuity and consistency of the registered slice sequence when it enters the segmentation and 3D stacking stages.

[0053] In this embodiment, pixel correlation is used to uniformly measure the alignment quality of different candidate registration results. Specifically, for each candidate registration result, its corresponding tissue foreground mask is extracted and compared with the tissue foreground mask of the reference slice used to generate the candidate to calculate the pixel correlation of that candidate. By comparing using the tissue foreground mask as a carrier, the evaluation focuses on the degree of overlap between the tissue outline and the main body region, thereby reducing the influence of background noise and non-tissue regions on the evaluation. A higher pixel correlation generally means that the candidate registration result is more consistent with the reference slice in the spatial position and shape of the tissue body.

[0054] In the optimization phase, the candidate with the highest pixel correlation is selected from all candidate registration results as the registration result for the slice to be registered. This optimization method allows the final result to automatically favor the candidate with better tissue contour overlap, thereby improving the stability of alignment quality. After selecting the registration result, it is incorporated into the already registered slice sequence, so that it can serve as one of the new reference slice sources in subsequent iterations, forming a progressive sequence registration process.

[0055] When the slice to be registered is at the beginning of a sequence or when there are fewer than three registered slices preceding it due to missing layers, the reference set is adaptively reduced based on the existing registered slices. Specifically, if one or two registered slices exist, only the currently available registered slices are used to generate the corresponding number of candidate registration results, and selection is performed. When the slice is at the very beginning of a sequence and there are no registered slices preceding it, the first slice of the sequence is determined as the initial reference slice and directly included in the registered slice set. This allows subsequent slices to use the initial reference slice as a reference to generate candidate registration results and perform selection. This approach maintains the same candidate generation and selection logic even when the number of references is insufficient or there are no initial references, preventing process interruptions at the beginning of the sequence or in areas with partial gaps, thus ensuring the continuity and feasibility of the sequence registration process.

[0056] In one embodiment of the present invention, to provide a calculable, comparable, and tissue-matter-related quantitative evaluation of the alignment quality between candidate registration results and reference slices, the following pixel correlation calculation scheme is adopted. Specifically, the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice are converted into binary representations, and the pixel correlation is obtained based on the overlap relationship of the binary representations at pixel positions. Through this scheme, the pixel correlation can directly reflect the degree of foreground overlap of the two tissue foreground masks after spatial alignment, thereby providing a unified basis for the retention and selection of candidate results.

[0057] The tissue foreground mask can be generated as described above, used to identify whether each pixel location belongs to a tissue region or a background region. Before calculating pixel correlation, the two tissue foreground masks are converted into binary matrices, so that the binary matrices can represent whether the corresponding pixel location belongs to the tissue foreground using a unified rule; for example, locations belonging to the tissue foreground are recorded as foreground, and the remaining locations are recorded as background. Through this binary representation, subsequent statistical operations can be performed directly at the pixel level without relying on the brightness, color, or other intensity information of the original image, thereby reducing the impact of staining differences and imaging fluctuations on the evaluation results.

[0058] After obtaining the binary matrices, the foreground relationship between the two binary matrices at corresponding pixel positions is counted and statistically analyzed. Specifically, the number of pixels in both binary matrices that are simultaneously foreground at the same pixel position is counted, and this number is taken as the intersection pixel count, which characterizes the degree to which the two masks jointly cover the tissue at that position. Simultaneously, the number of pixels in both binary matrices that are at least one foreground at the same pixel position is counted, and this number is taken as the union pixel count, which characterizes the overall spatial extent of tissue coverage by the two masks. Subsequently, the ratio of the intersection pixel count to the union pixel count is determined as the pixel correlation, making the pixel correlation numerically comparable. That is, when the tissue foreground of the two masks highly overlaps, the intersection pixel count is relatively large and the union pixel count is relatively small, and the ratio tends to be high; when the tissue foreground of the two masks is significantly misaligned or insufficiently overlapped, the intersection pixel count is relatively small and the union pixel count is relatively large, and the ratio tends to be low.

[0059] To ensure the stability of the evaluation, pixel correlation calculation can be performed in a unified coordinate system of the two masks. The tissue foreground mask corresponding to the candidate registration result should have the same size and pixel coordinate definition as the tissue foreground mask corresponding to the reference slice. When boundary padding or cropping occurs, the above counting statistics can be performed within the same effective area. Through the above pixel correlation definition, different candidate results can be compared laterally under multi-reference candidate registration conditions, thereby supporting the subsequent threshold screening and maximum correlation selection process, providing a clear and reproducible basis for the selection of registration results.

[0060] In a specific example, after GIMroma outputs the keypoints of two images, the homography matrix H is calculated using these keypoints and then applied to the images. Obtain the converted image This is the registration result for that layer. The registration quality is evaluated by comparing the inter-pixel correlation R between the registration result and the reference image. If the correlation R is higher than a threshold, the original image is retained. The image at this layer is used as the registration result; otherwise, the image at this layer is discarded. The formula for calculating pixel correlation R is: ; ; in, This indicates the pixel coordinates of the corresponding image. The result of binarization is used to characterize whether a pixel location belongs to the foreground or background of an organism. Binarization can determine the pixel intensity based on a preset threshold T, assigning a value of 1 to pixels that meet the threshold and assigning a value of 0 to the background. For example, This represents the binarization result of the reference image. This represents the binarized result of the registered image. To facilitate the statistical analysis of the overlap between the two binary matrices, the binarized results can be added together at the same pixel position to obtain the superimposed value. = ,when =2 indicates that both binary matrices are foreground elements at that pixel location. This indicates that at least one of the two binary matrices at that pixel position is the foreground. This represents a counting function, which takes a value of 1 when the condition within the parentheses is true, and a value of 0 otherwise. Therefore, This is used to count the number of pixels in two binary matrices that are both foreground pixels at corresponding pixel positions, i.e., the number of intersection pixels. The number of pixels in the intersection of two binary matrices that are at least foreground pixels at corresponding pixel positions is used to count the number of pixels in the intersection. The pixel correlation R is the ratio of the number of pixels in the intersection to the number of pixels in the union, which is used to characterize the degree of overlap between the registration result and the reference image in the foreground region of the tissue.

[0061] Subsequently, the registration result obtained from the previous layer is used as the new reference image, and the unregistered image from the next layer is used as the new moving image. Feature matching, homography matrix estimation, and geometric transformation are then repeated to obtain the registration result for the next layer. Reference Figure 3 As shown, for example, in the reference image With moving images The registration results were calculated. Then, update the reference image to and update the moving image to This allows us to calculate the registration result for the next layer. Subsequently, for any image to be registered... The pre-registered images of up to three layers were used respectively. , , As a reference image, corresponding candidate registration results are generated independently. For each candidate registration result, the pixel correlation between its tissue foreground mask and the corresponding reference image tissue foreground mask is calculated. The quality of the candidate registration is evaluated based on this pixel correlation, and the candidate registration result with the highest pixel correlation is selected as the registration result for that layer. If the image to be registered has fewer than three preceding registered layers, then only the currently available preceding registered layers are used as reference images to generate candidate images, and the best one is selected. The reference image and the moving image are updated layer by layer in the above manner, and the registration steps are repeated until the registration of the entire slice sequence is completed.

[0062] In one embodiment of the present invention, the pixel correlation calculation process further includes: dividing the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice into at least two local regions, calculating the local pixel correlation of each local region, and weighting and aggregating all local pixel correlations to obtain the global pixel correlation, using the global pixel correlation as the pixel correlation between the two tissue foreground masks; the weight of the weighted aggregation is determined based on the proportion of foreground pixels in the corresponding local region or the proportion of the area of ​​the local region. This embodiment decomposes the overall overlap evaluation into multiple local ranges for calculation and forms a global evaluation in a weighted manner, thereby improving the robustness of the evaluation when local defects, wrinkles, tears, or local deformations exist.

[0063] In this embodiment, the space of the mask is divided into at least two local regions to segment the spatial extent of the mask into multiple relatively independent statistical units. The local regions can be determined according to preset rules, such as dividing the mask coverage area into multiple grid blocks according to image space, dividing it into horizontal or vertical bands, or proportionally partitioning it according to the tissue circumference, so that each local region corresponds to a clearly defined range of pixel coordinates. During the partitioning, it should be ensured that the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice use the same partitioning method and region boundaries, so that local regions with the same name have consistent spatial positions and coverage areas in the two masks, thereby supporting the comparable calculation of subsequent local pixel correlation.

[0064] When calculating local pixel correlation, the foreground overlap relationship of the two masks within each local region is statistically analyzed to obtain the local pixel correlation of that region. The calculation method for local pixel correlation can be consistent with the aforementioned pixel correlation, that is, the degree of overlap is represented by the ratio of the number of pixels that are simultaneously in the foreground to the number of pixels that are at least one in the foreground within the local region. By calculating separately in multiple local regions, the evaluation results can reflect the alignment quality differences at different spatial locations: when a local region has mask anomalies due to folding, tearing, or missing parts, its local pixel correlation will be significantly lower, while unaffected areas can still maintain a high local pixel correlation, thus avoiding the excessive lowering or misleading of the single full-image statistics by local anomalies.

[0065] In the weighted aggregation stage, the local pixel correlations of each local region are summed according to their weights to obtain the global pixel correlation, which is then used as the pixel correlation between the two tissue foreground masks. The weights are determined based on the proportion of foreground pixels or the proportion of the area of ​​the corresponding local region. This means that local regions with a higher proportion of tissue foreground or a larger area contribute more weight to the global evaluation, making the global pixel correlation more reflective of the alignment quality of the main tissue region. Conversely, local regions with a very small proportion of tissue foreground or a small area have a relatively limited impact on the global evaluation, thereby reducing the interference of edge noise or small isolated regions on the overall evaluation.

[0066] The global pixel correlation obtained through the above local calculation and weighted aggregation method can be directly used for the quality evaluation and selection process of candidate registration results. This ensures that the global evaluation can maintain good robustness and interpretability even in the presence of local anomalies, local deformations, or local defects, thereby improving the stability of candidate screening and final result selection in the sequence registration process.

[0067] In one embodiment of the present invention, determining the preset threshold includes: selecting a preset number of benchmark slide samples or candidate registration result samples for the target organ tissue type or slide staining method, calculating the corresponding pixel correlation, and determining the preset threshold based on the statistical distribution characteristics of the pixel correlation; when the organ tissue type or slide staining method changes, the preset threshold is re-determined based on the changed samples. In this embodiment, the threshold can be adaptively set according to the data type, avoiding overly strict or lenient misjudgments under different tissue morphologies or staining conditions when using a fixed threshold, thereby improving the reliability and consistency of the candidate registration result retention determination.

[0068] The target organ tissue type or section staining method is used to define the applicable range of the threshold. Different organs and tissues differ in tissue morphology, structural boundary complexity, and foreground mask morphology; different staining methods may also differ in brightness distribution, background noise, and tissue color development characteristics, resulting in the same pixel correlation value representing registration quality not being entirely consistent in different scenarios. Therefore, when determining the preset threshold, samples are first selected for the current target organ tissue type or section staining method to ensure that the threshold can reflect the typical registration correlation level in that scenario.

[0069] The reference slice samples can be understood as a representative set of samples used for threshold calibration. They can originate from slice image pairs of the same organ / tissue type and staining method, or their corresponding tissue foreground mask pairs. These samples are representative in terms of acquisition quality and structural integrity, and are used to establish a reference distribution of pixel correlation in this scene. The candidate registration result samples can be understood as sample pairs formed by the candidate registration results generated during the actual registration process and their corresponding reference slices, used to reflect the pixel correlation distribution of the candidate results under real-world operating conditions. Both types of samples can be used individually or in combination to balance representativeness and distribution characteristics under real-world operating conditions.

[0070] During the sample calculation phase, the pixel correlation of each selected sample group is calculated, resulting in a set of pixel correlation values. This set is used to characterize the overall distribution pattern of the foreground overlap between candidate registration results and reference slices under the current organ tissue type or staining method. Subsequently, a preset threshold is determined based on the statistical distribution characteristics of pixel correlation. These statistical distribution characteristics may include the central tendency, dispersion, and low-value tail range of pixel correlation, used to distinguish between normally acceptable registration correlation ranges and obviously abnormal registration correlation ranges. The preset threshold thus determined can serve as the basis for judging the retention of candidate registration results, making candidates below the threshold more likely to correspond to large misalignments, insufficient overlap, or abnormal influences, and thus being eliminated or not adopted.

[0071] When organ tissue type or section staining method changes, the distribution pattern of pixel correlation may also change. To maintain the consistency and effectiveness of the threshold determination, the changed samples are reselected, and the above calculation and statistical process is repeated to redetermine the preset threshold applicable to the new scenario. This threshold re-determination strategy based on sample distribution achieves better adaptability across different organs and tissues and staining conditions, reduces the risk of misjudgment caused by directly reusing thresholds across scenarios, and thus enhances the stability and feasibility of the overall registration process.

[0072] In one embodiment of the present invention, the three-dimensional reconstruction method further includes establishing a reference confidence level for a reference slice. The reference confidence level is determined by the maximum pixel correlation during the generation of the reference slice's own registration result, or by the statistical characteristics of the pixel correlation obtained from the reference slice's participation in the subsequent registration of the slice to be registered. When the pixel correlation difference among the retained candidate registration results is less than a preset difference or there are tied maximum values, an evaluation value is calculated using the pixel correlation of the candidate registration result and its corresponding reference confidence level. The registration result of the slice to be registered is then determined by the evaluation value. Introducing the reliability information of the reference slice itself when pixel correlation is difficult to differentiate enables a more stable selection process and reduces the risk of random selection due to similar candidate quality.

[0073] Based on the above implementation, reference confidence is used to characterize the credibility of a reference slice as a registration benchmark, enabling it to participate in the decision-making process as auxiliary information during the selection stage. Reference confidence can be obtained from two sources. The first source is the maximum pixel correlation corresponding to the reference slice when generating its own registration result. That is, during the registration process of the reference slice being incorporated into the sequence, the pixel correlation corresponding to the final adopted candidate registration result can be used as a self-evident quality indicator of the reference slice. When the maximum pixel correlation is high, it indicates that the reference slice has a good foreground overlap with its reference object when it is added to the registration sequence, and its stability as a subsequent reference is usually higher. The second source is the statistical characteristics of pixel correlation generated when the reference slice is subsequently used as a reference object. That is, statistically analyzing the pixel correlation performance of the corresponding candidate results when the reference slice participates in several subsequent candidate generation and evaluations, such as the central tendency level or frequency of low values ​​in its pixel correlation, thereby reflecting the stability of the alignment effect when the reference slice is used as a reference multiple times. When this statistical characteristic shows a high and stable correlation level, its reference confidence can be considered high.

[0074] The preset difference is used to define the threshold for determining whether the pixel correlation difference is sufficient to make a clear selection. When the pixel correlation difference between the retained candidate registration results is large, directly selecting the candidate with the highest pixel correlation will yield a clear and stable result. However, when the pixel correlation difference is less than the preset difference or there are ties for the highest value, relying solely on pixel correlation may lead to the selection result being affected by minor fluctuations, resulting in instability or randomness. In this case, a reference confidence level is further introduced to participate in the joint evaluation.

[0075] During the joint evaluation phase, evaluation values ​​are constructed for the candidate registration results entering the comparison. These evaluation values ​​are jointly determined by the pixel correlation of the candidate registration result and the reference confidence of its corresponding reference slice, thus creating a synergistic constraint between candidate quality and reference reliability. Pixel correlation reflects the degree of overlap between the candidate result and the reference slice in the tissue foreground, while reference confidence reflects the stability of the reference slice itself as a benchmark. When combined, these two factors allow for the priority selection of candidate registration results from more reliable reference slices among candidates with similar pixel correlation, thereby improving the sequential consistency of the final registration results.

[0076] Therefore, when pixel correlations are close or parallel, the registration result of the slice to be registered is determined using the evaluation value. After this registration result is incorporated into the already registered slice sequence, it can provide a data basis for subsequent updates and statistics of reference confidence, thereby forming a sustainable stability constraint during the sequence progression.

[0077] In one embodiment of the present invention, to maintain the continuous executability of the sequence registration process when all candidate registration results fail to meet the threshold condition, and to provide a controllable termination and alarm mechanism when anomalies continue to occur, the following operation steps are adopted: When there are no candidate registration results with pixel correlation not less than a preset threshold, interpolation estimation is performed based on the geometric transformation parameters corresponding to the already registered slices adjacent to the slice to be registered to obtain interpolated geometric transformation parameters, and geometric transformation is performed on the slice to be registered according to the interpolated geometric transformation parameters to generate a compensated registration result. The compensated registration result is used as the registration result of the slice to be registered to form a registration slice sequence; when a preset number of qualified candidate registration results occur consecutively, registration is stopped and an anomaly indication message is output. In the extreme case of no qualified candidates, usable registration results can still be generated to avoid process interruption, while boundary conditions are set for continuous anomalies to prevent the long-term propagation of error states.

[0078] In the above embodiments, the absence of candidate registration results with pixel correlation not less than a preset threshold can be understood as follows: for the current slice to be registered, all candidate registration results generated under multi-reference conditions fail the threshold screening, making it difficult to directly determine a reliable registration result from the candidate set. This situation may be caused by factors such as local tearing, folding, missing, severe deformation, abnormal staining background, or scanning artifacts in the slice, making it impossible for the tissue foreground mask overlap to reach the preset threshold. To avoid the direct loss of this layer causing interruption of the registration slice sequence, a compensatory registration method is used to generate alternative results.

[0079] The geometric transformation parameters characterize the spatial transformation required to align the slice to be registered to the reference slice. These parameters can be consistent with the transformation estimation results in the aforementioned registration process, for example, corresponding to a set of parameters for homography, affine transformation, or other forms of two-dimensional geometric transformation. Registered slices adjacent to the slice to be registered can be understood as slice layers that have already been registered in the layers before and after the slice to be registered. Their corresponding geometric transformation parameters can be obtained and saved by these adjacent layers during their respective registration processes for subsequent use.

[0080] During the interpolation estimation stage, the interpolation geometric transformation parameters of the slice to be registered are estimated based on the geometric transformation parameters of adjacent registered slices, ensuring that the estimation result maintains continuity with adjacent layers in terms of spatial variation. Interpolation can be performed hierarchically, for example, using linear interpolation with the transformation parameters of adjacent layers before and after the slice to be registered, or using extrapolation when only one-sided adjacent layers exist, so that the obtained interpolation geometric transformation parameters can reflect the smooth variation characteristics of sequence registration within a local range. After the interpolation estimation is completed, the interpolation geometric transformation parameters are applied to the slice to be registered, and geometric transformation and resampling are performed to obtain the compensated registration result. This compensated registration result is then incorporated into the registration slice sequence as the registration result of the slice to be registered, thereby ensuring that the sequence still has continuous hierarchical output at that layer.

[0081] To avoid introducing too many unreliable results under persistent anomalies during compensation registration, a termination condition is set where a preset number of consecutive registration results without qualified candidate data are encountered. This preset number can be determined based on the stability and fault tolerance requirements of the actual application. When no qualified candidates are found in several consecutive layers, it usually indicates a systematic anomaly in data quality or sequence status. Continuing interpolation compensation at this point may lead to accumulated biases and affect the quality of subsequent reconstructions. Therefore, when this consecutive number condition is triggered, registration processing for subsequent layers is stopped, and an anomaly indication message is output to prompt the need to check and address the slice data quality, threshold settings, or input sequence integrity. Through this compensation and termination mechanism, clear boundaries for anomaly control can be provided while ensuring process continuity, making the sequence registration process more robust and implementable.

[0082] In another embodiment of the present invention, the semantic segmentation performs pixel-level classification of the registered slice sequence using an image segmentation neural network to output a target structure mask sequence. The training data of the image segmentation neural network is constructed using data augmentation, including cropping image blocks from the slice images, performing random augmentation on the image blocks, pasting them onto a blank canvas of a preset size to generate augmented training images, and cropping the augmented training images into training image blocks as training data. The voxelization stacking includes mapping the target structure mask sequence to a voxel grid based on the interlayer spacing and outputting the three-dimensional reconstruction result. Based on the current implementation scheme, after sequence registration is completed, the target structure on the two-dimensional slice is stably extracted in the form of a mask sequence and further mapped to three-dimensional space according to the real interlayer scale, realizing a three-dimensional reconstruction output that can be used for overall display and sub-structure display.

[0083] In this example, the input for semantic segmentation is a sequence of registered slice images. A neural network is used to perform pixel-level classification on the input image, assigning each pixel location a corresponding structural category, thus obtaining a target structure mask sequence that corresponds one-to-one with each slice. The target structure can be one or more fine structures in an organ or tissue. When multiple structural categories exist, the mask can be output in a multi-category label format or a multi-channel format, allowing different structures to be distinguishably represented within the same mask layer. Because segmentation is performed on the registered sequence, the mask sequence has better spatial consistency between layers, which is beneficial for maintaining structural continuity during subsequent 3D stacking.

[0084] To improve the adaptability of the image segmentation neural network to different samples, morphologies, and imaging differences, the training data is constructed using data augmentation. Image patches are obtained by cropping from sliced ​​images, which means extracting local regions containing tissue structures from the training sliced ​​images at a preset size to form the basic units of the training samples. The cropping location can cover typical areas such as structural boundaries, structural interiors, and background neighborhoods, enabling the network to learn the discriminative features between structured and unstructured data. The random augmentation process applies random perturbations to the image patches to expand the distribution of training samples and improve generalization ability. Augmentation can include, but is not limited to, changes in brightness, contrast, color perturbation, rotation, flipping, scaling, or noise perturbation, allowing the network to maintain stable segmentation performance even when faced with different staining intensities, different scanning conditions, or local artifacts.

[0085] Image blocks are pasted onto a blank canvas of a preset size to generate enhanced training images. These images are then combined according to preset rules onto a canvas of a uniform size to form new training images. This allows for greater variation in the spatial layout, structural proportions, and background distribution of the training samples, thereby improving the network's adaptability to complex scenes. The blank canvas can be an image plane with a preset resolution. Image blocks can be pasted at random positions or arranged according to a preset strategy. The overlap between image blocks or the boundary transition method can be controlled as needed to ensure that the generated images still possess learnable structural features. Subsequently, the enhanced training images are cropped into training image blocks as training data, ensuring that the input format during the training phase matches the network's expected input size, while simultaneously expanding the enhanced samples.

[0086] After training, the trained image segmentation neural network is applied to register the slice sequence, outputting a corresponding target structure mask for each slice layer, forming a target structure mask sequence. This mask sequence then enters the voxel stacking stage. Mapping the target structure mask sequence to a voxel grid based on the layer spacing means placing each mask layer at its corresponding 3D height position according to the spatial interval between adjacent slices, and forming a voxel grid representation in 3D space at a preset voxel resolution. For multi-class masks, label information of different structures can be retained in the voxel grid, thus obtaining multi-label 3D data. The final output is a 3D reconstruction result, which can be used for the 3D display of the entire tissue, or for the display and analysis of individual structures based on structural labels.

[0087] The semantic segmentation and 3D reconstruction steps of this invention will be further explained below with a specific implementation process. In this example, the labeled images are used to train the image segmentation model, where the labeled images are the pre-registration images, and the images used for image segmentation training are the post-registration images.

[0088] Before training, data augmentation is performed on the labeled data to increase the amount of training data and balance the proportions between categories. The principle of data augmentation is to crop small image patches containing the annotated categories from the original data and randomly augment them, such as by performing simple flipping or rotation, before pasting them onto a generated large blank canvas. Figure 5 As shown in the figure, this provides a typical example of data augmentation processing, illustrating how to construct training data by augmenting slice image samples during the training of an image segmentation neural network, thereby improving the model's stability in recognizing fine structures in different organ samples and under different stained backgrounds. Figure 5 In the image, the upper image corresponds to the hepatopancreatic ampulla, and the lower image corresponds to the testis. The left side shows the sample image obtained after data augmentation, and the right side shows multiple training input image patches obtained by further cropping the augmented sample. This example intuitively demonstrates that, while maintaining the recognizability of tissue morphology and structural texture features, the augmented sample exhibits richer variations in local texture, background interference, orientation, and structural proportion, thus enabling the generation of a larger number and wider distribution of training inputs from a limited set of original slice samples.

[0089] Based on the processing shown in the figure, the key to data augmentation lies in two steps: First, cropping the local regions containing tissue structures in the sliced ​​image to obtain image patches, and applying random augmentation processing to the image patches, such as orientation transformation (rotation, flipping), intensity transformation (brightness and color perturbation), and appropriate noise and blur perturbation, so as to expand the representation of the same structure under different imaging conditions; Second, pasting the randomly augmented image patches into a blank canvas according to a preset size to form augmented training images, and then cropping training image patches with the same size as the network input from the augmented training images. Figure 5 The multiple small images on the right illustrate the results of the second step, namely that the same enhanced training image can be cropped into multiple input samples, so that the training data differs in spatial location, structural combination and background ratio, thus avoiding the training samples being too homogeneous.

[0090] pass Figure 5 The data augmentation methods shown can significantly increase the amount of training data and sample diversity, reduce the model's dependence on a small number of original samples, and alleviate overfitting. On the other hand, they can improve the model's adaptability to small target structures, weak boundary structures, and background noise changes, enabling the model to output target structure mask sequences more stably when performing pixel-level classification on registered slice sequences, thereby providing more reliable segmentation input for subsequent voxel stacking and 3D reconstruction based on layer spacing.

[0091] This example uses the SegNext segmentation network model to train and predict augmented image patches, improving the model's recognition accuracy and efficiency. This model is a convolutional neural network architecture specifically designed for semantic segmentation tasks, offering high performance and low complexity. The overall model structure is a composite encoder-decoder architecture, as shown below. Figure 6 The overall architecture of the SegNext network segmentation model shown in (a) and as follows Figure 6 The multi-scale convolutional attention network encoder structure (MSCAN) shown in Figure (b) contains four stages of MSCAN modules. The decoder fuses the features of the last three stages by channel splicing, then models the features by matrix factorization and other methods, and finally achieves image segmentation through a classifier.

[0092] Traditional Transformer-based models (such as SETR and SegFormer) employ a self-attention mechanism, which suffers from high computational complexity and loss of detail. The encoder MSCAN, however, uses a convolution-based, efficient attention mechanism, MSCA, to replace the Transformer's self-attention. Figure 6 The Multi-Scale Convolutional Attention (MSCA) mechanism shown in (c) specifically involves the MSCA model first using a 5×5 depthwise convolution to aggregate local information, then using convolution kernels of different sizes to extract multi-scale features, and then using a 1×1 convolution to generate attention weights. Finally, the generated attention weights are multiplied element-wise with the input features, thereby enhancing key regions with multi-scale convolutional attention.

[0093] The training loss function uses the commonly used cross-entropy loss function to quantify the difference between the probability distribution predicted by the model and the true label distribution. Model accuracy is validated using the mIoU and mAcc metrics.

[0094] The mIoU metric measures the degree of overlap between the predicted and ground truth regions, while the mAcc metric is the average pixel-level accuracy for each category, reflecting the model's classification ability for each category. The corresponding calculation formulas are: ; ; in This represents the number of pixels in category i that were correctly predicted. This represents the number of pixels of all true class i. This represents the total number of pixels predicted as category i, where k+1 is the number of categories (category indices start from 0).

[0095] Finally, the model with the best mIoU index on the validation set (mAcc index helps to judge the balance of each category) was selected to perform semantic segmentation on the registered slice sequence.

[0096] like Figure 7 As shown, typical examples of semantic segmentation results for two types of organ samples after sequence registration are presented, with the upper example being the hepatopancreatic ampulla and the lower example being the testis. Each set of examples, from left to right, includes: a digital slice scan image, the network segmentation result image of all structures, and the segmentation result image of a specific target structure. The digital slice scan image is used to display the original morphology and structural background; the network segmentation result image of all structures is used to demonstrate the multi-structure segmentation effect obtained by the image segmentation neural network after pixel-level classification of the slice image; the segmentation result image of a specific target structure is used to demonstrate the effect of presenting a single category separately based on multi-structure segmentation. In the hepatopancreatic ampulla example, the segmentation result of the sympathetic nervous system is highlighted, and in the testis example, the segmentation result of nerves and blood vessels is highlighted. By outputting a target structure mask on the registered slice sequence, the positional relationship of the target structure between adjacent slices can be made more consistent, which is beneficial for subsequent voxelization and stacking of the mask sequence according to the layer spacing, thus maintaining structural continuity in 3D reconstruction. Simultaneously, this example shows that the segmentation output can be used for the joint expression of the overall structure as well as for the individual extraction and display of fine structures, providing a stable input foundation for the construction of multi-label 3D models.

[0097] like Figure 8The figure shows typical examples of 3D reconstruction and display after obtaining the registered slice sequence and its target structure mask sequence, corresponding to the 3D reconstruction effects of the hepatopancreatic ampulla and testis samples, respectively. This figure visually illustrates that after registration and semantic segmentation are completed at the 2D level, the spatial positional relationship of the target structure between adjacent slices is unified into the same coordinate system. Then, the mask sequence can be mapped to 3D space according to the layer spacing and stacked in voxels to obtain a 3D voxel grid or multi-label 3D data with spatial scale significance. Through this 3D reconstruction result, the overall morphology and structural orientation of the tissue can be observed from a 3D perspective, and the overall tissue display and individual structure display can be achieved based on structural labels. Specifically, the 3D reconstruction takes the target structure mask sequence as input, maps each mask layer to its hierarchical position in the sequence, and calibrates the interval between adjacent layers in 3D space according to the layer spacing, so that the reconstruction result can reflect the actual slice thickness or interlayer span. For mask sequences obtained from multi-class segmentation, the label information of different classes can be retained during voxel stacking, thus forming a multi-label 3D model. During the demonstration, the entire structure can be rendered as a whole, or one or more structures can be selected by label for individual display and comparative observation. In this way, the hepatopancreatic ampulla sample example can present the spatial distribution and extension relationship of related target structures in three-dimensional space, and the testicular sample example can present the three-dimensional orientation and relative positional relationship of structures such as nerves and blood vessels, which facilitates intuitive analysis of the spatial relationship of multiple structures within the tissue.

[0098] pass Figure 8 The results show that the three-dimensional reconstruction method of the present invention combines the spatial alignment capability of sequence registration with the structural extraction capability of semantic segmentation in the same process. Registration ensures the continuity of cross-layer structures, segmentation provides structured mask inputs that can be stacked, and voxel stacking transforms two-dimensional mask sequences into three-dimensional spatial representations, thereby improving the usability of three-dimensional reconstruction results in terms of structural continuity, label distinguishability, and display flexibility.

[0099] It should be understood that the above embodiments are preferred embodiments of the present invention and are used to illustrate the present invention, not to limit the present invention. Those skilled in the art can make various modifications, equivalent substitutions, or variations to the above embodiments without departing from the spirit and substance of the present invention, and all such modifications, substitutions, or variations should fall within the protection scope of the present invention. The protection scope of the present invention should be determined by the content defined in the claims.

Claims

1. A method for AI-based three-dimensional reconstruction of human organ tissue slices, characterized in that, The method includes: Acquire and preprocess consecutive slice sequences of images of the same organ tissue to generate a tissue foreground mask; For each slice to be registered, the registered slices of the preceding preset number of layers are used as reference slices. The correspondence is obtained through the feature matching neural network, the homography matrix is ​​estimated, and geometric transformation is performed on the slice to be registered to generate candidate registration results. Pixel correlation is calculated based on the tissue foreground mask corresponding to the candidate registration results and the tissue foreground mask corresponding to the reference slice; Candidate registration results with pixel correlation not less than a preset threshold are retained. The candidate registration result with the highest pixel correlation is selected as the registration result of the slice to be registered. The registration results corresponding to each slice to be registered form a registration slice sequence. The target structure mask sequence is obtained by semantic segmentation of the registered slice sequence, and the mask sequence is stacked by voxelization according to the interlayer spacing to output the three-dimensional reconstruction result.

2. The AI ​​three-dimensional reconstruction method for human organ and tissue slices as described in claim 1, characterized in that, The process of obtaining sequential slice images of the same organ tissue includes: performing sequential slicing of the same organ tissue, staining and scanning each slice to obtain slice sequence images, establishing a sequence identifier for each slice sequence image, obtaining the slice thickness of each slice, and determining the interlayer spacing based on the slice thickness.

3. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1 or 2, characterized in that, The preprocessing includes at least one of the following: downsampling, denoising, brightness and color normalization, background suppression, cropping to obtain a target image containing tissue regions, and format conversion.

4. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1 or 2, characterized in that, The process of generating the tissue foreground mask includes: performing threshold segmentation on the preprocessed slice sequence image to obtain the tissue foreground region, and performing morphological processing on the tissue foreground region to remove isolated noise or fill holes, thereby obtaining the tissue foreground mask.

5. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1, characterized in that, The pre-registered slices of the preset number of layers are at least three consecutive pre-registered slices preceding the slice to be registered; the slice to be registered generates candidate registration results with each pre-registered slice as a reference slice, and calculates the pixel correlation between the tissue foreground mask of each candidate registration result and the tissue foreground mask of the corresponding reference slice, and selects the candidate registration result with the largest pixel correlation as the registration result of the slice to be registered; When there are fewer than three registered slices preceding the slice to be registered, the existing registered slices are used as reference slices to generate candidate registration results.

6. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1, characterized in that, The calculation of pixel correlation includes: converting the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice into binary matrices, respectively. The binary matrices are used to characterize whether the corresponding pixel position belongs to the tissue foreground or the background; counting the number of pixels in the two binary matrices that are simultaneously foreground at the corresponding pixel position as the intersection pixel count, and counting the number of pixels in the two binary matrices that are at least one foreground at the corresponding pixel position as the union pixel count; and determining the pixel correlation by the ratio of the intersection pixel count to the union pixel count.

7. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1 or 6, characterized in that, The calculation of pixel correlation includes: dividing the tissue foreground mask corresponding to the candidate registration result and the tissue foreground mask corresponding to the reference slice into no less than two local regions, calculating the local pixel correlation of each local region, and weighting and aggregating all local pixel correlations to obtain the global pixel correlation, and using the global pixel correlation as the pixel correlation between the two tissue foreground masks. The weights of the weighted aggregation are determined based on the proportion of foreground pixels in the corresponding local region or the proportion of the local region area.

8. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1, characterized in that, The determination of the preset threshold includes: for the target organ tissue type or section staining method, selecting a preset number of benchmark section samples or candidate registration result samples to calculate the corresponding pixel correlation, and determining the preset threshold based on the statistical distribution characteristics of the pixel correlation; when the organ tissue type or section staining method changes, the preset threshold is re-determined based on the changed samples.

9. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1, characterized in that, It also includes establishing a reference confidence level for the reference slice, which is determined by the maximum pixel correlation corresponding to the process of generating the registration result of the reference slice itself, or by the statistical characteristics of the pixel correlation obtained by the reference slice participating in the registration process of subsequent slices to be registered; When the pixel correlation difference of the retained candidate registration results is less than the preset difference or there are ties for the maximum value, the evaluation value is calculated by using the pixel correlation of the candidate registration results and the reference confidence of their corresponding reference slices, and the registration result of the slice to be registered is determined by the evaluation value.

10. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1, characterized in that, Also includes: When there are no candidate registration results with pixel correlation not less than a preset threshold, interpolation estimation is performed based on the geometric transformation parameters corresponding to the registered slices adjacent to the slice to be registered to obtain interpolated geometric transformation parameters. Then, geometric transformation is performed on the slice to be registered according to the interpolated geometric transformation parameters to generate a compensated registration result. The compensated registration result is used as the registration result of the slice to be registered to form a registration slice sequence. When a preset number of registration results for which no qualified candidate is found occur consecutively, registration is stopped and an error indication message is output.

11. The AI ​​three-dimensional reconstruction method for human organ tissue slices as described in claim 1, characterized in that, The semantic segmentation uses an image segmentation neural network to perform pixel-level classification on the registered slice sequence to output a target structure mask sequence; the training data of the image segmentation neural network is constructed using data augmentation, including cropping image blocks from slice images, performing random augmentation on the image blocks and pasting them onto a blank canvas of a preset size to generate augmented training images, and cropping the augmented training images into training image blocks as training data. The voxelization stacking includes mapping the target structure mask sequence into a voxel mesh based on the interlayer spacing and outputting the three-dimensional reconstruction result.