Semi-supervised image processing method based on boundary-aware task network

By using BATNet's dual-backbone architecture and boundary guidance module, the problems of data scarcity and boundary ambiguity in blastocyst image segmentation are solved, achieving more efficient feature capture and improved boundary accuracy, and significantly enhancing adaptability and generalization ability.

CN122243871APending Publication Date: 2026-06-19ANHUI UNIVERSITY OF TRADITIONAL CHINESE MEDICINE +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ANHUI UNIVERSITY OF TRADITIONAL CHINESE MEDICINE
Filing Date
2026-02-06
Publication Date
2026-06-19

Smart Images

  • Figure CN122243871A_ABST
    Figure CN122243871A_ABST
Patent Text Reader

Abstract

This invention relates to a semi-supervised image processing method based on a boundary-aware task network, comprising the following steps: inputting a medical image; using multiple independent backbone networks with non-shared weights to extract features from the medical image X and generate prediction results; inputting the features output from multiple consecutive layers of each backbone network into their respective boundary guidance modules for processing to obtain binary boundary probability maps for segmenting different types of tissues in the medical image X; the boundary guidance module includes an intra-class feature enhancement module, a multi-scale feature fusion module, and a boundary extraction module; the intra-class feature enhancement module establishes a correspondence between categories and channels and dynamically adjusts channel weights; the multi-scale feature fusion module recalibrates features through a channel attention mechanism; and the boundary extraction module accurately extracts boundary information by integrating multi-scale feature extraction, adaptive weighting, and global context integration.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a technique for image processing of medical images using deep learning methods, specifically a semi-supervised image processing method based on a boundary-aware task network for semi-supervised blastocyst image segmentation. The boundary-aware task network employs a dual-backbone architecture and a non-shared weight mechanism to enhance adaptability and prevent overfitting. Background Technology

[0002] In the field of assisted reproduction, the degree of differentiation of each component of the embryo is a core indicator for measuring the embryonic development process, while histological morphology is an important basis for judging whether an embryo has the potential for transplantation. When a fertilized egg develops to the blastocyst stage, its image shows a complex nested multi-layered tissue structure—from the inside out, the inner cell mass (ICM), blastocoel, trophoblast (TE), and zona pellucida (ZP). The accurate identification and segmentation of these tissue structures is not only a core prerequisite for revealing the mechanism of embryonic development, but also a key technical support for optimizing the assisted reproductive technology process and improving the success rate of embryo transfer.

[0003] The rise of deep learning has brought revolutionary breakthroughs to the field of medical image analysis. However, the performance of deep learning models is highly dependent on large-scale, high-quality labeled data, and blastocyst image annotation faces three core challenges: First, embryonic tissue has inherent transparency, and the Huffman modulation contrast (HMC) optics commonly used in clinical image acquisition can cause tissues at different depths of field in the same blastocyst image to exhibit different characteristics, resulting in blurred tissue boundaries. Second, manually delineating embryonic structures requires a significant amount of time and effort, leading to extremely high annotation costs. Third, the annotation process is susceptible to subjective judgment (such as differences in boundary definitions among different experts) and objective interference (such as background noise), which not only leads to ambiguity in the annotation results but also further exacerbates the scarcity of high-quality labeled data.

[0004] To address the problem of data scarcity, semi-supervised segmentation methods have become a research hotspot. Their core idea is to train models by combining a small amount of labeled data with a large amount of unlabeled data. Taking mean-teacher (MT) methods as an example, these methods employ a two-branch "teacher-student" framework and utilize strategies such as consistency constraints and exponential moving averages (EMA) to mine latent feature patterns in unlabeled data. However, the complexity of blastocyst images—including tissue transparency, significant background noise, large depth-of-field variations, and blurred boundaries—poses a significant challenge. In this situation, MT methods, relying solely on feature consistency learning paradigms, struggle to accurately capture the dynamic morphological changes of the embryo, ultimately limiting their segmentation performance.

[0005] While existing research attempts to improve segmentation accuracy through boundary-related tasks, such as: outcome-oriented methods, such as ELKPPNet, which directly applies edge detection to the segmentation results through boundary supervision, they suffer from insufficient boundary loss compensation; input-oriented methods, which use edge detection results as input to the segmentation network, such as EDN, ESNet, and HED-PSPNet, suffer from weak feature coupling and high model complexity due to the relative independence of edge detection and segmentation tasks; and intermediate feature-oriented methods, which share hierarchical intermediate features between semantic segmentation and edge detection, are limited by low utilization of multi-scale features and still have limitations in integrating the above types of comprehensive boundary extraction.

[0006] Furthermore, blastocyst images not only exhibit significant inter-class ambiguity (manifested as overlapping boundaries between different tissues), but also show obvious intra-class differences due to the uneven morphological features of different regions within the same tissue. Directly applying traditional boundary extraction methods still cannot achieve ideal results, often leading to problems such as blurred and incomplete boundaries.

[0007] While the aforementioned methods have achieved some success, they typically focus on feature-level consistency and may not be able to definitively address domain-specific anatomical challenges prevalent in medical images—such as boundary blurring, scale confusion, and complex morphological variations characteristic of blastocyst images. Furthermore, the perturbations employed by these methods are often generalized and may not optimally guide the model to actively learn these crucial task-specific features from unlabeled data. Summary of the Invention

[0008] To address the aforementioned technical problems, this invention provides a semi-supervised image processing method based on a boundary-aware task network, characterized by comprising: Step S100, input the medical image as follows ,in and These represent the height and width of the image, respectively. Indicates the number of channels; Step S200: Use a backbone network to extract features from the medical image X and generate prediction results; Step S300: The features output from multiple consecutive layers of each backbone network are input into their respective boundary guidance modules for processing to obtain binary boundary probability maps for segmenting different types of tissues in medical image X. The boundary guidance module includes an intra-class feature enhancement module, a multi-scale feature fusion module, and a boundary extraction module, which are processed sequentially. The intra-class feature enhancement module is used to establish the correspondence between categories and channels and dynamically adjust the channel weights according to the predicted category probability map. The multi-scale feature fusion module adaptively processes input features of different spatial dimensions and recalibrates the features through a channel attention mechanism. The boundary extraction module accurately extracts boundary information by integrating multi-scale feature extraction, adaptive weighting, and global context integration.

[0009] In the above technical solution, each backbone network can output pixel-level segmentation prediction results for different types of tissues in medical image X.

[0010] In the above technical solution, the processing steps of the intra-class feature enhancement module include: step S311, processing the input pixel-level segmentation prediction result. Interpolation is used to adapt feature maps at various scales. The spatial dimension yields the aligned probability map. ; where each scale feature map Classified as Channel group Step S312: Adaptive weights are used. By applying element-wise multiplication to its corresponding category-specific channel group To obtain enhanced features; step S313, stitch all the enhanced channel groups together to reconstruct a full-size feature map.

[0011] In the above technical solution, the processing steps of the multi-scale feature fusion module include: step S321, fusion of the input multi-scale full-size feature map using bilinear interpolation. Adjust to target size The adjusted two-dimensional feature map is obtained. Step S322, utilize learnable weights The adjusted features are then weighted and fused to obtain the final fused features. Step S323: Generate channel weights through a channel attention mechanism. ,right Perform channel-level reweighting, and finally output .

[0012] In the above technical solution, the processing steps of the boundary extraction module include: step S331, using parallel convolutional layers to extract multiple features, with convolutional kernel sizes of respectively... , and , used from Extract multi-scale local features; concatenate the obtained features along the channel dimension; in step S332, generate channel attention weights through the adaptive weight generation submodule. Obtain weighted features Step S333, using boundary probability weights The binary boundary probability map was calculated. .

[0013] In the above technical solution, the adaptive weight generation submodule includes an adaptive average pooling layer AdaAvol, Convolutional layer, ReLU activation function layer, another Convolutional layers and sigmoid function layers; the generated weights are used to enhance the channels most relevant to boundary formation, thereby suppressing irrelevant noise.

[0014] In the above technical solution, the boundary probability weight The way to obtain it is to Two pooling feature descriptors are generated by applying average pooling and max pooling respectively. These descriptors are then concatenated and processed... The convolution and sigmoid function processes are used to obtain the boundary probability weights. .

[0015] Compared with the prior art, the present invention achieves the following beneficial effects: To effectively utilize unlabeled blastocyst data, this invention proposes BATNet—a dual-branch boundary guidance framework employing non-shared weights. Through its dual-branch design and the introduction of an explicit boundary awareness task, the model successfully captures richer morphological representations, significantly improving segmentation integrity and boundary accuracy. To better capture the complex features of blastocyst tissue, this invention designs an intra-class feature enhancement module (IFEM) to amplify hard features using Gaussian mapping, and a multi-scale feature fusion module (MSFFM) to fuse multi-scale features through an attention mechanism; these modules together enhance the comprehensiveness and discriminative power of the learned features. To proactively acquire boundary-specific knowledge from unlabeled data, this invention proposes a boundary extraction module (BEM), shifting the feature utilization paradigm from passive adaptation to active guidance. This ensures enhanced ability to depict blurred boundaries and robust learning of multi-scale feature associations, thereby improving the accuracy and structural continuity of blastocyst segmentation. Extensive experiments on blastocyst datasets demonstrate that the method of this invention outperforms other competing methods. Although the method of this invention is based on the technical requirements of blastocyst image segmentation, this invention also extends the method to left atrial datasets, proving its universality and effectiveness in image processing of medical images. Attached Figure Description

[0016] Figure 1 A comparative diagram of the standard MT framework and the BATNet architecture of this invention; Figure 2 This is a schematic diagram of the overall architecture of the Boundary Awareness Task Network (BATNet) of this invention; Figure 3 This is a schematic diagram of the structure of the Multi-Scale Feature Fusion Module (MSFFM) in the BATNet architecture of the present invention; Figure 4 This is a schematic diagram of the boundary extraction module (BEM) in the BATNet architecture of the present invention; Figure 5 This is a schematic diagram of the intra-class feature enhancement module (IFEM) in the BATNet architecture of the present invention; Figure 6 This is a visualization comparison of the results of the method of this invention with other cutting-edge methods in the ICM segmentation task; Figure 7 This is a visualization comparing the ablation results of the method of this invention with other cutting-edge methods in the ICM segmentation task; Figure 8 This is a visualization of the pixel-level segmentation results and boundary feature changes of the ICM segmentation task under the BATNet framework of this invention in different training cycles; Figure 9 This is a qualitative visualization comparison of the results of BATNet of this invention with other cutting-edge semi-supervised methods in left atrial segmentation. Detailed Implementation

[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] To address the aforementioned issues, this invention proposes a boundary-aware task network (BATNet) for semi-supervised blastocyst image segmentation. This network incorporates the operational characteristics of clinical practice where doctors focus on observing the boundaries of blastocyst images. Figure 1 As shown on the right, BATNet introduces a non-shared weights (NSW) mechanism and a dual-backbone architecture; by independently optimizing the feature extraction process in the two branches, compared to MT-type methods (such as... Figure 1As shown on the left, through dual-branch collaborative learning, the inherent distribution of data is fitted using feature consistency, thereby achieving effective utilization of unlabeled data. NSW learning can prevent the model from overfitting to a single data distribution and enhance its adaptability to complex tissue features. In particular, combined with clinical features, this invention designs an explicit boundary-aware learning task to guide the model to actively understand the tissue characteristics of blastocyst images. At the same time, this task explicitly utilizes the feature value of unlabeled data, enhancing the model's generalization ability. Specifically, the boundary-aware learning task integrates three core modules: given that tissues at different depths of field exhibit differentiated features (leading to blurred tissue boundaries), this invention proposes an intra-class feature enhancement module (IFEM), which enhances "hard features" (such as blurred boundaries and low-contrast regions) during the decoding process through Gaussian function mapping; to more effectively alleviate scale ambiguity caused by depth-of-field changes, a multi-scale feature fusion module (MSFFM) is proposed, which integrates IFEM enhancement features at different scales using an attention mechanism; and a boundary extraction module (BEM) designed based on fused features provides explicit boundary constraints for the segmentation task. Through the collaborative design of "architecture-task-module", this invention provides richer and more discriminative feature support for blastocyst image segmentation in semi-supervised segmentation workflow, effectively overcoming the limitations of traditional methods in complex blastocyst image scenarios.

[0019] like Figure 2 As shown, the Boundary Aware Task Network (BATNet) model first uses two independent backbone networks with non-shared weights (NSW). and Feature extraction and These features are then processed by the Boundary Guidance Module (BGM), which includes an Intra-Class Feature Enhancement Module (IFEM), a Multi-Scale Feature Fusion Module (MSFFM), and a Boundary Extraction Module (BEM) to generate a boundary prediction map. . This is the final segmentation probability map. The model uses... and The boundary-aware self-distillation mechanism between them is optimized.

[0020] The overall architecture of the BATNet model for semi-supervised blastocyst image segmentation aims to address common problems in existing semi-supervised blastocyst segmentation methods, including blurred boundaries, scale confusion, and insufficient tissue feature discrimination. Specifically, BATNet integrates two key innovative designs within a unified framework: First, it adopts a dual-backbone network architecture with non-shared weights (NSW), enhancing the feature discrimination capability for complex blastocyst tissues through independent feature extraction in both branches; second, it customizes a boundary-aware task module for the characteristics of blastocyst images. This module aligns with the diagnostic logic of clinicians focusing on blastocyst boundaries, striving to explore and utilize boundary features.

[0021] In a single basic backbone network, BATNet decomposes the blastocyst segmentation task into two collaborative sub-tasks: pixel-level tissue region segmentation and boundary-level feature optimization. This design not only enhances the model's ability to capture fine-grained boundary details (crucial for distinguishing adjacent tissues such as TE and blastocoel, ICM and blastocoel), but also feeds back boundary features to improve region segmentation accuracy and reduce tissue misclassification caused by boundary ambiguity. Specifically, the dual-backbone network employing non-shared weights allows the model to learn differentiated features from labeled and unlabeled data separately—the labeled data branch focuses on learning accurate tissue category mappings, while the unlabeled data branch explores general boundaries and structural patterns. This design effectively avoids overfitting to a single data distribution and improves its adaptability to blastocysts at different developmental stages and with different morphologies. Furthermore, the boundary-aware task module provides precise boundary guidance for the segmentation task through collaborative operations of feature enhancement, multi-scale fusion, and boundary extraction: first, "hard features" (such as low-contrast boundaries) are enhanced during decoding; then, multi-scale boundary information is integrated; and finally, explicit boundary constraints are generated to provide precise boundary guidance for the segmentation task.

[0022] NSW Backbone Network Let the input blastocyst image be represented as ,in and These represent the height and width of the image, respectively. Indicates the channel number. For the th... A backbone network The feature extraction process can be described as follows: (1) in Refers to the first A backbone network, Indicates the first The set of learnable parameters for a backbone network. The scale identifier representing the output feature (when counting from back to front, this identifier specifically corresponds to the first node of the backbone network). (Feature scale of layer output). This represents the extracted feature map. Specifically, and It refers to the spatial dimensions of the feature map (such as height and width). This refers to the number of feature channels (such as the number of color channels).

[0023] Due to the non-shared weight mechanism, the two parameters of the dual backbone network... and Mutually independent, that is For a single backbone network (e.g., in an NSW dual-backbone architecture) or ), in extracting high-dimensional feature maps Subsequently, it can simultaneously generate two complementary predictions: a pixel-level semantic segmentation prediction for blastocyst tissue categories, and a boundary attribute prediction emphasizing tissue edge features. This boundary attribute prediction is not directly derived from the original features, but is obtained through refinement by the Boundary Guiding Module (BGM).

[0024] The pixel-level prediction branch uses a series of convolutional layers and batch normalization layers to process the feature map. Mapped to the organizational category space, and then processed by the Softmax activation function to obtain the category probabilities. Let... This represents the pixel-level segmentation prediction result (upsampled to the original image resolution). ),in This represents the number of blastocyst tissue types (e.g., when it is necessary to separate four types of tissue: ZP, TE, ICM, and blastocoel). The prediction process can be described as follows: (2) in Indicates that it has learnable parameters The segmented convolutional blocks.

[0025] The boundary attribute prediction branch first processes the feature map through the three components of the boundary guidance module. Generate a binary boundary probability map : (3) in , This represents the set of learnable parameters for the background music (BGM). Indicates that it has learnable parameters The convolutional blocks are used to extract explicit boundary features from multi-scale features.

[0026] Boundary Bootstrap Module (BGM) To address the issues of insufficient boundary feature discriminability and severe background interference in multi-scale feature fusion in existing technologies, this invention proposes a boundary guidance module (BGM), which integrates three core components—an intra-class feature enhancement module (IFEM), a multi-scale feature fusion module (MSFFM), and a boundary extraction module (BEM)—to progressively enhance the discriminability of boundary-related features while suppressing irrelevant background information, thereby laying a solid foundation for accurate boundary perception tasks.

[0027] Intra-class Feature Enhancement Module (IFEM) Traditional feature extraction methods in the prior art treat all channels equally, ignoring the differences in the contribution of different channels to the representation of a specific category. To overcome this limitation, this invention proposes an intra-class feature enhancement module (IFEM), which can dynamically adjust channel weights based on the predicted class probability map (e.g., ...). Figure 5 (As shown). By adaptively enhancing key category-related features, IFEM effectively improves the discriminative power of feature representations in downstream segmentation tasks.

[0028] Specifically, due to the differences in resolution between the feature maps extracted by the backbone network, the category probability maps... Interpolation is required to adapt the feature maps to different scales. The spatial dimension. For a two-dimensional feature map, the aligned probability map. (with the first) Layer-scale feature map Matching, with a resolution of The result was obtained through bilinear interpolation: (4) To establish an accurate "category-channel" correspondence, feature maps at each scale... Classified as Channel group The grouping rule is defined as follows: (5) in This represents the number of channels in each group within the first C-1 categories, the number of channels in the first category. The number of remaining channels for each category is This grouping strategy ensures that each category has its own dedicated feature representation channel, avoiding cross-interference between different categories.

[0029] To balance the information contribution of different confidence regions, this invention also designs a Gaussian nonlinear activation mechanism, used to adjust the alignment probability map. Calculate the channel weights. Weights The calculation formula is: (6) The final weights Falling Within the range. For high-confidence positive and negative samples, the weight approaches 1; for difficult samples, a weight of 1 is used. By focusing on a Gaussian function with a central parameter around 0.5, the system can concentrate on regions with moderate confidence, thereby increasing the learning weights.

[0030] To obtain enhanced features, adaptive weights By applying element-wise multiplication to its corresponding category-specific channel group This operation achieves channel-level adaptive modulation: higher weights amplify the contributions of features closely related to the target class (especially features from the medium-confidence region), while weights close to 1 maintain the integrity of reliable high-confidence features. The mathematical expression for this enhancement process is: (7) Where ⊙ represents element-wise multiplication.

[0031] Subsequently, all the enhanced channel groups were stitched together to reconstruct a full-size feature map: (8) in This indicates that a full-dimensional feature map is reconstructed by concatenating channels.

[0032] From a mathematical perspective, the overall transformation of IFEM can be expressed as: (9) in Indicates the first Each scale of feature refinement convolutional layer.

[0033] Multi-scale Feature Fusion Module (MSFFM) In boundary perception tasks, the effective fusion of multi-scale features is crucial for improving model performance. Features at different scales contain both rich semantic information and fine-grained details; however, due to differences in spatial dimensions, directly fusing these features can lead to information loss or imbalanced distribution. To address this issue, this invention proposes an MSFFM module (such as...). Figure 3 (As shown). This module can adaptively process input features of different spatial dimensions and recalibrate the features through a channel attention mechanism, thereby enhancing the model's ability to utilize multi-scale information.

[0034] For multi-scale enhancement features of the input The MSFFM module first adjusts it to the target size using bilinear interpolation. The adjusted two-dimensional feature map is obtained. Subsequently, learnable weights were utilized. ( The adjusted features are then weighted and fused to obtain the final fused features. : (10) Subsequently, channel weights are generated using a channel attention mechanism. (in (representing the number of channels for the merged feature), for Perform channel-level reweighting, and finally output Obtained by combining residual connections: (11) Boundary Extraction Module (BEM) After multi-scale enhanced feature fusion is completed and the final fused feature map is obtained, a boundary extraction module (BEM) is used to accurately extract boundary information, such as... Figure 4 As shown, this module completes the extraction task by integrating multi-scale feature extraction, adaptive weighting, and global context integration.

[0035] First, the multi-feature extractor employs parallel convolutional layers with kernel sizes of [sizes to be filled in]. , and , used from Multi-scale local features are extracted. Batch normalization (BN) and ReLU activation are applied after each convolution. Because boundaries differ at different spatial scales, small convolution kernels capture fine details, while large convolution kernels capture broader contextual information. The final features are then concatenated along the channel dimension. (12) in Represent a The convolution operation is followed by batch normalization (BN) and ReLU.

[0036] Next, the boundaries typically involve interactions between different feature channels (e.g., intensity gradients in some channels versus texture variations in other channels). This invention designs an adaptive weight generation submodule to generate channel attention weights. This structure includes adaptive average pooling (AdaAvol). Convolution, ReLU activation function, another Convolution and the sigmoid function are used. The generated weights enhance the channels most relevant to the boundaries, thus suppressing irrelevant noise. This yields weighted features. : (13) Subsequently Two pooling feature descriptors are generated using average pooling and max pooling. Average pooling captures the overall statistical trend of boundary-related features, while max pooling highlights the most salient boundary cues (such as sharp intensity changes). Combining these two descriptors integrates global statistical information with local salient features, which is crucial for distinguishing true boundaries from false boundaries. These descriptors are then concatenated and... The convolution and sigmoid function processes are used to obtain the boundary probability weights. Boundary probability diagram The calculation formula is: (14) in Represents matrix multiplication. Residual join (via...) The implementation retains the original feature information, ensuring that key boundary details are not lost during weight adjustment, thereby improving the clarity and accuracy of boundary extraction.

[0037] loss function Self-supervised mechanisms in learning from labeled data: For labeled data (and the actual segmentation label) and boundary labels (Pairing up) two backbone networks with independent, non-shared weights. and X is processed independently. Each network outputs a segmentation probability map after sigmoid activation. and boundary prediction map It achieves cross-network consistency constraints while maintaining the ability to autonomously capture features.

[0038] Supervision loss function It consists of three components, each addressing a key mission requirement: Segmentation cross-entropy loss: Combines the segmentation predictions of each network with... Alignment is used to ensure accurate organization and classification. (15) in express The segmentation probability map, This is the cross-entropy loss function.

[0039] Boundary Dice loss function: focused on Boundary foreground channel To handle sparse boundary pixels, a Dice coefficient is used for balance optimization: (16) in This is the Dice loss function.

[0040] Boundary Feature Consistency Loss: To achieve initial self-distillation on labeled data and lay the foundation for knowledge transfer to unlabeled data, this invention designs a boundary feature consistency loss. This loss function forces two independent backbone networks ( and To maintain consistency among the boundary prediction maps, ensure that their boundary-related feature representations are aligned. For For each segmentation category, the mean squared error (MSE) is used to calculate the pixel-wise difference between the two network boundary prediction channels: (17) in This represents the mean square error function.

[0041] There will always be losses due to oversight. The three components are integrated using weighted coefficients to balance task priorities: (18) in .

[0042] Self-distillation in learning from unlabeled data: For unlabeled data (its real label) and (It is unavailable), self-distillation becomes the core implicit monitoring signal. By utilizing and The independent weight design of the loss function guides the two networks to reach a consensus on the boundary feature representation, which replaces the explicit true labels to provide effective supervision.

[0043] Processing unlabeled data hour, and Independent output boundary prediction graph and (its structure and the labeled data) and (To maintain consistency). The self-distillation loss for unlabeled data is calculated using mean squared error (MSE) to determine the pixel-level difference between the two boundary prediction maps, ensuring that the network can mutually optimize boundary feature learning and capture generalizable boundary patterns. (19) It is worth noting that, Dynamic weights introduced in An adjustment strategy was adopted that gradually increases with each training epoch: in the early stages of training, the weight was set to a smaller value to ensure that the model preferentially learns from the supervised loss based on labeled data. The system learns through repeated training, gradually establishing stable basic capabilities in segmentation and boundary detection. In the later stages of training, this weight is gradually increased with each round to enhance performance. The self-distillation effect enables the model to transfer learned knowledge from labeled data to unlabeled data, further improving feature generalization performance.

[0044] Total loss with dynamic self-distillation weights: Finally, the total loss function It integrates supervised loss (for labeled data) and unsupervised self-distillation loss (for unlabeled data), utilizing... and Independent weight design enables collaborative learning: (20) During this process, while maintaining two networks (each outputting...) and Under the premise of weight independence (as in the boundary prediction graph), the overall loss function unifies its learning objective through a self-distillation mechanism, forming a closed-loop optimization paradigm of "supervised calibration - unsupervised generalization". This design is well adapted to scenarios with limited labeled data because it can maximize the value of unlabeled data while ensuring the accuracy of the model's core task, ultimately improving the overall performance of the model.

[0045] Experimental and test results The method was evaluated using a publicly available human blastocyst dataset containing 249 microscopic images annotated with realistic masks by the Pacific Reproductive Medicine Center. The annotations cover four structures: zona pellucida (ZP), trophoblast (TE), inner cell mass (ICM), and blastocoel. In the binary segmentation task, the focus was on segmenting the ICM, which is crucial for embryonic development; while in the multi-class classification task, all four structures were segmented. Input images were uniformly adjusted to... Pixel resolution.

[0046] To further evaluate the generalization ability of the model of this invention on structures with similar embryonic features, the method of this invention was also tested on the left atrium (LA) segmentation dataset. This dataset is widely used to evaluate semi-supervised segmentation performance and includes 100 three-dimensional gadolinium-enhanced magnetic resonance imaging (GE-MRI) images with manual annotation, using currently common data partitioning settings. The input image size was uniformly adjusted to [size missing]. .

[0047] The BATNet instance of this invention used in experiments and tests was implemented based on the PyTorch 1.10.0 framework and ran on an NVIDIA Tesla V100 GPU. To alleviate overfitting, data augmentation techniques were employed to enhance the diversity of training samples, including random scaling, random rotation, random scale changes, and brightness and contrast adjustments. Training was performed using the Adam optimizer, with a maximum training epoch set to 2000 epochs, an initial learning rate fixed at 0.0001, and a weight decay coefficient configured to 0.00005.

[0048] Different evaluation metrics are used for the segmentation results generated in experiments and tests, depending on the task. In the blastocyst segmentation task, four commonly used evaluation metrics are employed: accuracy, recall, Dice similarity coefficient (DSC), and Jaccard index. This invention selects the U-Net architecture as the backbone network for this task. For the LA dataset, Dice similarity coefficient (DSC), Jaccard index, average surface distance (ASD), and 95% Hausdorff distance (95HD) are used, with VNet serving as the backbone network for the LA segmentation task.

[0049] ICM segmentation experiment: Table 1 presents the quantitative results of ICM segmentation. To verify the superiority of BATNet in semi-supervised ICM segmentation, it is compared with 11 existing SSL methods: MT, DAN, UAMT, AEM, RD, DTC, ICT, MCF, ACMT, UniMatch, and LeFeD. All methods use U-Net as a unified backbone network to ensure fair comparison. As can be seen from Table 1, BATNet significantly outperforms classic consistent semi-supervised learning methods. Compared with the MT method, its DSC metric is improved by 1.75%, and its Jaccard index is improved by 1.81%. For multi-level perturbation methods that unify image / feature perturbations but lack task-specific optimizations, BATNet maintains its lead in boundary / region metrics—accuracy reaches 97.70%, DSC reaches 91.95%, and Jaccard index reaches 86.61%. This is due to its strategy of prioritizing boundary feature learning (rather than unified perturbation), thereby avoiding misclassification problems in small-region ICM. Compared to the fully supervised baseline model (SupOnly), BATNet achieves superior performance by utilizing unlabeled data: a 4.36% improvement in DSC and a 5.29% improvement in the Jaccard index. This validates its ability to transform unlabeled data into effective boundary learning signals to alleviate the data scarcity problem. Even when the labeled ratio increases to 50%, BATNet maintains a stable performance improvement compared to various existing semi-supervised learning methods. Notably, BATNet ultimately achieves 98.36% accuracy, 94.82% recall, 94.50% DSC, and 90.28% Jaccard index.

[0050] Table 1. Performance comparison of semi-supervised learning methods with U-Net as the backbone network in binary segmentation tasks. "*" and "**" indicate the significance levels of p≤0.05 and p≤0.01 respectively when comparing the best model of this invention with other models using a two-sided paired t-test. SupOnly: Fully supervised learning using only labeled images. SupOnly(Upper Bound): Fully supervised learning using all images. The best results are indicated in bold.

[0051]

[0052] Figure 6 The qualitative visualization comparison results further demonstrate (in the figure, the purple-red area represents false negatives, the red area represents false positives, and the cyan area shows the model's recognition results) that BATNet has advantages in solving problems such as blurred boundaries and missed detections in small regions in ICM segmentation. Figure 6 As can be seen, BATNet's recognition result (cyan area) has the highest overlap with the ground truth annotation, accurately outlining the complete contour of the ICM. Even in the low-contrast region where the ICM meets the TE, there is no obvious boundary shift or breakage. In contrast, other methods have several shortcomings in their segmentation results: for example, the classic consistency-based methods MT and UAMT produce fragmented recognition, with numerous small false positive spots appearing at the edge of the ICM, and local missed detections (false negatives) for small-volume ICMs. Although RD and DAN can roughly capture the overall range of the ICM, the boundary clarity is still insufficient due to the blurred boundary between the ICM and the blastocoel. Current state-of-the-art methods such as UniMatch, DTC, and MCF still exhibit missegmentation in local areas. These visual differences further validate the effectiveness of BATNet in enhancing weak boundary features and suppressing background interference through its boundary task perception module.

[0053] Blastocyst multiclass splitting experiment: Table 2 presents the quantitative comparison results of different semi-supervised learning methods on blastocyst multi-class segmentation datasets with annotation ratios of 10% and 50%. When the annotation ratio is 10%, BATNet achieves 90.10% accuracy, 88.12% recall, 86.67% DSC score, and 77.75% Jaccard index. Compared with other semi-supervised learning methods, BATNet of this invention outperforms in all metrics. Specifically, compared to the current state-of-the-art method LeFeD, BATNet improves the DSC and Jaccard indices by 0.82% and 0.97%, respectively. When the annotation ratio increases to 50%, the proposed BATNet framework continues to deliver performance improvements, validating its stability in blastocyst multi-class image segmentation scenarios.

[0054] It is worth noting that BATNet not only performs excellently in binary segmentation tasks but also demonstrates outstanding performance in multi-class segmentation. Multi-class segmentation of blastocysts requires simultaneously addressing the distinct features of the zona pellucida (ZP), the irregular morphology of the trophoblast cells (TE), and the small volume of the inner cell mass (ICM). BATNet effectively mitigates the problems of "scale confusion" and "class interference" through the synergistic effect of its IFEM module (enhancing blurred boundaries), MSFFM module (multi-scale feature fusion), and BEM module (clear boundary extraction). This collaborative mechanism is the key reason why BATNet outperforms single-feature consistency methods such as ACMT and ICT in multi-class segmentation tasks. Furthermore, although the performance gap between BATNet and other semi-supervised methods narrows slightly with the increase in the proportion of labeled data, BATNet still maintains a significant advantage. This further validates its strong adaptability to multi-class segmentation tasks of blastocysts.

[0055] Table 2. Performance comparison of semi-supervised learning methods with U-Net as the backbone network in multi-class segmentation tasks. "∗" and "∗∗" represent the significance levels of p≤0.05 and p≤0.01 respectively when comparing the best model of this invention with other models using a two-sided paired t-test. SupOnly: Fully supervised learning using only labeled images. SupOnly (Upper Bound): Fully supervised learning using all images. The best results are indicated in bold.

[0056]

[0057] Ablation studies: To verify the effectiveness of each component in this method, ablation experiments were conducted on the ICM segmentation task, and the results are shown in Table 3. As shown in Table 3, the experiment used "MS" (basic two-branch framework, without additional modules) as the baseline, and gradually added the proposed modules to observe the changes in various evaluation metrics. Table 3 shows that the model using only the "MS" baseline framework achieved the lowest values ​​on all metrics, with an accuracy of 96.96%, a recall of 87.08%, a DSC coefficient of 87.59%, and a Jaccard index of 81.32%. The ablation experiments further demonstrate that as the proposed modules are gradually integrated, the model performance shows a clear and progressive improvement trend, with each component making a unique contribution to the overall functionality. When the BEM module was integrated into the baseline model (MS+BEM), the recall increased from 87.08% to 90.29%, and the DSC coefficient increased from 87.59% to 90.52%. This improvement stems from the BEM architecture applying precise boundary constraints through multi-scale convolution operations and adaptive weighting mechanisms, thereby effectively reducing false negatives in ICM boundary segmentation. Adding the IFEM module (MS+BEM+IFEM) to this model brings additional gains, increasing accuracy to 97.55% and the Jaccard index to 85.73%. These results validate the effectiveness of IFEM in enhancing "hard features" (such as low-contrast boundaries) through Gaussian mapping and grouping channels by category. This design can reduce cross-interference between features of heterogeneous tissues and improve feature discrimination. When both IFEM and MSFFM modules are added simultaneously (MS+BEM+IFEM+MSFFM), all model metrics reach peak values: accuracy 97.70%, recall 92.45%, DSC coefficient 91.95%, and Jaccard index 86.61%. This result highlights the synergistic effect of the proposed modules in addressing the complex challenges of blastocyst multi-structure segmentation.

[0058] Figure 7Qualitative visualization results of the ablation study on the ICM segmentation task are presented. It can be observed that the segmentation results deviate most significantly from the ground truth when only the basic backbone network (BB) is used. The contours of the ICM region are blurred, exhibiting significant false negatives (failing to cover parts of the actual ICM region), while some false positives are observed near the junction of the TE and blastocoel. This indicates that basic feature extraction alone cannot solve the challenges of ICM segmentation—its blurred boundaries and small scale. After adding the boundary extraction module (BEM) (i.e., BB+BEM), the clarity of the ICM boundaries is significantly improved, and the number of false positives is reduced. However, local contour discontinuities still exist, confirming that BEM effectively extracts boundary features through its multi-scale convolution and adaptive weighting mechanism, providing initial constraints on segmentation accuracy. Further addition of the intra-class feature enhancement module (IFEM) (i.e., BB+BEM+IFEM) effectively enhances low-contrast boundary regions (such as the junction of the ICM and blastocoel), further improving the overlap between the segmentation results and the ground truth annotations, and significantly reducing missed detection areas. When both BEM and the Multi-Scale Feature Fusion Module (MSFFM) are added simultaneously (i.e., BB+BEM+MSFFM), the overall contour integrity of the ICM is improved, and the "scale confusion" problem caused by depth-of-field variations is alleviated, but local details are still inferior to the complete model. Ultimately, the segmentation results of the complete BATNet almost perfectly match the ground truth annotations, with continuous and accurate ICM boundaries and no obvious false positives or missed detections. This visually validates the advantage of the three modules (BEM, IFEM, and MSFFM) working together to address the shortcomings of basic frame segmentation, further demonstrating the effectiveness of the proposed BATNet.

[0059] Table 3. Ablation Study Results of ICM Dataset

[0060] also, Figure 8The provided dynamic visualization tracks the changes in pixel-by-pixel segmentation results and corresponding boundary features during BATNet's ICM segmentation training. This comparison intuitively demonstrates the model's optimization progress across different training epochs. In terms of pixel-level segmentation, during the initial training phase (epoch 1), the model could only roughly locate the approximate ICM region, showing a significant deviation from the ground truth annotations. As training progressed (from epoch 100 to 500), the segmentation accuracy continuously improved: by epoch 100, the overall outline of the ICM began to emerge; by epoch 500, the overlap between the segmented core ICM region and the ground truth annotations significantly increased, with only sporadic local false negatives remaining. By epoch 1000, the pixel-by-pixel segmentation results highly matched the ground truth annotations, completely capturing the ICM morphology without significant missegmentation. Meanwhile, the boundary features underwent a gradual optimization process throughout training: in the initial stage (round 1), the boundaries were blurry and chaotic; by round 100, although there were discontinuities, identifiable boundary fragments began to form; by round 500, the boundary continuity significantly improved, and the overall alignment with the labeled results was good, with only minor deviations in low-contrast areas; finally, in round 1000, the model generated completely stable, clear, and continuous ICM boundaries. Comprehensive analysis shows that... Figure 8 The dynamic comparison verified the effectiveness of BATNet—as training progressed, the model not only continuously improved in pixel-level ICM region segmentation integrity, but also simultaneously enhanced the clarity and accuracy of boundary delineation. This synergistic progress confirmed the collaborative effectiveness of the boundary guidance modules (IFEM, MSFFM, BEM) and the dual-branch architecture.

[0061] Extending to the LA segmentation dataset: To further verify the generalization ability of the proposed boundary-aware task framework, the experiment was extended to the LA segmentation dataset, which has similar structural features to blastocysts (e.g., nested tissue structures and boundary blurring due to changes in imaging depth). Table 4 and Figure 9 The results shown not only verify the robustness of BATNet, but also reveal the adaptability of its core modules to different medical image segmentation tasks.

[0062] Table 4 presents the quantitative comparison results of BATNet and current state-of-the-art semi-supervised methods on the LA dataset. It is evident that in the 10% labeled data scenario, BATNet achieves a DSC score of 90.09% and a Jaccard score of 82.20%, outperforming all compared methods. Its DSC score is 2.16% higher than LeFeD, and its Jaccard score is 3.37% higher; compared to classic consistency methods such as MT, its DSC score is 2.38% higher, and its Jaccard score is 3.31% higher. Notably, BATNet simultaneously achieves the lowest ASD score (7.50 voxels) and 95HD score (2.38 voxels), which are 3.95 voxels and 0.07 voxels lower than LeFeD, respectively. When the labeled data increased to 20%, BATNet maintained its leading position with a DSC score of 92.46% and a Jaccard score of 86.06%, while ASD and 95HD further decreased to 4.93 voxels and 1.59 voxels, respectively, demonstrating that its boundary-aware mechanism can effectively synergize with the increased labeled data to improve segmentation accuracy. These quantitative results verify that the BATNet core modules (IFEM, MSFFM, BEM) are not only suitable for blastocyst images, but also adaptable to the structural features of LA. Figure 9 The qualitative results in the study intuitively demonstrate BATNet's advantages in LA segmentation—its segmentation results almost perfectly match the real annotations, while other comparative methods have obvious shortcomings.

[0063] Table 4. Performance comparison of semi-supervised learning methods with V-Net as the backbone network on the LA dataset for binary segmentation.

[0064] BATNet consistently outperforms both datasets due to the universality of its core design. The Gaussian mapping in the IFEM module effectively enhances low-contrast features (such as thin-walled left atrial structures and blurred blastocyst boundaries), ensuring that weak signals are not missed. The MSFFM module, based on an attention-based multi-scale fusion mechanism, addresses the scale confusion caused by the complex morphology of the left atrium (from large chambers to small accessory structures) and the varying depth of field of the blastocyst. The explicit boundary constraints in the BEM module provide precise guidance for segmentation, avoiding the "edge blurring" phenomenon that arises from relying solely on consistency regularization methods. This design not only addresses the challenges of specific tasks but also overcomes the poor transferability of many semi-supervised methods due to optimization for single datasets.

[0065] Table 4 Quantitative Evidence and Figure 9Qualitative analysis collectively confirms that BTANet's boundary-aware task framework provides a robust and generalizable solution for semi-supervised medical image segmentation. The network's outstanding performance in blastocyst and left atrial segmentation tasks highlights its potential in broader clinical applications—from embryo quality assessment in assisted reproduction to anatomical segmentation for cardiac diagnosis—successfully bridging the gap between technological innovation and clinical practicality.

[0066] Through the detailed description of the specific embodiments and examples of the present invention above, those skilled in the art will understand that various changes, modifications, substitutions, and variations can be made to these embodiments and examples without departing from the principles and basic concept of the present invention. The scope of the present invention is defined by the appended claims and their equivalents. Content not described in detail in this specification, such as the specific construction and implementation details of typical neural networks and their functional layers, is something that those skilled in the art can implement based on well-known prior art and the technical teachings of the present invention.

Claims

1. A semi-supervised image processing method based on a boundary-aware task network, characterized in that... include: Step S100, input the medical image as follows ,in and These represent the height and width of the image, respectively. Indicates the number of channels; Step S200: Use a backbone network to extract features from the medical image X and generate prediction results; wherein, the feature extraction process for the input medical image X is represented as follows: (1) in Reference The first in the backbone network A backbone network, Each backbone network does not share weights and independently processes the input medical images and generates corresponding prediction results. Indicates the first The set of learnable parameters for a backbone network. They are independent of each other; The scale identifier representing the output feature corresponds to the number of times the backbone network is counted from back to front. The feature scale of the layer output; where ≥2; ≥3; Step S300: Input the features output from multiple consecutive layers of each backbone network into their respective boundary guidance modules for processing to obtain binary boundary probability maps for segmenting different types of tissues in medical image X. : (3) in, , This represents the set of learnable parameters for the boundary guidance module. Indicates that it has learnable parameters The convolutional blocks are used to extract explicit boundary features from multi-scale features; the boundary guidance module includes an intra-class feature enhancement module, a multi-scale feature fusion module, and a boundary extraction module processed sequentially; the intra-class feature enhancement module is used to establish the correspondence between categories and channels, and dynamically adjust the channel weights according to the predicted category probability map; the multi-scale feature fusion module adaptively processes input features of different spatial dimensions, and recalibrates the features through a channel attention mechanism; the boundary extraction module accurately extracts boundary information by integrating multi-scale feature extraction, adaptive weighting, and global context integration.

2. The semi-supervised image processing method based on a boundary-aware task network as described in claim 1, characterized in that: Each of the backbone networks can output pixel-level segmentation prediction results for different types of tissues in medical image X. : (2) in, Indicates that it has learnable parameters The segmented convolutional blocks.

3. The semi-supervised image processing method based on a boundary-aware task network as described in claim 2, characterized in that: The processing steps of the intra-class feature enhancement module include: Step S311: Analyze the input pixel-level segmentation prediction results. Interpolation is used to adapt feature maps at various scales. The spatial dimension yields the aligned probability map. ; (4) Among them, each scale feature map Classified as Channel group Grouping rules are defined as follows: (5) in This represents the number of channels in each group within the first C-1 categories, the number of channels in the first category. The number of remaining channels for each category is This grouping strategy ensures that each category has its own dedicated feature representation channel, avoiding cross-interference between different categories. Step S312, adopt adaptive weights By applying element-wise multiplication to its corresponding category-specific channel group To obtain enhanced features; where, weights The calculation formula is: (6) The final weights Falling Within the range; The mathematical expression for this enhancement process is: (7) Where ⊙ represents element-wise multiplication; Step S313: Stitch together all the enhanced channel groups to reconstruct the full-size feature map; (8) in This indicates that a full-dimensional feature map is reconstructed by concatenating channels.

4. The semi-supervised image processing method based on a boundary-aware task network as described in claim 3, characterized in that: The processing steps of the multi-scale feature fusion module include: Step S321: The input multi-scale full-size feature map is processed by bilinear interpolation. Adjust to target size The adjusted two-dimensional feature map is obtained. ; Step S322, using learnable weights The adjusted features are then weighted and fused to obtain the final fused features. : (10) in, ; Step S323: Generate channel weights through the channel attention mechanism. ,right Perform channel-level reweighting, and finally output Obtained by combining residual connections: (11) in This indicates the number of channels for the merged feature.

5. The semi-supervised image processing method based on a boundary-aware task network as described in claim 4, characterized in that: The processing steps of the boundary extraction module include: Step S331: Parallel convolutional layers are used for multi-feature extraction, with kernel sizes of the following values: , and , used from Extract multi-scale local features; then concatenate the obtained features along the channel dimension: (12) in Represent a Convolution operation, followed by batch normalization (BN) and ReLU activation function processing; Step S332: Channel attention weights are generated through the adaptive weight generation submodule. Obtain weighted features : (13) Step S333, using boundary probability weights The binary boundary probability map was calculated. ; (14) in This represents matrix multiplication.

6. The semi-supervised image processing method based on a boundary-aware task network as described in claim 5, characterized in that: The adaptive weight generation submodule includes an adaptive average pooling layer AdaAvol. Convolutional layer, ReLU activation function layer, another Convolutional layers and sigmoid function layers; the generated weights are used to enhance the channels most relevant to boundary formation, thereby suppressing irrelevant noise.

7. The semi-supervised image processing method based on a boundary-aware task network as described in claim 6, characterized in that: The boundary probability weight The way to obtain it is to Two pooling feature descriptors are generated by applying average pooling and max pooling respectively. These descriptors are then concatenated and processed... The convolution and sigmoid function processes are used to obtain the boundary probability weights. .

8. A semi-supervised image processing system based on a boundary-aware task network, characterized in that, include: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-7.

9. A computer-readable storage medium, characterized in that, Includes a computer program, which, when executed on a computer, causes the computer to perform the method of any one of claims 1-7.

10. A computer program product, characterized in that, The computer program product includes computer program code that, when run on a computer, causes the computer to perform the method of any one of claims 1-7.