A new method for mesophase segmentation of zinc alloy material microstructure image
By using the SCoP-SAM method, the spatial context prior is captured by utilizing the gradient structure and grayscale characteristics of the intermediate phase and integrated into the SAM encoding and decoding process. This solves the problems of low contrast, small target detection and complex morphology in the microstructure analysis of zinc alloys, and achieves high-precision intermediate phase segmentation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- UNIV OF SCI & TECH BEIJING
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-23
AI Technical Summary
Existing image segmentation techniques face challenges in the microstructure analysis of zinc alloys, including low contrast, difficulty in detecting small targets, and challenges in detecting complex morphologies. In particular, the accuracy of fine structure recognition is insufficient and it is prone to misjudgment.
The SCoP-SAM method, guided by spatial context priors, extracts the gradient structure and grayscale attribute priors of intermediate phases and integrates them into the encoder and decoder of SAM, providing prior perception cues and improving segmentation accuracy.
It significantly improves the segmentation accuracy of intermediate phases in zinc alloy microstructure images, robustly handles low phase contrast, blurred boundaries, high proportion of small-scale intermediate phases and morphological heterogeneity, and reduces the impact of noise interference.
Smart Images

Figure CN122265299A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical fields of computer vision and image segmentation, and in particular to a novel method for intermediate phase segmentation of microstructure images of zinc alloy materials. Background Technology
[0002] This invention belongs to the field of computer vision image segmentation technology, specifically relating to a segmentation method based on SAM (SegmentAnything Model). Zinc-based alloys (i.e., zinc alloys) are an indispensable and promising class of absorbable metallic biomaterials with enormous application potential in orthopedics, cardiovascular medicine, and dentistry. The microstructure characteristics of zinc alloys directly determine their macroscopic properties, while the intermediate phase—as a key microstructure component—plays a crucial role in regulating mechanical and functional properties. In the real world, the microstructure of zinc alloys exhibits complex multiphase characteristics.
[0003] Traditional methods for representing mesophases primarily rely on techniques such as scanning electron microscopy (SEM) and optical microscopy (OM) to acquire images of the microstructure, followed by manual annotation by materials professionals and segmentation and statistical analysis using simple image analysis software. On the one hand, manual annotation is inefficient and fails to meet stringent requirements for accuracy and repeatability. On the other hand, existing image segmentation techniques face numerous challenges in the microstructure analysis of zinc alloys, and these challenges remain largely unresolved.
[0004] SAM is a fundamental segmentation model based on a cue-based learning paradigm, capable of zero-shot segmentation and supporting multiple cue types, thereby improving the accuracy of complex structure analysis. Recently, SAM-Adapter developed a lightweight adaptive framework that transforms task-specific knowledge into visual cues to address the poor generalization problem in specialized segmentation tasks. CAT-SAM proposes a network with few-shot conditional tuning through a cue-bridging structure, enabling the encoder to co-optimize under decoder guidance, thus addressing the adaptive challenges of SAM. SAMCT is equipped with a SAM-based U-shaped CNN image encoder, providing supplementary local features for segmentation. DenseSAM replaces SAM's reliance on location cues with semantic guidance in dense scenes, introducing an efficient semantic injection module and a dual-head decoding structure to address the challenge of dense objects in pathological and remote sensing images. MatSAM develops an automatic material microstructure analysis model based on SAM, achieving efficient segmentation of structures such as grains and second-phase particles by integrating cue generation strategies and multi-scale feature fusion. µSAM provides a SAM-based microscopic image segmentation and tracking tool, unifying interactive and automatic segmentation of 2D / 3D / temporal data.
[0005] However, despite SAM’s efforts in microstructure analysis, existing methods still fall short in terms of accuracy in identifying fine structures such as grains, precipitates, and microcracks, and are susceptible to misjudgments in complex phase distribution scenarios.
[0006] For example, Chinese patent CN121527427A discloses a large model image segmentation method based on multi-scale fusion. This method has technical defects such as over-reliance on global semantic features while ignoring local edge details, and lack of adaptive mechanisms for different sample prediction difficulties.
[0007] Chinese patent CN121505257A discloses a SAM medical image segmentation method based on QR-KAN and MSMDA feature enhancement. This method has technical defects such as insufficient nonlinear fitting ability, weak extraction of small target features, and poor resistance to noise interference.
[0008] Chinese patent CN119205801A discloses a fast and memory-friendly image segmentation method based on the SAM model. However, this method suffers from technical defects such as difficulty in recognizing fine structures, instability of the prompting mechanism leading to misjudgments, difficulty in efficiently handling dense multi-target scenes, and deviation in small target detection.
[0009] Chinese patent CN120689610A discloses an automatic prompting ultrasound image segmentation method and system based on SAM. This method has technical defects such as excessive domain specificity, misfiltering of fine features, and insufficient resolution of dense small targets. Summary of the Invention
[0010] The main objective of this invention is to address the technical problems of low contrast, difficulty in detecting small targets, and challenges in complex morphology in the microstructure analysis of zinc alloys by existing image segmentation techniques. In particular, existing SAM-based segmentation methods still have shortcomings in the recognition accuracy of fine structures (such as grains, precipitates, and microcracks) and are easily affected by misjudgments in complex phase distribution scenarios.
[0011] Among them, low contrast: the gray-scale contrast between the matrix and the intermediate phase in zinc alloys is usually very small, especially in SEM backscattered electron images, where the contrast between the phases differs by only a few gray levels.
[0012] Challenges in small object detection: Zinc alloys contain a large number of submicron-sized intermediate phases, which occupy a very small portion of the pixels in the image, carry negligible semantic and textural information, and are easily confused with the background texture.
[0013] Challenges of complex morphologies: The intermediate phases in zinc alloys exhibit complex and heterogeneous morphologies, including irregular forms (such as layered, spherical, dendritic, and acicular), often accompanied by overlap and aggregation.
[0014] Therefore, existing methods suffer from high accuracy in recognizing fine structures and are prone to misjudgment in complex phase distribution scenarios. To address these issues, a novel spatial context prior-guided SAM method is proposed. This method, named SCoP-SAM (Spatial Context Prior-guided SAM), extracts the gradient structure and grayscale attributes of intermediate phases as prior spatial context and integrates them into the encoder and decoder of the SAM, providing prior perceptual cues for intermediate phase regions. SCoP-SAM accurately anchors the intermediate phase regions of the target during segmentation and fully depicts their complex boundary contours, thereby significantly improving segmentation accuracy.
[0015] The technical solution is as follows:
[0016] A novel method for intermediate phase segmentation of microstructure images of zinc alloy materials is disclosed. This method is a new spatial context prior-guided SAM method, SCoP-SAM. Specifically, the gradient structure and grayscale attribute priors of the intermediate phases are first extracted as spatial context priors (SCPs). Then, the SCPs are integrated into the prior perceptual cue encoder (PPE) and the prior enhancement mask decoder (PMD). This enables the method to achieve robust performance under the challenges of intermediate phase segmentation, including low inter-phase contrast, blurred or incomplete boundaries, high proportion of small-scale intermediate phases, morphological heterogeneity, and noise interference. Its core technologies include the spatial context prior (SCP) technology, the prior perceptual cue encoder (PPE) technology, and the prior enhancement mask decoder (PMD) technology.
[0017] Optionally, the spatial context prior (SCP) technology is as follows: a spatial context prior is constructed, and the SCP module adopts a learnable network with three parallel paths to process the original RGB image, gradient prior image and grayscale prior image respectively.
[0018] Optionally, the three paths are independent of each other and do not share weights. Each path is composed of three convolutional layers and two GELU nonlinear activation functions stacked alternately. The feature maps output by the three paths are added element by element and then fused into a unified spatial context prior representation. The result is a three-dimensional feature tensor with a fixed height, width and number of channels.
[0019] Optionally, the gradient prior map is generated by an adaptive edge detection algorithm: combining the Canny operator and the Otsu thresholding method to extract edge features, and selecting the result with richer contours as the gradient prior by comparing the number of contours, so as to better maintain the topological integrity and structural continuity of the intermediate phase edges;
[0020] The gray-scale prior image is modeled based on dominant color analysis: first, the distribution of black and white pixels in the image is statistically analyzed, and the target primary color (black or white) is determined according to the dominant color. Then, K-means clustering is performed on all pixels, and the cluster center that is closest to the primary color in Euclidean distance is selected. Finally, a binary region mask is generated around the center, thereby effectively capturing the clustering characteristics of the intermediate phase region in the gray-scale space.
[0021] Optionally, the prior awareness cue encoder PPE technology involves: after obtaining the spatial context prior SCP, embedding it into the subsequent prior awareness cue encoder PPE module to generate robust prior cues that guide downstream image coding.
[0022] Optionally, the prior-aware cue encoder PPE module introduces a learnable projection network to project the spatial context prior (SCP) onto a shared cue token; then, the shared cue token is added to several learnable tokens to generate a cue token with prior-aware capabilities, and this token is input into each Transformer encoder.
[0023] Optionally, the learnable projection network consists of a convolutional layer, a GELU activation function, an average pooling layer, and a multilayer perceptron (MLP); the prior-aware cue encoder (PPE) contains multiple cascaded Transformer encoders, each encoder taking the output of its previous stage as input and progressively updating the cue tokens with prior-aware capabilities during the encoding process; for the last Transformer encoder, only the prior-enhanced image embedding token in its output is retained, while the cue token portion is discarded.
[0024] Optionally, in the prior enhancement mask decoder PMD technology, the PMD module integrates spatial context priors and maps this prior-aware, token-guided image embedding to the final segmentation mask. After obtaining the image embedding, this method adds it to the spatial context prior to obtain the fused features, which are then input into a decoder containing two decoding blocks. Each decoding block sequentially includes a self-attention layer, a token-to-image cross-attention layer, a multilayer perceptron (MLP), and an image-to-token cross-attention layer.
[0025] Optionally, a set of learnable output tokens first undergoes a self-attention mechanism, and then serves as a query to focus on the fused image features, thereby capturing global image context information; the fused features act as keys and values in this process; after being updated by the MLP, the output tokens integrate the image context information; then, the fused features again serve as a query, focusing on the updated output tokens through an image-to-token cross-attention mechanism, injecting global guidance information into local spatial features, thereby obtaining an enhanced image representation; after processing by two layers of decoders, this enhanced representation is upsampled through two transposed convolutions with GELU activation functions and layer normalization to obtain a feature map with a smaller scale than the input image; simultaneously, the updated output tokens are fed into a token-to-image cross-attention layer, and then an upsampled image embedding is generated by a multilayer perceptron; finally, the upsampled embedding is added pointwise with the shared cue features, and the result is multiplied pointwise with the aforementioned small-scale feature map and shared cue features, thereby generating a foreground probability for each pixel and outputting the final segmentation mask.
[0026] Optionally, the novel method for intermediate phase segmentation of the microstructure image of zinc alloy materials further includes model training and model inference techniques; wherein:
[0027] Model training techniques: The overall training objective of SCoP-SAM is set as a linearly weighted multi-task loss function that integrates three complementary core loss components: a region overlap optimization component - Dice loss, a pixel-level binary classification component - binary cross-entropy loss, and an auxiliary intersection-union ratio (IoU) regression component for segment consistency; the model is trained using SAM ViT-H as the backbone network, starting from scratch on the IPSM-Bench training set;
[0028] Model inference techniques: Three pixel-level metrics are used: Intersection over Union (IoU), F1 score, and precision. The performance of the foreground and background categories for each metric is calculated separately, and the average values of the two categories are reported as mIoU, mF1, and mPre.
[0029] Technical principle of the invention:
[0030] The Spatial Context Prior (SCP) method of this invention generates a three-dimensional context representation that integrates multi-source prior information by fusing an RGB image, a gradient prior based on adaptive edge detection, and a grayscale prior based on dominant color clustering through three independent learnable convolutional networks.
[0031] The prior-aware cue encoder PPE of the present invention maps spatial context priors to shared cue tokens through a learnable projection network consisting of convolutional layers, GELU, average pooling, and MLP, and fuses them with the learnable tokens to generate prior-aware cue. After iterative optimization by a multi-level cascaded Transformer encoder, only the prior-enhanced image embedding in the final output is retained for downstream tasks.
[0032] The prior enhancement mask decoder PMD of the present invention integrates spatial context prior with image embedding, and iteratively optimizes learnable output tokens and image features using a bidirectional token-image cross-attention mechanism in two-level decoding blocks. Finally, it combines upsampled token-guided embedding, shared cue features and low-resolution feature maps, and generates pixel-level foreground probabilities through point-by-point addition and multiplication operations to output the final segmentation mask.
[0033] The above technical solution has at least the following advantages compared with the existing technology:
[0034] The present invention proposes a novel method for intermediate phase segmentation in microstructure images of zinc alloy materials, which can solve the technical problems of low contrast, difficulty in small target detection, and challenges in complex morphology in existing image segmentation techniques for zinc alloy microstructure analysis. In particular, it can address the shortcomings of existing SAM-based segmentation methods in the recognition accuracy of fine structures (such as grains, precipitates, and microcracks), and is less susceptible to misjudgment in complex phase distribution scenarios.
[0035] The method of this invention utilizes the gradient structure and grayscale characteristics of the intermediate phase to capture spatial context priors and integrates them into the entire SAM encoding and decoding process to provide prior perception cues for the intermediate phase region.
[0036] The method of this invention accurately anchors the intermediate phase region of the target during the segmentation process and completely depicts its complex boundary contour, thereby significantly improving the segmentation accuracy.
[0037] The overall training objective of the method of this invention is set as a linearly weighted multi-task loss function that integrates three complementary core loss components: a region overlap optimization component - Dice loss, a pixel-level binary classification component - binary cross-entropy loss, and an auxiliary intersection-union ratio (IoU) regression component for segmentation consistency.
[0038] In summary, compared with traditional methods, the method of this invention utilizes the gradient structure and grayscale characteristics of intermediate phases to capture the spatial context prior (SCP), and integrates it into the prior perceptual cue encoder (PPE) and prior enhancement mask decoder (PMD) of the entire SAM. This method is simple to operate, clearly depicts boundary contours, has high segmentation accuracy, and is conducive to large-scale industrial production and promotion. Attached Figure Description
[0039] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0040] Figure 1 This is a schematic diagram of the overall model architecture of a novel intermediate phase segmentation method for microstructure images of zinc alloy materials according to the present invention, which includes spatial context prior (SCP) technology, prior perceptual cue encoder (PPE) technology, and prior enhancement mask decoder (PMD) technology.
[0041] Figure 2 This is a schematic diagram of the Spatial Context Prior (SCP) technology module in a novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to the present invention.
[0042] Figure 3 This is a schematic diagram of the prior perception cue encoder (PPE) technology module in a novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to the present invention.
[0043] Figure 4 This is a schematic diagram of the prior enhancement mask decoder (PMD) technology module in a novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to the present invention.
[0044] Figure 5 This is a schematic diagram of the microstructure of 20 zinc alloy materials in a novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to the present invention.
[0045] Figure 6 This is a schematic diagram showing the number of images of 20 zinc alloy materials in a novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to the present invention.
[0046] Figure 7 This is a visual comparison of SEM and OM images of six typical zinc alloys in a novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to the present invention.
[0047] Figure 8 This is a comparison chart showing the results of SCoP-SAM, a novel method for intermediate phase segmentation of the microstructure image of zinc alloy materials according to the present invention, compared with the most advanced method. Detailed Implementation
[0048] The technical solution of the present invention will now be described with reference to the accompanying drawings.
[0049] In embodiments of the present invention, words such as "exemplarily," "for example," etc., are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" in the present invention should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the word "exemplary" is intended to present the concept in a concrete manner. Furthermore, in embodiments of the present invention, the meaning expressed by "and / or" can be both, or either one.
[0050] In the embodiments of the present invention, the terms "image" and "picture" may sometimes be used interchangeably. It should be noted that when the distinction is not emphasized, their intended meanings are consistent.
[0051] In this embodiment of the invention, sometimes a subscript such as W1 may be written in a non-subscript form such as W1. When the difference is not emphasized, the meaning they express is the same.
[0052] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.
[0053] A novel method for intermediate phase segmentation of microstructure images of zinc alloy materials is disclosed. This method is a novel spatial context-guided SAM method, SCoP-SAM. Specifically, it combines... Figure 1 The method first extracts the gradient structure and grayscale attribute priors of the intermediate phase as the spatial context prior (SCP). Then, the SCP is integrated into the prior perceptual cue encoder (PPE) and the prior enhancement mask decoder (PMD). This enables the method to achieve robust performance under the challenges of intermediate phase segmentation, including low inter-phase contrast, blurred or incomplete boundaries, high proportion of small-scale intermediate phases, morphological heterogeneity, and noise interference. Its core technologies include the spatial context prior (SCP) technology, the prior perceptual cue encoder (PPE) technology, and the prior enhancement mask decoder (PMD) technology.
[0054] In particular, such as Figure 2 As shown, the Spatial Context Prior (SCP) technology is as follows: a spatial context prior is constructed, and the SCP module adopts a learnable network with three parallel paths to process the original RGB image, gradient prior image and grayscale prior image respectively.
[0055] Specifically, the three paths are independent of each other and do not share weights. Each path is composed of three convolutional layers and two GELU nonlinear activation functions stacked alternately. The feature maps output by the three paths are added element by element and then fused into a unified spatial context prior representation, which is a three-dimensional feature tensor with a fixed height, width and number of channels.
[0056] Specifically, the gradient prior map is generated by an adaptive edge detection algorithm: edge features are extracted by combining the Canny operator and the Otsu thresholding method, and the result with richer contours is selected as the gradient prior by comparing the number of contours, so as to better maintain the topological integrity and structural continuity of the intermediate phase edges.
[0057] The gray-scale prior image is modeled based on dominant color analysis: first, the distribution of black and white pixels in the image is statistically analyzed, and the target primary color (black or white) is determined according to the dominant color. Then, K-means clustering is performed on all pixels, and the cluster center that is closest to the primary color in Euclidean distance is selected. Finally, a binary region mask is generated around the center, thereby effectively capturing the clustering characteristics of the intermediate phase region in the gray-scale space.
[0058] Specifically, the prior awareness cue encoder PPE technology involves: after obtaining the spatial context prior (SCP), embedding it into the subsequent prior awareness cue encoder PPE module to generate robust prior cues that guide downstream image coding.
[0059] In particular, such as Figure 3 As shown, the prior awareness cue encoder PPE module introduces a learnable projection network to project the spatial context prior (SCP) onto the shared cue token; then, the shared cue token is added to several learnable tokens to generate a cue token with prior awareness capability, and this token is input into each Transformer encoder.
[0060] Specifically, the learnable projection network consists of a convolutional layer, a GELU activation function, an average pooling layer, and a multilayer perceptron (MLP); as Figure 3 As shown, the prior-aware cue encoder PPE contains multiple cascaded Transformer encoders. Each encoder takes the output of its previous stage as input and updates the cue tokens with prior-aware capabilities progressively during the encoding process. For the last Transformer encoder, only the prior-enhanced image embedding token in its output is retained, while the cue token portion is discarded.
[0061] In particular, such as Figure 4 As shown, the PMD module in the prior enhancement mask decoder technique integrates the spatial context prior (SCP) and maps this prior-aware, token-guided image embedding to the final segmentation mask. After obtaining the image embedding, this method adds it to the spatial context prior (SCP) to obtain the fused features, which are then input into a decoder containing two decoding blocks. Each decoding block sequentially includes a self-attention layer, a token-to-image cross-attention layer, a multilayer perceptron (MLP), and an image-to-token cross-attention layer.
[0062] In particular, such as Figure 4 As shown, a set of learnable output tokens first pass through a self-attention mechanism, and then serve as a query to focus on the fused image features, thereby capturing global image context information; the fused features act as keys and values in this process; after being updated by the MLP, the output tokens integrate the image context information; then, the fused features again serve as a query, focusing on the updated output tokens through an image-to-token cross-attention mechanism, injecting global guidance information into local spatial features, thereby obtaining an enhanced image representation; after processing by two layers of decoders, this enhanced representation is upsampled through two transposed convolutions with GELU activation functions and layer normalization to obtain a feature map with a smaller scale than the input image; simultaneously, the updated output tokens are fed into a token-to-image cross-attention layer, and then an upsampled image embedding is generated by a multilayer perceptron; finally, the upsampled embedding is added pointwise with the shared cue features, and the result is multiplied pointwise with the aforementioned small-scale feature map and shared cue features, thereby generating a foreground probability for each pixel and outputting the final segmentation mask.
[0063] Specifically, the novel method for intermediate phase segmentation of the microstructure image of the zinc alloy material also includes model training and model inference techniques; wherein:
[0064] Model training techniques: The overall training objective of SCoP-SAM is set as a linearly weighted multi-task loss function that integrates three complementary core loss components: a region overlap optimization component - Dice loss, a pixel-level binary classification component - binary cross-entropy loss, and an auxiliary intersection-union ratio (IoU) regression component for segment consistency; the model is trained using SAM ViT-H as the backbone network, starting from scratch on the IPSM-Bench training set;
[0065] Model inference techniques: Three pixel-level metrics are used: Intersection over Union (IoU), F1 score, and precision. The performance of the foreground and background categories for each metric is calculated separately, and the average values of the two categories are reported as mIoU, mF1, and mPre.
[0066] The specific implementation methods use 20 types of zinc alloy materials, totaling 1054 images, such as... Figure 5 As shown.
[0067] The number of images of various zinc alloys, such as Figure 6 As shown. This zinc alloy material image dataset is named IPSM-Bench. The training set contains 800 samples: 246 from OM images and 554 from SEM images; the test set contains 254 samples: 132 from OM images and 122 from SEM images.
[0068] The inventors used three pixel-level metrics: Intersection over Union (IoU), F1 score, and accuracy.
[0069] Specifically, the inventors calculated the performance of the foreground and background categories on each metric and reported the average values of the two categories, denoted as mIoU, mF1, and mPre, respectively.
[0070] To ensure the fairness of the comparative experiments, all experiments followed a consistent configuration: 1) all methods used SAMViT-H as the backbone network; 2) all methods were trained from scratch on the IPSM-Bench training set using its official open-source implementation and settings.
[0071] The inventors trained SCoP-SAM on four NVIDIA A800 GPUs with a batch size of 4, for 50 epochs, using the AdamW optimizer and a learning rate of 1e-4. The inventors compared the described SCoP-SAM with state-of-the-art methods and obtained the following results: Figure 8 As shown.
[0072] In addition, the inventors provided visualization analysis of SEM and OM images of six typical zinc alloys, such as Figure 7 As shown.
[0073] SEM and OM images present different challenges: SEM images suffer from low contrast and background noise, while OM images reveal the topological structure of complex microstructure phases.
[0074] Compared to MatSAM, DenseSAM, and µSAM, which suffer from oversegmentation, undersegmentation, or even complete failure, the inventors' SCoP-SAM can stably and accurately recover irregular shapes and detail boundaries, achieve high consistency with human annotation (GT) across all samples, and demonstrate excellent generalization ability and robustness across materials and modes.
[0075] The present invention proposes a novel method for intermediate phase segmentation in microstructure images of zinc alloy materials, which can solve the technical problems of low contrast, difficulty in small target detection, and challenges in complex morphology in existing image segmentation techniques for zinc alloy microstructure analysis. In particular, it can address the shortcomings of existing SAM-based segmentation methods in the recognition accuracy of fine structures (such as grains, precipitates, and microcracks), and is less susceptible to misjudgment in complex phase distribution scenarios.
[0076] The method of this invention utilizes the gradient structure and grayscale characteristics of the intermediate phase to capture spatial context priors and integrates them into the entire SAM encoding and decoding process to provide prior perception cues for the intermediate phase region.
[0077] The method of this invention accurately anchors the intermediate phase region of the target during the segmentation process and completely depicts its complex boundary contour, thereby significantly improving the segmentation accuracy.
[0078] The overall training objective of the method of this invention is set as a linearly weighted multi-task loss function that integrates three complementary core loss components: a region overlap optimization component - Dice loss, a pixel-level binary classification component - binary cross-entropy loss, and an auxiliary intersection-union ratio (IoU) regression component for segmentation consistency.
[0079] In summary, compared with traditional methods, the method of this invention utilizes the gradient structure and grayscale characteristics of intermediate phases to capture the spatial context prior (SCP), and integrates it into the prior perceptual cue encoder (PPE) and prior enhancement mask decoder (PMD) of the entire SAM. This method is simple to operate, clearly depicts boundary contours, has high segmentation accuracy, and is conducive to large-scale industrial production and promotion.
[0080] It should be understood that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. A and B can be singular or plural. Additionally, the character " / " in this article generally indicates an "or" relationship between the preceding and following related objects, but it can also represent an "and / or" relationship. Please refer to the context for a more accurate understanding.
[0081] In this invention, "at least one" means one or more, and "more than one" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be a single item or multiple items.
[0082] It should be understood that, in various embodiments of the present invention, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0083] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A novel method for intermediate phase segmentation in microstructure images of zinc alloy materials, characterized in that, The novel intermediate phase segmentation method for the microstructure image of zinc alloy material is a new spatial context prior guided SAM method SCoP-SAM. Specifically, the gradient structure and grayscale attribute prior of the intermediate phase are first extracted as the spatial context prior SCP. Then, the SCP is integrated into the prior perceptual cue encoder PPE and the prior enhancement mask decoder PMD, so that the method can achieve robust performance under the challenges of intermediate phase segmentation, including low inter-phase contrast, blurred or incomplete boundaries, high proportion of small-scale intermediate phases, morphological heterogeneity, and noise interference. Its core technologies include Spatial Context Prior (SCP) technology, Prior Perception Cueing Encoder (PPE) technology, and Prior Enhancement Mask Decoder (PMD) technology.
2. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 1, characterized in that, The Spatial Context Prior (SCP) technology involves constructing a spatial context prior. The SCP module employs a learnable network with three parallel paths to process the original RGB image, gradient prior image, and grayscale prior image, respectively.
3. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 2, characterized in that, The three paths are independent of each other and do not share weights. Each path is composed of three convolutional layers and two GELU nonlinear activation functions stacked alternately. The feature maps output by the three paths are added element by element and then fused into a unified spatial context prior representation. The result is a three-dimensional feature tensor with a fixed height, width and number of channels.
4. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 3, characterized in that, The gradient prior map is generated by an adaptive edge detection algorithm: edge features are extracted by combining the Canny operator and the Otsu thresholding method, and the result with richer contours is selected as the gradient prior by comparing the number of contours, so as to better maintain the topological integrity and structural continuity of the intermediate phase edges. The gray-scale prior image is modeled based on dominant color analysis: first, the distribution of black and white pixels in the image is statistically analyzed, and the target primary color (black or white) is determined according to the dominant color. Then, K-means clustering is performed on all pixels, and the cluster center that is closest to the primary color in Euclidean distance is selected. Finally, a binary region mask is generated around the center, thereby effectively capturing the clustering characteristics of the intermediate phase region in the gray-scale space.
5. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 1, characterized in that, The aforementioned Prior Aware Cueing Encoder (PPE) technology involves: after obtaining the Spatial Context Prior (SCP), embedding it into the subsequent Prior Aware Cueing Encoder (PPE) module to generate robust prior cues that guide downstream image coding.
6. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 5, characterized in that, The prior awareness cue encoder PPE module introduces a learnable projection network that projects the spatial context prior (SCP) onto the shared cue token. The shared cue token is then added to several learnable tokens to generate a cue token with prior awareness, which is then input into each Transformer encoder.
7. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 6, characterized in that, The learnable projection network consists of a convolutional layer, a GELU activation function, an average pooling layer, and a multilayer perceptron (MLP). The prior-aware cue encoder (PPE) contains multiple cascaded Transformer encoders, each of which takes the output of its previous stage as input and updates the cue tokens with prior-aware capabilities progressively during the encoding process. For the last Transformer encoder, only the prior-enhanced image embedding token in its output is retained, while the cue token portion is discarded.
8. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 1, characterized in that, The PMD module in the aforementioned Prior Enhancement Mask Decoder (PMD) technology integrates spatial context priors and maps this prior-aware, cue token-guided image embedding into the final segmentation mask. After obtaining the image embedding, this method adds it to the spatial context prior to obtain the fused features, which are then input into a decoder containing two decoding blocks. Each decoding block contains, in sequence, a self-attention layer, a token-to-image cross-attention layer, a multilayer perceptron (MLP), and an image-to-token cross-attention layer.
9. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 8, characterized in that, A set of learnable output tokens first passes through a self-attention mechanism, and then serves as a query to focus on the fused image features, thereby capturing global image context information; the fused features act as keys and values in this process. After being updated by the MLP, the output token is able to incorporate the contextual information of the image; then, The fused features are used as a query, focusing on the updated output token through an image-to-token cross-attention mechanism, injecting global guidance information into local spatial features to obtain an enhanced image representation. After processing by two layers of decoders, this enhanced representation is upsampled through two transposed convolutions with GELU activation functions and layer normalization to obtain a feature map with a smaller scale than the input image. At the same time, the updated output token is fed into a token-to-image cross-attention layer, and then an upsampled image embedding is generated by a multilayer perceptron. Finally, the upsampled embedding is added pointwise with the shared cue features, and the result is multiplied pointwise with the aforementioned small-scale feature map and shared cue features to generate a foreground probability for each pixel, outputting the final segmentation mask.
10. The novel method for intermediate phase segmentation of microstructure images of zinc alloy materials according to claim 1, characterized in that, The novel method for intermediate phase segmentation of the microstructure image of zinc alloy materials also includes model training and model inference techniques; wherein: Model training techniques: The overall training objective of SCoP-SAM is set as a linearly weighted multi-task loss function that integrates three complementary core loss components: a region overlap optimization component - Dice loss, a pixel-level binary classification component - binary cross-entropy loss, and an auxiliary intersection-union ratio (IoU) regression component for segment consistency; the model is trained using SAM ViT-H as the backbone network, starting from scratch on the IPSM-Bench training set; Model inference techniques: Three pixel-level metrics are used: Intersection over Union (IoU), F1 score, and precision. The performance of the foreground and background categories for each metric is calculated separately, and the average values of the two categories are reported as mIoU, mF1, and mPre.