An intelligent identification method for salt cavern shape based on edge optimization and evolutionary constraint
By constructing the EO-MMDC-UNet model and using an edge-aware hybrid loss function, the problems of low efficiency and insufficient accuracy in salt cave morphology analysis are solved. This enables high-precision segmentation and multi-dimensional parameter extraction of salt cave images, supports the analysis of cavity evolution patterns, and meets the needs of engineering applications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ENERGY RES INST OF JIANGXI ACAD OF SCI
- Filing Date
- 2026-05-19
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, salt cave morphology analysis is inefficient and highly subjective. Traditional image processing methods cannot adapt to the blurred boundaries and complex shapes of salt caves. General deep learning models lack the accuracy of salt cave image segmentation and lack the ability to extract morphological parameters and quantify evolution trends to meet the needs of engineering analysis.
An image segmentation model (EO-MMDC-UNet) with edge optimization and dense multi-module connections is constructed. It adopts a hybrid annotation strategy, an edge-aware hybrid loss function, and multi-scale edge feature fusion to achieve high-precision segmentation of salt cave images and automatic extraction of multi-dimensional morphological parameters. The evolution law of the cavity is analyzed by combining time series data.
It achieves high-precision intelligent segmentation and integrated processing of salt cave images, improving the model's adaptability and robustness in special salt cave imaging scenarios. It can automatically extract cavity geometric morphology parameters and analyze their evolution trend, meeting the timeliness requirements of engineering applications.
Smart Images

Figure CN122244689A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the interdisciplinary field of deep learning and geological engineering, and in particular relates to an intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints. Background Technology
[0002] Salt caverns are underground storage spaces formed during the water-dissolution process of salt rock, and are widely used in natural gas storage, compressed air energy storage, and oil reserves. In layered salt rock strata, the commonly developed refractory interlayers such as anhydrite and mudstone are susceptible to dissolution disturbance during cavern construction, leading to local instability and collapse. This results in the cavern's outline deviating from its ideal shape, exhibiting complex phenomena such as local protrusions, depressions, and eccentric expansion. These morphological deviations directly affect the effective volume assessment, airtightness, and long-term stability analysis of the reservoir. Therefore, achieving high-precision identification, quantitative characterization, and temporal evolution analysis of salt cavern morphology is of significant engineering importance for optimizing cavern construction processes and ensuring reservoir safety.
[0003] Currently, the analysis of cavity morphology in physical simulation experiments of layered salt rock water-soluble cavity construction mainly relies on manual methods. This method involves visually delineating the cavity outline and estimating basic parameters such as area and perimeter, which inherently suffers from low efficiency, strong subjectivity, and poor consistency. It struggles to meet the demands of rapid batch analysis when faced with massive amounts of time-series images. Furthermore, manual methods typically only acquire a limited number of macroscopic parameters, failing to systematically extract deep morphological features such as roundness, eccentricity, and edge smoothness, and making it even more difficult to achieve automated trend analysis of the temporal changes in these parameters.
[0004] It is worth noting that conventional automated image processing methods such as threshold segmentation and edge detection have not yet been effectively applied in this field. This is mainly due to the unique characteristics of salt cavern water-soluble cavity images: First, the cavity edges typically exhibit gradual grayscale changes rather than sharp abrupt transitions, making boundary localization difficult; second, the cavity morphology is influenced by multiple factors, including water flow rate, interlayer properties and thickness distribution, and collapse behavior, resulting in significant irregularities and phased changes; third, the cavity expands from micro-pores to large-scale contours, exhibiting obvious multi-scale features; and finally, the images contain complex background interference such as interlayer textures, brine reflection, uneven illumination, and experimental setup. The combination of these factors makes it difficult to directly apply traditional image processing methods.
[0005] In recent years, deep learning models based on convolutional neural networks have performed well in image segmentation tasks. However, existing general-purpose models, such as DeepLabV3+, are mainly designed for targets with clear boundaries and stable textures, making it difficult to adapt to salt cave image scenes with significant geological evolution characteristics. Experiments show that the segmentation accuracy of such models on salt cave images is still insufficient, with an IoU of 0.8877, indicating limited ability to model targets with weak boundaries and varied morphologies. More importantly, existing models typically only output pixel-level segmentation masks, lacking the ability to extract specific morphological parameters and quantify evolutionary trends for the engineering analysis needs of salt caves, thus failing to achieve the leap from "segmentable" to "interpretable, quantifiable, and predictable." Therefore, there is an urgent need to develop a dedicated intelligent recognition method that deeply integrates the feature modeling capabilities of deep learning with the representation requirements of salt cave evolution mechanisms. Summary of the Invention
[0006] The purpose of this invention is to provide an intelligent salt cave morphology recognition method based on edge optimization and evolutionary constraints. This method addresses the problems of low efficiency and strong subjectivity in manual analysis, the failure of traditional image processing methods due to blurred salt cave boundaries, complex morphological evolution, and background interference, the insufficient adaptability of general deep learning models to weak boundary scenes of salt caves, limited segmentation accuracy, and the lack of automatic extraction and evolution evaluation capabilities of morphological parameters for engineering analysis. This invention achieves integrated processing of salt cave images from high-precision intelligent segmentation and automatic extraction of multi-dimensional morphological parameters to analysis of cavity evolution laws.
[0007] The technical solution adopted in this invention is a method for intelligent recognition of salt cave morphology based on edge optimization and evolutionary constraints, comprising the following steps:
[0008] Step S1: Construct an image dataset of physical simulation experiment of layered salt rock water dissolution cavity formation, and use a three-level hybrid annotation strategy to label the cavity regions in the images;
[0009] Step S2: Based on the image dataset, construct the EO-MMDC-UNet image segmentation model with edge optimization and dense multi-module connections; the model consists of an encoder module, a bottleneck layer module, a decoder module, a multi-scale edge feature fusion module, and a dual-branch output module;
[0010] Step S3: Train the model using an edge-aware hybrid loss function; the edge-aware hybrid loss function is composed of weighted loss components of Dice loss, binary cross-entropy loss, edge consistency loss, and edge refinement loss;
[0011] Step S4: Preprocess the input image and process it using the model trained in Step S3 to output the main segmentation map and the edge refinement map; perform binarization on the main segmentation map to obtain a binarized segmentation mask; the edge refinement map provides refinement information of the cavity boundary to assist in the analysis.
[0012] Step S5: Process and analyze the binarized segmentation mask, extract parameters describing the geometry of the cavity, and analyze the evolution pattern based on time series data.
[0013] Further, in step S1, the hybrid annotation strategy includes:
[0014] The RGB image is converted to the HSV color space, and thresholds are set for the hue channels in the ranges of 0-10 degrees and 160-180 degrees for automatic annotation. Morphological opening operations are performed on the image using a 3×3 elliptical kernel, and morphological closing operations are performed using a 7×7 elliptical kernel to obtain an initial binary mask. 10% of the keyframe images in the dataset are selected and manually annotated using an interactive tool. Based on the binary mask, the corresponding automatic annotation results are directly replaced with manually annotated files to generate a unique final label file for each image in the dataset.
[0015] Further, in step S2, the encoder module consists of an initial convolutional block and four sequentially connected downsampling stages; the initial convolutional block includes two convolutional layers with a kernel size of 3×3, a stride of 1, and padding of 1, and each convolutional layer is followed by a batch normalization layer and a ReLU activation function; the downsampling stage includes a convolutional block and a max pooling layer, wherein the last downsampling stage contains only one convolutional block.
[0016] Further, in step S2, the bottleneck layer module is located at the end of the encoder and adopts a bottleneck block structure with parallel convolutional branches; the parallel convolutional branches include: a first branch, which uses a 1×1 convolutional kernel and operates with a stride of 1; a second branch, which uses a 3×3 convolutional kernel, a stride of 1, and padding of 1; a third branch, which uses a 3×3 convolutional kernel, a stride of 1, and padding of 1; and a fourth branch, which first uses average pooling with a pooling window of 3×3, a stride of 1, and padding of 1, and then connects to a 1×1 convolution; the number of output channels of all branches is set to one-quarter of the number of input channels; the outputs of the parallel convolutional branches are concatenated in the channel dimension and feature fusion is performed through a processing layer containing a batch normalization layer and a ReLU activation function.
[0017] Further, in step S2, the decoder module consists of four consecutive upsampling stages; each stage includes an upsampling block, a skip connection from the corresponding layer of the encoder, a convolutional block with the same structure as the encoder, and an edge enhancement module; the upsampling block uses transposed convolution with a kernel size of 2×2 and a stride of 2; the skip connection upsamples the feature map of the corresponding layer of the encoder to align the spatial dimensions, and then concatenates it with the current decoder features in the channel dimension;
[0018] The edge enhancement module processes the input features through convolutional layers to generate edge response features and reduces their number of channels. The edge response features are then reweighted based on channel attention weights generated by global average pooling and subsequent convolutional operations. The feature channel number is then restored and fused with the original input of the module via residual connections, outputting an edge probability map processed by the Sigmoid function.
[0019] Further, in step S2, the multi-scale edge feature fusion module is used to receive the edge probability map output by the edge enhancement module in the first three upsampling stages of the decoder, uniformly upsample the edge feature map to the target size, stitch it along the channel dimension, and process it through two consecutive 3×3 convolutional layers to generate a unified edge feature expression.
[0020] Furthermore, in step S2, the dual-branch output module includes a main segmentation branch and an edge refinement branch;
[0021] The main segmentation branch is used to concatenate the features output by the last convolutional block of the decoder with the features output by the multi-scale edge feature fusion module along the channel dimension, and then perform convolution processing to generate the main segmentation map of the cavity region.
[0022] The edge refinement branch is used to perform convolution processing on the features output by the multi-scale edge feature fusion module to generate an edge refinement map.
[0023] Further, in step S3, the edge-aware hybrid loss function is:
[0024]
[0025] in, For Dice's loss, For binary cross-entropy loss, For edge consistency loss, For edge refinement loss; , , , These are the weighting coefficients corresponding to each loss.
[0026] Further, in step S4, the preprocessing of the input image includes: cropping the image using an image center-symmetric cropping strategy; scaling the cropped image to a fixed size; and normalizing and standardizing the pixel values of the scaled image.
[0027] In the binarization process, a probability threshold of 0.5 is set, and pixels with a probability value greater than this threshold are identified as cavity regions.
[0028] Furthermore, in step S5, the parameters describing the geometry of the cavity include: area, perimeter, and equivalent diameter. Solidity, Eccentricity, Roundness, Aspect Ratio, and Edge Smoothness;
[0029] The evolution law analysis based on time series data includes: taking a series of binary segmentation masks obtained by collecting data in chronological order during the same physical simulation experiment and processing them through steps S1 to S4 as input; calculating the morphological parameters of the masks at each time point based on the masks; fitting the changing trend of each parameter over time using a linear regression method; and quantitatively predicting the changes in the cavity area, shape regularity, expansion direction preference, and eccentricity by comprehensively analyzing the rate of change of each parameter. The rate of change is defined as the slope of the linear regression.
[0030] The beneficial effects of this invention are:
[0031] 1. This invention constructs a deep collaborative network architecture for weak boundary segmentation of salt cave images, achieving high-precision and geometrically consistent segmentation. Addressing the core challenges of blurred and complex salt cave boundaries, a collaborative mechanism of edge enhancement, multi-scale fusion, and dual-branch output is designed, supplemented by an edge-aware hybrid loss function for closed-loop optimization. The modules in this architecture are functionally orthogonal and exhibit significant collaborative gains, resulting in a complete model performance far exceeding the expectations of simply stacking the modules. Experiments show that compared to the basic U-Net, the edge similarity of this invention's model is improved by more than 56%. On the test set, its overall segmentation accuracy significantly outperforms general models such as DeepLabV3+, verifying the non-obviousness and technological advancement of this dedicated architecture.
[0032] 2. This invention significantly improves the adaptability and robustness of the model in the special imaging scenario of salt caves; through edge enhancement and multi-scale fusion mechanism, the model's ability to identify gray-scale gradient boundaries is significantly enhanced; combined with joint supervision of two branches, the ability to capture irregular, small-scale cavity contours is effectively improved; even under the most challenging imaging conditions such as initial cavity formation and extremely low contrast, the method of this invention can still maintain excellent segmentation accuracy and boundary fidelity, overcoming the performance limitations of general models in such scenarios.
[0033] 3. This invention achieves a seamless process from intelligent image segmentation to quantitative engineering analysis, possessing direct engineering application value. It not only achieves high-precision pixel-level segmentation but also, through an efficient hybrid annotation strategy and a built-in morphological feature analyzer, automatically extracts multi-dimensional parameters such as cavity area, shape, and contour, and can analyze their evolutionary trends based on time-series data. This overcomes the bottlenecks of traditional manual methods being inefficient, subjective, and conventional segmentation techniques unable to directly support quantitative engineering analysis. It provides an efficient and objective intelligent analysis tool for cavity construction process optimization, interlayer stability assessment, and storage safety evaluation. Simultaneously, the system's processing efficiency meets the timeliness requirements of practical engineering applications, demonstrating significant practical value. Attached Figure Description
[0034] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0035] Figure 1 This is a schematic diagram of the network architecture of the image segmentation model with edge optimization and dense multi-module connections according to an embodiment of the present invention.
[0036] Figure 2 This is a schematic diagram of the bottleneck layer module structure according to an embodiment of the present invention.
[0037] Figure 3 This is a comparison chart of the model prediction results in an embodiment of the present invention.
[0038] Figure 4 This is a graph showing the change in training loss according to an embodiment of the present invention.
[0039] Figure 5 This is a graph showing the change in edge similarity according to an embodiment of the present invention.
[0040] Figure 6 The figures show the experimental results of the model comparison in the embodiments and comparative examples of the present invention.
[0041] Figure 7 This is a comparison of the ablation experiment results before and after an embodiment of the present invention. Detailed Implementation
[0042] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0043] like Figure 1 As shown in the embodiment, a method for intelligent recognition of salt cave morphology based on edge optimization and evolutionary constraints is provided, including the following steps:
[0044] Step S1: Constructing a physical simulation experiment image dataset of layered salt rock water-soluble cavity construction: This step aims to establish a set of physical simulation experiment image datasets of salt cavern water-soluble cavity construction covering multiple working conditions, providing a data foundation for subsequent model training and validation.
[0045] In this embodiment, a physical model simulating the layered salt rock strata structure was first constructed. The model sample structure consisted of a composite structure formed by alternating stacks of high-purity Pakistani rock salt blocks and artificial interlayer casting frames, bonded with epoxy resin. The salt blocks used had a soluble content of 99.8% and a density of 2177 kg / m³. 3 The horizontal dimensions of the model are 300mm×150mm, and the vertical dimensions are set to three specifications of 25mm, 40mm and 50mm to simulate different thicknesses of salt rock layers. The horizontal dimensions of the interlayer casting frame are also 300mm×150mm, and the vertical thicknesses are set to four specifications of 5mm, 10mm, 15mm and 20mm to study the influence of interlayer thickness. The final physical model contains a total of 4 interlayers with a maximum thickness of 20mm.
[0046] During the water-soluble cavity-building experiment, the entire process of cavity formation and evolution was synchronously recorded using a high-resolution industrial camera. The acquired images had a resolution of 5472×3648 pixels, were in RGB three-channel true color format with a bit depth of 24 bits, and were stored in PNG lossless format to ensure the integrity of image details. To systematically study the influence of various key operating parameters on the cavity morphology evolution, the constructed dataset comprehensively covered multiple variable combinations in its design. Specifically, this included setting low, medium, and high water injection flow rates to simulate different cavity-building rates; and selecting mudstone. Gypsum and a mixture of mudstone and salt rock were used as interlayers to characterize different dissolution properties; four different interlayer thicknesses of 5 mm, 10 mm, 15 mm, and 20 mm were set; in addition, to simulate the process of controlling the cavity morphology by adjusting the position of the water injection pipe in actual cavity construction, the dataset also included different frequencies of pipe inversion (i.e., water injection pipe lifting and lowering operations), which were specifically set to 2, 4, and 6 times in this experiment; this multi-factor combination design scheme ensures that the dataset can fully cover various key working conditions, thus possessing good diversity and representativeness.
[0047] To obtain high-quality training labels, this invention employs a three-level hybrid annotation strategy to accurately label cavity regions in images. First, automatic annotation preprocessing is performed: based on the characteristic that salt rock exhibits a specific deep red color in experimental images, the RGB image is converted to the HSV color space, which is relatively insensitive to changes in illumination. By setting thresholds for the hue channels within the ranges of 0-10 degrees and 160-180 degrees, cavity regions are initially identified. Next, morphological opening operations are performed on the image using a 3×3 elliptical kernel to remove noise, followed by closing operations using a 7×7 elliptical kernel to fill small holes, thus obtaining an initial binary mask. Second, manual fine annotation is performed: [The text abruptly ends here, likely due to an incomplete sentence or missing information.] Ten percent of the keyframe images representing different evolutionary stages of the cavity were selected and meticulously hand-drawn by professional annotators using interactive tools such as LabelMe. Emphasis was placed on ensuring the accuracy of the contour boundaries, the continuity of the mask region, and the consistency between different annotation results. Finally, a hybrid annotation fusion was implemented: prioritizing manual annotation, the initial binary mask generated by the aforementioned automatic annotation was used as a base, and a completed, higher-quality manual annotation file was used to directly replace its corresponding automatic annotation result. This generated a unique and reliable final label file for each image in the dataset. This strategy effectively ensured annotation accuracy while also considering the overall efficiency of the annotation work.
[0048] Step S2: Construct an image segmentation model with edge optimization and dense multi-module connections (EO-MMDC-UNet):
[0049] To achieve accurate segmentation of salt cavern morphology, especially addressing the challenges of weak boundaries and complex shapes, this invention designs an image segmentation model with edge optimization and dense multi-module connections, denoted as EO-MMDC-UNet, as follows: Figure 1 As shown, this model, based on the classic U-Net architecture, aims to enhance the extraction and reconstruction capabilities of relevant features, especially edge features, by introducing multiple dedicated modules and constructing a collaborative mechanism between modules. The model of this invention is mainly composed of core components such as an encoder module, a bottleneck layer module, a decoder module, a multi-scale edge feature fusion module, and a dual-branch output module.
[0050] The encoder module consists of an initial convolutional block and four sequentially connected downsampling stages. The initial convolutional block (ConvBlock) uses two convolutional layers with a kernel size of 3×3, a stride of 1, and padding of 1. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function, and its output channel count is set to 32. Each downsampling stage consists of a convolutional block and a max pooling layer in sequence. The max pooling layer uses a 2×2 window and operates with a stride of 2. The encoder module gradually expands the receptive field of the features through progressive downsampling, and the number of output channels is doubled from 32 to 64, 128, 256, and finally reaches 512 to extract deep multi-scale features. It should be noted that the last downsampling stage only contains the convolutional block and omits the max pooling layer.
[0051] The bottleneck layer module is located at the end of the encoder and is used to aggregate multi-scale contextual information; this module adopts a bottleneck block (InceptionBlock) structure with parallel convolutional branches, such as... Figure 2 As shown, its four parallel convolutional branches are as follows: the first branch is a convolutional branch using a 1×1 kernel with a stride of 1, used to reduce dimensionality and capture local features; the second branch is a convolutional branch using a 3×3 kernel with a stride of 1 and padding of 1, used to capture medium-range features; the third branch is a convolutional branch using the same parameters as the second branch, i.e., a 3×3 kernel with a stride of 1 and padding of 1, used to further extract features; the fourth branch is a branch that first performs average pooling with a pooling window of 3×3, a stride of 1, and padding of 1, and then connects to a 1×1 convolution, used to capture local contextual information; the number of output channels of all branches is set to one-quarter of the number of input channels. These features are concatenated along the channel dimension and then fused through a post-processing layer containing a batch normalization layer and a ReLU activation function. Figure 2 The relevant labels are explained as follows: filterconcat represents multi-scale edge feature concatenation; BatchNorm2d represents batch normalization layer, abbreviated as BN; ReLU is an activation function that sets negative values to 0, introducing non-linearity.
[0052] The decoder module consists of four consecutive upsampling stages. Its core function is to gradually restore the spatial resolution of the feature map and reconstruct the segmentation map. Each stage includes an upsampling block, a skip connection from the corresponding layer of the encoder, a convolutional block with the same structure as the encoder, and an edge enhancement module. The upsampling block is implemented using transposed convolution with a kernel size of 2×2 and a stride of 2, which doubles the size of the input feature map while halving the number of channels. The skip connection upsamples the feature map of the corresponding layer of the encoder to align the spatial dimensions and then concatenates it with the current decoder features in the channel dimension to achieve the fusion of shallow details and deep semantic information.
[0053] The edge enhancement module aims to adaptively strengthen the model's focus on cavity boundary regions. This module first processes the input features through two 3×3 convolutional layers with a stride of 1 and padding of 1 to generate edge response features, while simultaneously reducing the number of channels by half to compress the features. Then, to focus on key edge channels, a global average pooling layer and two 1×1 convolutional layers generate channel attention weights, reweighting the edge response features. Next, a 1×1 convolutional layer restores the number of channels in the weighted edge response features, and a residual connection is established with the module's original input to fuse the original information and stabilize training. Finally, the module outputs a single-channel edge probability map processed by the Sigmoid function.
[0054] The multi-scale edge feature fusion module aims to integrate edge features from the first three different stages of the decoder to generate a unified and robust edge feature representation. This module receives the edge probability maps output by the edge enhancement module in the first three upsampling stages of the decoder as input, namely the first edge feature map, the second edge feature map, and the third edge feature map. First, the first, second, and third edge feature maps are uniformly upsampled to the target size. In this embodiment, the upsampling operation is implemented through bilinear interpolation, specifically using the F.interpolate function in the PyTorch framework, with the mode parameter set to 'bilinear'. Next, the upsampled feature maps are concatenated along the channel dimension and fed into two consecutive convolutional layers for deep fusion and dimensionality reduction. Both convolutional layers are 3×3 convolutions with stride and padding set to 1, and the output channel numbers are 32 and 32 respectively. Finally, the module outputs a fused edge feature map with 32 channels.
[0055] The dual-branch output module is the key to the collaborative optimization of regions and boundaries in this model. It includes the main segmentation branch and the edge refinement branch. The outputs of the two branches represent the probability that each pixel belongs to the cavity region and the probability that it belongs to the boundary, respectively, which together constitute the final prediction result of the model.
[0056] The main segmentation branch is responsible for generating the main segmentation map of the cavity region. This branch first concatenates the 32-channel features output from the last convolutional block of the decoder with the 32-channel edge features output from the multi-scale edge feature fusion module along the channel dimension to obtain a 64-channel fused feature. Then, the fused feature is processed by a 3×3 convolutional layer with a stride of 1 and padding of 1 to reduce the number of channels to 32. After that, a 1×1 convolutional layer is used to generate a single-channel main segmentation probability map, which is finally processed by the Sigmoid activation function and output.
[0057] The edge refinement branch works independently and is dedicated to generating fine edge refinement maps. This branch directly processes the 32-channel edge features output by the multi-scale edge feature fusion module. First, it reduces the number of feature channels to 16 through a 3×3 convolutional layer with a stride and padding of 1. Then, it generates a single-channel edge refinement map through a 1×1 convolutional layer and finally outputs it through the Sigmoid activation function.
[0058] Step S3: Train the model using an edge-aware hybrid loss function:
[0059] To effectively optimize model parameters, this step employs an edge-aware hybrid loss function specifically designed for salt cave image segmentation. The EO-MMDC-UNet model is trained under supervision; the loss function is composed of four complementary loss components weighted together, aiming to simultaneously optimize the overall accuracy of segmentation regions, pixel-level classification confidence, and geometric consistency of boundary prediction; its mathematical expression is defined as:
[0060]
[0061] in, For Dice's loss, For binary cross-entropy loss, For edge consistency loss, For edge refinement loss; , , , The weighting coefficients corresponding to each loss are set as follows in this embodiment: , , , .
[0062] The weighting coefficients are set based on the following considerations: Dice loss directly optimizes the overlap between the predicted and real regions, is naturally robust to class imbalance, and is the dominant loss for salt hole segmentation, hence it is given the highest weight α=0.6; Binary Cross-Entropy loss provides independent supervision pixel by pixel, which helps stabilize gradient updates in the early stages of training, but its sensitivity to edge details is lower than that of Dice loss, so its weight is set to β=0.3; Edge Consistency loss effectively improves boundary localization accuracy by constraining the geometric consistency between the edges of the main segmentation result and the real edges, but since the main segmentation map is already strongly constrained by Dice and BCE, the weight of this loss should not be too high, so it is set to γ=0.1; Edge Refinement loss directly supervises the output of the edge refinement branch, which independently learns boundary features and plays a key role in improving the final edge quality, but to avoid excessive interference with the main task learning, its weight is slightly higher than that of the Edge Consistency loss, set to δ=0.2. The above weighting allocation has been verified by multiple sets of pre-experiments to achieve the best balance between region segmentation accuracy and edge geometric fidelity.
[0063] The specific definitions of each loss component are as follows:
[0064] Dice loss is used to measure the overlap between the predicted region and the actual region, and its calculation formula is as follows:
[0065]
[0066] in, It is a true binary mask; The main segmentation probability map predicted by the model; It is a very small smoothing constant, which is taken as 10 in this embodiment. -6 This is used to prevent the denominator from being zero.
[0067] The binary cross-entropy loss provides pixel-by-pixel classification supervision, and its calculation formula is as follows:
[0068]
[0069] in, The total number of pixels in the image. , These are the true label value (0 or 1) of the i-th pixel and the probability value predicted by the model, respectively.
[0070] Edge consistency loss is used to constrain the consistency between the boundaries of the main segmentation probability map and the true boundaries; from the main segmentation probability map... Real binary mask The edge map is extracted using the Sobel operator, denoted as . , The loss is calculated based on the mean square error between the two:
[0071]
[0072] in, , These represent the edge map pixel values extracted from the predicted mask and the real mask at pixel position i, respectively.
[0073] The edge refinement loss directly supervises the output of the edge refinement branch, making it approximate the edge map extracted from the real mask. The calculation formula is as follows:
[0074]
[0075] in, The edge probability map output by refining the edge branches. Its value at pixel i; the above edge map , and All gradients were extracted using the Sobel operator, which is derived by calculating the gradients of the image in the horizontal and vertical directions using two sets of 3×3 convolution kernels.
[0076] In terms of training configuration, this embodiment uses the Adam optimizer to update model parameters, with an initial learning rate set to 3×10. -4 First-order moment estimation of attenuation rate The second moment estimates the attenuation rate at 0.9. The value is 0.999; the batch size during training is set to 2, the maximum training period is 20, and an early stopping mechanism is adopted, that is, training is terminated when the validation set loss does not decrease for 5 consecutive periods; 10% of the dataset is first randomly selected as an independent test set, which is not used in the model training and validation process; the remaining 90% of the data is randomly divided into training set and validation set in a ratio of 8:2.
[0077] Step S4: Perform cavity morphology segmentation on the input image based on the trained model:
[0078] This step aims to load the optimal model weights trained in step S3 and saved through the early stopping mechanism on the validation set, i.e., the model parameters with the minimum validation loss; using this optimized EO-MMDC-UNet model, forward inference is performed on new salt cave experimental images to finally obtain the binarized segmentation mask of the cavity region; the specific process is as follows:
[0079] S41: Input image preprocessing:
[0080] First, the input high-resolution raw image was preprocessed, with an original size of 5472×3648 pixels. Based on previous experimental calibration, a centrally symmetrical cropping strategy was adopted to crop a core region from the center of the image. This region has a width of 3450 pixels and a height of 2800 pixels, with starting coordinates of approximately (1011, 424) and ending coordinates of approximately (4461, 3224). This region can completely cover the effective range of the salt rock specimen and cavity evolution, and remains consistent in images at different time points.
[0081] Subsequently, the cropped image is scaled to a fixed size, namely 384×384 pixels. The scaling method uses bilinear interpolation to directly stretch and adjust the 3450×2800 image to 384×384. Although this operation will change the aspect ratio of the image from about 1.232 to 1, the change is small and its impact on the geometry of the cavity body is negligible.
[0082] Next, the pixel values of the scaled image are normalized and standardized. First, the pixel values are normalized from the integer range [0, 255] to the floating-point range [0, 1]. Then, the mean and standard deviation are used for standardization, which are exactly the same as those used in the model training phase. The mean values are 0.485, 0.456, and 0.406, and the standard deviations are 0.229, 0.224, and 0.225, respectively. This set of statistics comes from the ImageNet dataset and is a common setting in the field of deep learning image processing to ensure that the distribution of input data is aligned with that of the training phase.
[0083] S42: Model Inference and Output:
[0084] The preprocessed image is input into the model with loaded weights; after forward propagation, the model outputs a master segmentation map and an edge refinement map simultaneously; the master segmentation map represents the pixel-level probability distribution of the cavity region.
[0085] The edge refinement map has two main uses during the inference stage: first, it can serve as auxiliary visualization information to help researchers observe the uncertainty estimation of the cavity boundary position by the model; second, it can be optionally used for post-processing optimization, such as superimposing it with the main segmentation map and correcting minor burrs on the boundary through morphological operations. In this embodiment, since the main segmentation map already meets the accuracy requirements, the edge refinement map is only used as an auxiliary output for reference and does not participate in the formation of the final segmentation mask.
[0086] S43: Post-processing and mask generation:
[0087] The main segmentation image is binarized; a probability threshold of 0.5 is set, and pixels with a probability value greater than this threshold are identified as cavity regions, thus obtaining the final binarized segmentation mask.
[0088] It should be noted that the specific parameters given in the above embodiments (such as target size 384×384, probability threshold 0.5, etc.) are only a preferred implementation of the present invention and do not constitute a limitation on the scope of protection. In practical applications, those skilled in the art can reasonably adjust the following parameters according to image resolution, computing resources, and segmentation accuracy requirements:
[0089] (1) Target size: The size of the input image of the model is set in the range of 256×256 to 512×512; increasing the size helps to retain more image details, but increases the computational overhead; decreasing the size can improve the inference speed, but may lose some edge information; the 384×384 used in this embodiment is a verified better balance point.
[0090] (2) Probability threshold: The threshold used to generate the final binary segmentation map is adjusted between 0.3 and 0.7. Lowering the threshold can improve the recall rate (reduce false negatives), but may also increase false positive areas. Raising the threshold can help improve the precision rate, but may lead to false negatives at weak boundaries. The default balance value of 0.5 is used in this embodiment. Users can adjust the emphasis on integrity or purity according to the actual scenario.
[0091] (3) Cropping size: The specific size of the image cropping area can be adapted according to the actual imaging range of the experimental device. Its core is to effectively cover the salt rock specimen, and is not limited to 3450×2800 given in the example.
[0092] (4) Scaling method: In addition to direct stretching and scaling, fill scaling that maintains the aspect ratio can also be used (i.e., first scale the image proportionally until the short side is equal to the target size, and then fill the long side with zero). This can eliminate geometric distortion, but it will introduce invalid fill areas. The two scaling methods usually have limited impact on the final segmentation result and are both within the scope of this invention.
[0093] Step S5: Perform morphological feature analysis and evolution trend prediction on the segmentation mask:
[0094] This step performs subsequent processing and quantitative analysis on the binarized segmentation mask generated in step S4 to extract multi-dimensional parameters describing the cavity geometry and analyze its evolution based on time series data; the specific process is as follows:
[0095] S51: Post-processing and morphological parameter extraction:
[0096] First, the binarized segmentation mask is post-processed to optimize its quality; morphological opening operations are applied to filter out small noise points, using a 5×5 pixel elliptical kernel as the structuring element; subsequently, connected component analysis is performed on the processed mask; in this embodiment of the invention, the independent cavity regions are labeled using the skimage.measure.label function in the Python image processing library scikit-image, and the quantitative morphological parameters of each region are extracted using the skimage.measure.regionprops function; a total of eight morphological parameters are extracted, and their definitions and calculation formulas are as follows:
[0097] (1) Area: The total number of pixels contained in the cavity area, which directly reflects the size of the cavity.
[0098] (2) Perimeter: The Euclidean length of the cavity region contour, which is automatically calculated by the aforementioned morphological parameter extraction function and reflects the true geometric length of the region boundary.
[0099] (3) Equivalent diameter The diameter of a circle with the same area as the cavity region is used to visually represent the characteristic dimensions of the cavity; the calculation formula is:
[0100]
[0101] (4) Solidity: The ratio of the cavity area to the area of its smallest convex hull, measuring the firmness or indentation of the area. The closer the value is to 1, the fuller the shape. The calculation formula is:
[0102]
[0103] Among them, the convex hull area It is the area covered by the smallest convex polygon of the region.
[0104] (5) Eccentricity: The eccentricity of an ellipse with the same second moment as the region, with a value range of [0,1]. It is used to describe the elongation of the shape. The larger the value, the more elongated the region.
[0105] (6) Roundness: A shape factor based on area and perimeter, used to evaluate how close a shape is to a circle. The closer the value is to 1, the more regular and circular the shape is. The calculation formula is:
[0106]
[0107] (7) Aspect Ratio: The ratio of the width to the height of the smallest bounding rectangle of the region, used to determine the expansion direction preference of the cavity; a ratio greater than 1 indicates lateral expansion; let the coordinates of the upper left corner of the bounding rectangle be... The coordinates of the lower right corner are The calculation formula is as follows:
[0108]
[0109] (8) Edge Smoothness: Used to quantify the smoothness of the cavity boundary; the closer the value is to 1, the smoother the edge. The calculation process is as follows: the cavity contour is extracted using the Canny edge detector, with the threshold set to a low threshold of 50 and a high threshold of 150; the contour is approximated as a polygon using the Douglas-Peucker algorithm, with the distance tolerance set to 0.01 times the contour perimeter; the smoothness is defined as:
[0110]
[0111] S52: Evolutionary Trend Prediction Method:
[0112] To provide in-depth support for cavity construction process analysis, this step establishes a method for predicting cavity evolution trends based on time-series morphological parameters. A series of binarized segmentation masks, collected sequentially during the same physical simulation experiment and processed through steps S1 to S4, are used as input. The aforementioned morphological parameters of the masks at each time point are calculated, and then a linear regression method is used to fit the changing trend of each parameter over time. The rate of change is defined as the slope of the linear regression; a positive slope indicates that the parameter increases with increasing dissolution time, and vice versa. By comprehensively analyzing the rate of change of each parameter, the following key evolution trends can be quantitatively predicted:
[0113] (1) Area change: reflects the state of the cavity being dissolved and expanded (area increases) or precipitated and contracted (area decreases).
[0114] (2) Changes in the regularity of shape: The change rate of roundness indicates whether the cavity shape tends to become more regular (roundness increases) or more complex (roundness decreases).
[0115] (3) Expansion direction preference: Identify the lateral expansion (increase in length-width ratio) or longitudinal expansion (decrease in length-width ratio) of the cavity by the rate of change of the aspect ratio.
[0116] (4) Eccentricity evolution: By analyzing the rate of change of eccentricity, the cavity tends to be slender (eccentricity increases) or rounded (eccentricity decreases).
[0117] To comprehensively and systematically evaluate the performance of the EO-MMDC-UNet model and related methods proposed in this invention, the following experimental verifications were conducted.
[0118] Experimental setup:
[0119] In this embodiment, the hardware and software environment on which the model training and evaluation depend is: PyTorch 2.9.1 deep learning framework, Python 3.10 programming language, NVIDIA GeForce RTX 3060 Laptop GPU and Intel Core i7-10870H CPU, and accelerated based on CUDA 11.8 parallel computing platform.
[0120] The key parameters and strategies during training are as follows: The optimizer used is the Adam optimizer, with an initial learning rate of... The first-order moment decay rate β1 = 0.9, and the second-order moment decay rate β2 = 0.999. The learning rate scheduling adopts the ReduceLROnPlateau strategy, which monitors the validation loss. When the loss no longer decreases, the learning rate is multiplied by a factor of 0.5 for decay. The patience value is set to 3 training epochs. The batch size is set to 2, and the maximum training epoch is 20. The early stopping mechanism is to terminate training early when the validation loss has not decreased for 5 consecutive training epochs. 10% of the dataset is first randomly selected as an independent test set, which is not used in the model training and validation process. The remaining 90% of the data is randomly divided into training set and validation set in an 8:2 ratio.
[0121] Evaluation criteria:
[0122] This embodiment uses the following six indicators to quantitatively evaluate the model segmentation performance. All indicators are averaged over the entire test set, and the indicators at each time point are calculated separately and then averaged.
[0123] (1) Intersection over Union (IoU): The area of the intersection of the predicted mask and the real mask divided by the area of the union.
[0124] (2) Dice coefficient: twice the intersection area divided by the sum of the areas of the predicted mask and the real mask.
[0125] (3) Precision: The proportion of true positives (TP) out of all predicted positives (TP+FP).
[0126] (4) Recall: The proportion of true positives (TP) out of all true positives (TP+FN).
[0127] (5) F1-score: the harmonic mean of precision and recall.
[0128] (6) Edge Similarity: This is used to specifically evaluate the accuracy of boundary prediction. Edge maps are extracted from the predicted mask and the ground truth mask, respectively. The Sobel operator is used, which calculates the approximate gradient values of the image in the horizontal and vertical directions using two sets of 3×3 convolution kernels. The cosine similarity between the two edge maps is calculated, with a value ranging from [0, 1]. The larger the value, the more consistent the predicted edge is with the ground truth edge. The calculation formula is:
[0129]
[0130] After clarifying the experimental setup and evaluation criteria, in order to visually demonstrate the model's performance at different dissolution stages, Figure 3 The model's recognition results are shown at four key time points (T=5h, 10h, 15h, 20h). The top row shows the original experimental images at the corresponding time points, where the cavities formed by the dissolution of salt rock are presented as dark red areas. It can be seen that the cavities gradually expand upwards and outwards from a small initial shape at the bottom as the dissolution time increases. The bottom row shows the predicted segmentation mask (red filled area) corresponding to the model. Visual comparison shows that the predicted mask highly overlaps with the actual cavity area. The quantitative evaluation indicators for each time point are: T=5h: IoU: 0.8291, Dice: 0.9065, Precision: 0.9134, Recall: 0.8998, F1-Score: 0.9065; T=10h: IoU: 0.9118, Dice: 0.9539, Precision: 0.9351, Recall: 0.9734, F1-Score: 0.9539; T=15h: IoU: 0.9372, Dice: 0.9676, Precision: 0.9503, Recall: 0.9855, F1-Score: 0.9676; T=20h: IoU: 0.9469, Dice: 0.9727, Precision: 0.9677, Recall: 0.9778, F1-Score: 0.9727.
[0131] The data above shows that as the dissolution process progresses, the cavity morphology evolves from an initial state of "small area and blurred boundaries" to a later state of "large area and clear outlines," and all evaluation indicators of the model show a continuous upward trend. Among them, the crossover ratio (CR) steadily increased from 0.8291 to 0.9469, and the F1-Score increased from 0.9065 to 0.9727. It is worth noting that even in the most challenging scenario of the early stage of dissolution (T=5h), when the cavity scale is small and the contrast between the boundary and the background is extremely low, the model still maintains an CR of 0.8291 and an F1-Score of 0.9065. This fully verifies that the model of this invention has excellent robustness in recognizing weak boundaries and small-scale targets.
[0132] Guided by an edge-aware hybrid loss function, the model of this invention exhibits rapid convergence and stable optimization during the training process; for example... Figure 4 As shown, the training loss and validation loss were approximately 0.42 and 0.57 respectively in the early stages of training. As training progressed, both loss curves decreased rapidly in the first few training cycles, reaching their lowest point around the 10th cycle (validation loss approximately 0.144). Subsequently, although the validation loss fluctuated slightly, it remained at a low level overall. Finally, the model stopped training in the 15th cycle under the early stopping mechanism, at which point the validation loss stabilized at approximately 0.19, indicating that the model did not overfit and the optimization process was efficient and stable. Simultaneously, the model's edge detection capability improved synchronously with training. Figure 5 As shown, the edge similarity indices of the training set and validation set both started from approximately 0.14 and 0.08 initially, and showed a steady upward trend with the increase of training cycles, reaching approximately 0.45 and 0.40 respectively after the 10th cycle. This intuitively reflects the continuous improvement of the model's ability to learn and generalize cavity boundary features. The rapid convergence of the loss curve and the steady increase in edge similarity together verify the effectiveness of the loss function and model architecture designed in this invention.
[0133] To systematically verify the superior performance of the EO-MMDC-UNet model proposed in this invention, this embodiment conducts comparative experiments with several mainstream semantic segmentation models. All experiments are conducted under identical conditions to ensure the fairness of the comparison.
[0134] Example (Complete Model of the Invention):
[0135] The proposed edge-optimized multi-module collaborative network (EO-MMDC-UNet) is employed. This model comprises an encoder module, a bottleneck layer module, a decoder module, a multi-scale edge feature fusion module, and a dual-branch output module. Training is supervised using the edge-aware hybrid loss function designed in this invention. This function is a weighted sum of Dice loss, binary cross-entropy loss, edge consistency loss, and edge refinement loss, with weights set to α=0.6, β=0.3, γ=0.1, and δ=0.2, respectively. The model is trained using the Adam optimizer with an initial learning rate of 3×10⁻⁶. -4 The first-order moment decay rate β1 = 0.9, and the second-order moment decay rate β2 = 0.999. The learning rate decay strategy adopts ReduceLROnPlateau, and the validation loss is monitored. When the validation loss does not decrease for 3 consecutive epochs, the learning rate is multiplied by 0.5. The early stopping mechanism works as follows: when the validation loss does not decrease for 5 consecutive epochs, training is terminated, and the model weights with the smallest validation loss are restored. The batch size is 2, and the maximum training cycle is set to 20 epochs. Apart from this, the dataset partitioning strategy, image preprocessing process, and hardware computing environment used for model training and testing are kept consistent.
[0136] Comparative Example 1 (BasicUNet):
[0137] Based on the classic U-Net architecture, it includes a standard encoder-decoder structure and skip connections. Its encoder consists of four downsampling stages, each containing two convolutional layers and one max-pooling layer. The decoder contains four upsampling stages. This model also uses the exact same edge-aware hybrid loss function as the example (weights α=0.6, β=0.3, γ=0.1, δ=0.2) to eliminate the influence of differences in loss functions on the comparison results. The optimizer and other training configurations are exactly the same as in the example. It should be noted that because BasicUNet lacks the edge enhancement module, multi-scale edge feature fusion module, and dual-branch output module designed in this invention, it cannot effectively utilize edge supervision information, leading to difficulty in converging the edge loss. However, to maintain the fairness of the comparison, the same loss function is still used for training.
[0138] Comparative Example 2 (UNet++):
[0139] UNet++ is a variant of U-Net with dense skip connections. It achieves multi-scale feature fusion through nested skip connections. The encoder structure is the same as that of Comparative Example 1, while the decoder introduces dense convolutional blocks and a deep supervision mechanism. The model also uses the exact same edge-aware hybrid loss function (weights α=0.6, β=0.3, γ=0.1, δ=0.2) as the example, and the optimizer and other training configurations are exactly the same as the example.
[0140] Comparative Example 3 (DeepLabV3+):
[0141] DeepLabV3+ is an advanced model based on dilated convolution and spatial pyramid pooling. This comparative example uses ResNet50 as its backbone network, which is randomly initialized (without any pre-trained weights), consistent with the initialization method of the example and comparative examples 1 and 2, to ensure the fairness of the comparison. A dilated spatial pyramid pooling module is introduced in the encoder part to capture multi-scale contextual information, and the resolution is restored by upsampling in the decoder. Its loss function also adopts the same edge-aware hybrid loss function as the example, and all training configurations and experimental environments are the same as those in the example.
[0142] After training, the salt cavern segmentation performance of the models obtained in Example 1, Comparative Example 2, and Comparative Example 3 was evaluated using the same independent test set. The test set accounted for 10% of the total original dataset, and the images in this test set were not used during model training and validation. The evaluation used Intersection over Union (IoU), Dice coefficient, F1-Score, and average inference time per batch of images as comprehensive evaluation indicators. The calculation method for each indicator was as follows: first, the indicator value was calculated for each image in the test set, and then the arithmetic mean was taken as the final result. The inference time test environment was a CPU platform (Intel Core i7-10870H @ 2.20GHz, single thread), and the batch size was uniformly 2. Detailed comparison results are recorded in Table 1.
[0143] Table 1. Performance comparison of different models on salt cave image segmentation task
[0144]
[0145] According to the experimental results shown in Table 1, the EO-MMDC-UNet model proposed in this embodiment of the invention achieves the best performance in the three core segmentation accuracy metrics: intersection-over-union ratio (IoU), Dice coefficient, and F1-Score, with values of 0.9200, 0.9582, and 0.9582, respectively. This performance is significantly better than the BasicUNet model in Comparative Example 1 (0.9158, 0.9557, 0.9557), the UNet++ model in Comparative Example 2 (0.8826, 0.9373, 0.9373), and the DeepLabV3+ model in Comparative Example 3 (0.8877, 0.9401, 0.9401). This result fully demonstrates that the present invention is designed for... The specialized design introduced to address the blurred boundaries and complex shapes of salt cave images, including an edge enhancement module, multi-scale edge feature fusion, a dual-branch output architecture, and an edge-aware hybrid loss function, effectively improves the model's accuracy in recognizing and segmenting targets with weak boundaries. Regarding model efficiency, the average inference time per batch in this embodiment is 1.223 seconds. While this is higher than Comparative Example 1's 0.521 seconds and Comparative Example 3's 0.799 seconds, it is lower than Comparative Example 2's 1.283 seconds. Considering the practical application scenario where salt cave physical simulation experiments typically involve image acquisition at minute intervals, the inference time of this invention fully meets the real-time processing requirements, demonstrating a good engineering balance between segmentation accuracy and computational efficiency.
[0146] Figure 6 The comparison of the segmentation results of the four models on typical test images is presented intuitively, including visualizations of the original image, the ground truth label mask, and the predicted mask of each model. Figure 6 The results show that the prediction results of the embodiment are better than those of the other pairs in terms of regional integrity and edge fineness. Especially in weak boundary regions, other models show varying degrees of missed detection or edge blurring, while the model of the present invention can still maintain a clear outline, which is consistent with the quantitative indicators in Table 1.
[0147] Further analysis of the performance of the comparative models reveals that: Comparative Example 3 (DeepLabV3+) has a faster inference speed, but its segmentation accuracy is lower in salt cave weak boundary scenarios. This reflects that its design, which relies on a large receptive field and spatial pyramid pooling, is difficult to adapt to the special image characteristics of blurred boundaries and irregular morphological evolution; Comparative Example 2 (UNet++) enhances feature reuse capability through dense skip connections, but incurs the highest computational overhead, and its accuracy is far lower than that of this invention; Comparative Example 1 (BasicUNet) has the fastest inference speed, but due to the lack of specific modeling and optimization of edge information, its segmentation accuracy is slightly inferior to that of this invention; In summary, only the embodiment of this invention, through the above-mentioned multi-module collaborative design and edge-aware supervision mechanism, achieves optimal overall performance while maintaining acceptable computational costs, thereby verifying the advancement and effectiveness of the technical solution of this invention.
[0148] To systematically verify the effectiveness and synergistic effect of the core modules designed in this invention, this embodiment compares the model performance under different module combinations through ablation experiments. The experimental results are as follows: Figure 7 As shown; Figure 7 In the diagram, the red-filled areas in each row represent the predicted segmentation masks generated by each model.
[0149] Baseline model definition:
[0150] To establish a performance benchmark, a basic U-Net architecture was first constructed. This model removed all edge optimization modules proposed in this invention (i.e., it did not include the edge enhancement module, multi-scale edge feature fusion module, and dual-branch output module), and replaced the bottleneck layer with a standard convolutional block (non-InceptionBlock structure). Specifically, this benchmark model adopted the same encoder-decoder structure as Comparative Example 1 (BasicUNet), and its loss function was exactly the same as the example, i.e., it adopted an edge-aware hybrid loss function (a weighted sum of Dice loss, binary cross-entropy loss, edge consistency loss, and edge refinement loss, with weight coefficients α=0.6, β=0.3, γ=0.1, and δ=0.2). It should be noted that although the loss function included an edge supervision term, due to the lack of an edge enhancement module, multi-scale edge feature fusion module, and dual-branch output module, the benchmark model could not effectively utilize the edge loss for optimization, and the edge loss had almost no positive impact on the training of this model. This benchmark model serves as a performance lower bound to measure the contribution of each optimization module.
[0151] Full model performance:
[0152] The complete model, including all three edge optimization modules (i.e., the EO-MMDC-UNet proposed in this invention), achieves an intersection-union ratio of 0.9146, a Dice coefficient of 0.9553, and an edge similarity of 0.5796. Compared with the baseline model, these key indicators are significantly improved by 11.7 percentage points, 8.6 percentage points, and 56.6 percentage points, respectively. This result fully demonstrates the effectiveness of the overall architecture of this invention in improving the accuracy of salt cave image segmentation.
[0153] Independent contribution analysis of each module:
[0154] To explore the independent contribution of each module in depth, the experiment was conducted by constructing three independent ablation models for verification. Each ablation model was constructed by disabling a single target module while keeping the other modules and training configuration unchanged.
[0155] (1) Disable the dual-branch output module:
[0156] Model structure modification: Only the main segmentation branch is retained, and the edge refinement branch is removed; specifically, after the decoder output, the dual-branch feature concatenation and parallel output are no longer performed, and a single-channel main segmentation map is generated only through a single convolutional layer; at the same time, all network layers related to the edge refinement branch (i.e., the self.edge_refine convolutional sequence) are removed.
[0157] Training configuration adjustment: Removed from the loss function The weights of the remaining three loss terms (Dice, BCE, and edge consistency loss) are renormalized according to their original proportions; since the weight of the removed term is δ=0.2, the sum of the remaining weights... Since the values of α, β, and γ remain unchanged, no adjustment is needed in actual training, but the composition of the loss function has changed; edge refinement loss is no longer calculated during training.
[0158] Experimental results: At this point, the crossover ratio of the model decreased to 0.8614, the Dice coefficient decreased to 0.9247, and the edge similarity decreased significantly to 0.4009. Compared with the complete model, the edge similarity decreased by 30.8%, which is the largest decrease among the three modules. This clearly shows that the dual-branch output module plays a crucial role in the model learning and generating geometrically consistent boundary information through independent edge refinement branches and corresponding loss supervision.
[0159] (2) Disable the multi-scale edge feature fusion module:
[0160] Model structure modification: The edge features from the first three stages of the decoder are no longer fused; only the edge feature map output from the first stage of the decoder is used as the edge representation. Specifically, the multi-scale edge feature fusion module (i.e., the edge_fusion convolutional sequence) is removed, and the feature map output from the first stage edge enhancement module of the decoder is directly input into the subsequent dual-branch output module. The edge enhancement modules of each stage of the decoder are retained, but cross-stage feature fusion is no longer performed.
[0161] Training configuration adjustment: The loss function remains unchanged (still containing four loss terms) because disabling a module does not affect the form of the loss function, but only the feature flow of the network's forward propagation.
[0162] Experimental results: At this point, the model's intersection-over-union ratio (IoU) decreased to 0.8626, the Dice coefficient decreased to 0.9254, and the edge similarity decreased to 0.4601; among them, the edge similarity decreased by 20.6% and the IoU decreased by 5.7%. This shows that fusing multi-scale edge information can effectively enhance the model's ability to simultaneously capture global contour structure and local detail features, and makes an important contribution to improving edge recognition quality and overall region segmentation accuracy.
[0163] (3) Disable the edge enhancement module:
[0164] Model structure modification: In each stage of the decoder, the edge enhancement module is removed; that is, Sobel edge extraction, channel attention weighting and residual connection operations are no longer performed on the features output by the decoder convolutional blocks, and the decoder output is directly passed to the next stage or the final output layer; at this time, the input of the multi-scale edge feature fusion module is changed to directly use the original output features of the convolutional blocks in each stage of the decoder.
[0165] Training configuration adjustment: The loss function remains unchanged because disabling the edge enhancement module does not affect the calculation of the loss function (the edge consistency loss and edge refinement loss are still calculated based on the main segmentation map and the edge refinement map, only the feature quality is reduced).
[0166] Experimental results: At this point, the crossover and union ratio (CUP) of the model decreased to 0.8542, the Dice coefficient decreased to 0.9097, and the edge similarity decreased to 0.5475. Notably, the CUP decreased by 6.6%, the highest among the three modules, while the edge similarity decreased by 5.5%, which was relatively small. This indicates that the edge enhancement module strengthens the boundary response in the feature map through the attention mechanism. Its main contribution is to more accurately distinguish the foreground (cavity) from the background, thus having the most direct and significant impact on improving the integrity and accuracy of region segmentation.
[0167] The ablation experiments described above fully demonstrate the necessity, effectiveness, and synergistic effect of each module in this invention. The absence of any single module leads to performance degradation, and different modules contribute differently to region segmentation and edge recognition: the dual-branch output module has the greatest impact on edge geometric consistency, the edge enhancement module contributes most significantly to region segmentation accuracy, and the multi-scale fusion module balances both. The complete model significantly outperforms any ablation version, verifying the irreplaceable nature and synergistic gain effect of the module combination in this invention.
[0168] Based on the above experimental results and analysis, the intelligent salt cave morphology recognition method based on edge optimization and evolutionary constraints proposed in this invention demonstrates significant technological advancement and engineering application value. Firstly, the constructed EO-MMDC-UNet model achieves outstanding performance in salt cave cavity segmentation. Its core innovation lies in addressing the inherent challenges of blurred boundaries, complex morphology, and multi-scale evolution in salt cave images by designing a deep collaborative framework consisting of an edge enhancement module, a multi-scale edge feature fusion module, a dual-branch output module, and an edge-aware hybrid loss function. Ablation experiments conclusively demonstrate that the absence of any of these modules leads to a significant decrease in model performance, confirming the necessity of each module design. More importantly, the modules do not work in isolation or are simply superimposed, but rather form a tight collaborative enhancement mechanism. The edge enhancement module focuses on strengthening local boundary responses, providing refined features for multi-scale fusion; the multi-scale fusion module adaptively integrates structural information at different levels to construct robust edge representations; and the dual-branch output and corresponding loss function construct a closed-loop optimization path for region segmentation and boundary refinement. This collaborative design, which involves feature enhancement, information fusion, and joint optimization, results in a performance improvement of the complete model that far exceeds the simple sum of the independent contributions of each module, producing a significant effect of "1+1+1>3".
[0169] The value of this method extends beyond obtaining high-precision pixel-level segmentation results; it also enables a closed-loop transformation from images to engineering knowledge. On one hand, the employed three-level hybrid annotation strategy significantly improves dataset construction efficiency while ensuring annotation quality. On the other hand, this method innovatively connects deep learning segmentation with quantitative engineering analysis. Through a built-in morphological feature analyzer, it can automatically extract morphological parameters covering multiple dimensions such as area, shape, and contour from the segmentation mask, and analyze the evolution trend of the cavity based on time-series data, thus providing a direct quantitative analysis tool for cavity construction process optimization and interlayer stability research. Ultimately, the entire system, while meeting the timeliness requirements of practical engineering applications, achieves intelligent analysis of salt cavern morphology that is "segmentable, quantifiable, and predictable," providing strong technical support for the construction and safety evaluation of layered salt rock underground reservoirs.
[0170] The various embodiments in this specification are described in a related manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0171] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention are included within the scope of protection of the present invention.
Claims
1. A method for intelligent recognition of salt cave morphology based on edge optimization and evolutionary constraints, characterized in that, Includes the following steps: Step S1: Construct an image dataset of physical simulation experiment of layered salt rock water dissolution cavity formation, and use a three-level hybrid annotation strategy to label the cavity regions in the images; Step S2: Based on the image dataset, construct the EO-MMDC-UNet image segmentation model with edge optimization and dense multi-module connections; the model consists of an encoder module, a bottleneck layer module, a decoder module, a multi-scale edge feature fusion module, and a dual-branch output module; Step S3: Train the model using an edge-aware hybrid loss function; the edge-aware hybrid loss function is composed of weighted loss components of Dice loss, binary cross-entropy loss, edge consistency loss, and edge refinement loss; Step S4: Preprocess the input image and process it using the model trained in Step S3 to output the main segmentation map and the edge refinement map; perform binarization on the main segmentation map to obtain a binarized segmentation mask; the edge refinement map provides refinement information of the cavity boundary to assist in the analysis. Step S5: Process and analyze the binarized segmentation mask, extract parameters describing the geometry of the cavity, and analyze the evolution pattern based on time series data.
2. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S1, the hybrid annotation strategy includes: The RGB image is converted to the HSV color space, and thresholds are set for the hue channels in the ranges of 0-10 degrees and 160-180 degrees for automatic annotation. Morphological opening operations are performed on the image using a 3×3 elliptical kernel, and morphological closing operations are performed using a 7×7 elliptical kernel to obtain an initial binary mask. 10% of the keyframe images in the dataset are selected and manually annotated using an interactive tool. Based on the binary mask, the corresponding automatic annotation results are directly replaced with manually annotated files to generate a unique final label file for each image in the dataset.
3. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S2, The encoder module consists of an initial convolutional block and four sequentially connected downsampling stages. The initial convolutional block includes two convolutional layers with a kernel size of 3×3, a stride of 1, and padding of 1. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function. The downsampling stage includes a convolutional block and a max pooling layer, wherein the last downsampling stage contains only one convolutional block.
4. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S2, The bottleneck layer module is located at the end of the encoder and adopts a bottleneck block structure with parallel convolutional branches. The parallel convolutional branches include: a first branch, which uses a 1×1 convolutional kernel and operates with a stride of 1; a second branch, which uses a 3×3 convolutional kernel, a stride of 1, and padding of 1; a third branch, which uses a 3×3 convolutional kernel, a stride of 1, and padding of 1; and a fourth branch, which first performs average pooling with a pooling window of 3×3, a stride of 1, and padding of 1, and then connects to a 1×1 convolution. The number of output channels of all branches is set to one-quarter of the number of input channels. The outputs of the parallel convolutional branches are concatenated along the channel dimension and feature fusion is performed through a processing layer containing a batch normalization layer and a ReLU activation function.
5. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S2, The decoder module consists of four consecutive upsampling stages; each stage includes an upsampling block, a skip connection from the corresponding layer of the encoder, a convolutional block with the same structure as the encoder, and an edge enhancement module; the upsampling block uses transposed convolution with a kernel size of 2×2 and a stride of 2; the skip connection upsamples the feature map of the corresponding layer of the encoder to align the spatial dimensions, and then concatenates it with the current decoder features in the channel dimension; The edge enhancement module processes the input features through convolutional layers to generate edge response features and reduces their number of channels; The edge response features are then reweighted based on channel attention weights generated by global average pooling and subsequent convolution operations; subsequently... The feature channel count is restored and fused with the original input of the module via residual connection, and the output is an edge probability map processed by the Sigmoid function.
6. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S2, The multi-scale edge feature fusion module is used to receive the edge probability map output by the edge enhancement module in the first three upsampling stages of the decoder, upsample the edge feature map to the target size, stitch it along the channel dimension, and process it through two consecutive 3×3 convolutional layers to generate a unified edge feature representation.
7. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S2, The dual-branch output module includes a main segmentation branch and an edge refinement branch; The main segmentation branch is used to concatenate the features output by the last convolutional block of the decoder with the features output by the multi-scale edge feature fusion module along the channel dimension, and then perform convolution processing to generate the main segmentation map of the cavity region. The edge refinement branch is used to perform convolution processing on the features output by the multi-scale edge feature fusion module to generate an edge refinement map.
8. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S3, the edge-aware hybrid loss function is: ; in, For edge-aware hybrid loss function, For Dice's loss, For binary cross-entropy loss, For edge consistency loss, For edge refinement loss; , , , These are the weighting coefficients corresponding to each loss.
9. The intelligent recognition method for salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S4, The preprocessing of the input image includes: cropping the image using a center-symmetric cropping strategy; scaling the cropped image to a fixed size; and normalizing and standardizing the pixel values of the scaled image. In the binarization process, a probability threshold of 0.5 is set, and pixels with a probability value greater than this threshold are identified as cavity regions.
10. The method for intelligent recognition of salt cave morphology based on edge optimization and evolutionary constraints according to claim 1, characterized in that, In step S5, The parameters describing the geometry of the cavity include: area, perimeter, and equivalent diameter. Solidity, Eccentricity, Roundness, Aspect Ratio, and Edge Smoothness; The evolution law analysis based on time series data includes: taking a series of binary segmentation masks obtained by collecting data in chronological order during the same physical simulation experiment and processing them through steps S1 to S4 as input; calculating the morphological parameters of the masks at each time point based on the masks; fitting the changing trend of each parameter over time using a linear regression method; and quantitatively predicting the changes in the cavity area, shape regularity, expansion direction preference, and eccentricity by comprehensively analyzing the rate of change of each parameter. The rate of change is defined as the slope of the linear regression.