A method for constructing a semantic segmentation neural network model for riverbank collapse monitoring scenarios

By constructing a teacher network and a lightweight student network, and combining dynamic computation mask and distillation loss function, the problems of fuzzy boundary detection and large model deployment in riverbank collapse monitoring are solved, achieving high-precision and low-power real-time monitoring results.

CN121937474BActive Publication Date: 2026-06-30NANJING HYDRAULIC RES INST

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING HYDRAULIC RES INST
Filing Date
2026-03-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies for monitoring riverbank collapse suffer from several problems, including difficulty in accurately segmenting weak boundary features, large model size making lightweight deployment impossible, and insufficient generalization ability of knowledge distillation.

Method used

A teacher network with region segmentation and boundary detection branches is constructed. Sparse and dense differential convolutions are performed by dynamically calculating masks to extract collaborative features. A lightweight student network is constructed, and the feature distribution is optimized by using a distillation loss function to generate a lightweight semantic segmentation model.

Benefits of technology

It achieves high-precision, low-power real-time monitoring of riverbank collapse, solves the problems of fuzzy boundary detection and high inference latency of edge models, and has high-precision and low-power real-time monitoring capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121937474B_ABST
    Figure CN121937474B_ABST
Patent Text Reader

Abstract

This invention discloses a method for constructing a semantic segmentation neural network model for riverbank collapse monitoring scenarios. The method includes: training a teacher network containing dual-path convolution and dynamic masking mechanisms using a riverbank collapse image dataset, introducing geometric consistency loss during training to strengthen boundary features; extracting collaborative correlation features from the teacher network that represent the structured dependencies between boundary detection and region segmentation tasks; constructing a single-branch lightweight student network, and using a distillation loss function containing alignment terms of collaborative correlation features to guide student network training. This invention eliminates background redundancy through dynamic sparse computation and transfers prior knowledge from the dual-task approach to the single-task network through function-attribution-guided collaborative distillation, solving the problems of fuzzy boundary detection and high inference latency at the edge of the model, thus achieving high-precision, low-power real-time monitoring.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of water conservancy monitoring and computer vision technology, and in particular to a method for constructing a semantic segmentation neural network model for riverbank collapse monitoring scenarios. Background Technology

[0002] Riverbank collapse is a common natural disaster in alluvial rivers. Its evolution is sudden and insidious. Traditional monitoring methods mainly rely on manual patrols or fixed-point measurements, making it difficult to achieve all-weather, high-precision real-time early warning. With the development of artificial intelligence technology, using semantic segmentation technology to automatically identify riverbank collapses from video surveillance images has become a research hotspot, which helps to ensure the safety of dikes and maintain the stability of waterways.

[0003] Currently, methods based on deep convolutional neural networks (such as U-Net and the DeepLab series) have been applied to feature extraction from remote sensing imagery or surveillance videos. These methods typically employ a deep encoder-decoder structure, performing numerous convolutional operations on a dense grid to extract features, and utilizing multi-scale fusion strategies to recover spatial details. In practical deployments, the trained large model is usually deployed on a cloud server, or the model is deployed on edge devices after simple channel pruning to achieve continuous monitoring of riverbanks.

[0004] However, existing technologies still face challenges in riverbank collapse monitoring scenarios, such as difficulty in accurately segmenting weak boundary features, large model size making lightweight deployment impossible, and insufficient generalization ability of knowledge distillation. Summary of the Invention

[0005] The purpose of this invention is to provide a method for constructing a semantic segmentation neural network model for riverbank collapse monitoring scenarios, thereby solving the aforementioned technical problems existing in the prior art.

[0006] The technical solution, on the one hand, is a method for constructing a semantic segmentation neural network model for riverbank collapse monitoring scenarios, including:

[0007] Obtain a dataset of bank collapse monitoring images that includes the bank collapse evolution process. The dataset includes the original images, corresponding bank collapse area segmentation labels, and boundary auxiliary labels.

[0008] A teacher network containing region segmentation and boundary detection branches is trained using a dataset. During training, a dynamic computation mask is generated based on the local texture attributes and gradient direction attributes of the input feature map. Based on the dynamic computation mask, a dual-path convolution calculation with sparse and dense differentiation is performed in the feature extraction layer to obtain the trained teacher model.

[0009] Extract collaborative correlation features representing the structured dependency between boundary detection and region segmentation tasks in the teacher model, and construct a lightweight student network with a single-branch structure;

[0010] A distillation loss function is constructed based on collaborative association features. The distillation loss function is used to constrain the feature distribution of the lightweight student network. The parameters of the lightweight student network are optimized by combining the dataset to obtain a lightweight semantic segmentation model.

[0011] Beneficial effects: This invention eliminates background redundancy through dynamic sparse computation and transfers dual-task prior knowledge to a single-task network through function-based collaborative distillation, solving the problems of fuzzy boundary detection and high inference latency of edge-end models, and achieving high-precision, low-power real-time monitoring. Attached Figure Description

[0012] Figure 1 A flowchart illustrating the semantic segmentation neural network model construction method for riverbank collapse monitoring scenarios provided in this application embodiment.

[0013] Figure 2 This is a flowchart illustrating the generation of boundary auxiliary labels provided in an embodiment of this application.

[0014] Figure 3 This is a flowchart illustrating the generation of a dynamically calculated mask based on the local texture attributes and gradient direction attributes of the input feature map, as provided in an embodiment of this application.

[0015] Figure 4 This is a flowchart illustrating the consistency of statistical gradient direction angles within a preset spatial neighborhood, as provided in an embodiment of this application.

[0016] Figure 5 This is a flowchart illustrating the process of generating a dynamically calculated mask based on a local complexity graph and a direction consistency graph, using a differentiable threshold function, as provided in an embodiment of this application. Detailed Implementation

[0017] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0018] It should be noted that the terms "first," "second," etc., in the specification and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "including" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that includes a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0019] To address the aforementioned issues, the applicant conducted in-depth searches and analyses, and discovered:

[0020] Correspondingly, the boundaries of collapsed banks exhibit weak texture and strong noise. Conventional convolutional networks struggle to capture subtle cracks and steep slopes in complex water surfaces, resulting in blurred boundary segmentation.

[0021] Furthermore, the dense computing networks used in pursuit of high precision have a large number of parameters, making it difficult to run in real time on edge terminals with limited computing power, while model pruning can cause the loss of key features.

[0022] Furthermore, lightweight distillation methods typically focus only on fitting feature map values, neglecting the inherent structured dependency between boundary extraction and region segmentation tasks in bank collapse detection. This results in student networks failing to truly inherit the reasoning logic of teacher networks and exhibiting insufficient generalization ability in complex scenarios.

[0023] To solve these problems, combined with Figures 1 to 5 The present invention will be specifically described through the following embodiments.

[0024] This application provides an exemplary scheme for constructing a semantic segmentation neural network model for riverbank collapse monitoring scenarios. In other words, it corresponds to a lightweight semantic segmentation model construction method for riverbank collapse monitoring. It addresses the technical problems of existing semantic segmentation models in riverbank collapse monitoring scenarios, such as blurred edge detection, high computational resource consumption, and insufficient utilization of cross-task feature associations. Specifically, it includes:

[0025] Step 101: Obtain a bank collapse monitoring image dataset containing the bank collapse evolution process. The dataset includes the original images, corresponding bank collapse area segmentation labels, and boundary auxiliary labels.

[0026] Specifically, high-resolution images of the bank collapse area can be collected periodically by deploying fixed monitoring cameras along the riverbank or by drones. The raw images are in RGB format, with a resolution of 512×512 pixels or higher to preserve sufficient texture detail. The collected raw images require manual or semi-automatic annotation. The bank collapse area segmentation label is a pixel-level binary image, where a pixel value of 1 represents a bank collapse area, and a pixel value of 0 represents a non-collapse area, such as water, vegetation, or normal embankment. Boundary auxiliary labels are specifically designed to enhance the model's sensitivity to edges; they contain weight information indicating the transition from the defined bank collapse boundary to both sides, used to give greater attention to pixels near the boundary during training.

[0027] Step 102: Train a teacher network that includes a region segmentation branch and a boundary detection branch using the dataset. During the training process, a dynamic computation mask is generated based on the local texture attributes and gradient direction attributes of the input feature map. Based on the dynamic computation mask, a dual-path convolution calculation with sparse and dense differentiation is performed on the feature extraction layer to obtain the trained teacher model.

[0028] The teacher network is designed as a dual-branch structure with a large number of parameters, which can fully extract feature information from the bank collapse image. Specifically, the teacher network can use ResNet-50 or ResNet-101 residual networks as the backbone feature extraction network, and then it is divided into two parallel branches: one is a region segmentation branch, which is used to output a large-scale region mask; the other is a boundary detection branch, which focuses on extracting edge lines.

[0029] Optionally, this step introduces a dynamic sparse convolution mechanism to improve the efficiency and accuracy of the teacher network when processing bank collapses with specific morphological features (such as linear cracks and steep slopes). Traditional convolutional networks perform dense computation on all pixels equally, resulting in low information content in most areas (such as gentle water surfaces or grasslands). Therefore, this approach analyzes the local attributes of the input feature map. Local texture attributes reflect the roughness and complexity of the image, while gradient direction attributes indicate the direction of potential edges. Based on these two attributes, the network automatically generates a binary or soft-valued dynamically computed mask. High response values ​​in this mask, close to 1, correspond to the bank collapse boundary and its surrounding complex regions, while low response values, close to 0, correspond to a flat background.

[0030] Based on this, the feature extraction layer is designed as a dual-path structure: the dense computation path targets high-response regions in the mask and uses standard convolution kernels, such as 3×3 convolutions, to preserve spatial details; the sparse computation path targets low-response regions in the mask and uses lightweight convolution operations, such as depthwise separable convolutions or dimensionality-reducing convolutions, to extract only the necessary background semantic information.

[0031] Furthermore, the two feature paths are fused using a mask as weights. This mechanism allows the teacher network to concentrate its computational resources on the most critical collapse boundaries, achieving feature extraction efficiency during the training phase and laying the foundation for subsequent guidance of the student network.

[0032] Step 103: Extract collaborative correlation features representing the structured dependency between the boundary detection task and the region segmentation task in the teacher model, and construct a lightweight student network with a single branch structure.

[0033] Optionally, a lightweight student network needs to be built. This student network typically adopts a single-branch structure to reduce the additional overhead of inference branches, and its backbone network can be a lightweight architecture such as MobileNetV3 or ShuffleNetV2.

[0034] In the teacher network, the boundary branch and the region branch are not independent; they have complementary and constraining relationships at the feature level. That is, the boundary defines the outline of the region, and the region fills the interior of the boundary. Extracting collaboratively related features refers to calculating the statistical correlation between boundary feature channels and region feature channels from the intermediate layer feature maps of the teacher network. For example, the Pearson correlation coefficient between the feature map of the i-th channel of the boundary branch and the feature map of the j-th channel of the region branch can be calculated to construct a correlation coefficient matrix. This matrix not only contains the spatial information of the image but also encodes knowledge of how boundary features assist in region segmentation and how region semantics constrain boundary extraction—that is, the structured dependencies between tasks.

[0035] Step 104: Construct a distillation loss function based on collaborative association features, use the distillation loss function to constrain the feature distribution of the lightweight student network, and optimize the parameters of the lightweight student network in combination with the dataset to obtain a lightweight semantic segmentation model.

[0036] The process of training the student network, as described above, is a process of knowledge distillation. Besides calculating the cross-entropy loss using the labels in the dataset, a distillation loss function also needs to be constructed. This distillation loss function is used to minimize the difference between the feature association patterns generated by the student network and the co-association features of the teacher network.

[0037] In other words, it is not required that every feature value of the student network be consistent with that of the teacher network, because the two have different architectures and feature dimensions. Instead, it is required that the feature channels within the student network can also exhibit a boundary-region collaborative pattern similar to that of the teacher network.

[0038] Alternatively, since the student network has a single-branch structure and lacks independent boundary detection and region segmentation branches, feature dimension alignment and function attribution are required when calculating the student's correlation coefficient matrix. Specifically, after the output of each layer of the encoder in both the teacher and student networks, a channel projection layer is added. This channel projection layer consists of a 1×1 convolutional kernel and is used to map feature maps from different layers to a unified channel dimension.

[0039] For the teacher network, the projected shallow features are selected as boundary features and the deep features as region features to calculate the student network, which has a dimension of d×d. For the student network, the projected shallow features are also selected as boundary proxy features and the deep features as region proxy features, and the student network is obtained using the same Pearson correlation coefficient calculation method, which also has a dimension of d×d.

[0040] Specifically, the norm distance, such as the L2 norm, between the correlation coefficient matrices of teachers and students can be calculated and used as the distillation loss term. During training, the total loss function is a weighted sum of the task loss and the distillation loss. This total loss function is then used to continuously update the weight parameters of the student network through backpropagation algorithms, such as stochastic gradient descent (SGD) or the Adam optimizer.

[0041] Optionally, the total loss function is a weighted sum of the task loss and the distillation loss, as shown in the following formula:

[0042] L _student =L _task +λ _d ×L _distill ;

[0043] Among them, L _student Let L be the total loss function for student network training. _task L is the cross-entropy loss calculated based on the segmentation labels of the collapsed shore regions in the dataset. _distill The distillation loss is the L2 norm distance between the teacher's correlation coefficient matrix and the student's correlation coefficient matrix, λ. _d λ represents the balance weight for distillation losses. This weight can be determined experimentally on the validation set. _d The specific value of λ. In one optional implementation, λ _d The value is 1.0.

[0044] It should be understood that the resulting lightweight semantic segmentation model not only possesses the inference speed of a lightweight network, but also inherits the structured prior knowledge of the teacher network when dealing with complex boundary collapses through distillation, thus achieving a balance between speed and accuracy.

[0045] This application also describes how to generate boundary auxiliary labels containing uncertainty information, and how to improve the model's generalization ability through data preprocessing. Before soft labeling, the acquired raw images need to be preprocessed and enhanced to ensure the quality and diversity of the input data. Since bank collapse monitoring is often carried out outdoors under complex lighting and weather conditions, the acquired images may be blurry or distorted.

[0046] Optionally, the Structural Similarity Simulation (SSIM) algorithm can be used to extract keyframes from the video stream and remove redundant data. For the extracted keyframes, a bilateral filtering algorithm is used for denoising, which can remove noise while preserving edge information. For example, the spatial domain standard deviation of the bilateral filter is set to 75, and the pixel domain standard deviation to 0.1. Furthermore, distortion correction is performed on the image using a pre-calibrated camera intrinsic matrix and distortion coefficients to eliminate barrel distortion caused by wide-angle lenses.

[0047] Based on this, data augmentation operations are performed. Considering the diversity of bank collapse morphology, random scale perturbations can be applied to the images, with scaling factors ranging from 0.75 to 1.5 times; simultaneously, perspective transformation is applied to simulate the shooting effects of drones at different pitch angles.

[0048] Step 201: Identify the deterministic bank collapse boundary in the bank collapse area segmentation label, and define the preset range on both sides of the deterministic bank collapse boundary as the uncertain area.

[0049] In this context, the labels for bank collapse areas are usually manually labeled. However, in actual physical scenarios, the boundaries of bank collapses, such as soil cracks or water-soil interfaces, are transition zones rather than absolute binary boundaries. To enable the model to learn the ambiguity and uncertainty of edges, a deterministic bank collapse boundary B with a single pixel width is extracted from the binary labels using edge detection algorithms (such as the Canny operator) or morphological operations (such as the difference between dilation and erosion).

[0050] Next, taking the deterministic boundary B as the center, a certain pixel distance is extended to both sides (the collapsed side and the non-collapsed side) to form a strip-shaped region, defined as the uncertain region. The width of the uncertain region is determined by a preset threshold δ. _width Decision. For example, if δ is set... _width If the value is 15 pixels, then all pixels within a 15-pixel range on both sides of the boundary are considered to be in the uncertain region.

[0051] Step 202: Calculate the normalized distance from each pixel within the uncertain region to the deterministic collapse boundary.

[0052] Furthermore, for each pixel p within the uncertain region, calculate its Euclidean distance D to the nearest point on the deterministic collapsing boundary B. _pixelFurthermore, using a preset width threshold δ _width Normalizing the Euclidean distance yields the normalized distance D. _norm This calculation process provides a unified metric for edge bands of different widths.

[0053] D _norm =D _pixel / δ _width ;

[0054] Among them, D _norm For the normalized distance, D _pixel Let δ be the Euclidean distance from pixel p to the nearest boundary. _width D is the half-width threshold for the uncertain region. For points on the boundary, D _norm D is 0; for points on the edge of the uncertain region, D is 0. _norm The value is 1.

[0055] Step 203: Generate a boundary pixel weight map representing the pixel membership gradient based on the normalized distance, as a boundary auxiliary label.

[0056] Correspondingly, the boundary auxiliary labels are not 0 or 1, but rather a continuous probability map or weight map with values ​​between 0 and 1. In other words, the closer a pixel is to the boundary, the higher its probability or importance of belonging to the boundary; the farther away, the lower its probability. This gradual design guides the model to impose different degrees of penalty on prediction deviations near the boundary during training, forcing the model to output smoother segmentation results that closely match the real physical edges.

[0057] Specifically, a quadratic function or a Gaussian function can be used to map the normalized distance to weight values. The weight values ​​can be calculated using the following formula:

[0058] W _pixel =1-(D _norm ) 2 ;

[0059] Among them, W _pixel D represents the weight value of this pixel in the boundary auxiliary label. _norm The superscript 2 indicates the square operation for the normalized distance.

[0060] According to this formula, the pixel point located on the deterministic boundary, i.e., D _norm =0, its weight W _pixel A value of 1 indicates the highest level of certainty; the weight decreases parabolically as the distance increases; when the edge of the uncertainty region is reached, i.e., D... _normWhen the weight is 1, the weight is reduced to 0. For background pixels outside the uncertain region, their weights are uniformly set to 0. It should be understood that the generated boundary pixel weight map will serve as a supervisory signal, participating in the training process of the teacher and student networks along with the original binary labels, playing a crucial role, especially in calculating the weighted cross-entropy loss or function alignment loss.

[0061] This application also describes the specific implementation process of front-end dynamic mask generation in the DSC-Bank module for calculating dynamic sensitive areas. This is used to solve the computational redundancy problem in bank collapse monitoring, and can identify complex areas in the image that require focused calculation, such as cracks and steep bank edges, and negligible flat areas, such as calm water surfaces, guiding subsequent convolution calculations. Specifically, it includes:

[0062] Step 301: For the input feature map output by the feature extraction layer of the teacher network, calculate the local feature variance in the local neighborhood centered on each pixel to obtain a local complexity map representing the texture complexity.

[0063] Accordingly, the input feature map is typically derived from the shallow or mid-level outputs of a teacher network (such as ResNet), and its dimensions can be represented as (C, H, W), where C is the number of channels, and H and W are the spatial dimensions. Optionally, a statistical metric of local feature variance is used to evaluate the texture complexity of each pixel. For each location (x, y) on the feature map, a sliding window centered thereon is defined, such as a 3×3 or 5×5 square window.

[0064] Specifically, the mean vector of all pixel feature vectors within the window is calculated. The sum of the squares of the Euclidean distances between each pixel feature vector and the mean vector within the window is calculated, and the average is used to obtain the local variance value at that location. The larger the variance value, the more drastic the feature changes within the neighborhood, corresponding to areas with rich texture or edge boundaries; the smaller the variance value, the more uniform the features within the neighborhood, corresponding to flat regions.

[0065] To eliminate differences in the numerical ranges of features at different levels and facilitate subsequent processing, the calculated variance plot can optionally be normalized. The normalization formula is as follows:

[0066] M _var (x, y) = (σ(x, y) - σ) _min ) / (σ _max -σ _min );

[0067] Among them, M _var (x, y) represents the value at coordinates (x, y) in the normalized local complexity graph, σ(x, y) represents the original calculated local variance, and σ _min and σ _maxThese are the minimum and maximum values ​​of all local variances in the current feature map, respectively.

[0068] Step 302: Calculate the gradient components of the input feature map in the horizontal and vertical directions, and determine the gradient direction angle of each pixel based on the gradient components.

[0069] Besides texture complexity, gradient direction is also a basis for judging boundary properties. This is because bank collapse cracks have directionality, while the direction of noise or messy textures is random. The Sobel or Prewitt operator can be used to process the input feature map. If it is multi-channel, the channel mean or maximum value can be taken first to compress it into a single channel; then convolution operations are performed to obtain the horizontal gradient components G. _x and the gradient component G in the vertical direction _y .

[0070] Based on these two components, the gradient magnitude and gradient direction angle of each pixel can be calculated. The formula for calculating the gradient direction angle is as follows:

[0071] θ(x, y) = arctan(G) _y (x, y) / G _x (x, y));

[0072] Where θ(x, y) is the gradient direction angle at pixel (x, y), and its value range is usually [-π / 2, π / 2] or [-π, π]. _y and G _x These represent the vertical and horizontal gradient components, respectively, with arctan being the arctangent function. This angle reflects the normal direction of the local edge.

[0073] Step 303: Calculate the consistency of gradient direction angles within a preset spatial neighborhood to obtain a direction consistency map representing the possibility of boundary existence.

[0074] The consistency of the statistical gradient direction angle within the preset spatial neighborhood is achieved through the following method:

[0075] Based on the landslide area segmentation label or the prediction results of the previous frame, the principal component analysis method is used to estimate the direction of the main boundary of the landslide in the current image.

[0076] An anisotropic elliptical neighborhood is constructed based on the direction of the main boundary of the collapsed bank. The major axis radius of the anisotropic elliptical neighborhood along the direction of the main boundary of the collapsed bank is greater than the minor axis radius in the vertical direction.

[0077] A weighted directional consistency coefficient is calculated within the anisotropic elliptical neighborhood, and a directional deviation penalty factor is introduced to suppress pixel responses where the gradient direction is inconsistent with the direction of the main boundary of the collapse bank.

[0078] A directional consistency graph is generated based on the directional consistency coefficient weighted by deviation penalty.

[0079] This scheme proposes a morphology-guided anisotropic statistical method, which requires determining the direction of the main boundary of the landslide. This can be achieved by performing principal component analysis (PCA) on the boundary pixel set in the segmentation result of the previous frame or the coarse segmentation mask of the current frame. The coordinates (x, y, z) of all boundary pixels are collected. _i y _i Construct the covariance matrix:

[0080] Cov=(1 / N)×Σ[(p _i -p _mean )×(p _i -p _mean ) T ];

[0081] Where Cov is a 2×2 covariance matrix, N is the total number of boundary pixels, and p _i Let p be the coordinate vector of the i-th pixel. _mean Let be the coordinate mean vector, Σ denotes the summation operation, and the superscript T denotes transpose. Perform eigenvalue decomposition on this covariance matrix; the eigenvector corresponding to the largest eigenvalue is the principal boundary direction vector, and its corresponding angle is denoted as θ. _main .

[0082] Based on this, an anisotropic elliptical neighborhood stretched along the principal direction is constructed. The major axis of this neighborhood is perpendicular to θ. _main Consistent, the minor axis is perpendicular to it. Assume the major axis radius is R. _long The minor axis radius is R _short And R _long >R _short For example, R _long =5,R _short =2. For any point within the neighborhood, we can determine whether it lies within the ellipse through coordinate rotation transformation, as follows:

[0083] x _new =x×cos(θ _main )+y×sin(θ _main );

[0084] y _new =-x×sin(θ _main )+y×cos(θ _main );

[0085] If (x _new / R _long ) 2 +(y _new / R _short ) 2If ≤1, then the point is in the neighborhood.

[0086] Where, x _new y _new These are the x and y coordinates of the point in the new coordinate system after rotation.

[0087] Next, the directional consistency coefficient within the neighborhood is calculated, and a directional deviation penalty factor can be introduced. For each pixel in the neighborhood, not only its gradient magnitude is considered, but also its gradient direction θ(x, y) and its relationship with the principal boundary direction θ are examined. _main The included angle Δθ. If the included angle is small, it indicates that it is part of a coastal crack, so it is given a high weight; if the included angle is large, it indicates that it is a wave or noise perpendicular to the shoreline, so it is given a low weight or even a negative weight.

[0088] Furthermore, the formula for calculating the directional deviation penalty factor can be:

[0089] P _dir =exp(-k×|sin(θ(x,y)-θ) _main )|);

[0090] Among them, P _dir The penalty factor is k, and the sensitivity coefficient is k = 2. The sin function is used to extract the sinusoidal component of the angle difference.

[0091] In this model, each pixel value in the orientation consistency map is the sum of the products of the gradient magnitudes of all pixels in its anisotropic neighborhood and the penalty factor. This design allows the model to view the crack along the shoreline while ignoring lateral water wave interference.

[0092] In other embodiments, further, since the gradient direction reflects the edge normal direction, while the main boundary direction θ of the collapsed bank... _main This reflects the direction of the edge tangent; therefore, the gradient direction at the pixel of the collapsed shore boundary is related to θ. _main The included angle between them should be close to π / 2. Based on this, the formula for calculating the direction deviation penalty factor is:

[0093] P _dir =exp(-k×|cos(θ(x,y)-θ) _main )|);

[0094] Among them, P _dir Here, |cos| is the penalty factor, and k is the sensitivity coefficient (e.g., k=2). The cosine function measures the parallelism between the gradient direction and the tangent direction of the principal boundary. When the gradient direction is perpendicular to the principal boundary direction, i.e., at the true collapse boundary, |cos| approaches 0, and P... _dir When the gradient direction approaches 1, it is given a high weight; when the gradient direction is parallel to the principal boundary direction, such as in the case of lateral water wave interference, |cos| approaches 1, and P _dirApproaching exp(-k), it is given low weight. This design allows the model to view the crack along the shoreline while ignoring lateral water wave interference.

[0095] In some other embodiments, if the gradient direction is nearly perpendicular to the main boundary direction, it indicates that there is a collapsed shoreline boundary extending along the shoreline at that pixel, and a high weight is given; if the gradient direction is nearly parallel to the main boundary direction, it indicates that the gradient at that pixel may originate from lateral water waves or random noise, and a low weight is given.

[0096] Optionally, before constructing the anisotropic elliptical neighborhood based on the main boundary direction of the collapsed bank, the following steps are also included:

[0097] The number of boundary pixels used to estimate the main boundary direction is statistically analyzed.

[0098] If the number of boundary pixels is lower than a preset threshold, the main boundary direction estimation is deemed unreliable, the anisotropic elliptical neighborhood is degenerated into an isotropic square neighborhood, and the direction deviation penalty factor is set to a constant value.

[0099] The system first checks the number of boundary pixels N used for PCA calculation. If N is less than a preset threshold, such as 20 pixels, or if the ratio of the two eigenvalues ​​of the covariance matrix is ​​close to 1, it indicates that the shape does not have a clear directionality, and the principal orientation estimation is deemed unreliable. In this case, the algorithm automatically degenerates to using a conventional square neighborhood, i.e., R0. _long =R _short The direction will deviate from the penalty factor P _dir Set to a constant value of 1 to avoid misleading the model with incorrect main directions.

[0100] According to one aspect of this application, for the first frame of a video sequence, the direction of the main boundary of the collapsed bank in the current image is estimated using principal component analysis, employing a two-stage initialization strategy, namely:

[0101] In the gradient coarse estimation stage, edge detection is performed on the first frame image to obtain the gradient magnitude and gradient direction angle of each pixel. Strong edge pixels with gradient magnitude exceeding the average of the whole image are selected. Histogram statistics of gradient direction angle of strong edge pixels are performed with magnitude weighting. The angle corresponding to the peak of the histogram is selected as the coarse estimate of the main boundary direction.

[0102] In the segmentation and fine estimation stage, the coarse estimate is used to complete the forward inference of the first frame to obtain the prediction result of the bank collapse area. Based on the prediction result, the boundary pixels are extracted and the principal component analysis is used to calculate the principal boundary direction of the fine estimate for subsequent direction consistency calculation.

[0103] In this step, when estimating the direction of the principal boundary of the collapsed bank using principal component analysis, it is necessary to distinguish between the following two cases:

[0104] For non-first frames of a video sequence, the prediction result of the previous frame can be used directly as a reference;

[0105] For the first frame of image, due to the lack of historical prediction information, a specially designed two-stage initialization strategy is required to obtain reliable estimates of the principal boundary direction.

[0106] Accordingly, the initialization process of the main boundary direction of the first frame includes a coarse gradient estimation stage and a fine segmentation estimation stage.

[0107] In the coarse gradient estimation stage, Sobel edge detection is performed on the first frame of the original image, calculating the horizontal and vertical gradient components to obtain the gradient magnitude at each pixel location. A gradient magnitude threshold is set as the average gradient magnitude of the entire image, and pixels with gradient magnitudes exceeding this threshold are selected to form a set of strong edge pixels.

[0108] Next, a weighted histogram is calculated for the gradient orientation angles of each pixel within the set of pixels with strong edges. The horizontal axis of the histogram represents the discretized orientation angle intervals, with an angle resolution set to 10 degrees, dividing the range from 0 to 180 degrees into 18 angle intervals; the vertical axis of the histogram represents the sum of the gradient magnitudes of pixels falling within each angle interval. The formula for calculating the weighted histogram is as follows:

[0109] H(θ _k )=ΣG _mag (p)×I[θ(p)∈[θ _k -Δθ / 2, θ _k +Δθ / 2]];

[0110] Wherein, H(θ) _k G represents the histogram statistics for the k-th angle interval. _mag (p) represents the gradient magnitude at pixel p, θ(p) represents the gradient direction angle at pixel p, Δθ is the angular resolution (10 degrees), I[…] is the indicative function, which takes a value of 1 when the condition is met and 0 otherwise, Σ represents the summation over all pixels in the set of strong edge pixels, and θ… _k Let be the center angle of the k-th angle interval.

[0111] Traverse all angle intervals and select the center angle corresponding to the angle interval with the largest histogram statistical value as a coarse estimate of the main boundary direction of the first frame: θ _main_init =argmax(H(θ _k ));

[0112] Where, θ _main_init The coarsely estimated principal boundary orientation angle is represented by argmax, which indicates the angle range that maximizes the histogram value. This coarse estimate reflects the direction in which the edge response is most concentrated in the first frame image, and can provide an initial reference for subsequent anisotropic neighborhood construction.

[0113] In the fine segmentation estimation stage, the principal boundary directions obtained from the coarse estimation are used to complete the forward inference of the first frame image, obtaining the preliminary segmentation prediction results for the bank collapse area. Based on these prediction results, the set of boundary pixel coordinates of the bank collapse area is extracted, and the principal boundary directions are recalculated using the principal component analysis method.

[0114] Furthermore, the coordinates of all pixels on the boundary of the collapsed area in the predicted segmentation mask are collected, a coordinate covariance matrix is ​​constructed and eigenvalue decomposition is performed, and the direction of the eigenvector corresponding to the largest eigenvalue is taken as the principal boundary direction of the fine estimate. This fine estimate will replace the coarse estimate and be used for the direction consistency calculation of subsequent feature layers in the first frame, serving as the historical reference direction for the next frame.

[0115] Furthermore, in the special case where the first frame scene is a homogeneous background (such as a pure water surface or a normal embankment without bank collapse), the weighted histogram in the coarse estimation stage may not have a peak, meaning the statistical values ​​in each angle interval are similar. In this case, the peak significance index is defined as the ratio of the maximum value of the histogram to the mean, i.e.:

[0116] R _peak =max(H(θ _k )) / mean(H(θ _k ));

[0117] Among them, R _peak This is a peak significance index, where max represents the maximum value and mean represents the mean value. When R... _peak When the value is less than a preset threshold (e.g., 1.5), it is determined that there is no dominant boundary direction in the current frame. In this case, the dominant boundary direction is set to the horizontal direction (0 degrees) by default, and is automatically updated after the collapse boundary is detected in subsequent frames. The degradation processing mechanism helps to achieve the robustness of the algorithm in various scenarios and avoids subsequent calculation errors caused by invalid direction estimation.

[0118] Step 304: Based on the local complexity map and the orientation consistency map, a dynamic computation mask is generated using a differentiable threshold function, wherein the high response value region in the dynamic computation mask corresponds to the collapse boundary region that requires dense computation.

[0119] Among them, based on the local complexity graph and the orientation consistency graph, a dynamically calculated mask is generated using a differentiable threshold function, including:

[0120] The mean global complexity, standard deviation of global complexity, and mean global orientation consistency of the current frame image are calculated as image-level statistics.

[0121] Based on image-level statistics and learnable sensitivity coefficients, the complexity threshold and consistency threshold adapted to the texture distribution of the current frame image are dynamically calculated.

[0122] Using the sigmoid function as a soft-threshold activation function, a continuously differentiable soft mask with values ​​between 0 and 1 is generated based on the local complexity map, the directional consistency map, and the complexity and consistency thresholds, serving as a dynamically computed mask. The sigmoid function corresponds to the sigmoid activation function.

[0123] Furthermore, it is necessary to combine the two feature maps obtained earlier, i.e., the complexity map M. _var And consistency graph M _dir This is then fused into a decision mask. To adapt to differences in images under different lighting and environments—for example, some images are generally blurry while others are generally clear—the threshold cannot be fixed; it must be adaptive.

[0124] Accordingly, calculate the statistics for the entire graph: the mean global complexity μ. _var Standard deviation σ _var and the mean of global directional consistency μ _dir .

[0125] Furthermore, two dynamic thresholds are defined, namely:

[0126] T _var =μ _var +α×σ _var ;

[0127] T _dir =μ _dir ×β;

[0128] Among them, T _var T is the complexity threshold. _dir The consistency threshold is defined by α and β, which are learnable parameters in the network, i.e., sensitivity coefficients. Initial values ​​can be set to α=1.0 and β=1.2, allowing the threshold to fluctuate based on the overall distribution of the image content.

[0129] Next, a soft mask is generated. To ensure that backpropagation of the neural network can proceed (i.e., reducibility), a step function (set to 1 for values ​​greater than a threshold and 0 for values ​​less than a threshold) cannot be used directly. Instead, a sigmoid function is used as a soft switch. The fusion logic is as follows:

[0130] M _soft =Sigmoid(w1×(M _var -T _var )+w2×(M _dir -T _dir ));

[0131] Among them, M _softThe generated soft mask has values ​​ranging from (0, 1); w1 and w2 are coefficients that adjust the weights of two metrics, such as w1=w2=5, used to control the steepness of the Sigmoid function. When the pixel complexity or consistency is significantly higher than the threshold, M... _soft It tends towards 1; conversely, it tends towards 0.

[0132] Furthermore, a pass-through estimator (STE) mechanism was employed during the training phase. The specific operation is as follows:

[0133] During the forward propagation process, the soft mask is binarized into a hard mask to participate in the convolution calculation;

[0134] During backpropagation, the binarization operation is skipped, and the network parameters are updated directly using the gradient of the soft mask.

[0135] In other words, during the forward propagation phase, the soft mask is binarized to generate the hard mask M. _hard If M _soft If M > 0.5, then M _hard =1; otherwise M _hard =0. Next, use M. _hard It participates in the calculation of dual-path convolution, making the sparse path truly sparse.

[0136] During the backpropagation phase, since the hard binarization operation is non-differentiable, it is skipped, and the loss function is directly applied to M. _hard The gradient is passed to M _soft That is, grad(M) _soft )=grad(M _hard ), where grad corresponds to the gradient operator; it achieves both computational acceleration and preserves the channel for parameter updates.

[0137] As an example, this paper describes the specific structure of the dual-path convolution computation in the DSC-Bank module backend and the geometric consistency loss used by the teacher during network training. Computational efficiency is improved through decoupling of the physical structure, and the technical problem of blurred collapsing bank boundaries is addressed by introducing geometric constraints. This example can be implemented by performing the following steps:

[0138] Step 401: Perform sparse and dense differential dual-path convolution calculations on the feature extraction layer based on the dynamically calculated mask, including:

[0139] By calling the dense computation path, for regions in the dynamically computed mask where the response value approaches 1, a standard convolution kernel is used to perform pixel-by-pixel convolution operations to preserve spatial detail features;

[0140] The sparse computation path is invoked, and lightweight computation is performed using a depthwise separable convolutional structure for regions in the dynamically computed mask where the response value approaches 0. The depthwise separable convolutional structure sequentially includes point convolutions for reducing channel dimensions, depthwise convolutions for extracting spatial features, and point convolutions for restoring channel dimensions.

[0141] By using a dynamically calculated mask as a weighting coefficient, the output features of dense computation paths and sparse computation paths are weighted and fused to obtain the output feature map of the current feature extraction layer.

[0142] Among them, the mask is calculated dynamically, for example, a soft mask M during training. _soft Or the hard mask M obtained through STE _hard During inference, a hard mask is used, splitting the feature extraction process into two parallel paths.

[0143] Furthermore, a dense computation path is used to process regions in the mask with values ​​of 1 or close to 1, corresponding to the collapsed shoreline. This path employs standard convolution operations, typically a 3×3 convolution kernel with a stride of 1 and padding of 1, maintaining the feature map size unchanged.

[0144] For example, assuming the input feature map has dimensions (64, H, W), dense paths use 64 3×3 convolutional kernels, and the output dimension remains (64, H, W). Full-channel, full-precision computation preserves the spatial details and semantic information of the boundaries, ensuring segmentation accuracy.

[0145] Furthermore, a sparse computation path is used to handle background regions in the mask with values ​​of 0 or close to 0. This path is used for lightweighting. An improved depthwise separable convolutional structure, also known as a bottleneck structure, can be employed. The specific process is as follows:

[0146] Optionally, a 1×1 convolution kernel can be used to reduce the number of channels from C. _in Compress to C _mid The selectable compression ratio is 4, or C. _mid =C _in / 4;C _in C represents the initial number of channels in the input feature map. _mid This represents the number of intermediate channels after channel compression. For example, 64 channels can be compressed into 16 channels, reducing the amount of subsequent computation.

[0147] Optionally, a 3×3 convolution kernel is used to perform channel-by-channel convolution on the compressed channels. Each channel is computed independently, without inter-channel fusion.

[0148] Optionally, a 1×1 convolution kernel can be used to reduce the number of channels from C. _mid Restore to C _in For example, restoring from 16 to 64 helps to integrate with dense path outputs.

[0149] Compared to standard convolution, this structure can reduce the computational cost to 1 / 8 or even less.

[0150] Furthermore, a weighted fusion is performed. The fusion formula is as follows:

[0151] F _out =M×F _dense +(1-M)×F _sparse ;

[0152] Among them, F _out For the final output feature map, M is a dynamically calculated mask, broadcast to the same dimension as the feature map, and F... _dense For dense path output, F _sparse Output is a sparse path. In the boundary region, the output is mainly determined by the dense path; in the background region, the output is mainly determined by the sparse path; in the transition region, if it is a soft mask, it is a weighted sum of the two.

[0153] Step 402, training a teacher network containing region segmentation and boundary detection branches using the dataset, also includes constructing a geometric consistency loss function to constrain the spatial consistency between the region segmentation and boundary detection branches. The process of constructing the geometric consistency loss function includes:

[0154] The signed distance transformation field is calculated based on the segmentation label of the collapsed bank region. The spatial gradient of the distance transformation field is calculated, and the boundary normal unit vector pointing to the outside of the collapsed bank region is obtained.

[0155] Calculate the spatial gradient vector of the region segmentation probability map output by the region segmentation branch;

[0156] Within a predefined boundary neighborhood, the directional difference between the spatial gradient vector of the region segmentation probability map and the boundary normal unit vector is calculated. This directional difference is used as the geometric consistency loss function to constrain the direction of change of the region segmentation probability to be consistent with the true boundary normal.

[0157] This scheme introduces a geometric consistency loss, utilizing mathematical field theory to constrain the boundaries. Based on the real-world collapsing bank region segmentation labels, a signed distance transform field (SDF) is calculated. For each pixel p in the image, its SDF value Φ(p) is defined as the distance from that pixel to the nearest boundary. If the pixel is within the collapsing bank region, the distance is negative; if it is outside the region, the distance is positive; and vice versa, as long as the signs are reversed.

[0158] Furthermore, the spatial gradient ▽Φ(p) of the SDF field is calculated. According to geometric principles, this gradient vector at the boundary is the unit normal vector n. _GT (p) points to the direction of the normal to the outer (or inner) side of the collapsed bank area.

[0159] n _GT (p)=▽Φ(p) / ||▽Φ(p)||;

[0160] Where ▽ represents the gradient operator, and ||...|| represents the vector magnitude.

[0161] Furthermore, for the probability graph P of the region segmentation branch output... _pred The value ranges from 0 to 1, and its spatial gradient vector ▽P is calculated similarly. _pred (p). Ideally, the descent direction of the probability map at the boundary, i.e., the reverse direction of the gradient, should be consistent with the true boundary normal. In other words, the reverse direction of the gradient must be consistent with the direction of the true boundary normal vector. If the two directions are inconsistent, it indicates that the predicted boundary is distorted or deviates from reality.

[0162] Therefore, the geometric consistency loss is defined and calculated only within a preset neighborhood Ω near the boundary, for example, within 5 pixels from the boundary, as shown in the following formula:

[0163] L _geo =Σ[1-(▽P _pred (p)•n _GT (p)) / (||▽P _pred (p)||+ε)];

[0164] Where Σ represents the summation over all pixels in the neighborhood Ω, • represents the vector dot product, and ε is a small quantity to prevent the denominator from being zero. This formula is used to calculate the cosine distance between the predicted gradient direction and the true normal. When they are in the same direction, the dot product is 1 and the loss is 0; when they are perpendicular or opposite, the loss increases.

[0165] Based on this, the total training loss function L of the teacher network _total It consists of three parts:

[0166] Cross-entropy loss L of region segmentation branch _seg ;

[0167] Weighted cross-entropy loss L of boundary detection branch _edge Weighting is achieved using boundary-assisted labels;

[0168] Geometric consistency loss L _geo .

[0169] Accordingly, L _total =L _seg +λ1×L _edge +λ2×L _geo ;

[0170] Here, λ1 and λ2 are hyperparameters for balancing the weights, which can be set, for example, λ1=10 and λ2=5. Through multi-task and multi-dimensional constraints, the trained teacher network will have high boundary localization accuracy, providing a high-quality knowledge source for subsequently guiding the student network.

[0171] As another example, a basic method for extracting collaborative association features from a teacher network is described, along with the specific mathematical process for constructing a collaborative association matrix. Here, the knowledge in the teacher network is not simply the numerical values ​​of the feature maps, but rather the statistical associations between feature channels. Accordingly, this example can be implemented using the following steps:

[0172] Step 501: Extract intermediate layer feature maps from the boundary detection branch and the region segmentation branch of the teacher model, respectively;

[0173] Calculate the Pearson correlation coefficient between each channel of the intermediate layer feature map of the boundary detection branch and each channel of the intermediate layer feature map of the region segmentation branch;

[0174] The correlation coefficients of all channel pairs obtained are organized into a correlation coefficient matrix, which serves as a co-correlation feature. Each element in the correlation coefficient matrix represents the degree of response synergy between a boundary feature channel and a regional feature channel.

[0175] This involves extracting collaborative correlation features to capture the intrinsic relationship between boundaries and regions at the feature level. To quantify this relationship, a correlation coefficient matrix needs to be constructed. Intermediate layer feature maps are extracted from the teacher network. Assume the extracted boundary branch feature map is F. _edge Its dimension is (C _e (H, W); the region branch feature map is F _seg Its dimension is (C _s H, W). Among them, C _e and C _s H and W are the number of channels for the two branches (e.g., 64 or 256), and H and W are the spatial dimensions of the feature map.

[0176] Based on this, the two-dimensional feature map (H, W) of each channel is flattened into a one-dimensional vector with a length of N = H × W.

[0177] For the i-th channel vector v of the boundary branch _ei and the j-th channel vector v of the region branch _sj Calculate the Pearson correlation coefficient r between the two. _ij The calculation formula is as follows:

[0178] r _ij =Cov(v _ei v _sj ) / (σ _ei ×σ_sj );

[0179] Among them, Cov(v _ei v _sj ) is a vector v _ei and v _sj covariance, σ _ei and σ _sj They are vectors v _ei and v _sj The standard deviation.

[0180] The calculation process will be explained in detail, including the calculation of the vector mean μ. _ei =(1 / N)×Σv _ei [k];μ _sj =(1 / N)×Σv _sj [k];

[0181] Where Σ represents the summation of all elements k=1 to N in the vector.

[0182] Calculate the covariance and standard deviation, Cov(v) _ei v _sj )=(1 / (N-1))×Σ[(v _ei [k]-μ _ei )×(v _sj [k]-μ _sj )];

[0183] σ _ei =sqrt((1 / (N-1))×Σ(v _ei [k]-μ _ei ) 2 );

[0184] σ _sj =sqrt((1 / (N-1))×Σ(v _sj [k]-μ _sj ) 2 ).

[0185] By iterating through all i∈[1, C] _e ] and j∈[1, C _s ], which yields a dimension of (C _e C _s The correlation coefficient matrix R. Each element R in the matrix _ij They are all between [-1, 1].

[0186] If R _ij A value close to 1 indicates a positive correlation between the boundary channel and the regional channel. For example, where the boundary channel has a strong response, the regional channel also has a strong response, and they may both be focusing on the same type of bank collapse characteristics.

[0187] If R_ij A value close to -1 indicates a negative correlation, such as one being the foreground response and the other the background response;

[0188] If R _ij A value close to 0 indicates that the two are uncorrelated.

[0189] It should be understood that matrix R represents the collaborative association feature, which encapsulates the structured knowledge of the teacher network when processing two tasks.

[0190] Step 502: Extract intermediate layer feature maps from the boundary detection branch and region segmentation branch of the teacher model, respectively, in the following way:

[0191] In the boundary detection branch of the teacher model, the feature maps of the shallow layers of the network are selected as boundary feature maps to capture the detailed texture information of the image;

[0192] In the region segmentation branch of the teacher model, feature maps from the deep layers of the network are selected as region feature maps to capture the semantic category information of the images;

[0193] Among them, the intermediate layer feature maps are the boundary feature maps and the region feature maps.

[0194] Optionally, specific feature layer selection strategies are provided. Since boundary detection relies more on low-level texture, gradient and other detailed information, while region segmentation relies more on high-level semantic and contextual information, the two layers are not randomly selected when constructing the matrix, but are paired in a targeted manner.

[0195] Specifically, assuming the backbone of the teacher network is ResNet-50, it typically contains four stages, corresponding to conv2. _x conv3 _x conv4 _x conv5 _x The size of the output feature map is halved sequentially.

[0196] For the boundary detection branch, feature maps from shallow layers of the network are selected. For example, ResNet's conv2 layer can be chosen. _x (i.e., Layer1) or conv3 _x The output of Layer 2 is layer q; Layer q is the equivalent name for the corresponding stage, q=1, 2, 3, 4. The feature maps of the upper layers have higher resolution, such as 1 / 4 or 1 / 8 of the original image, preserving edge and texture details.

[0197] For the region segmentation branch, feature maps from deeper layers of the network are selected. For example, ResNet's conv5 map can be chosen. _xThe output of Layer 4 (i.e., Layer 4) has a large receptive field and low feature map resolution, such as 1 / 32 of the original image, but it contains abstract semantic information and can accurately distinguish between bank collapse and water.

[0198] In one embodiment, the boundary feature map F _edge Selected from ResNet Layer 2 output, number of channels C _e =512, size is 64×64 (assuming input is 512×512).

[0199] Region Feature Map F _seg Selected from ResNet Layer 4 output, number of channels C _s =2048, size is 16×16.

[0200] Because the two feature maps have different spatial dimensions, such as 64×64 vs 16×16, directly calculating the correlation coefficient can be difficult. Therefore, before calculation, it is necessary to convert the smaller feature map F... _seg Upsampling via bilinear interpolation, or by using a larger feature map F _edge By downsampling using average pooling, they are made to have the same spatial dimension, for example, uniformly 32×32, and then flattening and calculation operations are performed.

[0201] It should be understood that cross-level association construction allows student networks to learn how underlying details support high-level semantics, thus better understanding image content.

[0202] As another example, a dynamic weighted fusion method is described, which introduces functional alignment loss and soft-rank consistency loss to address the issues of feature semantic misalignment and information rank collapse during cross-architecture distillation. Specifically, this example is as follows:

[0203] Step 601 involves extracting intermediate layer feature maps from the boundary detection branch and region segmentation branch of the teacher model, specifically including a multi-layer feature fusion process based on function attribution weights, as follows:

[0204] Extract multi-scale feature maps from the outputs of each layer of the teacher network encoder to obtain the final outputs of the boundary detection branch and the region segmentation branch of the teacher model;

[0205] Calculate the first spatial correlation between the multi-scale feature maps of each layer and the final output of the boundary detection branch, and the second spatial correlation between them and the final output of the region segmentation branch.

[0206] Boundary function attribution weights for each layer are generated based on the first spatial correlation, and regional function attribution weights for each layer are generated based on the second spatial correlation.

[0207] By using boundary function attribution weights to perform weighted fusion of multi-scale feature maps, function-weighted boundary features are obtained; by using regional function attribution weights to perform weighted fusion of multi-scale feature maps, function-weighted regional features are obtained.

[0208] Functionally weighted boundary features and functionally weighted region features are used as intermediate layer feature maps to calculate the correlation coefficient matrix.

[0209] Correspondingly, teacher networks typically contain multiple layers, such as the four layers L1, L2, L3, and L4 of ResNet, corresponding to Layerq. Rather than manually assigning which layer represents a boundary and which represents a region, it is better to let the network itself determine the contribution of each layer to the final task, i.e., functional attribution.

[0210] Accordingly, the output feature maps of each layer of the teacher network are extracted.

[0211] Before calculating the first spatial correlation and the second spatial correlation, the following steps are also included:

[0212] Construct a channel projection layer, which contains a 1×1 convolution;

[0213] By using a channel projection layer, the multi-scale feature maps output by each layer of the teacher network encoder are mapped to a unified channel dimension, so that the feature maps participating in the weighted fusion are consistent in the channel dimension.

[0214] Accordingly, a channel projection layer is constructed. This projection layer consists of 1×1 convolutional kernels and is used to map feature maps from different levels, such as those with 256, 512, 1024, and 2048 channels, to a uniform channel dimension d, for example, d=64. The projected feature is denoted as F'. _l .

[0215] Next, the final outputs of the two branches of the teacher network are obtained, denoted as boundary graphs G. _edge and segmentation graph G _seg The two graphs are typically single-channel or dual-channel probability graphs.

[0216] Furthermore, calculate the feature F' of each layer. _l Assign functional weights to the two tasks. For the boundary task, calculate F'. _l With G _edge Spatial correlation. F' _l Activated by the rectified linear unit ReLU, with G _edge Element-wise multiplication is performed, followed by global average pooling (GAP) to obtain the scalar value. The specific formula is as follows:

[0217] s _l_edge =GAP(ReLU(F' _l )×G _edge );

[0218] Similarly, for regional tasks, calculate s _l_seg =GAP(ReLU(F' _l )×G _seg );

[0219] Above, s _l_edge This reflects the activation intensity of the l-th layer features in the high-response region at the boundary. GAP corresponds to the global average pooling operator, and ReLU corresponds to the ReLU activation function; s _l_seg This reflects the activation intensity of the l-th layer features in the high-response region; l=1, 2, 3, 4.

[0220] The scores of each layer are normalized using the Softmax function to obtain the function attribution weights, as shown below:

[0221] α _edge =Softmax([s _1_edge s _2_edge s _3_edge s _4_edge ]);

[0222] α _seg =Softmax([s _1_seg s _2_seg s _3_seg s _4_seg ]);

[0223] In the formula, Softmax represents the maximum flexibility function, α _edge α _seg These are the function assignment weight vectors for boundary tasks and regional tasks, respectively. For example, the calculated α... _edge The values ​​could be [0.6, 0.3, 0.1, 0.0], indicating that shallow features contribute the most to the boundary task; while α _seg It could be [0.0, 0.1, 0.4, 0.5], indicating that deep features contribute the most to the regional task.

[0224] Furthermore, weighted fusion is performed to generate functionally weighted features, namely:

[0225] F _weighted_edge =Σ(α _l_edge ×F' _l );

[0226] F _weighted_seg =Σ(α _l_seg ×F' _l );

[0227] Where Σ represents the summation over all layers l=1 to 4.

[0228] The two fused features F _weighted_edge and F _weighted_seg These are the purified features that best represent the boundary and regional task information. Using these two to calculate the correlation coefficient matrix can improve the accuracy of distillation.

[0229] It can also be said that F _weighted_edge F _weighted_seg The function-weighted feature maps corresponding to the boundary task and the regional task, respectively, α _l_edge α _l_seg The corresponding feature weights for the l-th layer are assigned to the functional assignments of the boundary task and the regional task.

[0230] Step 602, in constructing the distillation loss function based on collaborative association features, also includes constructing the function attribution alignment loss. The process of constructing the function attribution alignment loss includes:

[0231] Extract student multi-scale feature maps from the outputs of each layer of the lightweight student network;

[0232] Using the boundary pixel weight map in the dataset as a proxy target for boundary tasks, the correlation between the student multi-scale feature map and the boundary pixel weight map is calculated to generate the student boundary function attribution weight.

[0233] Using the landslide area segmentation labels in the dataset as regional task proxy targets, we calculate the correlation between student multi-scale feature maps and landslide area segmentation labels to generate student regional function attribution weights.

[0234] The distributional differences between the student boundary function assignment weights and the boundary function assignment weights generated by the teacher model, as well as the distributional differences between the student region function assignment weights and the region function assignment weights generated by the teacher model, are calculated. The sum of these distributional differences is used as the function assignment alignment loss.

[0235] It's not enough to have students imitate the teacher's final characteristics; they also need to imitate the teacher's attention distribution. That is, if the teacher believes that the first layer is most important to the boundary, the student should also believe that their own shallow layer is most important to the boundary.

[0236] However, the student network did not have a mature output G in the early stages of training. _edge and G _seg Therefore, the true label ground truth (GT) in the dataset is used as the proxy target. The boundary pixel weight map is used to replace the teacher's GT. _edge Using landslide area segmentation labels to replace teachers' G-scores _seg .

[0237] Following the same method, the features of each layer of the student network are calculated, and after projection to unify the dimensions and their relevance to the two agent targets, the student's attribution weight distribution is normalized as follows:

[0238] β _edge =Softmax(Scores _student_edge );

[0239] β _seg =Softmax(Scores _student_seg ).

[0240] Furthermore, construct the function attribution alignment loss L _align The commonly used method is the Kullback-Leibler divergence, also known as the KL divergence, which measures the difference between two probability distributions.

[0241] L _align =KL(β _edge ||α _edge )+KL(β _seg ||α _seg );

[0242] Above, β _edge β _seg These are the student network boundary and region task function assignment weight vectors, respectively. KL(…||…) is the KL divergence operator, and Scores are the KL divergence operator. _student_edge The scores represent the activation intensity scores of features from each layer of the student network for the boundary task. _student_seg This is a sequence of activation intensity scores for regional tasks based on features at each layer of the student network.

[0243] By minimizing this loss, the student network is forced to form a hierarchical functional division mechanism similar to that of the teacher network.

[0244] Step 603, in constructing the distillation loss function based on collaborative association features, also includes constructing the soft-rank consistency loss. The construction process of the soft-rank consistency loss includes:

[0245] Singular value decomposition was performed on the collaborative association feature matrices generated by the teacher model and the lightweight student network to obtain the teacher singular value sequence and the student singular value sequence.

[0246] A continuously differentiable soft-rank calculation function is constructed using the Sigmoid function to calculate the soft rank of the teacher singular value sequence and the student singular value sequence, respectively.

[0247] The difference between teacher soft rank and student soft rank is calculated as the soft rank consistency loss, which constrains the collaborative relationship structure of lightweight student network learning and teacher model complexity.

[0248] Accordingly, the correlation coefficient matrix R is a two-dimensional matrix. If we directly calculate the two matrices, the teacher network R... _teacher And student network R _studentElement-level distances, such as mean squared error (MSE), may show very similar numerical values ​​but vastly different matrix ranks. Rank represents the amount of linearly independent information in a matrix. If the student's matrix has a low rank, it indicates that the learned features are redundant and collapsed. To prevent this, soft-rank constraints are introduced.

[0249] Accordingly, for R _teacher and R _student Perform singular value decomposition (SVD) on each of them, i.e.:

[0250] R=U×Σ×V T ;

[0251] The resulting singular value sequence is σ=[σ _1 , σ _2 , ..., σ _k ], where σ _1 ≥σ _2 ≥...≥0, k represents the total number of singular values, U and V are orthogonal matrices, and Σ is a diagonal matrix composed of singular values; V T This is the transpose of matrix V.

[0252] Based on this, the rank of a matrix is ​​usually defined as the number of non-zero singular values. However, this operation is not differentiable. Therefore, a soft-rank function is defined. Using the sigmoid function or a similar smoothing function, singular values ​​are mapped to the interval (0, 1), approximating whether they are non-zero.

[0253] Alternatively, SoftRank(R) = Σ(σ _i / (σ _i +ε));

[0254] Where ε is a very small constant, such as 1 × 10⁻⁶. -5 The threshold used to control softening, SoftRank(...) is the operation for calculating the soft rank (SoftRank) value, σ _i Let be the i-th singular value obtained by singular value decomposition (SVD) of the characteristic matrix R. When σ _i When σ is much greater than ε, the term approaches 1; when σ is much greater than ε, the term approaches 1 _i When the value approaches 0, the term approaches 0.

[0255] Furthermore, the soft-rank consistency loss L is calculated. _rank =||SoftRank(R _student )-SoftRank(R _teacher )|| _2 Among them, ||...|| _2 This represents the L2 norm (squared difference). Including this loss term ensures that the collaborative relationship matrix learned by the student network maintains consistency with the teacher in terms of information richness, thus avoiding pattern collapse.

[0256] On the one hand, it describes how to convert soft masks into hard masks, using a hardware index mapping mechanism to skip invalid computations and achieve sparse inference. Specifically, this includes:

[0257] Step 701, after obtaining the lightweight semantic segmentation model, also includes performing inference acceleration steps on an edge computing device, specifically including:

[0258] Acquire monitoring images of the bank collapse to be detected and input them into a lightweight semantic segmentation model.

[0259] Optionally, the image to be detected is a frame from a real-time video stream. Edge computing devices, such as NVIDIA Jetson series or Huawei Atlas series, perform necessary preprocessing on the image, such as scaling it to the model input size, for example, 512×512, and then feed it into the backbone network of a lightweight model, such as MobileNet V3, for forward inference. At this point, the parameters in the model are fixed and no longer updated via backpropagation.

[0260] Step 702: In the forward inference process of the lightweight semantic segmentation model, the binarization threshold is determined based on the target sparsity.

[0261] Optionally, the target sparsity S needs to be set manually. _target This ratio defines what percentage of pixels are desired to be judged as background and skipped from the calculation. For example, setting S... _target =80%, aiming to perform high-precision convolution calculations only on the remaining 20% ​​of pixels (i.e., the landslide boundary region).

[0262] During inference, the model's front-end module generates a soft mask M. _soft The value ranges from 0 to 1. We need to find the binarization threshold T. _bin This makes M _soft Medium to small T _bin The percentage of pixels is approximately S _target It can be done by adjusting M _soft This is achieved by performing histogram statistics or sorting on all pixel values. The specific algorithm is as follows:

[0263] M _soft Flatten all pixel values ​​and sort them in ascending order to obtain sequence V. _sorted ;

[0264] Calculate the corresponding index position k=total _pixels ×S _target ;

[0265] Take V _sorted [k] is used as the binarization threshold T _bin ;

[0266] Where k is the index position of the threshold in the ordered sequence, total _pixels For M _soft The total number of pixels is equal to the mask width multiplied by the height. The determined threshold is dynamically adapted to each image, ensuring the stability of the computational load; that is, regardless of the complexity of the image, the computational load for each image is kept within budget.

[0267] Step 703: Convert the soft mask generated during the inference process into a hard mask consisting of 0s and 1s based on a binarization threshold.

[0268] Among them, using T _bin The soft mask is binarized as follows:

[0269] M _hard (x, y) = 1, if M _soft (x, y) ≥ T _bin ;

[0270] M _hard (x, y) = 0, if M _soft (x, y) <T _bin .

[0271] In the formula, M _hard It's a switch matrix, indicating which positions need to be calculated and which can be ignored; that is, M. _hard When M is 1, the corresponding position needs to be calculated; when it is 0, the corresponding position can be ignored. _hard (x, y), M _soft (x, y) represent the soft mask value and the binarized hard mask result at coordinates (x, y), respectively.

[0272] Step 704: Generate spatial indices of non-zero elements based on the hard mask, use the index mapping mechanism to skip the convolution calculation of regions with a hard mask of 0, and only perform convolution operations on regions with a hard mask of 1, and output the collapse shore region segmentation result.

[0273] If we use M directly _hard Multiplying by the feature map, although numerically resulting in 0, still involves the multiplication operation itself, thus not saving time. Therefore, a data acquisition-computation-distribution model is adopted.

[0274] Accordingly, scan M _hard Extract a list of coordinate indices of all pixels with a value of 1. _indices =[(x1, y1), (x2, y2), ..., (xN, yN)], where N corresponds to the total number of coordinate indices.

[0275] Furthermore, the corresponding feature vectors are extracted from the input feature map according to the index and concatenated into a feature matrix. Assuming the input feature map has dimensions (C, H, W) and N non-zero points, the feature matrix has dimensions (C, N); C is the number of feature channels, H is the feature map height, and W is the feature map width.

[0276] Next, a convolution operation is performed on the feature matrix. Since N is much smaller than H×W, the computational cost is reduced. For example, if N is 20% of the total pixels, the computational cost is theoretically reduced by 80%.

[0277] Further, create an output feature map consisting entirely of zeros. Based on the list... _indices The calculated convolution results are then filled back into the corresponding positions of the output feature map. For positions where no calculation is performed, i.e., the background, the value naturally remains 0, or it is filled using the lightweight results of sparse paths.

[0278] This mechanism avoids redundant computation on large background areas. Experimental data shows that in the scenario of bank collapse monitoring, this method can reduce inference latency, enabling the semantic segmentation model to achieve real-time frame rates, such as >25 FPS, even on low-power edge devices.

[0279] On the other hand, it describes how to use the segmented bank collapse areas and boundary information for quantitative assessment and disaster early warning. The specific steps are as follows:

[0280] Step 801: Based on the bank collapse area segmentation results and boundary detection results output by the model, calculate the geometric morphology index of the bank collapse area.

[0281] In this step, the model's output serves as the data source for subsequent analysis. This is based on a binarized segmentation mask. _seg and boundary mask _edge This allows for the calculation of key geometric parameters. Correspondingly, the area of ​​the collapsed bank can be calculated. Assuming the actual physical area represented by each pixel is known through camera calibration, for example, each pixel represents 0.01 square meters, the mask area can be statistically analyzed. _seg The total number of pixels N with a median value of 1 _pixel The actual bank collapse area S _area for:

[0282] S _area =N _pixel ×Unit _area ;

[0283] Among them, Unit _area This represents the physical area of ​​a single pixel.

[0284] Furthermore, the centroid coordinates (C) of the collapsed bank region are calculated. _x C _y Used to track the development direction of bank collapse.

[0285] C _x =(1 / N _pixel )×Σx _i ;

[0286] C _y =(1 / N _pixel )×Σy _i ;

[0287] Among them, (x _i y _i (i) represents the image coordinates of the i-th pixel within the collapsed bank area.

[0288] Step 802: Analyze the changes in geometric morphological indicators over a continuous time series and calculate the rate of bank collapse evolution.

[0289] Accordingly, bank collapse monitoring is a dynamic process. Two image frames with a time interval of Δt are selected, for example, T... _current and T _prev Calculate the area S of each. _curr S _prev and center of mass C _curr C _prev Calculate the area change rate (Rate). _area The specific formula can be described as follows:

[0290] Rate _area =(S _curr -S _prev ) / Δt;

[0291] If Rate _area A value >0 indicates that the area of ​​bank collapse is expanding.

[0292] Furthermore, the displacement distance D of the centroid is calculated. _shift =sqrt((C _curr_x -C _prev_x ) 2 +(C _curr_y -C _prev_y ) 2 )×Unit _length ; where Unit _length The value represents the actual length per pixel, with subscripts x and y indicating the horizontal and vertical coordinates, respectively. The movement of the centroid indicates that a collapse has occurred.

[0293] Step 803: Construct a multi-level early warning mechanism and trigger graded alarm signals based on the evolution rate.

[0294] In this step, the system presets multiple threshold levels. Level 1 alert corresponds to the attention level, when the Rate... _area >T _1 And D _shift<T _2 When the time indicates that the bank collapse is showing a slight expanding trend, the system automatically increases the sampling frequency, such as from once per hour to once every 10 minutes.

[0295] Level 2 warning corresponds to alert level, when Rate _area >T _3 Or D _shift >T _4 At this time, it indicates that a bank collapse is occurring rapidly, triggering a yellow alert and sending screenshots of the scene to management personnel.

[0296] A Level 3 warning corresponds to a danger level. It is triggered when a sudden morphological change is detected in the boundary line within a short period, either by calculating differences in the boundary shape descriptors or by a rapid increase in area, such as a Rate indicator. _area >T _5 The system triggered a red alert, indicating that a large-scale bank collapse may occur and engineering reinforcement measures need to be taken immediately.

[0297] Above, T _1 T is the threshold for the rate of change of area for Level 1 early warning. _2 T is the threshold for the centroid displacement distance in the first-level early warning system. _3 T is the threshold for the area change rate of the secondary warning. _4 T is the centroid displacement distance threshold for secondary early warning. _5 The threshold for the area change rate in the Level III early warning system.

[0298] In other embodiments, a non-volatile computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the method as described in any of the present applications.

[0299] In this scheme, a soft labeling mechanism is introduced at the data level to calculate normalized distance and generate a boundary weight map representing pixel membership, guiding the model to focus on ambiguous areas. At the model training level, a geometric consistency loss is introduced to use the gradient constraint of the signed distance field to predict that the boundary normal is consistent with the true normal, thereby improving the positioning accuracy of cracks and steep slope edges.

[0300] Furthermore, the background region is identified by a dynamic mask generated at the front end, and a dual-path computation mode is adopted at the back end, namely, dense convolution is performed on complex boundaries, and only lightweight sparse computation is performed on flat backgrounds; the inference stage further utilizes an index mapping mechanism to skip zero-value regions, reducing the amount of computation and latency.

[0301] This scheme also proposes a multi-layered collaborative distillation strategy (FA-MCD) guided by function attribution. It extracts the channel correlation matrix (co-association features) between boundary branches and split branches in the teacher network, and combines this with soft-rank consistency constraints to force the single-branch student network to learn the structured dependency logic between tasks, thus inheriting the reasoning ability of the teacher network with a low parameter count.

[0302] It should be noted that the various specific technical features described in the above embodiments can be combined in any suitable manner without contradiction. To avoid unnecessary repetition, the present invention will not describe the various possible combinations separately.

Claims

1. A method for constructing a semantic segmentation neural network model for riverbank collapse monitoring scenarios, characterized in that, include: Obtain a dataset of bank collapse monitoring images that includes the bank collapse evolution process. The dataset includes the original images, corresponding bank collapse area segmentation labels, and boundary auxiliary labels. A teacher network containing region segmentation and boundary detection branches is trained using a dataset. During training, a dynamic computation mask is generated based on the local texture attributes and gradient direction attributes of the input feature map. Based on the dynamic computation mask, a dual-path convolution calculation with sparse and dense differentiation is performed in the feature extraction layer to obtain the trained teacher model. Extract collaborative correlation features representing the structured dependency between boundary detection and region segmentation tasks in the teacher model, and construct a lightweight student network with a single-branch structure; A distillation loss function is constructed based on collaborative association features. The distillation loss function is used to constrain the feature distribution of the lightweight student network. The parameters of the lightweight student network are optimized by combining the dataset to obtain a lightweight semantic segmentation model. The process of generating a dynamic computation mask based on the local texture attributes and gradient direction attributes of the input feature map includes: calculating the local feature variance in the local neighborhood centered on each pixel for the input feature map output by the feature extraction layer of the teacher network, obtaining a local complexity map representing the texture complexity; calculating the gradient components of the input feature map in the horizontal and vertical directions, and determining the gradient direction angle of each pixel based on the gradient components; statistically analyzing the consistency of the gradient direction angles in a preset spatial neighborhood, obtaining a direction consistency map representing the possibility of boundary existence; and generating a dynamic computation mask based on the local complexity map and the direction consistency map using a differentiable threshold function, wherein the high response value region in the dynamic computation mask corresponds to the collapsed boundary region that requires dense computation.

2. The method according to claim 1, characterized in that, The process of generating boundary auxiliary labels in a dataset of bank collapse monitoring images containing the bank collapse evolution process includes: Identify the deterministic bank collapse boundary in the bank collapse area segmentation label, and define the preset range on both sides of the deterministic bank collapse boundary as the uncertain area; Calculate the normalized distance from each pixel within the uncertain region to the deterministic bank collapse boundary; Boundary pixel weight maps representing pixel membership gradients are generated based on normalized distance and used as boundary auxiliary labels.

3. The method according to claim 1, characterized in that, The consistency of the gradient direction angle within a predefined spatial neighborhood is achieved through the following method: Based on the landslide area segmentation label or the prediction results of the previous frame, the principal component analysis method is used to estimate the direction of the main boundary of the landslide in the current image. An anisotropic elliptical neighborhood is constructed based on the direction of the main boundary of the collapsed bank. The radius of the major axis of the anisotropic elliptical neighborhood along the direction of the main boundary of the collapsed bank is greater than the radius of the minor axis in the vertical direction. A weighted directional consistency coefficient is calculated within the anisotropic elliptical neighborhood, and a directional deviation penalty factor is introduced to suppress pixel responses where the gradient direction is inconsistent with the direction of the main boundary of the collapse bank. A directional consistency graph is generated based on the directional consistency coefficient weighted by deviation penalty.

4. The method according to claim 1, characterized in that, Based on local complexity graphs and directional consistency graphs, a dynamically computed mask is generated using a differentiable threshold function, including: The mean global complexity, standard deviation of global complexity, and mean global orientation consistency of the current frame image are calculated as image-level statistics. Based on image-level statistics and learnable sensitivity coefficients, the complexity threshold and consistency threshold adapted to the texture distribution of the current frame image are dynamically calculated. Using the sigmoid function as a soft threshold activation function, a continuously differentiable soft mask with values ​​between 0 and 1 is generated based on the local complexity map, the directional consistency map, and the complexity threshold and consistency threshold, and is used as a dynamically computed mask.

5. The method according to claim 1, characterized in that, Based on the dynamically calculated mask, sparse and dense differential dual-path convolution computation is performed in the feature extraction layer, including: By calling the dense computation path, for regions in the dynamically computed mask where the response value approaches 1, a standard convolution kernel is used to perform pixel-by-pixel convolution operations to preserve spatial detail features; The sparse computation path is invoked, and lightweight computation is performed using a depthwise separable convolutional structure for regions in the dynamically computed mask where the response value approaches 0. The depthwise separable convolutional structure sequentially includes point convolutions for reducing channel dimensions, depthwise convolutions for extracting spatial features, and point convolutions for restoring channel dimensions. By using a dynamically calculated mask as a weighting coefficient, the output features of dense computation paths and sparse computation paths are weighted and fused to obtain the output feature map of the current feature extraction layer.

6. The method according to claim 1, characterized in that, The training of a teacher network, which includes region segmentation and boundary detection branches, utilizes a dataset. It also includes constructing a geometric consistency loss function to constrain the spatial consistency between the region segmentation and boundary detection branches. The construction process of the geometric consistency loss function includes: The signed distance transformation field is calculated based on the segmentation label of the collapsed bank region. The spatial gradient of the distance transformation field is calculated, and the boundary normal unit vector pointing to the outside of the collapsed bank region is obtained. Calculate the spatial gradient vector of the region segmentation probability map output by the region segmentation branch; Within a predefined boundary neighborhood, the directional difference between the spatial gradient vector of the region segmentation probability map and the boundary normal unit vector is calculated. This directional difference is used as the geometric consistency loss function to constrain the direction of change of the region segmentation probability to be consistent with the true boundary normal.

7. The method according to claim 1, characterized in that, Extract collaborative correlation features from the teacher model that represent the structured dependencies between the boundary detection task and the region segmentation task, including: Intermediate layer feature maps are extracted from the boundary detection branch and the region segmentation branch of the teacher model, respectively; Calculate the Pearson correlation coefficient between each channel of the intermediate layer feature map of the boundary detection branch and each channel of the intermediate layer feature map of the region segmentation branch; The correlation coefficients of all channel pairs obtained are organized into a correlation coefficient matrix, which serves as a co-correlation feature. Each element in the correlation coefficient matrix represents the degree of response synergy between a boundary feature channel and a regional feature channel.

8. The method according to claim 7, characterized in that, Intermediate layer feature maps are extracted from the boundary detection branch and the region segmentation branch of the teacher model, respectively, in the following way: In the boundary detection branch of the teacher model, the feature maps of the shallow layers of the network are selected as boundary feature maps to capture the detailed texture information of the image; In the region segmentation branch of the teacher model, feature maps from the deep layers of the network are selected as region feature maps to capture the semantic category information of the images; Among them, the intermediate layer feature maps are the boundary feature maps and the region feature maps.

9. The method according to claim 1, characterized in that, After obtaining the lightweight semantic segmentation model, the system also includes accelerating inference on edge computing devices, specifically: Acquire the bank collapse monitoring images to be detected and input them into a lightweight semantic segmentation model; In the forward inference process of the lightweight semantic segmentation model, the binarization threshold is determined based on the target sparsity. The soft mask generated during the inference process is converted into a hard mask consisting of 0s and 1s based on a binarization threshold; Spatial indices of non-zero elements are generated based on the hard mask. The index mapping mechanism is used to skip the convolution calculation in regions where the hard mask is 0, and only the convolution operation is performed on regions where the hard mask is 1, outputting the collapse shore region segmentation result.