A method for detecting sand content in wind-blown roadbeds based on adaptive semantic segmentation

The ASBE-Net model, which uses adaptive semantic segmentation, solves the problem of multi-scale target segmentation in wind-blown sand roadbeds, achieving efficient and accurate sand content detection, adapting to different environmental conditions, and supporting rapid detection of long-distance lines.

CN121661394BActive Publication Date: 2026-06-30BEIJING JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING JIAOTONG UNIV
Filing Date
2025-11-27
Publication Date
2026-06-30

Smart Images

  • Figure CN121661394B_ABST
    Figure CN121661394B_ABST
Patent Text Reader

Abstract

This invention discloses a method for detecting sand content in wind-blown sand track beds based on adaptive semantic segmentation. First, it stably distinguishes sand particles from ballast at the pixel level, calculates the apparent contamination rate, and then uses a regression model to inversely determine the physical sand content, forming a rapid and non-destructive online quantitative detection scheme. This invention achieves rapid, non-destructive, and automated quantitative detection of sand content in wind-blown sand track beds, effectively overcoming the shortcomings of existing manual and physical detection methods, such as low efficiency, strong subjectivity, and difficulty in quantification. It provides reliable technical support for scientific maintenance decisions of ballast track beds in wind-blown sand areas.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of rail transit technology, specifically to a method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation. Background Technology

[0002] Ballasted track dominates global railway networks due to its excellent elasticity, drainage performance, and economic benefits. However, ballasted railway lines traversing deserts, Gobi, and other wind-blown sandy areas are highly susceptible to damage from sand intrusion and deposition. Sand intrusion reduces track porosity and increases track stiffness, significantly weakening the track's drainage capacity, altering its mechanical properties, exacerbating track irregularities, and directly threatening train operation safety. Therefore, timely and accurate determination of the sand content of the track bed is a crucial and urgent requirement for railway maintenance in wind-blown sandy areas.

[0003] Currently, the assessment of track bed condition mainly relies on the following methods:

[0004] (1) Manual inspection: visual inspection is carried out based on the experience of maintenance personnel. This method is highly subjective, inefficient, and difficult to quantify. Moreover, the inspection results are greatly affected by the quality and condition of the personnel, making it difficult to form a unified and objective evaluation standard.

[0005] (2) Physical testing method: On-site excavation of ballast samples for sieving and weighing to calculate sand content. Although this method yields accurate results, it is a destructive testing method, which is time-consuming, labor-intensive, and costly, and cannot achieve large-scale, continuous monitoring.

[0006] (3) Geophysical exploration methods: such as ground-penetrating radar (GPR) and surface seismic wave method. GPR infers the internal structure by emitting electromagnetic waves into the track bed and receiving the reflected signals, but its signal interpretation is complex and easily affected by factors such as track bed moisture content and ballast mineral composition. In addition, the equipment is expensive and requires highly skilled operators. The surface seismic wave method has problems such as bulky equipment, slow detection speed, and insensitivity to subtle changes in the surface, making it difficult to apply to rapid surveys of long-distance lines.

[0007] In recent years, deep learning semantic segmentation technology has achieved great success in fields such as medical imaging and autonomous driving. Its powerful pixel-level classification capabilities have provided new ideas for solving the aforementioned problems. However, directly applying existing general segmentation models to images of wind-blown sand roadbeds presents significant bottlenecks: First, ballast particles are large and irregular in shape, while sand particles are small and densely distributed, resulting in huge differences in target scale, which general models cannot simultaneously accommodate. Second, sand particles and ballast edges often exhibit weak contrast characteristics, and under environmental noise such as wind, sand, and lighting, the model's boundary localization accuracy and the detection rate of small targets will significantly decrease.

[0008] In summary, there is an urgent need in this field for a high-precision and high-efficiency innovative technical solution specifically designed for wind-blown sand roadbed detection tasks, capable of adaptively handling multi-scale targets and weak contrast boundaries, and ultimately outputting quantitative sand content indicators. Summary of the Invention

[0009] The present invention aims to provide a method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation, in order to solve the above problems.

[0010] The technical solution of this invention is: a method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation, comprising the following steps:

[0011] S1, which uses a camera array installed at the bottom of the track inspection vehicle to collect digital images of the ballast track surface in a sandy environment;

[0012] S2, the acquired digital images are sequentially processed for denoising and white balance;

[0013] S3, perform perspective correction on the processed image, and extract the region of interest for the specified ballast track bed based on the corrected image;

[0014] S4. Input the image processed by S3 into the pre-trained ASBE-Net semantic segmentation model based on encoder-decoder to obtain a pixel-level classification mask. The classification mask accurately distinguishes the ballast area, sand area and background area.

[0015] S5. Based on the classification mask, count the total number of pixels in the sand area and the total number of pixels in the ballast area to calculate the apparent dirt rate. The calculation formula for the apparent dirt rate is: Apparent dirt rate = total number of pixels in the sand area / (total number of pixels in the sand area + total number of pixels in the ballast area).

[0016] S6. Substitute the calculated apparent contamination rate into the nonlinear power regression model established based on the field calibration data to obtain the physical sand content of the track bed. Among them, the mapping relationship model is used as the nonlinear regression model between the apparent contamination rate and the physical sand content. Automatic resampling is triggered when the segmentation confidence is lower than the threshold or the image quality does not meet the standard.

[0017] S7 outputs segmentation visualization, apparent dirtiness rate, and physical sand content results;

[0018] S1 is implemented through an image acquisition module; S2-S6 are implemented through a data processing module; and S7 is implemented through a result output module. The image acquisition module has a sensor resolution of no less than 1920×1080 and a frame rate of no less than 10 fps, and automatically adjusts the exposure and supplementary lighting brightness according to the ambient light. The data processing module is configured to perform ASBE-Net segmentation, apparent contamination rate calculation, and physical sediment concentration inversion software. The output module includes generating sediment concentration level maps and remediation priority suggestions, and can upload the results to the management system in real time via the network.

[0019] Preferably, in S4, the ASBE-Net semantic segmentation deep learning model adopts an encoder-decoder architecture, the structure of which includes:

[0020] Encoder network, as a feature extractor;

[0021] The feature decoder network is used to progressively upsample the deep features output by the encoder, restore the spatial resolution, and output the segmentation result.

[0022] The Scale Adaptive Weighted Aggregation (SAWA) module is connected between the encoder output and the decoder input.

[0023] The Statistical Adaptive Convolutional Attention (SACA) module is embedded inside the encoder network, and multi-scale feature fusion is achieved between layers of the same resolution in the encoder and decoder through skip connections.

[0024] Preferably, the Statistical Adaptive Convolutional Attention (SACA) module achieves adaptive fusion of multi-scale features through the following sub-steps:

[0025] Receive the feature map output from the final stage of the encoder;

[0026] The Statistical Adaptive Convolutional Attention (SACA) module contains K dilated convolution branches in parallel, and the dilation rate combination is determined based on the size statistics of ballast and sand grains.

[0027] The feature maps are fed in parallel to K dilated convolution branches with different dilation rates to obtain K sets of feature maps with different receptive fields.

[0028] Perform global average pooling on each set of feature maps to obtain K branch global description vectors;

[0029] K description vectors are input into a shared fully connected layer and linearly transformed to obtain K branch importance scores;

[0030] The importance scores of the K branches are normalized using Softmax to obtain a set of adaptive weights for each branch, with a sum of 1.

[0031] The corresponding feature maps are weighted and summed using adaptive weights to obtain a fused feature map;

[0032] The fused feature map is subjected to 1×1 convolution, batch normalization, and nonlinear activation to compress the number of channels and enhance nonlinearity, and finally output to the decoder.

[0033] Preferably, the Statistical Adaptive Convolutional Attention (SACA) module enhances the input features through sequentially executed channel adaptive submodules and spatial adaptive submodules; wherein,

[0034] The execution flow of the channel adaptive submodule includes:

[0035] For the input feature map, the global mean, global extreme difference and global standard deviation are calculated simultaneously in its spatial dimension to obtain three independent statistical description vectors;

[0036] The three statistical description vectors are stacked along the channel dimension to form a comprehensive statistical feature tensor;

[0037] The comprehensive statistical feature tensor is fused through a 1×1 convolutional layer, and then the channel attention weight vector is generated by the Sigmoid activation function.

[0038] The generated channel attention weight vector is multiplied with the original input feature map channel by channel to achieve feature recalibration in the channel dimension;

[0039] The execution flow of the spatial adaptive submodule includes:

[0040] For the recalibrated feature map, the mean, extreme value difference and standard deviation of all channels are calculated simultaneously in its channel dimension to generate three spatial feature maps;

[0041] The three spatial feature maps are concatenated along the channel dimension;

[0042] The concatenated features are then fused with spatial context information through a 7×7 convolutional layer, and then activated by a Sigmoid activation function to generate a spatial attention weight matrix.

[0043] The generated spatial attention weight matrix is ​​multiplied pixel-by-pixel with the channel-recalibrated feature map to highlight features at key spatial locations.

[0044] Preferably, the training process of the ASBE-Net semantic segmentation model adopts a weighted cross-entropy (WCE) + Dice composite loss, with the weights set according to the pixel ratio of each category, to improve the detection rate and boundary consistency of small target classes such as sand grains. The expression is as follows:

[0045] (1)

[0046] (2)

[0047] (3)

[0048] In the formula, Represents the total number of pixels in the image. Indicates the total number of categories. This represents the true label of pixel i. This indicates that the model predicts that pixel i belongs to a certain category. The probability, Indicate category The weight, For smoothing terms, , This is the loss weighting coefficient.

[0049] Preferably, the backbone of the ASBE-Net semantic segmentation model is pre-trained using ImageNet, and the input images are normalized according to the ImageNet mean of [0.485, 0.456, 0.406] and the standard deviation of [0.229, 0.224, 0.225]. The training uses the Adam optimizer and combines a cosine annealing strategy to adjust the learning rate.

[0050] Preferably, in steps S1 to S3, the quality control of the acquired images includes: motion blur determination based on gradient energy threshold, overexposure / underexposure detection and contrast evaluation, color constantization, and automatic ROI cropping assisted by rail and sleeper positioning.

[0051] Preferably, between steps S4 and S5, that is, after obtaining the mask in S4 and before calculating the apparent contamination rate in S5, morphological post-processing of small connected component removal and hole filling is performed on the mask to further refine the boundaries.

[0052] Preferably, in S6, the calibration sample for establishing the relationship between apparent dirtiness rate and physical sand content is obtained through synchronous imagery and actual sampling. At the image acquisition point, the track bed in the target area is subjected to standard screening and weighing according to industry specifications. The physical sand content is calculated according to the formula sand mass / (sand mass + ballast mass), and the fitting error is evaluated using five-fold cross-validation.

[0053] Preferably, in S6, the mapping relationship model between apparent dirtiness rate and physical sediment content is a power function model determined by nonlinear regression fitting, and its expression is:

[0054] (4)

[0055] In the formula, Indicates sand content, Indicates the surface dirt rate. and These are the model parameters obtained by fitting using the nonlinear least squares method.

[0056] The beneficial effects of this invention are as follows:

[0057] (1) It can accurately segment large-sized irregular ballast and small dense sand particles at the same time, effectively adapting to the huge difference in target scale between the two, solving the problem that the general model is difficult to take into account the segmentation accuracy of targets of different scales, realizing pixel-level accurate classification of targets of different scales in the wind and sand roadbed scenario, and ensuring the accuracy of the basic data for subsequent sand content calculation.

[0058] (2) It can flexibly adapt to different lighting conditions, ballast types and sand particle characteristics, and has good versatility and scalability. Attached Figure Description

[0059] Figure 1 A flowchart illustrating a method for detecting sand content in wind-blown roadbeds based on adaptive semantic segmentation, provided in an embodiment of the present invention;

[0060] Figure 2 This is a diagram of the ASBE-Net deep learning network model architecture provided in an embodiment of the present invention.

[0061] Figure 3 This is a structural diagram of the SAWA module provided in an embodiment of the present invention;

[0062] Figure 4 This is a structural diagram of the SACA module provided in an embodiment of the present invention;

[0063] Figure 5 The annotation examples and segmentation visualization results provided in the embodiments of the present invention;

[0064] Figure 6 A fitting curve diagram of the apparent dirtiness rate and physical sand content provided in the embodiments of the present invention;

[0065] Figure 7 This is a schematic diagram of vehicle deployment provided in an embodiment of the present invention;

[0066] Figure 8 This is a flowchart of image quality control and abnormal resampling provided for an embodiment of the present invention. Detailed Implementation

[0067] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. The embodiments of the present invention are not limited thereto.

[0068] Example 1

[0069] See Figure 1 The present invention provides a method for detecting the sand content of wind-blown sand roadbed based on adaptive semantic segmentation, comprising the following steps:

[0070] 1) Digital images of the ballast track bed surface under windy and sandy conditions are acquired by a camera array installed at the bottom of the track inspection vehicle; the acquired digital images are then processed for noise reduction and white balance; perspective correction is performed on the processed images, and the region of interest (ROI) of the specified ballast track bed is extracted based on the corrected images.

[0071] 2) Input the image into the pre-trained ASBE-Net semantic segmentation deep learning model to obtain a pixel-level classification mask. The classification mask accurately distinguishes the ballast area, sand area and background area. Based on the classification mask, count the total number of pixels in the sand area and the total number of pixels in the ballast area, and calculate the apparent dirt rate according to the formula: Apparent dirt rate = total number of pixels in the sand area / (total number of pixels in the sand area + total number of pixels in the ballast area).

[0072] 3) Substitute the calculated apparent contamination rate into the nonlinear power regression model established based on the field calibration data to obtain the physical sand content of the track bed; wherein, the mapping relationship model is a nonlinear regression model between the apparent contamination rate and the physical sand content; output segmentation visualization, apparent contamination rate and physical sand content results, and trigger automatic resampling when the confidence threshold or image quality threshold is not met.

[0073] This invention implements the above method using a sand-bearing roadbed sediment content detection system, comprising: an image acquisition module; a data processing module; and a result output module. The data processing module runs software including ASBE-Net model segmentation, apparent contamination rate calculation, and physical sediment content inversion. The image acquisition module has a sensor resolution of at least 1920×1080 and a frame rate of at least 10 fps, and automatically adjusts exposure and supplementary lighting brightness according to ambient light. The output module further generates a sediment content level map and remediation priority suggestions, and can upload the results to the management system in real time via a network.

[0074] See Figure 2 The ASBE-Net semantic segmentation deep learning model adopts an encoder-decoder architecture, which includes: an encoder network as a feature extractor; a feature decoder network for progressively upsampling the deep features output by the encoder to restore spatial resolution and output the segmentation result; a scale-adaptive weighted aggregation (SAWA) module connected between the encoder output and the decoder input; and a statistical adaptive convolutional attention (SACA) module embedded in the encoder network. Multi-scale feature fusion is achieved between layers of the same resolution in the encoder and decoder through skip connections.

[0075] See Figure 3 The SAWA module achieves adaptive fusion of multi-scale features through the following sub-steps: receiving the feature map output from the final stage of the encoder; the SAWA module contains K dilated convolution branches in parallel, with the dilation rate combination determined based on the size statistics of ballast and sand grains; inputting the feature map in parallel to the K dilated convolution branches with different dilation rates to obtain K sets of feature maps with different receptive fields; performing global average pooling on each set of feature maps to obtain K branch global description vectors; inputting the K description vectors into a shared fully connected layer for linear transformation to obtain K branch importance scores; performing Softmax normalization on the K branch importance scores to obtain a set of adaptive weights for each branch, with a sum of 1; using the adaptive weights to perform weighted summation on the corresponding feature maps to obtain the fused feature map; performing 1×1 convolution, batch normalization, and nonlinear activation on the fused feature map to compress the number of channels and enhance nonlinearity, and finally outputting it to the decoder.

[0076] See Figure 4 The statistical SACA module enhances the input features through sequentially executed channel adaptive submodules and spatial adaptive submodules.

[0077] 1) The execution flow of the channel adaptive submodule is as follows: For the input feature map, the global mean, global extreme difference, and global standard deviation are calculated simultaneously in its spatial dimension to obtain three independent statistical description vectors; the three statistical description vectors are stacked in the channel dimension to form a comprehensive statistical feature tensor; this comprehensive statistical feature tensor is fused through a 1×1 convolutional layer, and then the channel attention weight vector is generated by the Sigmoid activation function; the generated channel attention weight vector is multiplied with the original input feature map channel by channel to achieve feature recalibration in the channel dimension.

[0078] 2) The execution flow of the spatial adaptive submodule is as follows: For the recalibrated feature map, calculate the mean, extreme value difference and standard deviation of all channels in its channel dimension to generate three spatial feature maps; concatenate the three spatial feature maps in the channel dimension; fuse the spatial context information of the concatenated features through a 7×7 convolutional layer, and then pass through the Sigmoid activation function to generate a spatial attention weight matrix; multiply the generated spatial attention weight matrix with the channel recalibrated feature map pixel by pixel to highlight the features of key spatial locations.

[0079] The ASBE-Net model training process employs a weighted cross-entropy (WCE) + Dice composite loss, with weights set according to the pixel ratio of each class to improve the detection rate and boundary consistency of small target classes such as sand grains. Its expression is as follows:

[0080] (1)

[0081] (2)

[0082] (3)

[0083] In the formula, Represents the total number of pixels in the image. Indicates the total number of categories. This represents the true label of pixel i. This indicates that the model predicts that pixel i belongs to a certain category. The probability, Indicate category The weight, For smoothing terms, , This is the loss weighting coefficient.

[0084] The backbone is pre-trained using ImageNet. Input images are normalized using ImageNet mean [0.485, 0.456, 0.406] and standard deviation (std) [0.229, 0.224, 0.225]. Training employs the Adam optimizer, combined with a cosine annealing strategy to adjust the learning rate.

[0085] See Figure 5 The image quality control obtained by the image acquisition module includes motion blur determination based on gradient energy threshold, overexposure / underexposure detection and contrast evaluation, color constancy, and automatic ROI cropping assisted by rail and sleeper positioning, followed by manual annotation of the image.

[0086] In the embodiments provided by the present invention, before calculating the apparent contamination rate, morphological post-processing is performed on the mask to remove small connected components and fill holes, further refining the boundaries.

[0087] See Figure 6 The mapping relationship between apparent dirtiness rate and physical sediment content is a power function model determined by nonlinear regression fitting, and its expression is:

[0088] (4)

[0089] In the formula, Indicates sand content, Indicates the surface dirt rate. and These are the model parameters obtained by fitting using the nonlinear least squares method.

[0090] The calibration samples for establishing the relationship between apparent dirtiness rate and physical sand content were obtained through synchronous imagery and actual sampling. At the image acquisition points, the track bed in the target area was subjected to standard sieving and weighing according to industry specifications. The physical sand content was calculated using the formula: sand mass / (sand mass + ballast mass), and the fitting error was evaluated using five-fold cross-validation.

[0091] Example 2

[0092] The electronic device includes a processor, a memory and a program stored therein, and a computer-readable storage medium on which the program is stored. When the program is executed by the processor, it implements a method for detecting the sand content of wind-blown sand roadbed based on adaptive semantic segmentation.

[0093] It will be understood by those skilled in the art that the accompanying drawings are merely schematic diagrams of one embodiment. The modules or processes shown in the drawings are not necessarily essential for implementing the present invention.

[0094] As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods of various embodiments or some parts of the embodiments of the present invention.

[0095] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, for apparatus or system embodiments, since they are basically similar to method embodiments, the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments. The apparatus and system embodiments described above are merely illustrative. Units described as separate components may or may not be physically separate. Components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without creative effort.

[0096] The above description is merely a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

[0097] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A method for detecting sand content in wind-blown roadbeds based on adaptive semantic segmentation, characterized in that, Includes the following steps: S1, which uses a camera array installed at the bottom of the track inspection vehicle to collect digital images of the ballast track surface in a sandy environment; S2, the acquired digital images are sequentially processed for denoising and white balance; S3, perform perspective correction on the processed image, and extract the region of interest for the specified ballast track bed based on the corrected image; S4. Input the image processed by S3 into the pre-trained ASBE-Net semantic segmentation model based on encoder-decoder to obtain a pixel-level classification mask. The classification mask accurately distinguishes the ballast area, sand area and background area. S5. Based on the classification mask, count the total number of pixels in the sand area and the total number of pixels in the ballast area to calculate the apparent dirt rate. The calculation formula for the apparent dirt rate is: Apparent dirt rate = total number of pixels in the sand area / (total number of pixels in the sand area + total number of pixels in the ballast area). S6. Substitute the calculated apparent contamination rate into the nonlinear power regression model established based on the field calibration data to obtain the physical sand content of the track bed. Among them, the mapping relationship model is used as the nonlinear regression model between the apparent contamination rate and the physical sand content. Automatic resampling is triggered when the segmentation confidence is lower than the threshold or the image quality does not meet the standard. S7 outputs segmentation visualization, apparent dirtiness rate, and physical sand content results; S1 is implemented through an image acquisition module; S2-S6 are implemented through a data processing module; and S7 is implemented through a result output module. The image acquisition module has a sensor resolution of no less than 1920×1080 and a frame rate of no less than 10 fps, and automatically adjusts the exposure and supplementary lighting brightness according to the ambient light. The data processing module is configured to perform ASBE-Net segmentation, apparent contamination rate calculation, and physical sediment concentration inversion software. The output module includes generating sediment concentration level maps and remediation priority suggestions, and can upload the results to the management system in real time via the network. In S4, the ASBE-Net semantic segmentation deep learning model adopts an encoder-decoder architecture, the structure of which includes: Encoder network, as a feature extractor; The feature decoder network is used to progressively upsample the deep features output by the encoder, restore the spatial resolution, and output the segmentation result. The scale-adaptive weighted aggregation SAWA module is connected between the encoder output and the decoder input. The Statistical Adaptive Convolutional Attention (SACA) module is embedded inside the encoder network, and multi-scale feature fusion is achieved between layers of the same resolution in the encoder and decoder through skip connections. The scale-adaptive weighted aggregation (SAWA) module achieves adaptive fusion of multi-scale features through the following sub-steps: Receive the feature map output from the final stage of the encoder; The scale-adaptive weighted aggregation SAWA module contains K dilated convolution branches in parallel, and the dilation rate combination is determined based on the size statistics of ballast and sand particles. The feature maps are fed in parallel to K dilated convolution branches with different dilation rates to obtain K sets of feature maps with different receptive fields. Perform global average pooling on each set of feature maps to obtain K branch global description vectors; K description vectors are input into a shared fully connected layer and linearly transformed to obtain K branch importance scores; The importance scores of the K branches are normalized using Softmax to obtain a set of adaptive weights for each branch, with a sum of 1. The corresponding feature maps are weighted and summed using adaptive weights to obtain a fused feature map; The fused feature map is subjected to 1×1 convolution, batch normalization and non-linear activation to compress the number of channels and enhance non-linearity, and finally output to the decoder. The Statistical Adaptive Convolutional Attention (SACA) module enhances the input features through sequentially executed channel adaptive submodules and spatial adaptive submodules. The execution flow of the channel adaptive submodule includes: For the input feature map, the global mean, global extreme difference and global standard deviation are calculated simultaneously in its spatial dimension to obtain three independent statistical description vectors; The three statistical description vectors are stacked along the channel dimension to form a comprehensive statistical feature tensor; The comprehensive statistical feature tensor is fused through a 1×1 convolutional layer, and then the channel attention weight vector is generated by the Sigmoid activation function. The generated channel attention weight vector is multiplied with the original input feature map channel by channel to achieve feature recalibration in the channel dimension; The execution flow of the spatial adaptive submodule includes: For the recalibrated feature map, the mean, extreme value difference and standard deviation of all channels are calculated simultaneously in its channel dimension to generate three spatial feature maps; The three spatial feature maps are concatenated along the channel dimension; The concatenated features are then fused with spatial context information through a 7×7 convolutional layer, and then activated by a Sigmoid activation function to generate a spatial attention weight matrix. The generated spatial attention weight matrix is ​​multiplied pixel-by-pixel with the channel-recalibrated feature map to highlight features at key spatial locations.

2. The method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation according to claim 1, characterized in that, The ASBE-Net semantic segmentation model employs a weighted cross-entropy (WCE) + Dice composite loss during training. The weights are set based on the pixel ratio of each class to improve the detection rate and boundary consistency of small target classes such as sand grains. The expression is as follows: (1) (2) (3) In the formula, This represents the total number of pixels in the image. Indicates the total number of categories. This represents the true label of pixel i. This indicates that the model predicts that pixel i belongs to a certain category. The probability, Indicate category The weight, For smoothing terms, , This is the loss weighting coefficient.

3. The method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation according to claim 1, characterized in that, The backbone of the ASBE-Net semantic segmentation model is pre-trained using ImageNet. The input images are normalized according to the ImageNet mean of [0.485, 0.456, 0.406] and the standard deviation of [0.229, 0.224, 0.225]. The training uses the Adam optimizer and combines a cosine annealing strategy to adjust the learning rate.

4. The method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation according to claim 1, characterized in that, In steps S1 to S3, the quality control of the acquired images includes: motion blur determination based on gradient energy threshold, overexposure / underexposure detection and contrast evaluation, color constantization, and automatic ROI cropping assisted by rail and sleeper positioning.

5. The method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation according to claim 1, characterized in that, Between steps S4 and S5, that is, after obtaining the mask in S4 and before calculating the apparent contamination rate in S5, morphological post-processing of small connected component removal and hole filling is performed on the mask to further refine the boundaries.

6. The method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation according to claim 1, characterized in that, In S6, the calibration samples for establishing the relationship between apparent dirtiness rate and physical sand content are obtained through synchronous imagery and actual sampling. At the image acquisition point, the track bed in the target area is subjected to standard screening and weighing according to industry specifications. The physical sand content is calculated according to the formula sand mass / (sand mass + ballast mass), and the fitting error is evaluated using five-fold cross-validation.

7. The method for detecting sand content in wind-blown sand roadbeds based on adaptive semantic segmentation according to claim 1, characterized in that, In S6, the mapping relationship between apparent dirtiness rate and physical sediment content is a power function model determined by nonlinear regression fitting, and its expression is: (4) In the formula, Indicates sand content, Indicates the surface dirt rate. and These are the model parameters obtained by fitting using the nonlinear least squares method.