A tunnel lining defect ring-viewing detection method

By combining radial distortion correction and spatial variable-scale feature extraction with multi-task learning and self-attention mechanism, the problems of field distortion and multiple types of defects in tunnel lining inspection are solved, achieving high-precision defect detection and distribution statistics, and improving inspection efficiency and safety.

CN121904049BActive Publication Date: 2026-06-23NORTHWEST ENGINEERING CORPORATION LIMITED

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NORTHWEST ENGINEERING CORPORATION LIMITED
Filing Date
2026-03-24
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing deep learning methods for tunnel lining inspection suffer from severe annular field-of-view distortion, significant differences in the characteristics of various defect types, and insufficient utilization of spatial location information, resulting in low inspection efficiency and significant safety hazards.

Method used

A radial distortion correction model is used to convert the annular field of view image into a cylindrical unfolded image. Combined with spatial variable scale feature extraction and a multi-task learning architecture, the tunnel circumferential angle information is embedded through a self-attention mechanism to perform defect segmentation and depth estimation. The feature map is then fused for defect detection.

Benefits of technology

It effectively eliminates radial distortion, ensures the consistency of full-width feature extraction, improves the identification accuracy of multiple types of defects, and provides complete spatial distribution information of defects, supporting tunnel structure safety assessment and maintenance decisions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121904049B_ABST
    Figure CN121904049B_ABST
Patent Text Reader

Abstract

The application discloses a kind of tunnel lining defect ring view detection methods, belong to tunnel detection technical field, can solve the problems of serious annular field distortion, large difference in multiple types of defect characteristics, insufficient use of spatial position information in existing tunnel lining detection.The method comprises the following steps: S1, collecting the annular field image of the inner wall of the tunnel, and converting the annular field image into a cylindrical expansion image;S2, scale feature extraction is carried out on the cylindrical expansion image to obtain a spatial scale feature map;S3, the spatial scale feature map is segmented and depth estimated respectively, and the corresponding defect feature map and depth feature map are obtained;S4, the defect feature map and the depth feature map are fused by gating, and the fusion feature map is obtained.The defect detection result is obtained by defect segmentation decoding of the fusion feature map.The application is used for tunnel lining defect detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a method for visual inspection of defects in tunnel lining, belonging to the field of tunnel inspection technology. Background Technology

[0002] As pumped-storage power stations play an increasingly prominent strategic role in new power systems, water conveyance tunnels, as core hydraulic structures, bear frequent water filling and releasing cycles. During long-term operation, the concrete lining of the tunnel walls inevitably develops various defects such as cracks, spalling, water seepage, and steel corrosion. Failure to detect these defects in a timely manner will seriously threaten the safe operation of the power station. Traditional manual inspection methods are not only inefficient, but also require inspectors to enter the narrow, enclosed tunnel environment, posing significant safety hazards. Therefore, developing automated defect detection technology based on machine vision has become an urgent need in the field of water conservancy engineering.

[0003] In recent years, significant progress has been made in tunnel defect detection technology based on deep learning. However, most existing deep learning methods are designed for planar images, and their direct application to the annular field-of-view images of water conveyance tunnels presents the following technical challenges: the annular field-of-view images acquired by panoramic surround-view cameras exhibit significant radial distortion, and traditional cylindrical unfolding methods lead to inconsistent resolution between the near and far ends of the image; various types of defects in the tunnel inner wall, such as cracks, spalling, water seepage, and steel corrosion, differ significantly in morphological features and texture characteristics, making it difficult for a single detection branch to simultaneously ensure the recognition accuracy of various defects; and the different stress states of the tunnel arch and sidewalls result in different defect distribution patterns, and existing detection networks lack the ability to model this spatial correlation. Summary of the Invention

[0004] This invention provides a method for detecting tunnel lining defects from a circumferential perspective, which can solve the problems of severe circumferential field-of-view distortion, large differences in the characteristics of various types of defects, and insufficient utilization of spatial location information in existing tunnel lining detection methods.

[0005] This invention provides a method for visual inspection of defects in tunnel lining, the method comprising:

[0006] S1. Acquire an annular field-of-view image of the tunnel inner wall and convert the annular field-of-view image into a cylindrical unfolded image;

[0007] S2. Perform variable-scale feature extraction on the cylindrical unfolded image to obtain a spatial variable-scale feature map;

[0008] S3. Perform defect segmentation and depth estimation on the spatial variable-scale feature map to obtain the corresponding defect feature map and depth feature map;

[0009] S4. Gated fusion is performed on the defect feature map and the depth feature map to obtain a fused feature map, and defect segmentation and decoding are performed on the fused feature map to obtain the defect detection result.

[0010] Optionally, converting the annular field-of-view image into a cylindrical unfolded image in step S1 specifically includes:

[0011] A radial distortion correction model is constructed, and the radial distortion correction model is used to correct the radial distortion of the annular field of view image to obtain a corrected image;

[0012] The corrected image is projected onto a cylindrical coordinate system to obtain a cylindrical unfolded image.

[0013] Optionally, S2 specifically includes:

[0014] Based on the vertical coordinates of each pixel in the cylindrical unfolded image, determine the effective convolution kernel size and sampling interval scaling factor for the corresponding pixel.

[0015] Based on the effective convolution kernel size and the sampling interval scaling factor, variable-scale feature extraction is performed on the cylindrical unfolded image to obtain a spatial variable-scale feature map.

[0016] Optionally, in step S3, defect segmentation is performed on the spatially variable-scale feature map to obtain a defect feature map, specifically as follows:

[0017] A self-attention-based semantic segmentation network is used to segment defects in the spatially variable-scale feature map to obtain a defect feature map.

[0018] Optionally, a self-attention-based semantic segmentation network is used to segment the spatially variable-scale feature map to obtain a defect feature map, specifically including:

[0019] Based on the cylindrical unfolded image, determine the annular position encoding vector of the tunnel;

[0020] The circular position encoding vector is embedded into the self-attention calculation of the semantic segmentation network to obtain the enhanced segmentation network;

[0021] The enhanced segmentation network is used to segment defects in the spatially variable-scale feature map to obtain a defect feature map.

[0022] Optionally, based on the cylindrical unfolded image, the annular position encoding vector of the tunnel is determined, specifically including:

[0023] Based on the cylindrical unfolded image, the circumferential angle information of the tunnel is determined;

[0024] Based on the circumferential angle information of the tunnel, the annular position encoding vector of the tunnel is determined by a combination of sine and cosine functions.

[0025] Optionally, the circular position encoding vector is embedded into the self-attention calculation of the semantic segmentation network to obtain an enhanced segmentation network, specifically including:

[0026] The enhanced position coding vector of the tunnel is determined based on the ring position coding vector and the standard two-dimensional sinusoidal position coding vector.

[0027] The enhanced location encoding vector is added to the query vector and key vector of the semantic segmentation network to obtain the enhanced segmentation network.

[0028] Optionally, the enhanced location encoding vector is added to the query vector and key vector of the semantic segmentation network to obtain the enhanced segmentation network, specifically including:

[0029] Based on the circumferential angle information, the tunnel is divided into multiple sub-regions;

[0030] In the self-attention computation of the semantic segmentation network, the region-aware mask value of each pixel pair is determined; wherein, the region-aware mask value of the same region pixel pair is greater than the region-aware mask value of the cross-region pixel pair; the same region pixel pair is a pixel pair consisting of two pixels belonging to the same sub-region; the cross-region pixel pair is a pixel pair consisting of two pixels belonging to different sub-regions.

[0031] The enhanced location encoding vector is added to the query vector and key vector of the semantic segmentation network, and the region-aware mask value is added to the attention weight of the semantic segmentation network to obtain the enhanced segmentation network.

[0032] Optionally, after S2, the method further includes:

[0033] The boundary sharpness score is obtained by performing a boundary sharpness score map on the spatial variable-scale feature map.

[0034] Optionally, after S4, the method further includes:

[0035] Based on the mileage positioning data of the annular field of view image and the defect detection results, the defect feature parameters are determined;

[0036] A defect distribution statistical map is generated based on the defect feature parameters and the boundary sharpness score map.

[0037] Optionally, the effective convolution kernel size for each pixel is determined based on the vertical coordinates of each pixel in the cylindrical unfolded image, specifically including:

[0038] Calculate the ratio between the vertical coordinate of each pixel in the cylindrical unfolded image and the height of the cylindrical unfolded image, and calculate the product of the ratio and the scale adjustment coefficient;

[0039] Calculate the sum of the product and 1, and multiply the sum by the base convolution kernel size as the effective convolution kernel size for the corresponding pixel.

[0040] Optionally, based on the vertical coordinates of each pixel in the cylindrical unfolded image, a sampling interval scaling factor for the corresponding pixel is determined, specifically including:

[0041] Calculate the ratio between the vertical coordinate of each pixel in the cylindrical unfolded image and the height of the cylindrical unfolded image;

[0042] Calculate the product of the ratio and the sampling interval adjustment coefficient, and use the sum of the product and 1 as the sampling interval scaling factor for the corresponding pixel.

[0043] Optionally, the annular field-of-view image is acquired by a panoramic surround-view camera array mounted on the inspection robot.

[0044] Optionally, the sampling interval scaling factor can be corrected based on the depth feature map.

[0045] The beneficial effects that this invention can produce include:

[0046] The tunnel lining defect surround view detection method provided by the present invention constructs a radial distortion correction model based on the geometric prior of the circular cross-section of the tunnel, and inversely projects the annular field of view image to the cylindrical coordinate system to achieve seamless unfolding, effectively eliminating the radial distortion of the image acquired by the panoramic surround view camera, and providing geometric consistency guarantee for subsequent feature extraction and defect detection.

[0047] The tunnel lining defect surround view detection method provided by this invention features a spatially variable scale convolutional layer that adaptively adjusts the effective receptive field of the convolutional kernel based on the radial distance of the pixel from the optical axis. This effectively solves the problem of inconsistent resolution between the near and far ends in the cylindrical unfolded image, ensures the uniformity of feature extraction across the entire area, and improves the detection consistency of defects at different locations.

[0048] The tunnel lining defect surround view detection method provided by this invention adopts a multi-task learning architecture, in which the defect segmentation main task branch works in concert with the depth estimation and boundary sharpness scoring auxiliary branches. The depth features of the auxiliary branches are back-transmitted to the main task branch for feature enhancement, realizing deep coupling and mutual promotion among multiple tasks, and significantly improving the identification accuracy of multiple types of defects.

[0049] The tunnel lining defect surround detection method provided by this invention introduces a ring position encoding mechanism and embeds the circumferential angle information of the tunnel into the Transformer self-attention calculation, enabling the model to perceive the difference in defect patterns in different stress areas of the arch and sidewalls, thereby enhancing the learning ability of spatially related defect features.

[0050] The tunnel lining defect surround detection method provided by this invention achieves accurate three-dimensional coordinate labeling of defects and automatic generation of a statistical map of defect distribution along the entire line by integrating mileage positioning data, providing complete spatial distribution information of defects for tunnel structural safety assessment and maintenance decisions. Attached Figure Description

[0051] Figure 1 This is a flowchart of a tunnel lining defect surround view detection method provided in an embodiment of the present invention. Detailed Implementation

[0052] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of the invention. However, those skilled in the art will understand that the invention can be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods are omitted so as not to obscure the description of the invention with unnecessary detail.

[0053] This invention provides a method for visually inspecting defects in tunnel lining, such as... Figure 1 As shown, the method includes:

[0054] S1. Acquire annular field-of-view images of the tunnel interior wall and convert the annular field-of-view images into cylindrical unfolded images.

[0055] The annular field-of-view image was acquired by a panoramic surround-view camera system mounted on the inspection robot.

[0056] In this embodiment of the invention, a panoramic surround-view camera array installed on an underwater inspection robot is used for image acquisition. The panoramic surround-view camera array consists of multiple fisheye lenses, each with a field of view of not less than 180 degrees. There is a certain overlap between adjacent lenses to ensure the continuity of image stitching. The preferred image resolution of the camera array is 4096×4096 pixels, and the sampling frame rate is set to 30fps, which can meet the image acquisition requirements of the inspection robot when it travels in the tunnel at a speed not exceeding 0.5m / s.

[0057] The conversion of the annular field-of-view image into a cylindrical unfolded image in S1 specifically includes: firstly, constructing a radial distortion correction model and using the radial distortion correction model to perform radial distortion correction on the annular field-of-view image to obtain a corrected image; then, projecting the corrected image into a cylindrical coordinate system to obtain a cylindrical unfolded image.

[0058] The annular field-of-view cylindrical unfolding module 1 is responsible for converting the annular distortion images acquired by the panoramic surround-view camera group into geometrically corrected cylindrical unfolded images. The core of this module lies in constructing a radial distortion correction model based on the geometric prior of the circular cross-section of the tunnel.

[0059] In one embodiment of the present invention, the water conveyance tunnel has a regular circular cross-section, the radius of which is denoted as . The panoramic surround-view camera assembly is mounted at the center of the inspection robot, with the camera's optical axis coinciding with the tunnel's axis. Let the polar coordinates of any pixel in the annular image be... ,in The radial distance of a pixel from the center of the image. This represents the circumferential angle of the pixel relative to the reference direction.

[0060] Based on the geometric constraints of the circular cross-section of the tunnel, this invention establishes a system from the coordinates of the annular image. To cylindrical coordinates The inverse projection mapping relationship is as follows. This inverse projection process follows the principle of equidistant cylindrical projection, inversely projecting each pixel of the annular field of view onto a virtual cylindrical surface centered on the tunnel axis.

[0061] The radial distortion correction model proposed in this invention can be achieved through the following formula:

[0062] ;

[0063] in, This is the corrected radial distance. This represents the radial distance of a pixel from the center of the original circular image. The maximum effective radial distance of the ring image. , , This is the radial distortion correction coefficient. In this embodiment, considering the typical cross-sectional dimensions and camera parameters of the pumped storage power station's water conveyance tunnel, The value is -0.12. The value is 0.05. The value is -0.008. This set of parameters has been verified through calibration experiments to effectively eliminate barrel distortion introduced by fisheye lenses.

[0064] After radial distortion correction, this invention projects the corrected annular image onto a cylindrical coordinate system. Cylindrical coordinates The calculation formula is:

[0065] ;

[0066] ;

[0067] in, Here are the horizontal coordinates of the pixels in the cylindrical unfolded image. Here are the vertical coordinates of the pixels in the cylindrical unfolded image. The width of the cylindrical unfolded image. This represents the height of the cylindrical unfolded image. In this embodiment, Setting it to 4096 pixels corresponds to a 360-degree circumferential expansion of the tunnel cross-section. Set to 1024 pixels to correspond to the visible depth range in the tunnel's axial direction.

[0068] Furthermore, to address the problem of stitching multiple fisheye lens images, this invention employs an image fusion method based on overlapping region feature matching. Within the overlapping region of adjacent shots, a weighted average fusion strategy is used to eliminate brightness discontinuities at the stitching boundary, with the weighting coefficients changing linearly according to the distance of each pixel from the center of the overlapping region. Preferably, the angle range of the overlapping region is set to 15 degrees to ensure a smooth and natural transition at the stitching point.

[0069] S2. Scale feature extraction is performed on the cylindrical unfolded image to obtain a spatial scale feature map.

[0070] Specifically, it includes:

[0071] (1) Determine the effective convolution kernel size and sampling interval scaling factor of each pixel in the cylindrical unfolded image based on the vertical coordinates of each pixel.

[0072] The effective convolution kernel size is calculated as follows: First, the ratio between the vertical coordinate of each pixel in the cylindrical unfolded image and the height of the cylindrical unfolded image is calculated, and the product of the ratio and the scale adjustment coefficient is calculated; then, the sum of the product and 1 is calculated, and the product of the sum and the basic convolution kernel size is taken as the effective convolution kernel size of the corresponding pixel.

[0073] The sampling interval scaling factor is calculated as follows: First, the ratio between the vertical coordinate of each pixel in the cylindrical unfolded image and the height of the cylindrical unfolded image is calculated; then, the product of the ratio and the sampling interval adjustment coefficient is calculated, and the sum of the product and 1 is used as the sampling interval scaling factor for the corresponding pixel.

[0074] (2) Based on the effective convolution kernel size and the sampling interval scaling factor, the cylindrical unfolded image is subjected to variable scale feature extraction to obtain a spatial variable scale feature map.

[0075] This invention achieves variable-scale feature extraction through a spatial variable-scale feature extraction module. The invention designs an innovative spatial variable-scale convolutional layer in the spatial variable-scale feature extraction module to solve the problem of inconsistent resolution between the near and far ends in cylindrical unfolded images.

[0076] In cylindrical unfolded images, due to the geometric properties of projection transformation, the region near the top of the image corresponds to a location on the tunnel wall farther from the camera, and its spatial sampling density is lower than that of the near-end region near the bottom of the image. If a fixed-size convolution kernel is used for feature extraction, the detailed features in the far-end region will be over-smoothed, while the features in the near-end region will be redundant, ultimately affecting the uniformity of defect detection.

[0077] To address this problem, this invention proposes a spatially variable-scale convolutional layer. Its core idea is to adaptively adjust the effective receptive field of the convolutional kernel based on the radial distance of each pixel from the optical axis. In one embodiment of this invention, let the vertical coordinate of any pixel in the cylindrical unfolded image be... The corresponding effective kernel size Calculate using the following formula:

[0078] ;

[0079] in, For pixels Effective kernel size at the location, Based on the kernel size, This is the scaling factor. The vertical coordinates of the pixel. This represents the height of the cylindrical unfolded image. In this embodiment, Set to 3, Setting it to 1.5 increases the effective kernel size in the far region to 2.5 times that in the near region.

[0080] The specific implementation of spatially variable-scale convolution employs a strategy combining deformable convolution and scale-aware sampling. For a cylindrical unfolded image with coordinates... The pixels whose convolution output The calculation method is as follows:

[0081] ;

[0082] in, For pixels The convolution outputs feature values ​​at the specified location. It is a set of neighboring pixels determined by the effective convolution kernel size, and its coverage area dynamically changes with the vertical coordinates of the pixels. For the convolution kernel at the offset position The weight parameters at that location, The sampling function uses bilinear interpolation to sample non-integer coordinates. For vertical coordinates Related sampling interval scaling factor.

[0083] Specifically, the effective kernel size plays a role in convolution output computation by constraining the set of neighboring pixels. For vertical coordinates... For a given pixel, its neighborhood pixel set is dynamically determined by the effective convolution kernel size, and the horizontal offset of the neighborhood range... and vertical offset All parameters satisfy the condition that their absolute value is no greater than half the rounded value of the effective convolution kernel size, meaning that the neighborhood covers the effective convolution kernel size in both the horizontal and vertical directions. Since the effective convolution kernel size monotonically increases with the vertical coordinate, the neighborhood range of the far-end region is larger than that of the near-end region, thus allowing the convolution operation to obtain a larger equivalent receptive field in the far-end region. During the calculation of the convolution output, the values ​​of the convolution kernel weight parameters at each offset position within the neighborhood are obtained by bilinear interpolation of the basic convolution kernel, ensuring that continuous changes in the effective convolution kernel size do not introduce quantization errors. Simultaneously, the sampling interval scaling factor further adjusts the actual spacing between sampling points within the neighborhood, working in conjunction with the effective convolution kernel size to achieve spatially adaptive feature extraction. In this embodiment, for the near-end region at the bottom of the image, the effective convolution kernel size is 3 pixels, corresponding to a neighborhood range of 3×3; for the far-end region at the top of the image, the effective convolution kernel size increases to approximately 7.5 pixels, expanding the corresponding neighborhood range to approximately 8×8, covering an area approximately 7 times that of the near-end region.

[0084] Sampling interval scaling factor The calculation formula is:

[0085] ;

[0086] in, This is the sampling interval scaling factor. This is the sampling interval adjustment coefficient, in this embodiment. Set to 1.0. This design makes the sampling points of the convolution kernel more sparse in the far regions of the image, effectively expanding the receptive field to match the lower spatial resolution of that region.

[0087] In this embodiment of the invention, the spatially variable-scale feature extraction module can adopt a four-stage hierarchical structure. The first stage uses a 3×3 spatially variable-scale convolutional layer with 64 output channels; the second stage uses a 3×3 spatially variable-scale convolutional layer with 128 output channels; the third stage uses a 3×3 spatially variable-scale convolutional layer with 256 output channels; and the fourth stage uses a 3×3 spatially variable-scale convolutional layer with 512 output channels. Downsampling is performed between each stage using a max-pooling layer with a stride of 2.

[0088] S3. Perform defect segmentation and depth estimation on the spatial variable-scale feature map to obtain the corresponding defect feature map and depth feature map.

[0089] In S3, defect segmentation is performed on the spatially variable-scale feature map to obtain a defect feature map. Specifically, a self-attention-based semantic segmentation network is used to segment the spatially variable-scale feature map to obtain a defect feature map.

[0090] Furthermore, the method also includes: scoring the boundary sharpness of the spatially variable-scale feature map to obtain a boundary sharpness score map.

[0091] This invention utilizes a multi-task collaborative detection module to achieve defect segmentation, depth estimation, and boundary sharpness scoring. The module employs a multi-task learning architecture, including a main branch for defect segmentation and auxiliary branches for depth estimation and boundary sharpness scoring. This multi-task collaborative design achieves mutual promotion and deep coupling between the main and auxiliary tasks.

[0092] In this embodiment of the invention, the main defect segmentation task branch adopts a Transformer-based semantic segmentation network structure. The encoder of this branch receives the spatially variable-scale feature map output by the spatially variable-scale feature extraction module, models global context information through a self-attention mechanism, and the decoder uses a progressive upsampling strategy to restore spatial resolution. Finally, it outputs a four-channel semantic segmentation mask with the same size as the input image, corresponding to four types of defects: cracks, spalling, water seepage, and steel corrosion.

[0093] The loss function of the defect segmentation branch adopts a combination of weighted cross-entropy loss and Dice loss to address the problem of uneven distribution of different types of defects in the training samples.

[0094] The depth estimation auxiliary branch shares the first three stages of the encoder with the defect segmentation branch, but separates into independent decoding paths starting from the fourth stage. The depth estimation branch outputs a single-channel depth feature map, representing the distance information of various locations on the tunnel wall relative to the camera. This depth information is of great value for accurately assessing the geometry and severity of defects.

[0095] In this embodiment of the invention, the boundary sharpness scoring auxiliary branch and the depth estimation branch share the main structure of the decoder, but a separate convolutional head is used in the last layer to output the boundary sharpness scoring map. The boundary sharpness scoring map is used to quantify the clarity of defect edges. A higher sharpness score indicates a clearer defect boundary, while a lower sharpness score indicates a blurred defect boundary that may be in its early stages of development or that there is uncertainty in detection. The boundary sharpness score is calculated based on the gradient magnitude at the edge of the defect segmentation mask, and is converted into a score value between 0 and 1 through a learnable nonlinear mapping.

[0096] In this invention, the overall loss function of the multi-task collaborative detection module is defined as the weighted sum of the losses of the three tasks:

[0097] ;

[0098] in, For the overall loss function, For defect segmentation loss, To estimate the loss in depth, For boundary sharpness scoring loss, , , These are the loss weighting coefficients for each task. In this embodiment, Set to 1.0, Set to 0.5. The weight was set to 0.3, and this weight configuration was determined through ablation experiments to achieve the optimal overall detection performance.

[0099] Defect segmentation loss A combination of weighted cross-entropy loss and Dice loss is used:

[0100] ;

[0101] in, For defect segmentation loss, For weighted cross-entropy loss, This is a loss for Dice.

[0102] Weighted cross-entropy loss The calculation formula is:

[0103] ;

[0104] in, For weighted cross-entropy loss, Total number of pixels For the number of defect categories, For the first The weighting coefficient for a defect class is calculated based on the reciprocal of the class frequency. For pixels Category The actual label, with a value of 0 or 1. Predict pixels for the network Category The probability value.

[0105] Depth estimation loss Scale-invariant logarithmic depth loss is used:

[0106] ;

[0107] in, To estimate the loss in depth, Total number of effective pixels, For pixels The true depth value, The depth value predicted by the network. The coefficient of the scale-invariant term is set to 0.5 in this embodiment.

[0108] Boundary sharpness score loss Mean squared error loss is used:

[0109] ;

[0110] in, For boundary sharpness scoring loss, This represents the total number of pixels at the defect boundary. For boundary pixels True sharpness label This is the sharpness score predicted by the network.

[0111] In water conveyance tunnels, the crown area primarily bears the pressure of the surrounding rock, making it prone to circumferential cracks; the sidewall area bears lateral earth and water pressure, making it prone to diagonal cracks and seepage; and the floor area is affected by water flow erosion and sediment, making it prone to spalling and erosion. This correlation between spatial location and defect type is important prior knowledge for tunnel defect detection, but traditional location coding methods cannot effectively express the periodic characteristics of the annular space.

[0112] This invention proposes a ring-shaped position encoding mechanism, which encodes the circumferential angle information of the tunnel cross section into a learnable position embedding vector and integrates it into the self-attention calculation process of Transformer.

[0113] The specific implementation steps include:

[0114] (1) Determine the annular position encoding vector of the tunnel based on the cylindrical unfolded image.

[0115] Specifically, the process involves: first, determining the circumferential angle information of the tunnel based on the cylindrical unfolded image; then, determining the annular position encoding vector of the tunnel using a combination of sine and cosine functions based on the circumferential angle information of the tunnel.

[0116] (2) The circular position encoding vector is embedded into the self-attention calculation of the semantic segmentation network to obtain the enhanced segmentation network.

[0117] Specifically, the enhanced location encoding vector of the tunnel is first determined based on the circular location encoding vector and the standard two-dimensional sinusoidal location encoding vector; then the enhanced location encoding vector is added to the query vector and key vector of the semantic segmentation network to obtain the enhanced segmentation network.

[0118] (3) Use an enhanced segmentation network to segment the spatial variable-scale feature map to obtain the defect feature map.

[0119] This invention employs a ring-shaped position encoding enhancement module to achieve ring-shaped position encoding embedding. This module embeds the circumferential angle information of the tunnel into the Transformer self-attention calculation of the multi-task collaborative detection network, enhancing the model's ability to perceive differences in defect patterns at different spatial locations.

[0120] In this invention, for the horizontal coordinate of the cylindrical unfolded image, The pixel column, corresponding to the tunnel circumferential angle. Calculate using the following formula:

[0121] ;

[0122] in, The circumferential angle of the tunnel, with a value range of [value missing]. , Here are the horizontal coordinates of the pixels in the cylindrical unfolded image. The width of the cylindrical unfolded image.

[0123] Circular position encoding vector A combination of sine and cosine functions is used to represent the periodicity of the circumferential angle:

[0124] ;

[0125] in, Circumferential angle corresponding 3D circular position encoding vector, These are the coding coefficients for different frequencies. The dimension for position encoding is consistent with the dimension of the hidden layer of the Transformer. In this embodiment, Set to 256, encoding coefficient Set as a geometric sequence ,in The value range is from 1 to 128.

[0126] Furthermore, this invention combines circular position coding with standard two-dimensional position coding to form an enhanced position coding vector:

[0127] ;

[0128] in, For pixels Enhanced positional encoding vector, It is a standard two-dimensional sinusoidal position encoding vector. It is a circular position encoding vector. The fusion weight coefficient for the ring position encoding is set to 0.5 in this embodiment.

[0129] In the Transformer self-attention computation, the enhanced positional encoding vector is added to the Query vector and Key vector:

[0130] ;

[0131] ;

[0132] in, For the first Query vectors at each position, For the first A key vector at each position, For the Query projection matrix, Let's consider the Key projection matrix. and The first and the The input feature vector at each position.

[0133] The formula for calculating self-attention weights is:

[0134] ;

[0135] in, For position Position Attention weights For the first Query vectors at each position, For the first Transpose of the key vector at each position The dimension of the key vector is used for scaling to prevent the inner product from becoming too large. It is a normalized exponential function.

[0136] By using circular position encoding, pixels at the same circumferential angle but different axial positions can more easily establish attentional associations, which aligns with the physical law of tunnel defects being distributed circumferentially. Simultaneously, the model can learn the differences in defect patterns between the arch region and the sidewall region, and differentiate defect features at different locations during inference.

[0137] In this invention, to further enhance spatial location awareness, the annular location encoding enhancement module also introduces a region-aware attention mask mechanism. This mechanism divides the circumferential angle into four sub-regions—the arch region, the left wall region, the right wall region, and the bottom plate region—based on the functional zoning of the tunnel cross-section. In the self-attention calculation, pixel pairs within the same sub-region are assigned higher basic attention weights.

[0138] The specific implementation steps include:

[0139] (1) Based on the circumferential angle information, the tunnel is divided into multiple sub-regions.

[0140] (2) Determine the region-aware mask value of each pixel pair in the self-attention calculation of the semantic segmentation network; wherein, the region-aware mask value of the same region pixel pair is greater than the region-aware mask value of the cross-region pixel pair; the same region pixel pair is a pixel pair consisting of two pixels belonging to the same sub-region; the cross-region pixel pair is a pixel pair consisting of two pixels belonging to different sub-regions.

[0141] (3) Add the region-aware mask value to the attention weights of the semantic segmentation network to obtain the enhanced segmentation network.

[0142] Specifically, region-aware mask values The definition is as follows:

[0143] ;

[0144] in, For position and location The area-aware mask value between them For position The tunnel's functional areas are numbered from 1 to 4, corresponding to the arch, left side wall, right side wall, and floor slab, respectively. The attention enhancement coefficient for the same sub-region is set to 0.2 in this embodiment.

[0145] The regional division rules are based on the circumferential angle. Confirmed: Corresponding to the vault sub-region The area corresponding to the right wall ; corresponding to the bottom plate area The area corresponding to the left wall. .

[0146] Furthermore, the detection method also includes: correcting the sampling interval scaling factor based on the depth feature map.

[0147] This invention establishes a reverse feedback mechanism between the spatially variable-scale feature extraction module and the multi-task collaborative detection module. The depth feature map output by the depth estimation branch of the multi-task collaborative detection module is passed back to the spatially variable-scale feature extraction module to dynamically adjust the sampling interval scaling factor. Specifically, when the depth estimation branch detects a significant deviation between the actual depth of a region and the preset value, the sampling interval scaling factor will be adjusted according to the actual depth value, thus achieving adaptive feature extraction for depth perception.

[0148] Specifically, the depth-aware correction process is as follows: Let the vertical coordinate of the depth feature map output by the depth estimation branch be... The average depth estimate of the pixel row is The preset reference depth value corresponding to the nominal inner diameter of the tunnel design is Then the depth deviation ratio is defined as When the depth deviation ratio deviates from 1, it indicates a difference between the actual geometric depth of the region and the preset model, requiring correction of the sampling interval scaling factor. The corrected sampling interval scaling factor is calculated by multiplying the original sampling interval scaling factor by the correction exponent of the depth deviation ratio, i.e., the corrected scaling factor equals the original sampling interval scaling factor multiplied by the correction exponent of the depth deviation ratio. The correction exponent is a learnable scalar parameter, initialized to 0.5, used to control the intensity of depth information's adjustment of the sampling interval. To prevent abnormal depth estimation from causing the scaling factor to be too large or too small, a truncation constraint is applied to the depth deviation ratio, limiting it to the range of 0.5 to 2.0. During training, the correction exponent is automatically optimized through backpropagation, enabling the depth-aware correction mechanism to adaptively learn the optimal adjustment strategy. In this embodiment, after training convergence, the final value of the correction exponent is approximately 0.42, indicating a sublinear relationship between depth information and the adjustment of the sampling interval. This correction mechanism enables the sampling interval scaling factor to be dynamically adjusted according to the actual depth in areas of local depressions or protrusions in the tunnel lining, thereby avoiding the problem of uneven sampling caused by geometric assumption deviations during feature extraction.

[0149] S4. Gated fusion of the defect feature map and the depth feature map is performed to obtain a fused feature map. Defect segmentation and decoding are then performed on the fused feature map to obtain the defect detection result.

[0150] The multi-task collaborative detection module establishes a reverse feature enhancement channel from the auxiliary branch to the main task branch. The intermediate layer features of the depth estimation branch are passed to the corresponding level of the defect segmentation branch through a gated fusion mechanism, providing geometric priors for defect segmentation. This gated fusion mechanism can be described by the following formula:

[0151] ;

[0152] in, To fuse feature maps for defect segmentation and decoding. This is a defect feature map. For depth feature maps; The gating coefficients are adaptively generated from the input features through a small, learnable network, and their values ​​range from 0 to 1.

[0153] Gating coefficient The calculation formula is:

[0154] ;

[0155] in, The gating coefficient, for Activation function For the learnable weight matrix of the gated network, For feature splicing operations, These are the bias parameters for the gated network.

[0156] This reverse feature enhancement mechanism enables depth information to assist in defect segmentation branches, so as to more accurately locate defect boundaries. In particular, for defect types that cause surface geometric changes, such as water seepage and spalling, depth features provide valuable geometric constraints.

[0157] Following S4, the method further includes: firstly, determining defect feature parameters based on the mileage positioning data and defect detection results of the annular field of view image; and then generating a defect distribution statistical map based on the defect feature parameters and the boundary sharpness scoring map.

[0158] This invention employs a defect location and statistics module to integrate mileage location data and defect detection results, generate three-dimensional coordinate annotation information of defects, and automatically generate a statistical map of defect distribution along the entire line.

[0159] In this invention, the inspection robot is equipped with a high-precision odometer and an inertial measurement unit, which enables it to acquire the robot's axial position within the tunnel in real time. And attitude angles. Odometry data and the circular field-of-view image frames are synchronized and aligned using timestamps to ensure that each frame of the circular field-of-view image has corresponding spatial location information.

[0160] For each defect region detected in the defect detection results, this invention calculates its three-dimensional centroid coordinates in the tunnel coordinate system. Let the centroid pixel coordinates of the defect region in the cylindrical unfolded image be... The corresponding circumferential angle and radial depth of the tunnel are as follows:

[0161] ;

[0162] ;

[0163] in, The circumferential angle of the defect's centroid. Let be the horizontal coordinate of the defect centroid in the cylindrical unfolded image. The width of the cylindrical unfolded image. Let be the radial distance from the centroid of the defect to the tunnel axis. The radius of the tunnel cross section is 1. The depth estimation branch outputs the average depth value of the defect area. Correct the camera's viewing angle tilt.

[0164] The formula for calculating the three-dimensional coordinates of the defect centroid in the tunnel coordinate system is as follows:

[0165] ;

[0166] ;

[0167] ;

[0168] in, Let be the coordinates of the defect's centroid in the horizontal direction of the tunnel cross-section. Let be the coordinates of the defect's centroid in the direction perpendicular to the tunnel cross-section. Let be the coordinates of the defect's centroid along the tunnel axis. This represents the mileage position corresponding to the current image frame. Based on pixel vertical coordinates Calculated axial offset.

[0169] In addition to the three-dimensional centroid coordinates, the defect location and statistics module also calculates the geometric feature parameters of each defect region, including the defect area. Defect perimeter Defective spindle length and secondary axis length Defect direction angle Defect characteristic parameters, etc.

[0170] Defect area The calculation formula is:

[0171] ;

[0172] in, The defect area is... This represents the number of pixels contained in the defective region. The physical size of the pixels in the horizontal direction. This refers to the physical size of the pixels in the vertical direction. The physical size of the pixels is dynamically calculated based on camera calibration parameters and depth information.

[0173] The defect location and statistics module automatically generates a comprehensive defect distribution chart based on defect information from all detected frames. This chart uses the tunnel's axial mileage as the horizontal axis and the circumferential angle as the vertical axis, employing color coding to represent different types of defect distribution.

[0174] In this invention, the overall defect distribution statistics map may also include the following statistical information: the total number and total area of ​​each type of defect, the distribution curve of defect density along the axial direction, the graded statistics of defect severity (based on a comprehensive judgment of area and boundary sharpness scores), and the marking of key attention sections (sections where the defect density exceeds the threshold).

[0175] The tunnel lining defect surround view detection method provided by this invention forms a deeply coupled closed-loop collaborative system through five modules: a ring field cylindrical unfolding module, a spatial variable-scale feature extraction module, a multi-task collaborative detection module, a ring position coding enhancement module, and a defect localization and statistics module. The output of the former module directly serves as the key input parameter of the latter module, and the auxiliary branch output of the multi-task collaborative detection module inversely affects the parameter adjustment of the spatial variable-scale feature extraction module, thus realizing a complete closed loop of forward transmission → performance evaluation → reverse feedback → parameter adjustment.

[0176] Through the coordinated operation of the above five modules, this invention achieves full-section coverage detection of defects in the lining of water conveyance tunnels, accurate segmentation of multiple types of defects, quantitative evaluation of defect depth and boundary features, and three-dimensional visualization of defect spatial distribution.

[0177] To verify the effectiveness of the method of this invention, a field testing experiment was conducted in a water conveyance tunnel of a pumped storage power station. The tunnel is 3.2 km long, with a reinforced concrete lining and an inner diameter of 8 m. The testing method of this invention was used to perform two complete tests on the entire tunnel, accumulating over 100,000 image frames.

[0178] Experimental results show that the method of the present invention achieves a detection accuracy of 95.2% for crack-type defects, 93.8% for spalling-type defects, 91.5% for water seepage-type defects, and 89.3% for steel corrosion-type defects. Compared with existing detection methods based on ordinary cameras and traditional image processing, the present invention improves the accuracy of crack detection by 12.5% ​​and the accuracy of water seepage detection by 18.3%.

[0179] Ablation experiments of spatially variable-scale convolutional layers show that, compared with the baseline model using a fixed-size convolutional kernel, the spatially variable-scale design of this invention improves the defect detection recall rate in the far region by 23.7%, while maintaining the same detection accuracy in the near region, and significantly improves the overall detection uniformity.

[0180] Ablation experiments using multi-task collaborative learning show that introducing a depth estimation auxiliary branch improves the segmentation IoU of spalling defects by 8.2%; introducing a boundary sharpness scoring auxiliary branch improves the localization accuracy of crack boundaries by 15.6%; and the depth feature reverse enhancement mechanism improves the segmentation recall of seepage areas by 11.3%.

[0181] Ablation experiments using ring location coding show that, after introducing ring location coding, the model's accuracy in identifying circumferential cracks in the arch area improved by 9.5%, and its accuracy in identifying diagonal cracks in the sidewall area improved by 7.8%, demonstrating the contribution of spatial location awareness to defect detection performance.

[0182] Regarding the accuracy of defect location, the three-dimensional coordinate annotation of this invention has an axial positioning error of less than 50mm and a circumferential positioning error of less than 2 degrees compared with the results of manual measurement, which meets the accuracy requirements of engineering applications.

[0183] In summary, the tunnel lining defect surround view detection method and system provided by this invention effectively solves the problems of annular field of view distortion, differences in features of multiple types of defects, and insufficient utilization of spatial location information in the prior art through five deeply coupled core modules: annular field of view cylindrical surface unfolding, spatial variable scale feature extraction, multi-task collaborative detection, annular position coding enhancement, and defect location statistics. This achieves high-precision automatic detection and three-dimensional spatial positioning of tunnel lining defects.

[0184] The above descriptions are merely a few embodiments of the present invention and are not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any modifications or alterations made by those skilled in the art without departing from the scope of the technical solution of the present invention using the disclosed technical content are equivalent to equivalent implementation cases and fall within the scope of the technical solution.

Claims

1. A method for visually inspecting defects in tunnel lining, characterized in that, The method includes: S1. Acquire an annular field-of-view image of the tunnel inner wall and construct a radial distortion correction model. Use the radial distortion correction model to perform radial distortion correction on the annular field-of-view image to obtain a corrected image. Project the corrected image onto a cylindrical coordinate system to obtain a cylindrical unfolded image. S2. Based on the vertical coordinates of each pixel in the cylindrical unfolded image, determine the effective convolution kernel size and sampling interval scaling factor of the corresponding pixel; and based on the effective convolution kernel size and the sampling interval scaling factor, perform variable scale feature extraction on the cylindrical unfolded image to obtain a spatial variable scale feature map. S3. Perform defect segmentation and depth estimation on the spatial variable-scale feature map to obtain the corresponding defect feature map and depth feature map; S4. Gated fusion is performed on the defect feature map and the depth feature map to obtain a fused feature map, and defect segmentation and decoding are performed on the fused feature map to obtain the defect detection result.

2. The method according to claim 1, characterized in that, In step S3, defect segmentation is performed on the spatially variable-scale feature map to obtain a defect feature map, specifically as follows: A self-attention-based semantic segmentation network is used to segment defects in the spatially variable-scale feature map to obtain a defect feature map.

3. The method according to claim 2, characterized in that, A self-attention-based semantic segmentation network is used to segment defects in the spatially variable-scale feature map to obtain a defect feature map, specifically including: Based on the cylindrical unfolded image, determine the annular position encoding vector of the tunnel; The circular position encoding vector is embedded into the self-attention calculation of the semantic segmentation network to obtain the enhanced segmentation network; The enhanced segmentation network is used to segment defects in the spatially variable-scale feature map to obtain a defect feature map.

4. The method according to claim 3, characterized in that, Based on the cylindrical unfolded image, the annular position encoding vector of the tunnel is determined, specifically including: Based on the cylindrical unfolded image, the circumferential angle information of the tunnel is determined; Based on the circumferential angle information of the tunnel, the annular position encoding vector of the tunnel is determined by a combination of sine and cosine functions.

5. The method according to claim 4, characterized in that, The circular positional encoding vector is embedded into the self-attention calculation of the semantic segmentation network to obtain the enhanced segmentation network, specifically including: The enhanced position coding vector of the tunnel is determined based on the ring position coding vector and the standard two-dimensional sinusoidal position coding vector. The enhanced location encoding vector is added to the query vector and key vector of the semantic segmentation network to obtain the enhanced segmentation network.

6. The method according to claim 5, characterized in that, The enhanced location encoding vector is added to the query vector and key vector of the semantic segmentation network to obtain the enhanced segmentation network, which specifically includes: Based on the circumferential angle information, the tunnel is divided into multiple sub-regions; In the self-attention computation of the semantic segmentation network, the region-aware mask value of each pixel pair is determined; wherein, the region-aware mask value of the same region pixel pair is greater than the region-aware mask value of the cross-region pixel pair; the same region pixel pair is a pixel pair consisting of two pixels belonging to the same sub-region; the cross-region pixel pair is a pixel pair consisting of two pixels belonging to different sub-regions. The enhanced location encoding vector is added to the query vector and key vector of the semantic segmentation network, and the region-aware mask value is added to the attention weight of the semantic segmentation network to obtain the enhanced segmentation network.

7. The method according to claim 1, characterized in that, Following S2, the method further includes: The boundary sharpness score is obtained by performing a boundary sharpness score map on the spatial variable-scale feature map.

8. The method according to claim 7, characterized in that, Following S4, the method further includes: Based on the mileage positioning data of the annular field of view image and the defect detection results, the defect feature parameters are determined; A defect distribution statistical map is generated based on the defect feature parameters and the boundary sharpness score map.