Method and system for detecting defects of underwater facilities of pumped storage power station
By combining optical images and sonar point clouds, underwater facility defect detection is performed, solving the accuracy and safety issues of underwater facility detection in existing technologies. This achieves pixel-level defect identification and 3D modeling, improving the detection efficiency and safety of pumped storage power stations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- STATE GRID INTELLIGENCE TECHNOLOGY CO LTD
- Filing Date
- 2026-05-18
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies struggle to achieve high-precision defect detection in underwater facilities of pumped storage power stations, especially in complex aquatic environments. Image enhancement algorithms lack specificity, acoustic-optical fusion faces challenges in spatiotemporal registration of heterogeneous information, and 3D reconstruction accuracy is low. Furthermore, it is difficult to achieve accurate defect identification and precise measurement of geometric parameters.
By combining optical images and sonar point clouds, adaptive weight compensation is used to compensate for the pixel values of the red channel. After image dehazing, polarization self-attention processing is used to identify defect areas. Point cloud registration and fusion are then performed to construct a 3D model to measure the geometric parameters of the defects. By combining multi-source data acquisition and deep learning technology, intelligent identification and accurate measurement of underwater facilities can be achieved.
It achieves pixel-level precise semantic segmentation and intelligent recognition of underwater facility defects, improving the real-time performance and automation level of underwater detection, reducing safety risks, and enhancing the accuracy of detection and the overall intelligence level of operations.
Smart Images

Figure CN122289262A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of defect detection technology for pumped storage power stations, and in particular to a method and system for detecting defects in underwater facilities of pumped storage power stations. Background Technology
[0002] Pumped storage power stations are an important energy source for regulation in the new generation of power grids. Their concealed structures, such as underwater channels, spiral casings, and tailrace pipes, affect the safe operation and power generation efficiency of the units. Under the influence of environmental factors such as long-term high-speed water flow scouring, cavitation, and chemical corrosion, these devices are prone to defects such as cracks, erosion pits, and steel plate detachment. If these defects are not detected in time, they will cause energy loss and, in severe cases, structural damage.
[0003] However, conventional manual detection methods have limitations such as high operational risks, numerous blind spots, significant subjectivity in results, and the inability to quantitatively evaluate findings. Existing ROV (Remotely Operated Vehicle) detection methods equipped with optical cameras, while improving safety, also suffer from image degradation due to light scattering and absorption in water, low optical transmittance in turbid water, and limited detection dimensions of single sensors, making it difficult to accurately identify damage and measure geometric parameters. Although the combination of multibeam sonar and optical imaging technology has opened a new avenue for underwater structure detection, challenges remain, including difficulties in spatiotemporal registration of heterogeneous information, a lack of cross-modal feature fusion mechanisms, and low accuracy in underwater 3D reconstruction, resulting in engineering problems that urgently need to be addressed, leading to undetectable and inaccurate measurements.
[0004] Currently, underwater intelligent detection technology has made significant breakthroughs in areas such as image enhancement, defect identification, and multimodal data fusion, providing new ideas for infrastructure operation and management in complex aquatic environments. However, existing image enhancement algorithms lack targeted modeling of the optical characteristics of specific water bodies. Research on acoustic-optical fusion mainly focuses on object detection, while cross-modal 3D reconstruction and sub-millimeter scale size measurement are still immature. Furthermore, due to low integration, it is difficult to form a closed loop between data acquisition and quantitative evaluation. Summary of the Invention
[0005] To address the aforementioned issues, this invention proposes a method and system for detecting defects in underwater facilities of pumped storage power stations. This system enables efficient underwater image restoration and correction, pixel-level precise semantic segmentation and intelligent recognition of typical defects, precise registration and fusion of acoustic and optical cross-modal point clouds, and 3D modeling with absolute scale features.
[0006] To achieve the above objectives, the present invention adopts the following technical solution: In a first aspect, the present invention provides a method for detecting defects in underwater facilities of a pumped storage power station, comprising: Acquire optical images and sonar point clouds of underwater facilities; For the red channel of the optical image, the red channel mean is used as a benchmark, and the pixel value of the red channel is compensated by adaptive weights. Combined with the dark channel map of the optical image, the transmittance is calculated. Based on the transmittance and global background light, a fog-free image is obtained. After the fog-free image is enhanced end-to-end by a pre-trained enhancement network, the restored image is obtained. After polarization self-attention processing of the restored image, the defect region is identified based on the obtained feature map; The camera pose and sparse point cloud are estimated from the restored image to generate an optical point cloud. After point cloud registration and fusion of the optical point cloud and the sonar point cloud, a 3D model of the underwater facility is constructed. Based on the 3D model, the geometric parameters of the defect area are measured.
[0007] As an alternative implementation method, the formula for calculating transmittance is: ; In the formula, This is an estimate of the transmittance. This is the defogging coefficient; For red channel adaptive weights, The red channel mean; A local window centered at pixel x; This is the original pixel value of the red channel at pixel x; Let y be the original pixel value of the c-th channel at pixel y; This is the estimated global background light value for the c-th channel.
[0008] As an alternative implementation, the polarization self-attention processing of the restored image includes: calculating channel attention weights by encoding the channel branches and spatial branches in parallel. Spatial attention weights Output feature map Represented as: ; in, The input feature map is the feature map obtained by convolutional processing of the restored image.
[0009] As an alternative implementation, Wise-IoU loss is introduced during the polarization self-attention processing of the restored image, defined as follows: ; ; In the formula, and These are the center coordinates of the predicted bounding box and the ground truth bounding box, respectively. and The width and height of the minimum closure box. This indicates the gradient truncation operation; For dynamic focusing weighting factors; Based on the loss ratio of the intersection and union.
[0010] As an alternative implementation, during the generation of optical point clouds, a refractive surface normal vector constraint is introduced to optimize the reprojection error function: ; In the formula, For camera external parameters; It is a rotation matrix; It is a translation vector; For three-dimensional point coordinates, To account for the projection function of refraction, n and d are parameters of the refraction plane. For Huber robust kernel function; The j-th 3D point observed on the i-th frame of the image Image feature point coordinates; These are the weighting coefficients for the refraction constraint term; It is the set of all points in three-dimensional space.
[0011] As an alternative implementation, a PointNet++-based feature learning network is used for global coarse registration of optical and sonar point clouds. Hierarchical feature extraction is used to obtain global descriptors of the point clouds, and an initial correspondence is established using approximate nearest neighbor search. The coarse transformation matrix is then obtained by solving this matrix. ; Based on coarse registration, a point-to-plane ICP algorithm is used for fine registration to minimize the distance between the source point cloud and the local tangent plane of the target point cloud. ;in, For points in an optical point cloud, Cloud dotting for sonar The nearest neighbor point in the middle, For point The normal vector; T is the rigid body transformation matrix to be solved in the iterative optimization process, with the initial value obtained in the coarse registration stage. The objective function is updated gradually by iteratively minimizing it. ; After registration, the optical point cloud and sonar point cloud are fused based on the Bayesian minimum mean square error framework. The fused point cloud is generated by uncertainty weighting. Finally, the Poisson surface reconstruction algorithm is used to construct a 3D model of the underwater facility. By extracting the 2D edge contour of the defect area, the 2D edge contour is mapped to the 3D model to realize the measurement of the defect geometric parameters.
[0012] Secondly, the present invention provides a defect detection system for underwater facilities of a pumped storage power station, comprising: The acquisition module is configured to acquire optical images and sonar point clouds of underwater facilities; The enhancement module is configured to compensate the red channel pixel values of the optical image with the red channel mean as a reference through adaptive weights. Combined with the dark channel map of the optical image, the transmittance is calculated. Based on the transmittance and global background light, a fog-free image is obtained. After the fog-free image is enhanced end-to-end by the pre-trained enhancement network, the restored image is obtained. The localization module is configured to identify defect areas based on the obtained feature maps after polarization self-attention processing of the restored image. The detection module is configured to estimate the camera pose and sparse point cloud based on the restored image, thereby generating an optical point cloud. After point cloud registration and fusion of the optical point cloud and the sonar point cloud, a 3D model of the underwater facility is constructed, and the geometric parameters of the defect area are measured based on the 3D model.
[0013] Thirdly, the present invention provides an electronic device including a memory and a processor, and computer instructions stored in the memory and running on the processor, wherein the computer instructions, when executed by the processor, perform the method described in the first aspect.
[0014] Fourthly, the present invention provides a computer-readable storage medium for storing computer instructions, which, when executed by a processor, perform the method described in the first aspect.
[0015] Fifthly, the present invention provides a computer program product, including a computer program that, when executed by a processor, implements the method described in the first aspect.
[0016] Compared with the prior art, the beneficial effects of the present invention are as follows: This invention innovatively proposes a defect detection method for underwater facilities in pumped-storage power stations that integrates IoT sensing and image recognition. It develops an integrated system for multi-source data acquisition, image enhancement, intelligent defect identification, and 3D precision measurement. A combined underwater image enhancement strategy based on physical models and deep learning is proposed. An improved dark channel prior algorithm combined with a deep learning network is used to restore image quality, solving the problems of color distortion and contrast attenuation in underwater images. A lightweight defect detection network integrating a polarization self-attention mechanism is constructed. A 3D precision measurement technology combining acoustic-optical point cloud cross-modal registration and fusion is designed to achieve pixel-level semantic segmentation and accurate size measurement of typical defects such as cracks, cavitation, and erosion. This reduces the safety risks of underwater manual operations and the overall cost of operation and maintenance, improving the real-time performance of full-domain defect detection in pumped-storage power stations and the overall intelligence and automation level of operations.
[0017] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0019] Figure 1 This is a flowchart of the method for detecting defects in underwater facilities of a pumped storage power station provided in Embodiment 1 of the present invention; Figure 2 This is a comparison chart of the defect detection performance of different test methods provided in Embodiment 1 of the present invention; Figure 3 This is a comparison chart of the detection accuracy of various defects provided in Embodiment 1 of the present invention; Figure 4 This is a comparison chart of crack width measurement accuracy provided in Embodiment 1 of the present invention; Figure 5 This is a diagram showing the accuracy of erosion pit volume measurement provided in Embodiment 1 of the present invention. Detailed Implementation
[0020] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0021] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0022] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of exemplary embodiments according to the invention. As used herein, unless the context clearly indicates otherwise, the singular form is intended to include the plural form as well. Furthermore, it should be understood that the terms “comprising” and “including”, and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product, or apparatus that includes a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0023] Where there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.
[0024] Example 1 This embodiment provides a method for detecting defects in underwater facilities of pumped storage power stations by fusing IoT sensing with AI image recognition, such as... Figure 1 As shown, it includes: Acquire optical images and sonar point clouds of underwater facilities; For the red channel of the optical image, the red channel mean is used as a benchmark, and the pixel value of the red channel is compensated by adaptive weights. Combined with the dark channel map of the optical image, the transmittance is calculated. Based on the transmittance and global background light, a fog-free image is obtained. After the fog-free image is enhanced end-to-end by a pre-trained enhancement network, the restored image is obtained. After polarization self-attention processing of the restored image, the defect region is identified based on the obtained feature map; The camera pose and sparse point cloud are estimated from the restored image to generate an optical point cloud. After point cloud registration and fusion of the optical point cloud and the sonar point cloud, a 3D model of the underwater facility is constructed. Based on the 3D model, the geometric parameters of the defect area are measured.
[0025] This embodiment, based on computer vision, 3D point cloud processing, and multi-sensor data fusion theory, constructs an intelligent monitoring system for underwater equipment that combines perception, cognition, and measurement, focusing on the degradation mechanism of underwater optical images and the heterogeneous characteristics of acoustic and optical data. By building a multi-source sensor collaborative detection network and a hierarchical cross-modal registration framework, it solves problems such as the difficulty in extracting features of damaged targets in complex aquatic environments, the low spatiotemporal matching accuracy of multi-source data, and the difficulty in sub-millimeter scale detection. It breaks through the technical bottleneck of not being able to detect or measure accurately in complex aquatic environments, and provides reliable technical support for the safe operation and maintenance of underwater structures in pumped storage power stations.
[0026] Internet of Things (IoT) sensing and multimodal data acquisition.
[0027] This embodiment studies an integrated multi-sensor sensing system, specifically designed for remotely operated vehicles (ROVs) operating in the complex underwater environment of pumped-storage power stations. This system not only accurately collects defect data from underwater facilities but also enables simultaneous spatiotemporal acquisition of multi-source heterogeneous data. The system is mounted on an industrial-grade underwater vehicle and equipped with optical imaging, acoustic detection, and precise attitude control functions. Employing a distributed information acquisition architecture, it comprehensively detects concealed structures such as water flow channels, spiral structures, and wakes.
[0028] The optical sensor employs an industrial-grade binocular camera equipped with blue-green filters, and uses an adjustable LED array to compensate for light attenuation, effectively reducing backscattering interference. The acoustic detection layer is equipped with multiple forward-looking sonar arrays to scan turbid water bodies.
[0029] At the positioning and attitude levels, an ultra-short baseline underwater positioning system is employed, utilizing methods such as fiber optic inertial navigation and Doppler velocimetry to achieve high-precision baseline positioning and localization of the ROV. The aim is to achieve spatiotemporal synchronization of multiple sensors, supporting acoustic-optical point cloud registration and unified coordinate system fusion; providing initial camera pose values to assist in optical point cloud generation and refraction correction; correcting sonar point cloud motion distortion to improve reconstruction quality; and supporting ROV trajectory planning and autonomous inspection.
[0030] All sensors are connected to the main control computer via an Ethernet switch inside the waterproof chamber, and microsecond-level clock synchronization is achieved using the IEEE 1588 precision time protocol.
[0031] Considering the impact of multi-medium refraction (water-glass-air) on imaging geometry, the camera intrinsic parameter calibration adopts a pinhole camera model based on refraction compensation. The intrinsic parameter matrix considering radial and tangential distortion is obtained in an underwater environment using the Zhang checkerboard calibration method. and distortion coefficient A refractive interface normal vector correction model was established to correct image point offset.
[0032] The extrinsic parameter calibration of sonar and camera adopts a joint calibration method based on known geometric features: a specially made calibration plate (with dual optical and acoustic echo characteristics) is placed in the water, and images and sonar point clouds are acquired from multiple perspectives. The rigid transformation matrix of the corresponding 3D-3D point sets is solved. Establish camera coordinate system With sonar coordinate system The transformation relationship is then established. Further hand-eye calibration is used to determine the coordinate system of the sensor assembly and the ROV body. Fixed transformation and Construct a unified world coordinate system A multi-source data fusion framework.
[0033] Based on the structural characteristics of the pumped-storage power station's flow channel, a spiral-upward rotational trajectory planning method was designed to ensure the ROV moves at a constant speed along the channel centerline. Simultaneously, interactive sampling using an optical camera and sonar is employed to guarantee that the robot's point cloud density meets the requirements for subsequent 3D reconstruction. During image acquisition, the turbidity, illumination, and velocity of the water are monitored in real time, and images are converted to acoustic imaging methods when they deteriorate. All original datasets (including point cloud sonar data, image sequences, IMU data, and positioning data) are saved in ROS package format. This mechanism verifies the integrity of the original dataset, ensuring excellent input data quality and providing a reliable foundation for subsequent multimodal fusion and 3D reconstruction.
[0034] Underwater image enhancement and quality restoration.
[0035] Because suspended particles scatter light, and the light absorption characteristics of water differ from those of clear water, these factors can lead to problems such as image color shift, reduced contrast, and blurred details, thus affecting the accuracy of defect detection after image acquisition. Therefore, this embodiment combines physical principles with deep learning technology to achieve refractive index correction, color restoration, and detail enhancement; laying the foundation for subsequent development of correction technologies for uneven illumination and significant turbidity changes in pumped storage facilities.
[0036] The acquired optical images Represented as direct components With backscattering component Superposition: (1); in, For scene radiation, Transmittance, Used as the global background light.
[0037] To eliminate geometric distortion caused by refraction at the water-glass-air interface, Snell's law was applied to establish a new mapping relationship for pixel coordinate coefficients, and specific corrections were made for the radial distortion parameters of the lens. Furthermore, when traditional dark channel prior algorithms are directly applied to underwater images, color distortion often occurs due to the rapid attenuation of the red channel signal.
[0038] Therefore, this embodiment proposes an improved Dark Channel Prior (IDCP) algorithm, which adds an adaptive compensation method for the red channel on the basis of the traditional dark channel prior to optimize transmittance estimation.
[0039] Specifically: (1) First, calculate the dark channel map of the optical image. Take the minimum value of the RGB three channels of each pixel of the optical image, and then calculate the dark channel map in the local window. Minimum filtering is performed within a range of 15×15 to obtain the dark channel image.
[0040] (2) For the red channel of the optical image, the pixel values of the red channel are compensated using adaptive weights, with the red channel mean as the benchmark; then, the transmittance is calculated by combining the dark channel map of the optical image: (2); In the formula, This is an estimate of the transmittance. The dehazing factor is between 0 and 1, typically set to 0.95, used to retain a small amount of fog to maintain the naturalness and depth of field of the image; For the adaptive weighting of the red channel, the compensation intensity is adjusted according to the degree of red channel attenuation (the more severe the red channel attenuation, the higher the compensation intensity). The larger the value, the stronger the compensation; the lighter the attenuation. The smaller the value, the less likely it is to cause overcompensation. The red channel mean; A local window centered at pixel x; This is the original pixel value of the red channel at pixel x; For the c-th channel at pixel y ( The original pixel value; This is the estimated value of the global background light (atmospheric light) for the c-th channel.
[0041] Among them, the directly estimated The data may be coarse; methods such as guided filtering are typically used to smooth and refine it to obtain a more accurate transmittance. .
[0042] (3) For global background light The estimation adopts an adaptive partitioning strategy of the first 0.1% of bright pixels in the dark channel image. That is, the first 0.1% of the brightest pixels in the dark channel image are selected (representing the area with the densest fog), and then the corresponding positions in the original foggy image I are returned to, and the pixel values with the highest brightness in these positions are selected as the estimate of A, so as to avoid interference from white objects or bright areas.
[0043] (4) Finally, by solving the above model, a fog-free image is obtained. : (3); In the formula, This is the lower limit threshold for transmittance, typically set to 0.1, to avoid... Too small a value results in excessive noise in the restored image.
[0044] Physical modeling methods still suffer from detail loss in complex aquatic environments. Therefore, this embodiment introduces an underwater image enhancement network (Water-Net) for end-to-end enhancement. This network adopts an encoder-decoder structure. The encoder captures degenerate features at different levels through multi-scale feature extraction, while the decoder uses skip connections to fuse shallow texture and deep semantic information.
[0045] loss function Designed for The weighted sum of loss, structural similarity (SSIM) loss, and perceptual loss: (4); In the formula, , , They are respectively Weighting coefficients for loss, structural similarity loss, and perceptual loss; The restored image output by the model (i.e., the predicted image obtained after network dehazing / enhancement); The ground truth is the clear reference image, which is the fog-free / clear ground truth image corresponding to the input foggy / degraded image. This is a high-level feature extractor for a pre-trained VGG network, used to extract high-level semantic features of images to measure the visual perception quality of images.
[0046] During training, the output of the physical model is used as the initial input to the network, forming a joint optimization strategy of physical guidance and data drive. The final output is a high-quality restored image with natural colors and clear edges, providing a reliable data foundation for subsequent defect detection.
[0047] A defect detection network based on an improved YOLO.
[0048] This embodiment takes pumped storage power stations as the research object. In view of the problems of large scale, blurred boundaries and few samples, it studies a lightweight defect detection network based on polarization self-attention mechanism and weighted crossover rate loss to achieve accurate identification and pixel segmentation of typical defects (such as cracks, cavitation and pits).
[0049] Using YOLOv8-seg as a benchmark, a polarized self-attention (PSA) mechanism is embedded after the C2f module of the backbone network to establish an enhanced feature extraction framework.
[0050] The PSA module calculates the channel attention weights separately through parallel encoding of channel branches and spatial branches. Spatial attention weights , For height, For width, The number of channels; its output feature map Represented as: (5); in, The input feature map is the feature map obtained after processing the restored image through a backbone network (convolution / C2f, etc.).
[0051] This mechanism suppresses interference from complex flow channel backgrounds and enhances the detection performance of low-contrast defects (such as cracks). The neck region employs a two-dimensional pyramidal mesh (Bi-FPN) to integrate multi-scale feature information, thereby improving the detection capability for minute voids (holes). The detection head adopts a decoupled design, with classification and regression branches optimized independently. Finally, pixel-level prediction of the defect region is achieved by adding a segmentation branch (mask head). The output is the defect region identification result, with dimensions of [dimension not specified]. ,in, This represents the number of defect categories.
[0052] To address the inaccurate localization issue caused by blurred edges in underwater images, Wise-IoU (Weighted Intersection over Union, WIoU) loss is introduced to replace the standard CIoU (Complete Intersection over Union) loss. WIoU reduces the gradient contribution of low-quality samples (high overlap but coarse localization) through a dynamic focusing mechanism, and its definition is as follows: (6); (7); In the formula, and These are the center coordinates of the predicted bounding box and the ground truth bounding box, respectively. and The width and height of the minimum closure box. This indicates the gradient truncation operation; For dynamic focusing weighting factors; Based on the loss ratio of the intersection and union.
[0053] The splitting branch is optimized using a joint approach of Dice Loss and Binary Cross-Entropy Loss (BCE): (8); In the formula, For a realistic mask, For predicting masks; and Balance coefficient.
[0054] A transfer learning strategy was adopted, first pre-training on a public dataset, then fine-tuning using similar datasets in areas such as bridge cracks and dam damage, and finally fine-tuning on a self-built dataset for pumped storage power stations. Data augmentation employed a hybrid strategy of Mosaic concatenation and MixUp to improve the model's generalization ability.
[0055] Cross-modal 3D reconstruction and fusion.
[0056] This embodiment aims to study the high-precision 3D positioning and geometric parameter quantification of underwater structural defects. It proposes to construct a defect 3D modeling method with absolute scale characteristics through point cloud registration, complementary fusion, and surface reconstruction.
[0057] For optical image sequences, i.e. restored images after dehazing and enhancement, the incremental Structure from Motion (SfM) algorithm is used to estimate the sparse point cloud of camera pose and underwater facility targets.
[0058] The core function of the SfM algorithm is to recover the three-dimensional structure of the scene from the image sequence. In this embodiment, the research object is the underwater facility of the pumped storage power station (which has defects such as cracks, cavitation, and pits). Therefore, the image sequence is taken of the surface of the underwater facility, and the recovered sparse point cloud is the three-dimensional surface structure point cloud of the underwater facility.
[0059] In this regard, considering the bending of imaging rays caused by refraction in the water medium, a refraction surface normal vector constraint is introduced in the bundle adjustment to optimize the reprojection error function: (9); In the formula, For camera external parameters; It is a rotation matrix; It is a translation vector; For three-dimensional point coordinates, To account for the projection function of refraction, n and d are parameters of the refraction plane. For Huber robust kernel function; The j-th 3D point observed on the i-th frame of the image Image feature point coordinates; These are the weighting coefficients for the refraction constraint term; The set of all points in three-dimensional space (i.e. Together with the camera extrinsic parameter P, these parameters serve as optimization variables for bundle adjustment, ultimately outputting a sparse 3D point cloud of the scene.
[0060] After sparse reconstruction, a multi-view stereo (MVS) algorithm is used for dense matching to generate a high-density optical point cloud. .
[0061] Motion distortion correction is performed on sonar point cloud data acquired by multibeam forward-looking sonar. DVL velocity information is used to compensate for point cloud trailing caused by ROV motion. Uniform sonar point clouds are obtained through statistical outlier removal (SOR) filtering and voxel grid downsampling. .
[0062] Due to significant geometric and density differences between optical and sonar point clouds, a PointNet++-based feature learning network is used for global coarse registration. This network obtains global descriptors of the point clouds through hierarchical feature extraction, establishes initial correspondences using approximate nearest neighbor search, and solves for the coarse transformation matrix. .
[0063] Specifically: (1) Hierarchical global feature extraction.
[0064] To address the significant differences in geometry and density between optical point clouds and sonar point clouds, we first employ the PointNet++ network to perform hierarchical feature learning for both types of point clouds: The Farthest Point Sampling (FPS) algorithm is used to filter key points layer by layer to ensure that the sampling points uniformly cover the global structure; By using ball queries, a local neighborhood is constructed for each key point, capturing local geometric context information; By aggregating local neighborhood features using mini-PointNet and passing them through multiple levels, a high-dimensional global geometric descriptor is generated for each point, taking into account both local details and global topology, thus providing a robust feature foundation for subsequent matching.
[0065] (2) Initial correspondence established.
[0066] Based on the extracted global descriptors, an initial point pair correspondence is constructed through an approximate nearest neighbor search: Using the feature vector of each point in the source point cloud as the query term, the nearest point in the feature space is retrieved from the feature library of the target point cloud using the Approximate Nearest Neighbor (ANN) algorithm. Calculate the characteristic Euclidean distance Set a similarity threshold τ, retain only point pairs where d < τ, filter out obvious mismatches, and obtain the initial set of corresponding points. .
[0067] (3) Solving for the coarse transformation matrix.
[0068] For the initial set of corresponding points, a robust rigid body transformation method is used to obtain the coarse registration transformation matrix. : Centroid decentralization: Calculate the centroids of corresponding points in the source and target point clouds respectively. , Decentering the point coordinates yields , ; Covariance Matrix Construction and SVD Decomposition: Constructing the Covariance Matrix SVD decomposition of H Calculate the rotation matrix If det(R) < 0, then it is corrected to To avoid reflection transformation; Translation vector calculation: Solving for the translation vector The rigid body transformation matrix is obtained by combining the results. ; Robustness optimization: Combining the RANSAC framework, three sets of non-collinear point pairs are randomly sampled to iteratively solve candidate transformations, the number of interior points is counted, and the transformation matrix with the most interior points is selected as the final coarse registration result, which effectively suppresses the interference of initial mismatches.
[0069] In this embodiment, based on coarse registration, a point-to-plane ICP (ICP) algorithm is used for fine registration to minimize the distance between the source point cloud and the local tangent plane of the target point cloud: (10); in, For points in an optical point cloud, Cloud dotting for sonar The nearest neighbor point in the middle, For point The normal vector; T is the rigid body transformation matrix to be solved in the iterative optimization process, with the initial value obtained in the coarse registration stage. The objective function is updated gradually by iteratively minimizing it. .
[0070] In this embodiment, after registration, the optical and sonar dual-modal point clouds are fused based on the Bayesian minimum mean square error (MMSE) framework. The optical point cloud provides high-density surface details, while the sonar point cloud provides absolute scale and large-scale structural information. The fused point cloud is generated through uncertainty weighting. .
[0071] Specifically: Uncertainty Modeling: For optical point clouds, positional uncertainty is quantified by reprojection error reconstructed from SfM / MVS, camera pose covariance, and local geometric consistency (point density, curvature, and normal vector variance). For sonar point clouds, the position uncertainty is quantified by beam ranging / angle noise, positioning / attitude error, and local plane fitting residual. .
[0072] Uncertainty-weighted approach: Under a unified coordinate system, for each point in the spatially overlapping region, the fusion weight is calculated using the reciprocal of the uncertainty as the confidence level.
[0073]
[0074]
[0075] Fusion point cloud generation: Weighted average of bimodal point coordinates yields a fused point cloud that combines high-precision detail with global scale consistency. : .
[0076] Finally, the Poisson surface reconstruction algorithm is used to construct a continuous 3D surface model of the underwater facility, providing a unified spatial reference for the measurement of defect geometric parameters. The 2D edge contour of the defect is accurately obtained by combining the subpixel contour extraction method. The 2D subpixel contour is registered / mapped with the 3D surface model (combining camera pose and SfM results) to map the 2D edge contour to the 3D surface model, so as to realize the three-dimensional size measurement of defect geometric parameters such as crack length, erosion depth, depth, and erosion range.
[0077] Experimental design.
[0078] To test the performance of the method in this embodiment, a hybrid flow turbine channel was constructed for experiments. The channel was approximately 4.5 meters wide and 35 meters long, and the test environment simulated conditions such as concrete aging, steel lining corrosion, and sediment adhesion to the channel walls. The test depth covered a range of 15 to 45 meters, and different turbidity and illumination conditions were set within the channel. The test setup included an industrial-grade binocular camera, a forward-looking multibeam sonar, and a composite navigation system integrating an ultra-short baseline underwater acoustic positioning system and fiber optic inertial reference units (IRUs).
[0079] The algorithm validation dataset is a sonar point cloud dataset built during three months of field observations, including 12,450 optical images and 580 imagery. This dataset was jointly calibrated by professionals to form a complete model. The Multi-modal Sensing YOLO (MS-YOLO) algorithm was compared with conventional YOLOv8-seg, MaskR-CNN, and single-channel detection methods to validate and optimize the algorithm's effectiveness.
[0080] Defect detection performance was evaluated by classifying images using precision, recall, F1 score, and mean accuracy (mAP@0.5, where mAP@0.5 is the mean accuracy at an IoU threshold of 0.5, and mAP@0.5:0.95 is the mean accuracy across IoU thresholds from 0.5 to 0.95). The segmentation results were assessed. 3D reconstruction accuracy was measured by relative translational error (RTE), relative rotational error (RRE), and point cloud distance referenced to terrestrial 3D laser scanning. Control experiments were conducted using artificial cracks and standard erosion samples. Dimensional measurement accuracy was characterized by mean absolute error (MAE), root mean squared error (RMSE), and maximum absolute error (MAXimum error), and evaluated through multiple independent inspections.
[0081] Defect detection performance.
[0082] To comprehensively evaluate the detection performance of the improved YOLO network, quantitative experiments will be conducted using a test set. The experiments will target four underwater defect types (cracks, cavitation pits, erosion grooves, and welding defects) to verify the detection effectiveness of the PSA attention mechanism and the WIoU loss function. Experimental methods include standard YOLOv8-seg, Mask R-CNN, YOLOv8-seg with PSA ablation, and YOLOv8-seg with WIoU loss function ablation, to evaluate the contributions of the PSA module and the WIoU loss function, thereby quantifying the contribution of each improved module. Furthermore, the experiments will be conducted under different turbidity and lighting conditions to fully evaluate the model's generalization performance.
[0083] Figure 2 The defect detection results of each experimental method are presented, including mAP@0.5, mAP@0.5:0.95, Boundary F-score, mIOU (Mean Intersection over Union), and Inferencespeed (inference speed, usually expressed in FPS, Frames Per Second). Figure 3The accuracy of defect detection was compared. It can be seen that the method proposed in this embodiment significantly outperforms the comparative methods in overall detection accuracy, achieving an mAP@0.5 of 0.947, a 6.3 percentage point improvement over the baseline YOLOv8-seg and a 5.6 percentage point improvement over Mask R-CNN. The WIoU loss function optimizes the bounding box regression accuracy, reducing center point bias and demonstrating advantages for targets with irregular boundaries such as cavitation pits. The lightweight design allows the model to maintain a real-time inference speed of 42 FPS, meeting the needs of rapid on-site inspections in engineering projects.
[0084] Cross-modal reconstruction and registration accuracy.
[0085] To evaluate the accuracy of cross-modal 3D reconstruction and fusion strategies, this embodiment focuses on examining the quality of optical photogrammetry and sonar point cloud registration, as well as the geometric fidelity of the fused reconstruction. The experiment compares control points with their corresponding ground truth values to evaluate the overall joint optimization effect of PointNet++ coarse registration combined with ICP fine registration. It also analyzes the comprehensive effect of dual-modal complementary fusion on improving reconstruction integrity. Table 1 shows the comparison results of point cloud registration accuracy, intuitively presenting the advantages and disadvantages of different registration methods.
[0086] Table 1. Comparison of point cloud registration accuracy; .
[0087] The hierarchical registration strategy proposed in this embodiment significantly improves the registration accuracy and robustness of heterogeneous point clouds. The global features learned by PointNet++ effectively overcome the problem of large initial pose differences, achieving a coarse registration success rate of 94.6%, providing reliable initial values for subsequent ICP fine registration. Ultimately, the relative translation error is reduced to 0.048 meters and the relative rotation error to 0.73 degrees, meeting the requirements of engineering measurements. Compared to the limitations of traditional ICP which relies on good initial values, the method proposed in this embodiment demonstrates stronger adaptability in scenarios where optical data is missing due to variations in turbidity. Table 2 shows a comparison of 3D reconstruction quality.
[0088] Table 2 Comparison of 3D Reconstruction Quality; .
[0089] The 3D reconstruction quality assessment validated the advantages of cross-modal fusion. Pure optical SfM exhibited rich detail in the clear water region (density 4850 pts / m). 2 However, the integrity was insufficient (87.4%), especially with data gaps appearing on featureless surfaces; pure sonar reconstruction showed complete coverage (98.2%) but with rough geometry (RMSE 34.7 mm). This embodiment's fusion method, through uncertainty-weighted complementary fusion, maintains a high integrity of 96.8% while reducing the root mean square error to 6.8 mm, achieving a density of 3680 pts / m².2 This achieves synergistic optimization of optical accuracy and sonar coverage.
[0090] Dimensional measurement accuracy.
[0091] To verify the accuracy of quantitative measurement of defect geometric parameters, this embodiment conducts standard sample control experiments and field repeatability tests, focusing on evaluating the measurement accuracy and stability of key indicators such as crack width and erosion pit volume. Artificially pre-fabricated cracks and machined standard samples are placed at the detection depth in the experiment. Contact measurement results are used as the true value benchmark to compare and analyze the sub-millimeter measurement capability of the method in this embodiment.
[0092] Figure 4 A comparison of crack width measurement accuracy is presented. It can be seen that the fusion measurement method in this embodiment achieves an accuracy of 0.11 mm mean absolute error (MAE) and 0.15 mm root mean square error (RMSE) in crack width detection, meeting the sub-millimeter measurement requirements in engineering inspection. Error analysis shows that the maximum deviation of 0.28 mm occurs in oblique cracks with an inclination angle exceeding 60°, attributed to the distortion of the segmentation mask caused by perspective compression. Further optimization can be achieved through multi-angle fusion observation.
[0093] Figure 5 This is a comparison chart of the accuracy of erosion pit volume measurement. From the actual volume, the measured volume, and the absolute error between them, it can be seen that the accuracy of erosion pit volume measurement is significantly lower for large-volume samples (>4000 mm). 3 The accuracy is better than that of small-volume samples, and it conforms to the error distribution law of point cloud discretization. The measurement error of irregular pits is slightly higher than that of standard spheres, which verifies the adaptability of Poisson surface reconstruction to complex boundaries.
[0094] Because the underwater facilities of pumped storage power stations are subjected to high-speed water flow, cavitation erosion and chemical corrosion for a long time, the detection of defects in their concealed structures such as flow channels and spiral shells faces technical bottlenecks such as the degradation of underwater optical imaging, difficulty in heterogeneous fusion of multimodal data and difficulty in achieving high-precision dimensional measurement. Traditional manual underwater inspection methods have limitations such as high safety risks, strong subjectivity and inability to quantify.
[0095] Therefore, this embodiment proposes a defect detection method for underwater facilities in pumped storage power stations that integrates IoT sensing and AI image recognition, and constructs a complete technical system from multi-source data acquisition, image enhancement, defect identification to three-dimensional precision measurement.
[0096] Specifically, the system deploys an ROV IoT sensing terminal system equipped with a high-definition camera, multi-beam forward-looking sonar, and an ultra-short baseline positioning system to achieve simultaneous acquisition of multimodal data, including optical images, acoustic point clouds, and pose data. Addressing the issues of color distortion and contrast degradation in underwater images, a combined underwater image enhancement strategy based on physical models and deep learning is proposed. An improved dark channel prior algorithm combined with the Water-Net deep learning network is used for image enhancement to restore image quality. A lightweight defect detection network integrating a polarization self-attention mechanism is constructed, and the WIoU loss function is introduced to optimize bounding box regression, achieving pixel-level semantic segmentation of typical defects such as cracks, cavitation, and erosion. A three-dimensional precision measurement technology for cross-modal registration and fusion of acoustic and optical point clouds is developed. Underwater photogrammetry (SfM) and structured light scanning are used to reconstruct sonar point clouds, achieving cross-modal registration and fusion of sonar and optical point clouds based on PointNet++. This enables 3D modeling with absolute scale characteristics, thereby accurately determining the geometric parameters of various cracks and erosion depths.
[0097] Experimental results show that the proposed method achieves an average defect detection accuracy (mAP@0.5) of 94.7% in complex aquatic environments, while the traditional YOLOv8 method only achieves an average detection accuracy (mAP@0.5) of 88.4%. In crack width measurement, the mean absolute error (MAE) is 0.11 mm, the root mean square error (RMSE) is 0.15 mm, and the RMSE for 3D underwater structure reconstruction reaches 6.8 mm. This effectively solves the technical bottleneck of undetectable and inaccurate underwater facility measurements. It provides reliable technical support for the safe operation and maintenance of underwater structures in pumped storage power stations, significantly improves the automation level and quantitative assessment accuracy of defect identification, and has significant engineering application value and promising prospects for widespread application.
[0098] Example 2 This embodiment provides a defect detection system for underwater facilities of a pumped storage power station, including: The acquisition module is configured to acquire optical images and sonar point clouds of underwater facilities; The enhancement module is configured to compensate the red channel pixel values of the optical image with the red channel mean as a reference through adaptive weights. Combined with the dark channel map of the optical image, the transmittance is calculated. Based on the transmittance and global background light, a fog-free image is obtained. After the fog-free image is enhanced end-to-end by the pre-trained enhancement network, the restored image is obtained. The localization module is configured to identify defect areas based on the obtained feature maps after polarization self-attention processing of the restored image. The detection module is configured to estimate the camera pose and sparse point cloud based on the restored image, thereby generating an optical point cloud. After point cloud registration and fusion of the optical point cloud and the sonar point cloud, a 3D model of the underwater facility is constructed, and the geometric parameters of the defect area are measured based on the 3D model.
[0099] It should be noted that the above modules correspond to the steps described in Embodiment 1, and the examples and application scenarios implemented by the above modules and the corresponding steps are the same, but are not limited to the content disclosed in Embodiment 1. It should also be noted that the above modules, as part of the system, can be executed in a computer system such as a set of computer-executable instructions.
[0100] In further embodiments, the following is also provided: An electronic device includes a memory and a processor, as well as computer instructions stored in the memory and running on the processor, wherein the computer instructions, when executed by the processor, perform the method described in Embodiment 1. For brevity, further details are omitted here.
[0101] It should be understood that in this embodiment, the processor can be a central processing unit (CPU), or it can be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc.
[0102] Memory may include read-only memory and random access memory, and provides instructions and data to the processor. A portion of memory may also include non-volatile random access memory. For example, memory may also store information about the device type.
[0103] A computer-readable storage medium for storing computer instructions, which, when executed by a processor, perform the method described in Embodiment 1.
[0104] The method in Example 1 can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor. The software modules can reside in readily available storage media in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, a detailed description is not provided here.
[0105] A computer program product includes a computer program that, when executed by a processor, implements the method described in Embodiment 1.
[0106] The present invention also provides at least one computer program product tangibly stored on a non-transitory computer-readable storage medium. The computer program product includes computer-executable instructions, such as instructions included in program modules, which execute in a device on a target real or virtual processor to perform the processes / methods described above. Typically, program modules include routines, programs, libraries, objects, classes, components, data structures, etc., that perform specific tasks or implement specific abstract data types. In various embodiments, the functionality of program modules can be combined or divided among program modules as needed. The machine-executable instructions for the program modules can execute within a local or distributed device. In a distributed device, the program modules can reside in both local and remote storage media.
[0107] The computer program code used to implement the methods of the present invention may be written in one or more programming languages. This computer program code may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the computer or other programmable data processing device, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a computer, partially on a computer, as a stand-alone software package, partially on a computer and partially on a remote computer, or entirely on a remote computer or server.
[0108] In the context of this invention, computer program code or related data may be carried by any suitable carrier to enable a device, apparatus, or processor to perform the various processes and operations described above. Examples of carriers include signals, computer-readable media, and the like. Examples of signals may include electrical, optical, radio, sound, or other forms of propagation signals, such as carrier waves, infrared signals, etc.
[0109] Those skilled in the art will recognize that the units and algorithm steps described in connection with the various examples of this embodiment can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention.
[0110] While the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.
Claims
1. A method for detecting defects in underwater facilities of a pumped-storage power station, characterized in that, include: Acquire optical images and sonar point clouds of underwater facilities; For the red channel of the optical image, the red channel mean is used as a benchmark, and the pixel value of the red channel is compensated by adaptive weights. Combined with the dark channel map of the optical image, the transmittance is calculated. Based on the transmittance and global background light, a fog-free image is obtained. After the fog-free image is enhanced end-to-end by a pre-trained enhancement network, the restored image is obtained. After polarization self-attention processing of the restored image, the defect region is identified based on the obtained feature map; The camera pose and sparse point cloud are estimated from the restored image to generate an optical point cloud. After point cloud registration and fusion of the optical point cloud and the sonar point cloud, a 3D model of the underwater facility is constructed. Based on the 3D model, the geometric parameters of the defect area are measured.
2. The method for detecting defects in underwater facilities of a pumped storage power station as described in claim 1, characterized in that, The formula for calculating transmittance is: ; In the formula, This is an estimate of the transmittance. This is the defogging coefficient; For red channel adaptive weights, The red channel mean; A local window centered at pixel x; This represents the original pixel value of the red channel at pixel x. Let y be the original pixel value of the c-th channel at pixel y; This is the estimated global background light value for the c-th channel.
3. The method for detecting defects in underwater facilities of a pumped storage power station as described in claim 1, characterized in that, The polarization self-attention processing of the restored image includes: calculating the channel attention weights by encoding the channel branch and the spatial branch in parallel. Spatial attention weights Output feature map Represented as: ; in, The input feature map is the feature map obtained by convolutional processing of the restored image.
4. The method for detecting defects in underwater facilities of a pumped storage power station as described in claim 3, characterized in that, In the process of polarization self-attention processing of the restored image, Wise-IoU loss is introduced, defined as follows: ; ; In the formula, and These are the center coordinates of the predicted bounding box and the ground truth bounding box, respectively. and The width and height of the minimum closure box. This indicates the gradient truncation operation; For dynamic focusing weighting factors; Based on the loss ratio of the intersection and union.
5. The method for detecting defects in underwater facilities of a pumped storage power station as described in claim 1, characterized in that, In the process of generating optical point clouds, a refractive surface normal vector constraint is introduced to optimize the reprojection error function: ; In the formula, For camera external parameters; It is a rotation matrix; It is a translation vector; For three-dimensional point coordinates, To account for the projection function of refraction, n and d are parameters of the refraction plane. For Huber robust kernel function; The j-th 3D point observed on the i-th frame of the image Image feature point coordinates; These are the weighting coefficients for the refraction constraint term; It is the set of all points in three-dimensional space.
6. The method for detecting defects in underwater facilities of a pumped storage power station as described in claim 1, characterized in that, For optical and sonar point clouds, a PointNet++-based feature learning network is used for global coarse registration. Hierarchical feature extraction is used to obtain global descriptors of the point clouds, and an initial correspondence is established using approximate nearest neighbor search. The coarse transformation matrix is then obtained by solving the network. ; Based on coarse registration, a point-to-plane ICP algorithm is used for fine registration to minimize the distance between the source point cloud and the local tangent plane of the target point cloud. ;in, For points in an optical point cloud, Cloud dotting for sonar The nearest neighbor point in the middle, For point The normal vector; T is the rigid body transformation matrix to be solved in the iterative optimization process, with the initial value obtained in the coarse registration stage. The objective function is updated gradually by iteratively minimizing it. ; After registration, the optical point cloud and sonar point cloud are fused based on the Bayesian minimum mean square error framework. The fused point cloud is generated by uncertainty weighting. Finally, the Poisson surface reconstruction algorithm is used to construct a 3D model of the underwater facility. By extracting the 2D edge contour of the defect area, the 2D edge contour is mapped to the 3D model to realize the measurement of the defect geometric parameters.
7. A defect detection system for underwater facilities of a pumped storage power station, characterized in that, include: The acquisition module is configured to acquire optical images and sonar point clouds of underwater facilities; The enhancement module is configured to compensate the red channel pixel values of the optical image with the red channel mean as a reference through adaptive weights. Combined with the dark channel map of the optical image, the transmittance is calculated. Based on the transmittance and global background light, a fog-free image is obtained. After the fog-free image is enhanced end-to-end by the pre-trained enhancement network, the restored image is obtained. The localization module is configured to identify defect areas based on the obtained feature maps after polarization self-attention processing of the restored image. The detection module is configured to estimate the camera pose and sparse point cloud based on the restored image, thereby generating an optical point cloud. After point cloud registration and fusion of the optical point cloud and the sonar point cloud, a 3D model of the underwater facility is constructed, and the geometric parameters of the defect area are measured based on the 3D model.
8. An electronic device, characterized in that, It includes a memory and a processor, as well as computer instructions stored in the memory and running on the processor, which, when executed by the processor, perform the method according to any one of claims 1-6.
9. A computer-readable storage medium, characterized in that, Used to store computer instructions, which, when executed by a processor, perform the method described in any one of claims 1-6.
10. A computer program product, characterized in that, Includes a computer program, which, when executed by a processor, implements the method described in any one of claims 1-6.