An improved crack image blur removal method of MPRNet and an improved MPRNet module

By introducing a void space pyramid pooling module and an improved MPRNet module, the problem of insufficient image restoration capability of the existing MPRNet in small cracks is solved, achieving higher quality image restoration results and improving the accuracy of the UAV monitoring system.

CN118608424BActive Publication Date: 2026-06-19CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2024-06-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing MPRNet has difficulty effectively recovering the blurred images of small cracks when processing concrete crack images captured by drones, resulting in a decrease in image quality and affecting the accuracy of the monitoring system.

Method used

By introducing the Hollow Spatial Pyramid Pooling Module (ASPP) and the improved MPRNet module, which integrate multi-scale feature fusion, and combining the ESRGAN module and ORSNet, the ability to restore crack edge details is enhanced through three-level feature extraction and feature transfer.

Benefits of technology

It improves the ability to restore blurred images with small cracks, enhances the effect of image detail restoration, and improves image quality and the accuracy of monitoring systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118608424B_ABST
    Figure CN118608424B_ABST
Patent Text Reader

Abstract

This invention discloses an improved MPRNet method for removing blur from cracked images and an improved MPRNet module, belonging to the field of image processing technology. The cracked image blur removal method includes: Step 1, collecting and constructing a cracked image dataset; Step 2, constructing an improved MPRNet structure; Step 3, obtaining neural network weights through transfer learning and neural network training; Step 4, inputting a blurred cracked image and performing image restoration and outputting the result. The improved MPRNet module contains six dilated convolutional layers, with dilation rates of 1, 2, 3, 1, 2, and 3 for each layer. Compared with existing MPRNets, the technical advantage of this invention is that it improves the ability to restore small cracks in blurred images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing technology, specifically relating to a method for removing blur from cracked images and an improved MPRNet module. Background Technology

[0002] For building and bridge structures, concrete cracking is one of the most common types of damage. Failure to address this damage promptly can lead to significant economic losses. Therefore, timely identification of concrete crack damage and subsequent maintenance of building and bridge structures has become a crucial issue in the field of civil engineering. Furthermore, with the development of computer vision technology, the integration of drones with computer vision has become a current research hotspot.

[0003] During the acquisition of crack images by drones, strong winds can adversely affect the drones, causing motion blur in the acquired images. Furthermore, the weight of the drone's imaging equipment limits the accuracy of the images. Therefore, improving the quality of drone-captured images, thereby enhancing the practicality of the monitoring system and the accuracy of early warnings, has become an important research topic.

[0004] Deep learning algorithms can effectively solve image restoration problems when the blur kernel is unknown. Based on deep learning, various basic architectures have been developed for deblurring algorithms, including deep decoder and encoder structures, generative adversarial networks (GANs), and multi-scale fusion network structures. Different architectures result in different image quality improvements. According to the paper "MPRNet: Multi-Stage Progressive Image Restoration," Zamir SW, Arora A, Khan S, et al., Proceedings of the IEEE / CVF conference on computer vision and pattern recognition, pp. 14821-14831, 2021, MPRNet can not only remove blur noise from everyday scene images but also restore images affected by raindrops taken on rainy days. Its multi-stage progressive image restoration method improves the model's restoration performance. MPRNet employs three feature extraction paths to extract feature information at different scales. Features in each path are progressively extracted through a series of convolutional layers, and then cross-path feature fusion is performed at different levels. After multi-path processing and feature fusion, the features are integrated and then used to generate a deblurred image through a final convolutional layer and upsampling operation. MPRNet's unique multi-stage fusion method improves the utilization of image information. However, image convolution often reduces the resolution of feature maps, which results in some loss of image information and may reduce the recovery effect on blurred images with small cracks. Summary of the Invention

[0005] To address the problems existing in the current MPRNet, the technical problem to be solved by this invention is to provide an improved MPRNet method for removing blur from cracked images, which can improve the recovery capability of blurred images with small cracks. An improved MPRNet module is also provided.

[0006] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows:

[0007] This invention provides an improved crack image blur removal method for MPRNet, comprising the following steps:

[0008] Step 1: Collect and construct a crack image dataset

[0009] In a crack image dataset containing normal crack images and blurred crack images, a certain number of images are randomly selected as the training set and the test set.

[0010] Step 2: Construct the improved MPRNet structure

[0011] The improved MPRNet includes an ESRGAN module and an improved MPRNet module, with the output of the ESRGAN module being transmitted to the input of the improved MPRNet module;

[0012] Step 3: Obtain neural network weights through transfer learning and neural network training.

[0013] The original ESRGAN module was trained on the DIV2K training set and the Flickr2K dataset, and network weights were generated. These pre-trained weights were then transferred to the ESRGAN module in step 2 to obtain the weights of the ESRGAN module.

[0014] The original MPRNet module was trained and tested using the GoPro dataset. The training model generated training weights, and these pre-trained weights were transferred to the improved MPRNet module in step 2.

[0015] The improved MPRNet was trained and tested using the crack image training set and test set obtained in step 1, respectively, to obtain the improved MPRNet weights.

[0016] Step 4: Input the blurred image of the crack and perform image restoration and output the results.

[0017] The present invention provides an improved MPRNet module, comprising a three-level feature extraction module for feature extraction and feature transmission from bottom to top, an original resolution module ORSNet, and a convolution module; the first level, the first half of the second level, and the third level are all shallow feature extraction modules, which are composed of convolutional layers, channel attention modules CAB, and dilated spatial pyramid pooling ASPP modules;

[0018] The latter half of the first and second levels is the deep information extraction module of the encoder-decoder subnetwork. The outputs of the two encoder-decoder subnetworks and the first and second level images are transmitted to the corresponding supervised attention module (SAM). The output of the first level SAM is concatenated with the output of the second level shallow feature extraction module and used as the input of the second level encoder-decoder subnetwork. The image context information obtained by the first level encoder-decoder subnetwork is input into the second level encoder. The output of the second level SAM is concatenated with the output of the third level shallow feature extraction module and sent to the original resolution module. The image context information obtained by the second level encoder-decoder subnetwork is input into the original resolution module (ORSNet). The feature map output by the original resolution module after passing through the convolution module is added to the third level image to output the deblurred image. The first level image is cut into four small blocks, the second level image is cut into two small blocks, and the third level image is a complete image patch.

[0019] The technical effects of this invention are:

[0020] The improved MPRNet module of this invention introduces a multi-scale feature fusion-based spatial pyramid pooling module (ASPP) to enhance the edge details of cracks, thereby improving MPRNet. Compared with existing MPRNets, it improves the ability to recover fine cracks in blurred images. Attached Figure Description

[0021] The accompanying drawings of this invention are described below:

[0022] Figure 1 This is a flowchart of the present invention;

[0023] Figure 2 This is a diagram of the improved MPRNet structure of the present invention;

[0024] Figure 3 A structural diagram of the ASPP module for the pyramid pooling of void spaces;

[0025] Figure 4 This is a comparison chart of the blurred image restoration results of the present invention and the existing MPRNet. Detailed Implementation

[0026] The present invention will be further described below with reference to the accompanying drawings and embodiments:

[0027] To clearly describe the invention, this patent application uses the directional terms "upper" and "lower" for distinction. The terms "upper" and "lower" are determined based on the arrangement of the above figures. When the actual use direction of the invention changes, the terminology of the orientation will change accordingly, and this should not be regarded as a limitation on the scope of patent protection.

[0028] The technical terms used in this application are:

[0029] Spatial Pyramid Pooling with Hollows (ASPP): According to the literature “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs”, Chen, Liang-Chieh et al., IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, No. 4, pp. 834-848, 2017, the ASPP module can improve the model’s ability to acquire multi-scale information.

[0030] The CAB module, encoder, decoder, supervised attention module SAM, and original resolution module (ORSNet): According to the paper "MPRNet: Multi-Stage Progressive Image Restoration", Zamir SW, Arora A, Khan S et al., Proceedings of the IEEE / CVF conference on computer vision and pattern recognition, pp. 14821-14831, 2021, ORSNet, SAM, and encoder-decoder can improve the model restoration performance. The CAB module is used to extract image feature information at various scales.

[0031] Residual Dense Block (RRDB) and ESRGAN Model: According to the paper "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks, Wang X, Yu K, Wu S et al., Proceedings of the European Conference on Computer Vision (ECCV) Workshops," 2018, residual in residual dense block (RRDB) is used for extracting deep features of the network in resolution models. ESRGAN is used to improve image resolution.

[0032] The process of this invention is as follows: Figure 1 As shown, it includes the following steps:

[0033] Step 1: Collect and construct a crack image dataset

[0034] Images of cracks captured by drones were collected. A certain number of normal crack images were randomly selected and blurred images were generated through motion blur convolution. This formed a crack image dataset containing two categories: normal crack images and blurred crack images, with a 1:1 ratio of normal images to blurred images. A certain number of images were randomly selected as the training and test sets, with a training set to test set ratio of 8:2.

[0035] Step 2: Construct the improved MPRNet structure

[0036] like Figure 2 As shown, the improved MPRNet includes an ESRGAN module and an improved MPRNet module. The output of the ESRGAN module is transmitted to the input of the improved MPRNet module. The improved MPRNet module includes a three-level feature extraction module that performs feature extraction and feature transmission from bottom to top, an original resolution module ORSNet, and a convolution module. The first level, the first half of the second level, and the third level are all shallow feature extraction modules. The shallow feature extraction modules are composed of convolutional layers, a channel attention module CAB, and a dilated spatial pyramid pooling (ASPP) module.

[0037] The latter half of the first and second levels is the deep information extraction module of the encoder-decoder subnetwork. The outputs of the two encoder-decoder subnetworks and the first and second level images output by the ESRGAN module are transmitted to the corresponding supervised attention module (SAM). The output of the first-level SAM is concatenated with the output of the second-level shallow feature extraction module and used as the input of the second-level encoder-decoder subnetwork. The image context information obtained by the first-level encoder-decoder subnetwork is input into the second-level encoder. The output of the second-level SAM is concatenated with the output of the third-level shallow feature extraction module and sent to the original resolution module. The image context information obtained by the second-level encoder-decoder subnetwork is input into the original resolution module (ORSNet). The feature map output by the original resolution module after passing through the convolution module is added to the third level image output by the ESRGAN module to output the deblurred image.

[0038] Figure 2 In the ESRGAN module, the three levels of images output are as follows: the first level image is cut into four small blocks, the second level image is cut into two small blocks, and the third level image is a complete image patch.

[0039] Figure 2 In the middle, symbol and These represent feature addition and feature concatenation, respectively. The Add operation is used to add two feature elements together; the Concatenate operation is used to connect two or more feature vectors together along a certain dimension.

[0040] The improved MPRNet working process is as follows:

[0041] The ESRGAN module uses RRDB to extract feature information and reconstructs the image using upsampling to enhance edge details in the crack image. It calculates the relative distance between the real and generated images as the adversarial loss and also computes the feature map perceptual loss.

[0042] The improved MPRNet employs a three-stage feature extraction module. In the first stage, the input image is divided into four blocks (one-quarter size of the original image is divided into four smaller blocks). In the second stage, the input image is divided into two blocks (half size of the original image is divided into two smaller blocks). The third stage does not perform any block division. Each feature extraction module is connected to a 3×3 convolutional layer to expand the number of feature channels, and simultaneously connected to a Channel Attention (CAB) module to extract feature information. After the CAB, a Dilated Spatial Pyramid Pooling (ASPP) module is connected to improve feature information utilization.

[0043] An encoder-decoder subnetwork is connected after the ASPP modules in the first and second stages to extract deeper semantic information. The encoder contains six CAB modules, with several convolutional layers and downsampling modules in between. The decoder contains eight CAB modules and employs a multi-scale fusion method to construct image features. Simultaneously, the feature information before and after the first-stage decoding is input into the second-stage encoder for multi-scale feature fusion to fully utilize image information. A supervised attention module (SAM) is connected at the end of the first and second stages to improve model performance and connect the output image features with the feature information of the next stage.

[0044] In the third stage, the original resolution module ORSNet is connected, and the feature information extracted by the second-stage decoder and the concatenated feature information from the second and third stages are input to generate a high-resolution feature map. Finally, a 3×3 convolutional layer is connected, and the output feature map is added to the third-stage image output by the ESRGAN module to output the deblurred image.

[0045] Existing ASPP (Application Performance Profiler) modules typically employ 3 to 4 dilated convolutional layers, with dilation rates usually set to 6, 12, and 18. While larger dilation rates can capture long-range image information, which is beneficial for recognizing large targets, they may be less effective for feature extraction from small targets due to their limited receptive field. Furthermore, larger dilation rates can cause discontinuities in image information acquisition, potentially reducing the neural network's ability to reconstruct fine cracks. This invention's ASPP module employs 6 dilated convolutional layers with dilation rates of 1, 2, 3, 1, 2, and 3, further enhancing the improved MPRNet's ability to reconstruct crack details and resulting in a more complete crack outline.

[0046] like Figure 3 As shown, the ASPP module inputs image features F1 and connects a 1×1 convolutional layer in parallel to the right. To improve image information utilization, six 3×3 dilated convolutional layers are added in parallel, with dilation rates of 1, 2, 3, 1, 2, and 3, respectively. Each convolutional layer is connected to a ReLU activation layer and outputs seven features F. 11 ~F 17Following the image feature F1 to the left, a 1×1 pooling layer, a 1×1 convolutional layer, a ReLU activation layer, and an upsampling layer are sequentially connected to restore the image size, outputting feature F. 31 F 31 The expression is as follows:

[0047] F 31 =Upsample(ReLU(Conv) 1×1 (Pool 1×1 (F1)))) (1)

[0048] In equation (1), Pool 1×1 Represents a 1×1 pooling layer, Conv 1×1 represents a 1×1 convolutional layer, ReLU represents an activation layer, and Upsample represents an upsampling layer.

[0049] To fuse feature information at various scales, the feature maps output from each convolutional layer and upsampling layer are concatenated to obtain F. c Its expression is as follows:

[0050] F c =Concat(F 11 ,F 12 ,F 13 ,F 14 ,F 15 ,F 16 ,F 17 ,F 31 (2)

[0051] In equation (2), Concat represents dimension concatenation.

[0052] Then, a 1×1 convolutional layer, a ReLU activation layer, and a Dropout layer with a random dropout rate of 0.5 are connected to reduce the number of channels and output the final result F. out The calculation process is as follows:

[0053] F out =Dropout 0.5 (ReLU(Conv 1×1 (F c (3)

[0054] In equation (3), Dropout 0.5 This means that a portion of the channels will be randomly discarded with a probability of 0.5.

[0055] Step 3: Obtain neural network weights through transfer learning and neural network training.

[0056] The original ESRGAN module was trained on the DIV2K training set containing 800 images and the Flickr2K dataset (both are widely used open-source datasets), generating network weights. Based on the idea of ​​transfer learning, these pre-trained weights were transferred to the ESRGAN module in step 2, thus obtaining the weights of the ESRGAN module.

[0057] The original MPRNet module was trained and tested using the GoPro dataset (a widely adopted dataset), generating training weights through model training. 2103 images were used for training, and 1111 images were used for testing and model performance evaluation. These pre-trained weights were then transferred to the improved MPRNet module in step 2.

[0058] The improved MPRNet is trained and tested using the crack image training and testing sets obtained in step 1, respectively, to obtain the improved MPRNet weights.

[0059] Step 4: Input the blurred crack image and perform image restoration and output the results.

[0060] Input a crack image test set, use ESRGAN to increase the image resolution from 448×448 to 700×700 to improve the deblurring effect; then input the image into the improved MPRNet module to reconstruct and restore the image, and output a high-quality restored image.

[0061] Example

[0062] Taking the crack image dataset collected in this invention as an example, it contains 400 images, with a 1:1 ratio of normal crack images to blurred crack images. This dataset is divided into training and testing sets in an 8:2 ratio. The training set is used for neural network model building and parameter learning, while the testing set is used to evaluate image restoration performance. The improved MPRNet is trained using the training set. This invention uses the Adam algorithm to optimize the parameters of the improved MPRNet, setting the initial learning rate to η = 0.0001. Simultaneously, cosine annealing is used to slowly decrease the learning rate to 0.00001 during training, thus making the reduction in model loss more reasonable. The batch size is 1, and the training lasts for 500 epochs.

[0063] Four randomly selected blurred images containing fine cracks were used for experimental analysis and comparison. First, ESRGAN was used to upscale the motion-blurred images, increasing the number of pixels and enhancing the details of the crack outlines. Then, MPRNet and an improved MPRNet module were used to deblur the images respectively. The restoration results of the two models are shown below. Figure 4As shown in (b) and (c).

[0064] Figure 4 (a) and Figure 4 (b) The comparison shows that after using MPRNet, the crack regions in the first, second, and third deblurred images are restored to a certain extent, and their crack outlines are relatively clear and complete. However, the restoration of small cracks is not high, and the outlines of small cracks are not fully restored. In the fourth image, the crack region shows obvious restoration, and the crack outline is basically restored completely, but the small crack in the upper right corner is still not fully restored. Although the multi-stage feature fusion method used by MPRNet can extract more image feature information, its extraction of information on small cracks is still relatively limited, reducing its ability to restore blurred images with small cracks.

[0065] Figure 4 (b) and Figure 4 (c) The comparison shows that after using the improved MPRNet module to deblur the cracked images, the crack areas in the first, second, and third deblurred images were further restored, and the crack outlines were basically restored completely. For small cracks, the blurred crack parts that MPRNet failed to restore were also restored more clearly. In the fourth image, the fracture of the small crack in the upper right corner was also improved, and its shape, length, and width were not much different from those of the normal image.

[0066] Therefore, this invention introduces the ASPP module to construct an improved MPRNet module, which focuses more on crack details during the shallow feature extraction stage. It leverages the advantages of multi-scale feature information fusion to improve image information utilization and enhance the ability to recover blurred images with fine cracks.

Claims

1. An improved crack image blur removal method using MPRNet, characterized in that, Includes the following steps: Step 1: Collect and construct a crack image dataset In a crack image dataset containing normal crack images and blurred crack images, a certain number of images are randomly selected as the training set and the test set. Step 2: Construct the improved MPRNet structure The improved MPRNet includes an ESRGAN module and an improved MPRNet module, with the output of the ESRGAN module being transmitted to the input of the improved MPRNet module; Step 3: Obtain neural network weights through transfer learning and neural network training. The original ESRGAN module was trained on the DIV2K training set and the Flickr2K dataset, and network weights were generated. The pre-trained weights were then transferred to the ESRGAN module in step 2 to obtain the weights of the ESRGAN module. The original MPRNet module was trained and tested using the GoPro dataset. The training model generated training weights, and the pre-trained weights were transferred to the improved MPRNet module in step 2. The improved MPRNet was trained and tested using the crack image training set and test set obtained in step 1, respectively, to obtain the improved MPRNet weights. Step 4: Input the blurred image of the crack and perform image restoration and output the results.

2. The crack image blur removal method according to claim 1, characterized in that: in In step 2, the improved MPRNet module includes a three-level feature extraction module that performs feature extraction and feature transmission from bottom to top, an original resolution module ORSNet, and a convolution module; the first level, the first half of the second level, and the third level are all shallow feature extraction modules, which are composed of convolutional layers, channel attention modules CAB, and dilated spatial pyramid pooling ASPP modules. The latter half of the first and second levels is the deep information extraction module of the encoder-decoder subnetwork. The outputs of the two encoder-decoder subnetworks and the first and second level images output by the ESRGAN module are transmitted to the corresponding supervised attention module (SAM). The output of the first-level SAM is concatenated with the output of the second-level shallow feature extraction module and used as the input of the second-level encoder-decoder subnetwork. The image context information obtained by the first-level encoder-decoder subnetwork is input into the second-level encoder. The output of the second-level SAM is concatenated with the output of the third-level shallow feature extraction module and sent to the original resolution module. The image context information obtained by the second-level encoder-decoder subnetwork is input into the original resolution module (ORSNet). The feature map output by the original resolution module after passing through the convolution module is added to the third level image output by the ESRGAN module to output the deblurred image. The three levels of images output by the ESRGAN module are as follows: the first level image is cut into four small blocks, the second level image is cut into two small blocks, and the third level image is a complete image patch.

3. The crack image blur removal method according to claim 2, characterized in that: The ASPP module inputs image feature F1, which is then connected in parallel to a 1×1 convolutional layer on the right, followed by six 3×3 dilated convolutional layers with dilation rates of 1, 2, 3, 1, 2, and 3. Each convolutional layer is connected to a ReLU activation layer, and outputs seven features F1. 11 ~F 17 ; After the input image feature F1 is connected to the left, a 1×1 pooling layer, a 1×1 convolutional layer, a ReLU activation layer, and an upsampling layer are sequentially connected to restore the image size, and the output feature F1 is output. 31 F 31 The expression is as follows: F 31 =Upsample(ReLU(Conv 1×1 (Pool 1×1 (F1)))) (1) In equation (1), Pool 1×1 Represents a 1×1 pooling layer, Conv 1×1 represents a 1×1 convolutional layer, ReLU represents an activation layer, and Upsample represents an upsampling layer; F is obtained by concatenating the feature maps output from each convolutional layer and upsampling layer. c F c The expression is as follows: F c =Concat(F 11 ,F 12 ,F 13 ,F 14 ,F 15 ,F 16 ,F 17 ,F 31 ) (2) In equation (2), Concat represents dimension concatenation; Then, a 1×1 convolutional layer, a ReLU activation layer, and a Dropout layer with a random dropout rate of 0.5 are connected to reduce the number of channels and output the final result F. out F ou The calculation process is as follows: F out = Dropout 0.5 (ReLU(Conv 1×1 (F c ))) (3) In equation (3), Dropout 0.5 This means that a portion of the channels will be randomly discarded with a probability of 0.

5.

4. An improved MPRNet module characterized by: It includes a three-level feature extraction module that performs feature extraction and feature transmission from bottom to top, an original resolution module ORSNet, and a convolution module; the first level, the first half of the second level, and the third level are all shallow feature extraction modules, which are composed of convolutional layers, channel attention modules CAB, and dilated spatial pyramid pooling ASPP modules; The latter half of the first and second levels is the deep information extraction module of the encoder-decoder subnetwork. The outputs of the two encoder-decoder subnetworks and the first and second level images are transmitted to the corresponding supervised attention module (SAM). The output of the first level SAM is concatenated with the output of the second level shallow feature extraction module and used as the input of the second level encoder-decoder subnetwork. The image context information obtained by the first level encoder-decoder subnetwork is input into the second level encoder. The output of the second level SAM is concatenated with the output of the third level shallow feature extraction module and sent to the original resolution module. The image context information obtained by the second level encoder-decoder subnetwork is input into the original resolution module (ORSNet). The feature map output by the original resolution module after passing through the convolution module is added to the third level image to output the deblurred image. The first-level image is cut into four small blocks, the second-level image is cut into two small blocks, and the third-level image is a complete block.

5. The improved MPRNet module of claim 4, wherein: It also includes an ESRGAN module, whose output is transmitted to the input of the improved MPRNet module.

6. The improved MPRNet module according to claim 4 or 5, characterized in that: The ASPP module contains 6 dilated convolutional layers, with dilation rates of 1, 2, 3, 1, 2, and 3 for each layer.