A method for detecting road surface damage
By adding 5 Feature Layers and a fusion layer to the DeepCrack model and adopting an improved loss function, the problem of insufficient accuracy in road disaster detection under complex road surface conditions is solved, achieving higher detection accuracy and noise resistance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NORTHWESTERN POLYTECHNICAL UNIV
- Filing Date
- 2023-03-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep learning methods lack accuracy in detecting road hazards under complex road conditions, especially in noisy environments where they have poor noise resistance and are difficult to effectively detect various complex road hazard types.
Five Feature Layers were added to the DeepCrack model, and a corresponding fusion layer was added after the output of each layer. Multi-scale fusion feature maps and loss function based on sigmoid cross-entropy error were used to improve recognition accuracy.
It improves the accuracy and anti-interference ability of road disaster detection, and can effectively cope with complex road disaster detection situations.
Smart Images

Figure CN116258708B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to road disaster detection technology, specifically to a road surface damage detection method based on an improved Deep Crack model. Background Technology
[0002] Currently, road disaster detection not only helps maintenance personnel perform fast, accurate, time-saving, and labor-saving inspections, but also quickly prevents the further spread of road disasters, ensuring the safety of road personnel and avoiding economic losses. Therefore, fully automated road disaster detection is particularly important today. However, conducting fully automated road disaster detection in noisy environments remains a challenge. With the rapid development of image processing technology, deep learning-based research methods are undoubtedly the most efficient detection means.
[0003] In computer vision, road disaster detection can be represented as the problem of edge detection in images of road disaster locations, which is a fundamental problem in computer vision and image processing. Currently, edge detection for road disasters can be divided into edge detection and crack detection.
[0004] Disadvantages of existing technology:
[0005] When using deep learning methods for crack detection, as the number of convolutional layers increases, the convolutional features extracted by the convolutional layers become increasingly coarse, resulting in insufficient detection performance for smaller cracks and inadequate noise resistance.
[0006] DeepCrack's edge detection performance deteriorates for various road disaster types. In complex situations, such as potholes, its detail representation accuracy is insufficient. This method is only suitable for road crack detection in simple scenarios. Its detection performance deteriorates significantly in complex road conditions and is inadequate for detecting various mixed road surfaces.
[0007] This invention adds Feature Layers to the DeepCrack network model. As the number of road disaster types detected increases, so does the detection difficulty, necessitating the extraction of more sparse convolutional features to ensure accuracy. Furthermore, the output of each Feature Layer is fed into a corresponding FeatureLayers fusion layer, eliminating the need for a concat fusion operation in the Feature Layers fusion layer. Addressing the need for loss function calculation in each fusion layer of the DeepCrack network model, and considering the imbalanced positive and negative samples in edge detection problems, a novel method for calculating the loss function is proposed. This method effectively calculates the loss function for multi-layer and multi-scale fusion layers. Summary of the Invention
[0008] The main objective of this invention is to provide a road surface damage detection method based on an improved Deep Crack model.
[0009] The technical solution adopted in this invention is: a road surface damage detection method, comprising:
[0010] Five Feature Layers were added to the DeepCrack model's encoding structure. These Feature Layers are symmetrical to the encoding and decoding network structure, and further extract high-level, high-resolution features from the model.
[0011] A corresponding fusion layer is added to the Feature Layers. After each Feature Layer outputs, the output result is saved and added to the corresponding fusion layer. A total of 5 fusion layers are added. At the same time, it is fused with the 5 fusion layers in the encoding and decoding network structure to obtain a multi-scale fusion feature map. In the DepthCrack network model, the loss function is calculated for each fusion layer to improve the recognition accuracy.
[0012] Based on the DepthCrack model, a loss function suitable for multi-layer fusion layer calculation is adopted; a sigmoid-based cross-entropy error loss function suitable for multi-scale and multi-fusion layers is adopted.
[0013] Furthermore, the road surface damage detection method further includes: designing 5 Feature Layers on the DeepCrack model's encoding structure, with the Feature Layers located in the middle between the encoding and decoding networks; adding padding layers to 3 of the Feature Layers' convolutional structures, while not adding padding to the 2 convolutional structures; and transmitting the feature maps obtained by the Feature Layers from the encoding network structure to the decoding structure of the DeepCrack model.
[0014] Furthermore, the road surface damage detection method further includes: the output result of the Feature Layers in the fusion layer of Feature Layers is passed into the corresponding Feature Layers fusion layer.
[0015] Furthermore, the road surface damage detection method further includes: adding 5 fusion layers to the DepthCrack model, with each fusion layer performing a loss function calculation; DepthCrack performs a fusion operation on the 5 Feature Layers to generate 5 fused feature maps, and performs a loss function calculation on these 5 fused feature maps, and finally fuses them with the 5 fused feature maps in the encoding / decoding network structure to obtain a multi-scale fused feature map, and finally generates the final predicted image.
[0016] Advantages of this invention:
[0017] The DepthCrack model of this invention has higher accuracy and stronger anti-interference ability compared to the DeepCrack model, and can cope with complex road disaster detection situations.
[0018] In addition to the objectives, features, and advantages described above, the present invention has other objectives, features, and advantages. The invention will now be described in further detail with reference to the figures. Attached Figure Description
[0019] The accompanying drawings, which form part of this application, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0020] Figure 1 This is a schematic diagram of the DepthCrack Feature Layers model;
[0021] Figure 2 This is a schematic diagram of the fusion layer model in the DepthCrack network encoding / decoding network structure;
[0022] Figure 3 This is a schematic diagram of the fusion layer in the Feature Layers of the DepthCrack network model. Detailed Implementation
[0023] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0024] This invention proposes a road surface damage detection method based on the DeepCrack model, which mainly includes:
[0025] (1) In order to solve the problem of the serious decrease in accuracy of DeepCrack model under complex road conditions, 5 Feature Layers were added to the DeepCrack model encoding structure. The Feature Layers are symmetrical with the encoding and decoding network structure, and further extract high-level and high-resolution features of the model.
[0026] (2) In order to further improve the recognition accuracy, a corresponding fusion layer is added to the Feature Layers. After each Feature Layer outputs, the output result is saved and added to the corresponding fusion layer. A total of 5 fusion layers are added. At the same time, they are fused with the 5 fusion layers in the encoding and decoding network structure to obtain a multi-scale fusion feature map. In the DepthCrack network model, the loss function is calculated for each fusion layer to improve the recognition accuracy.
[0027] (3) Based on the DepthCrack model, a loss function suitable for multi-layer fusion layer calculation is proposed. In edge detection problems, negative sample points such as image background are often hundreds of times more numerous than positive sample points such as edge points. Conventional loss functions are completely ineffective and have no effect on model training. Therefore, based on the loss function mentioned in the RCF paper, the loss function in RCF calculates the loss function by weighting positive sample points, but the effect is often not ideal in actual use. Thus, a loss function based on sigmoid cross-entropy error is proposed that is suitable for multi-scale and multi-fusion layer calculations.
[0028] The road surface damage detection method based on the DeepCrack model specifically includes:
[0029] Five Feature Layers were designed on the DeepCrack model's encoding structure. These Feature Layers are located directly between the encoding and decoding networks, and their function is to extract high-level, high-resolution features. Padding layers were added to three of the Feature Layers' convolutional structures, while no padding was added to the other two. The feature maps obtained by the Feature Layers from the encoding network structure are then transmitted to the DepthCrack model's decoding structure.
[0030] The design of the fusion layer in Feature Layers differs from that in the encoding / decoding network structure of the DeepCrack model in that it reduces the concat operation and passes the output of Feature Layers into the corresponding Feature Layers fusion layer.
[0031] Five fusion layers were added to the DepthCrack model, with each fusion layer performing a loss function calculation to improve recognition accuracy. DepthCrack fused the five feature layers to generate five fused feature maps, and then performed a loss function calculation on these five fused feature maps. Finally, these fused feature maps were fused with the five fused feature maps in the encoding / decoding network structure to obtain a multi-scale fused feature map, which ultimately generated the final predicted image.
[0032] The feature maps of the encoding and decoding network structures are paired and fused at each convolutional stage to generate fused feature maps at different scales. Furthermore, five fused maps are obtained after passing through five convolutional layers in the Feature Layer. At each scale, the pixel-level loss function is calculated independently. Simultaneously, the fused feature maps from each scale are concatenated and fused to obtain a multi-scale fused image, which is the predicted image for road disaster edge detection.
[0033] As can be seen, the DepthCrack network model is an end-to-end encoding and decoding network model built upon the DeepCrack network model. The DepthCrack network model adds a total of 5 Feature Layers to the DeepCrack network model. This is because it was found that when detecting road edges for various types of road hazards, increasing the number of fusion layers and feature extraction layers improves the performance of sparse convolutional feature extraction. Therefore, inspired by the SSD network model, this approach adds new convolutional feature layers to further extract fine features. Thus, 5 Feature Layers are added, and the feature maps extracted from these 5 Feature Layers are further fused to obtain 5 fused feature maps. This results in a total of 10 fused feature maps, with the loss function calculated independently for each fused feature map. These 10 fused feature maps are then concatenated and fused to obtain a single multi-scale fused feature map, which is the final predicted image for road hazard edge detection.
[0034] Depend on Figure 1 As shown, the Feature Layers serve as input to both the encoding and decoding networks. They are positioned directly between these networks. Further improvements were made to the Feature Layers in the SSD network model, reducing the original 6 layers to 5 and modifying the convolutional structure to achieve symmetry. This allows the Feature Layers to extract more edge features. The encoding network structure reduced the original input image size by a factor of 32. However, the Feature Layers in the SSD network model output a single-channel image with a classification problem, making it unsuitable for the DepthCrack network model and requiring further improvement. Therefore, the DepthCrack network model incorporates padding to fill the shrunk feature maps after convolution, followed by deconvolution to restore the feature maps to the same size as the input feature map of the DepthCrack encoding network. Thus, padding layers are added to 3 of the Feature Layers convolutional structures, while padding is not added to the remaining 2 convolutional structures.
[0035] In the first Feature Layer, a feature map with 512 channels is obtained from the encoding network structure. At this point, the feature map has undergone 5 pooling operations, reducing the image size by a factor of 32. In the first Feature Layer, the sparse convolutional features are first processed by a 1*1 convolution kernel to interact and integrate information, resulting in a feature map with 256 channels. Then, a 3*3 convolution kernel is used for convolution operation with a stride of 1 and padding using the SAME method, resulting in a feature map with 512 channels. In the second Feature Layer, a 1x1 convolution kernel is first used for convolution with a stride of 2 and a SAME padding method. This further reduces the size of the feature map to 64 times the original input image size, thus extracting more detailed sparse convolution features. Then, a padding operation is used to fill the area around the feature points with zeros, which increases the sparse convolution features of the image. If the padding is 0, it is the same as the deconvolution operation. The padded data is then convolved again with a 3x3 convolution kernel with a stride of 1 and a SAME padding method. Therefore, a feature map with a size reduced by 32 times from the original input image size and 512 channels is obtained. In the third Feature Layer, a 1x1 convolution kernel with a stride of 2 is used, further reducing the size of the output convolutional feature map by half. SAME padding is used, resulting in a convolutional feature map with 128 channels. This is followed by padding, or deconvolution to fill the image around feature points with zeros. Next, a 3x3 convolution kernel with a stride of 1 is used, resulting in a convolutional feature map with 256 channels. In the fourth Feature Layer, a 1x1 convolution kernel with a stride of 2 and SAME padding is used, resulting in a convolutional feature map with 256 channels. The feature map size is reduced by half. Padding is then applied to expand the image by filling it with zeros. Finally, a 3x3 convolution kernel with SAME padding and a stride of 1 is used, resulting in a convolutional feature map with 512 channels. In the 5th Feature Layer, similar to the 1st Feature Layer, a 1x1 convolutional kernel is used for convolution to integrate sparse feature information. The kernel has a stride of 1 and uses SAME padding. This is followed by a 3x3 convolutional kernel with SAME padding and a stride of 1, outputting a sparse convolutional feature map with 512 channels. The output of the Feature Layers is then fed into the decoding structure of the DepthCrack network model.
[0036] In the five Feature Layers, the results after each Feature Layer convolution are recorded and saved, and then passed into the corresponding fusion layer.
[0037] like Figure 2 The diagram shows the overall structure of the fusion layer in the DepthCrack network model. The DepthCrack network model has five fusion feature map layers, corresponding to the five layers of the encoding and decoding networks. Each scale corresponds to the output of the encoding network after convolution and before the pooling layer, and the result of the decoding network after convolution, which are then fused together. This is because the corresponding scales of the encoding and decoding networks are the same, so the output feature maps after convolution have the same size and number of channels. Next, in the DepthCrack fusion layer, a concat fusion operation is performed, expanding the third dimension (adding channels). Then, a convolutional layer is used with a 1x1 kernel, a stride of 1, and SAME padding. The resulting feature map has the sum of the number of channels in the output feature maps of the encoding and decoding networks. After another deconvolutional layer, the input feature map is deconvolved. This deconvolution operation pads the original image with zeros and expands the feature map to the same size as the output image, with the number of channels equal to the number of classifications. Since road disaster edge detection requires images of the final road disaster edges, it's a binary classification problem, only distinguishing between edges and non-edges. Because the output image consists only of 0s and 1s, the output feature map has only 1 channel. In this output feature map image, the image matrix contains only 0s and 1s. However, we might need feature images of a specific area or edge detection results. Therefore, we can use a crop layer to crop the single-channel feature map output after deconvolution, cropping the image of the desired area. Finally, the sigmoid activation function divides the feature map image matrix. For pixels that meet the conditions, the matrix value is set to 1; for pixels that do not meet the conditions, the matrix value is set to 0. Furthermore, for each fusion layer in the encoding and decoding network structure of the DepthCrack network model, a separate loss function calculation is performed. This can significantly improve the prediction accuracy and make the entire large network model converge faster.
[0038] The design of the fusion layer in the encoder-decoder network structure of the DepthCrack network model is consistent with that of DeepCrack, but it is different from the design of the fusion layer in Feature Layers. We will explain the design of the fusion layer in Feature Layers in detail below.
[0039] like Figure 3 The diagram shows the design of the fusion layer corresponding to the Feature Layer in the DepthCrack network model. It can be seen that the fusion layer differs from the encoder-decoder network structure in the DepthCrack network model in that it reduces the concat operation. This is because the Feature Layers are single-layer structures, not corresponding encoder-decoder network structures, thus the fusion layer corresponding to the Feature Layer in the DepthCrack network model does not have a concat operation. In the DepthCrack network model, the number of output channels in each Feature Layer may differ, but the feature map size and feature matrix size are the same. Therefore, in the first convolutional layer of the fusion layer, a 1*1 convolution kernel is used, but the number of output channels remains the same as the original input channels, with a stride of 1 and SAME padding. This allows the sparse convolutional features extracted from the Feature Layer to be further highlighted. Next, a deconvolution operation is performed, using zero padding. Because the input feature map size of each layer in the fusion map corresponding to the Feature Layers is reduced by 32, this deconvolution process needs to expand the result of the previous convolution by a factor of 32 and output a new feature map. After deconvolution, a feature map is obtained, which can be cropped, and the output can be modified as needed. Finally, the sigmoid function is used to set the values of pixels that meet the criteria to 1 and those that do not to 0. Furthermore, for the fusion layers corresponding to the Feature Layers, the final output fused feature map of each layer requires loss function calculation, which improves prediction accuracy and model convergence training speed.
[0040] To further enhance the extraction of more detailed features, this invention adds 5 Feature Layers to the DeepCrack network model. Furthermore, to generate more fused feature maps, a further fusion layer operation is performed on the 5 Feature Layers to generate 5 more fused feature maps. A loss function is then calculated on these 5 fused feature maps, which are finally fused with the 5 fused feature maps in the encoding / decoding network structure to generate the final predicted image.
[0041] Other deep learning networks can replace the method of this invention, but DepthCrack combines the Feature Layer in the SSD network model and improves the Feature Layers to make it applicable to the encoding and decoding network structure. It is designed as a 5-layer symmetrical structure, so that more sparse convolution features can be extracted in the high-level convolution process, improving the recognition accuracy and getting closer to the prediction result. Therefore, the actual effect of this method is better.
[0042] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method of detecting a road surface damage, characterized by, include: Five Feature Layers were added to the DeepCrack model's encoding structure. These Feature Layers are symmetrical to the encoding and decoding network structure, and further extract high-level, high-resolution features from the model. A corresponding fusion layer is added to the Feature Layers. After each Feature Layer outputs, the output result is saved and added to the corresponding fusion layer. A total of 5 fusion layers are added. At the same time, it is fused with the 5 fusion layers in the encoding and decoding network structure to obtain a multi-scale fusion feature map. In the DepthCrack network model, the loss function is calculated for each fusion layer to improve the recognition accuracy. Based on the DepthCrack model, a loss function suitable for multi-layer fusion layer calculation is adopted; a sigmoid-based cross-entropy error loss function suitable for multi-scale and multi-fusion layers is adopted. Five Feature Layers are designed on the DeepCrack model's encoding structure, positioned directly between the encoding and decoding networks. Padding layers are added to three of the Feature Layers' convolutional structures, while no padding is added to the other two convolutional structures. The Feature Layers obtain feature maps from the encoding network structure, and their output is transmitted to the DepthCrack model's decoding structure. The output of the Feature Layers is passed into the corresponding Feature Layers fusion layer in the Feature Layers fusion layer. Five fusion layers are added to the DepthCrack model, and a loss function is calculated for each fusion layer. DepthCrack performs a fusion operation on the five feature layers to generate five fused feature maps, and calculates a loss function for these five fused feature maps. Finally, these feature maps are fused with the five fused feature maps in the encoding and decoding network structure to obtain a multi-scale fused feature map, which is then used to generate the final predicted image. The fusion layer corresponding to the Feature Layer differs from the fusion layer in the encoder-decoder network structure of the DepthCrack network model in that it reduces the concat operation. In the Feature Layer, there are only single-layer structures, not corresponding encoder-decoder network structures. Therefore, the fusion layer corresponding to the Feature Layer in the DepthCrack network model does not have a concat operation.