Real-time detection methods and devices for road surface defects

By constructing the PD-Picodet model, the performance of road surface defect detection is improved by utilizing the backbone network and attention mechanism. This solves the problems of missed detection and false detection in the Picodet algorithm under insufficient lighting conditions, and enables real-time detection.

CN116883645BActive Publication Date: 2026-06-30SIRUNTIANLANG (BEIJING) TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SIRUNTIANLANG (BEIJING) TECH CO LTD
Filing Date
2023-07-05
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies using the Picodet algorithm to detect road surface defects are prone to missed detections and false detections, and the results are poor under insufficient lighting conditions, making real-time detection impossible.

Method used

A PD-Picodet model is constructed, including a backbone network, a coordinate attention mechanism module, a neck network, and a decoder. By extracting pavement distress features and utilizing coordinate attention and channel attention mechanisms, the detection performance is improved, avoiding the pre-generation of a large number of anchor boxes and saving computational resources.

Benefits of technology

It improves the speed and accuracy of road surface defect detection, especially in low light conditions, reducing missed and false detections and enabling real-time detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116883645B_ABST
    Figure CN116883645B_ABST
Patent Text Reader

Abstract

This invention relates to a method and apparatus for real-time detection of pavement defects. The method includes constructing a PD-Picodet model; acquiring an original image and preprocessing it to obtain a to-be-processed image; inputting the to-be-processed image into a backbone network to extract pavement defect features, obtaining a feature map; extracting a preset feature layer from the feature map and processing the preset feature layer using a coordinate attention mechanism module to obtain N processed feature layers; fusing the N processed feature layers into N+1 fused feature layers using a neck network; and decoding the fused feature layers to obtain pavement defect detection information. This invention addresses the linearity and continuity of pavement defects by adding an attention mechanism to the Picodet model, thereby improving the algorithm's focus on pavement defect targets.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of target detection technology, specifically relating to a method and device for real-time detection of road surface defects. Background Technology

[0002] With economic development, cars have become a necessity for families, and highways have become a vital part of national infrastructure. Ensuring good highway conditions is not only a requirement for national economic development but also a guarantee for safe driving. As one of the most commonly used road surface types, road surfaces will experience various damages and deformations after a period of use, such as cracks (including transverse cracks, longitudinal cracks, and network cracks), potholes, and ruts. These are collectively referred to as road surface defects. Road surface defects not only affect the use of highways and driver safety but, if not repaired in time, can also lead to deeper damage to the road surface structure, resulting in greater safety hazards. Therefore, timely detection and repair of road surface defects are crucial.

[0003] Currently, there are three main methods for real-time detection of road surface defects: manual inspection, semi-automatic methods, and deep learning methods. Manual inspection relies on professionals visually observing and manually inspecting to determine the location of road surface defects. Semi-automatic methods, with the development of computer technology, researchers have begun to utilize machine vision for semi-automatic detection of road surface defects. This involves first extracting features from the road surface image using a manually designed feature extractor, then separating the foreground of the road surface defects from the background by selecting appropriate thresholds, and finally extracting the specific defects to obtain relevant information. Deep learning approaches: Currently, deep learning is mainly applied to object detection, object segmentation, and object tracking tasks with good results. Similarly, deep learning can also be applied to real-time road surface defect detection. Researchers have implemented road surface defect detection using two-stage and one-stage algorithms. Two-stage algorithms generate candidate regions and then use convolutional neural networks to obtain the predicted target type and location, including Fast-RCNN, Faster-RCNN, and Mask-RCNN. One-stage algorithms directly extract information from the original image using convolutional neural networks and output the predicted target type and location, including YOLO series algorithms and SSD. Most researchers use YOLO series algorithms, such as YOLOv4 and YOLOv5, for road surface defect detection.

[0004] However, while manual inspection can accurately locate road surface defects and related information, it relies heavily on manual labor, resulting in high costs and low efficiency. Semi-automatic methods require manual design of feature extractors, appropriate thresholds, and segmentation methods, leading to slow detection speed, low accuracy, and inability to adapt to complex traffic environments. Deep learning offers convenience for road surface defect detection, requiring only a sufficient dataset. Two-stage algorithms offer higher accuracy than the YOLO series, but slower speed, making real-time detection impossible. While the YOLO series reduces detection time, it is an anchor-based algorithm, requiring the generation of numerous anchor boxes adapted to the target, many of which contain negative samples. This significantly wastes computational resources, making anchor-based algorithms unsuitable for real-time detection on resource-constrained mobile devices. Using the Picodet algorithm can improve the detection speed. Picodet is a lightweight algorithm with a simple structure and the advantage of fast processing speed. However, when using the Picodet algorithm to detect road surface defects, it is easy to miss or falsely detect, and the effect is poor in insufficient light conditions. Summary of the Invention

[0005] In view of this, the purpose of the present invention is to overcome the shortcomings of the prior art and provide a method and device for real-time detection of pavement defects, so as to solve the problems that the Picodet algorithm is prone to missed detection and false detection when used to detect pavement defects in the prior art, and the effect is poor under insufficient light conditions.

[0006] To achieve the above objectives, the present invention adopts the following technical solution: a method for real-time detection of pavement defects, comprising:

[0007] Construct a PD-Picodet model; the PD-Picodet model includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence;

[0008] Acquire the original image, and preprocess the original image to obtain the image to be processed;

[0009] The image to be processed is input into the backbone network, and the road surface defect features of the image to be processed are extracted to obtain a feature map;

[0010] Extract a preset feature layer from the feature map, and process the preset feature layer using the coordinate attention mechanism module to obtain N processed feature layers;

[0011] The neck network fuses the processed N feature layers to obtain N+1 fused feature layers.

[0012] The decoder decodes the fused feature layer to obtain road surface defect detection information.

[0013] Furthermore, the backbone network includes: a convolutional module, a first module, a second module, a third module, a fourth module, and a fifth module connected in sequence;

[0014] The first module includes one depthwise separable convolution module, the second module includes two depthwise separable convolution modules, the third module includes two depthwise separable convolution modules, the fourth module includes six depthwise separable convolution modules, and the fifth module includes two depthwise separable convolution modules.

[0015] The depthwise separable convolutional modules in the first and fourth modules are composed of DW convolution and PW convolution, while the depthwise separable convolutional module in the fifth module is composed of DW convolution, attention mechanism SEModule, and PW convolution.

[0016] Furthermore, the preset feature layers are the three highest-ranking feature layers in the feature map. Processing the preset feature layers yields N processed feature layers, including:

[0017] The feature maps corresponding to the three feature layers are subjected to average pooling operations in the horizontal and vertical directions, respectively, to obtain feature maps in the horizontal and vertical directions.

[0018] The obtained feature maps are then subjected to convolution and activation function operations to extract deeper features, resulting in mask maps corresponding to the three feature layers.

[0019] The feature maps corresponding to the three feature layers are multiplied with their corresponding mask maps to assign different weights to the elements on the input feature maps, thus connecting different coordinate points to obtain the processed N feature layers.

[0020] Furthermore, the processed N feature layers include: a first feature layer, a second feature layer, and a third feature layer; the fusion of the processed N feature layers to obtain N+1 fused feature layers includes:

[0021] 2D convolution operations are performed on the first feature layer, the second feature layer and the third feature layer to obtain the fourth feature layer, the fifth feature layer and the sixth feature layer with uniform channels and different sizes;

[0022] The fourth, fifth, and sixth feature layers are fused to obtain four fused feature layers.

[0023] Furthermore, the fourth, fifth, and sixth feature layers are fused respectively to obtain four fused feature layers, including:

[0024] After the sixth feature layer is upsampled, it is fused with the fifth feature layer. The fused first sub-fused feature layer is then passed through the LC module to obtain the second output feature layer. After the first sub-fused feature layer is passed through the LC module again, it is upsampled and then fused with the fourth feature layer to obtain the first output feature layer.

[0025] The third output feature layer is obtained based on the sixth feature layer;

[0026] The first output feature layer is processed by the LC module to obtain the first fused feature layer;

[0027] The first fused feature layer is downsampled after passing through the LC module and then fused with the second output feature layer. The fused second sub-fused feature layer is then passed through the LC module to obtain the second fused feature layer.

[0028] The second sub-fusion feature layer is downsampled after passing through the LC module and then fused with the third output feature layer. The fused third sub-fusion feature layer is then passed through the LC module to obtain the third fusion feature layer.

[0029] The third sub-fusion feature layer is then downsampled by the LC module and fused with the sixth feature layer, which has undergone downsampling, to obtain the fourth fusion feature layer.

[0030] Furthermore, the decoder decodes the fused feature layer to obtain pavement defect detection information, including:

[0031] The first fusion feature layer, the second fusion feature layer, the third fusion feature layer, and the fourth fusion feature layer are decoded respectively to obtain road surface defect detection information;

[0032] Anchor frames are generated based on the road surface defect detection information, and the anchor frames are used to mark road surface defects in the original image.

[0033] Furthermore, the attention mechanism SEModule pools the original C×H×W image into C 1×1 initial feature maps; the C 1×1 initial feature maps are used to reflect the C different features of each initial feature map;

[0034] The C initial 1×1 feature maps are processed through convolutional layers and activation functions to obtain deeper feature information;

[0035] The elements of the C initial 1×1 feature maps that provide deeper feature information are multiplied with the elements of the C H×W original images, so that the feature maps of different channels are assigned different weights, thereby increasing the correlation between the feature maps of different channels.

[0036] Furthermore, the preprocessing of the original image to obtain the image to be processed includes:

[0037] The original image is scaled to a preset size, and the image of the preset size is determined as the image to be processed.

[0038] Furthermore, the pavement defects include:

[0039] The center point, size, and type of pavement distress.

[0040] This application provides a real-time pavement distress detection device, including:

[0041] A building module is used to construct the PD-Picodet model; the PD-Picodet model includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence.

[0042] The acquisition module is used to acquire the original image and preprocess the original image to obtain the image to be processed;

[0043] The extraction module is used to input the image to be processed into the backbone network, extract the road surface defect features of the image to be processed, and obtain a feature map;

[0044] The processing module is used to extract a preset feature layer from the feature map, and process the preset feature layer using the coordinate attention mechanism module to obtain N processed feature layers.

[0045] The fusion module is used to fuse the processed N feature layers in the neck network to obtain N+1 fused feature layers.

[0046] The decoding module is used by the decoder to decode the fused feature layer to obtain road surface defect detection information.

[0047] The beneficial effects that can be achieved by adopting the above technical solution in this invention include:

[0048] This invention provides a method and apparatus for real-time detection of pavement defects. The proposed method employs a constructed PD-Picodet model, which mainly consists of a backbone network, a coordinate attention mechanism module, a neck network, and a decoder. Using the PD-Picodet model provided in this application, it is not necessary to pre-generate a large number of anchor boxes, greatly saving computational resources and improving the detection speed of pavement defects. Furthermore, this application, considering the linear and continuous characteristics of pavement defects, adds a coordinate attention mechanism (CA) and a channel attention mechanism (SE) to the Picodet model, further enhancing the performance of the PD-Picodet model in detecting pavement defects. Attached Figure Description

[0049] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0050] Figure 1 This is a schematic diagram illustrating the steps of the real-time detection method for pavement defects of the present invention;

[0051] Figure 2 This is a schematic diagram of the structure of the PD-Picodet model provided by the present invention;

[0052] Figure 3 This is a schematic diagram of the structure of the LC-Net provided by the present invention;

[0053] Figure 4 This is a schematic diagram of the structure of the SEModule provided by the present invention;

[0054] Figure 5 This is a schematic diagram of the structure of the CAModule provided by the present invention;

[0055] Figure 6 This is a schematic diagram of the structure of the LC-PAN provided by the present invention;

[0056] Figure 7 This is a schematic diagram of the structure of the real-time road surface defect detection device of the present invention. Detailed Implementation

[0057] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be described in detail below. Obviously, the described embodiments are merely some embodiments of this invention, and not all embodiments. Based on the embodiments of this invention, all other implementation methods obtained by those skilled in the art without creative effort are within the scope of protection of this invention.

[0058] The following describes a specific method and apparatus for real-time detection of road surface defects provided in an embodiment of this application, with reference to the accompanying drawings.

[0059] like Figure 1 As shown in the embodiments of this application, the real-time pavement distress detection method includes:

[0060] S101, Construct the PD-Picodet model; the PD-Picodet model includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence;

[0061] Specifically, the PD-Picodet model in this application includes a backbone network, a coordinate attention mechanism module (CA), a neck network, and a decoder (Head).

[0062] S102, acquire the original image, preprocess the original image to obtain the image to be processed;

[0063] In some embodiments, the preprocessing of the original image to obtain the image to be processed includes:

[0064] The original image is scaled to a preset size, and the image of the preset size is determined as the image to be processed.

[0065] In this application, images of variable size are resized to a fixed size for subsequent processing, and then the resized images are input into the PD-Picodet backbone network.

[0066] S103, The image to be processed is input into the backbone network, and the road surface defect features of the image to be processed are extracted to obtain a feature map;

[0067] In this application, the backbone network of PD-Picodet uses LC-Net.

[0068] S104, extract the preset feature layer from the feature map, and process the preset feature layer using the coordinate attention mechanism module to obtain N processed feature layers;

[0069] S105, the neck network fuses the processed N feature layers to obtain N+1 fused feature layers;

[0070] S106, the decoder decodes the fused feature layer to obtain road surface defect detection information.

[0071] The road surface defects include:

[0072] The center point, size, and type of pavement distress.

[0073] The working principle of the real-time pavement distress detection method is as follows: See Figure 2 First, a PD-Picodet model is constructed, which includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence. After the model is constructed, the original image is acquired and preprocessed to obtain the image to be processed. The image to be processed is input into the backbone network to extract the pavement defect features of the image to be processed, resulting in a feature map. A preset feature layer is extracted from the feature map, and the preset feature layer is processed using the coordinate attention mechanism module to obtain N processed feature layers. The neck network fuses the N processed feature layers to obtain N+1 fused feature layers. Finally, the decoder decodes the fused feature layers to obtain pavement defect detection information.

[0074] In some embodiments, the backbone network includes: a convolutional module, a first module, a second module, a third module, a fourth module, and a fifth module connected in sequence;

[0075] The first module includes one depthwise separable convolution module, the second module includes two depthwise separable convolution modules, the third module includes two depthwise separable convolution modules, the fourth module includes six depthwise separable convolution modules, and the fifth module includes two depthwise separable convolution modules.

[0076] The depthwise separable convolutional modules in the first and fourth modules are composed of DW convolution and PW convolution, while the depthwise separable convolutional module in the fifth module is composed of DW convolution, attention mechanism SEModule, and PW convolution.

[0077] like Figure 3As shown, the input image to be processed is first passed through a 3×3 2D convolutional module, and then through the first module, which consists of a DepthwiseSeparable module. It should be noted that DepthwiseSeparable is a module unique to LC-Net, consisting of three parts: dw_conv, SEModule, and pw_conv. dw_conv consists of 2D convolution, BatchNorm, and Hardswish activation functions; pw_conv consists of 1×1 2D convolution, BatchNorm, and Hardswish activation functions; and SEModule is the channel attention mechanism module. The technical solution provided in this application selectively places SEModule into DepthwiseSeparable in LC-Net. The specific structure of SEModule is shown below. Figure 4 As shown. For example, SEModule can be set in the first module, or it can be left unset.

[0078] In some embodiments, the attention mechanism SEModule pools the original C×H×W image into C 1×1 initial feature maps; the C 1×1 initial feature maps are used to reflect the C different features of each initial feature map;

[0079] The C initial 1×1 feature maps are processed through convolutional layers and activation functions to obtain deeper feature information;

[0080] The elements of the C initial 1×1 feature maps that provide deeper feature information are multiplied with the elements of the C H×W original images, so that the feature maps of different channels are assigned different weights, thereby increasing the correlation between the feature maps of different channels.

[0081] Specifically, in SEModule, adaptive average pooling is first used to pool the original C×H×W image into C 1×1 feature maps to reflect the C different features of each feature map. Then, convolutional layers and activation functions are used to obtain deeper information. Finally, the elements of the C 1×1 feature maps are multiplied with the elements of the original C H×W feature layers, which is equivalent to assigning different weights to the feature maps of different channels, increasing the correlation between the feature maps of different channels, and making the algorithm pay more attention to the more important channels.

[0082] After the image to be processed passes through the first module, the output feature map is then processed through the second, third, fourth, and fifth modules. The second module includes two DepthwiseSeparables, the third module includes two DepthwiseSeparables, the fourth module includes six DepthwiseSeparables, and the fifth module includes two DepthwiseSeparables. According to experimental data, the DepthwiseSeparable in the fifth module should be supplemented with a SEModule to achieve better results.

[0083] In some embodiments, the preset feature layer is the three highest-ranking feature layers in the feature map, and the processing of the preset feature layer to obtain N processed feature layers includes:

[0084] The feature maps corresponding to the three feature layers are subjected to average pooling operations in the horizontal and vertical directions, respectively, to obtain feature maps in the horizontal and vertical directions.

[0085] The obtained feature maps are then subjected to convolution and activation function operations to extract deeper features, resulting in mask maps corresponding to the three feature layers.

[0086] The feature maps corresponding to the three feature layers are multiplied with their corresponding mask maps to assign different weights to the elements on the input feature maps, thus connecting different coordinate points to obtain the processed N feature layers.

[0087] Specifically, this application extracts the top three feature layers from the feature map processed by the Backbone, and uses a coordinate attention mechanism module (CAModule) to process each of the three feature layers. The specific structure of CAModule is as follows: Figure 5 As shown in the diagram, in CAModule, the input feature maps of C×H×W are first subjected to average pooling operations in the horizontal and vertical directions to obtain initial feature maps in the horizontal and vertical directions, respectively. Then, further features are extracted through convolution and activation functions. Finally, the feature maps corresponding to the three feature layers are multiplied with the corresponding mask maps to assign different weights to the elements on the input feature maps, thus connecting different coordinate points and obtaining N processed feature layers, which improves the detection accuracy of linear targets.

[0088] In some embodiments, the processed N feature layers include: a first feature layer, a second feature layer, and a third feature layer; the fusion of the processed N feature layers to obtain N+1 fused feature layers includes:

[0089] 2D convolution operations are performed on the first feature layer, the second feature layer and the third feature layer to obtain the fourth feature layer, the fifth feature layer and the sixth feature layer with uniform channels and different sizes;

[0090] The fourth, fifth, and sixth feature layers are fused to obtain four fused feature layers.

[0091] In some embodiments, the fourth feature layer, the fifth feature layer, and the sixth feature layer are fused separately to obtain four fused feature layers, including:

[0092] After the sixth feature layer is upsampled, it is fused with the fifth feature layer. The fused first sub-fused feature layer is then passed through the LC module to obtain the second output feature layer. After the first sub-fused feature layer is passed through the LC module again, it is upsampled and then fused with the fourth feature layer to obtain the first output feature layer.

[0093] The third output feature layer is obtained based on the sixth feature layer;

[0094] The first output feature layer is processed by the LC module to obtain the first fused feature layer;

[0095] The first fused feature layer is downsampled after passing through the LC module and then fused with the second output feature layer. The fused second sub-fused feature layer is then passed through the LC module to obtain the second fused feature layer.

[0096] The second sub-fusion feature layer is downsampled after passing through the LC module and then fused with the third output feature layer. The fused third sub-fusion feature layer is then passed through the LC module to obtain the third fusion feature layer.

[0097] The third sub-fusion feature layer is then downsampled by the LC module and fused with the sixth feature layer, which has undergone downsampling, to obtain the fourth fusion feature layer.

[0098] After processing by the coordinate attention mechanism module (CAModule), the three feature layers are input into the neck network of PD-Picodet. This part is mainly responsible for fusing the three feature layers. The neck part of PD-Picodet is mainly composed of LC-PAN, and the specific structure of LC-PAN is as follows: Figure 6 As shown.

[0099] Specifically, this application first fuses three feature layers from top to bottom, and then fuses them from bottom to top. The top-down fusion includes: the sixth feature layer undergoes an upsampling operation and is then fused with the fifth feature layer; the fused first sub-fused feature layer passes through an LC module to obtain the second output feature layer; the first sub-fused feature layer passes through an LC module again, undergoes an upsampling operation, and is then fused with the fourth feature layer to obtain the first output feature layer; a third output feature layer is obtained based on the sixth feature layer; the first output feature layer passes through an LC module to obtain the first fused feature layer. The bottom-up fusion includes: the first fused feature layer passes through an LC module and then undergoes a downsampling operation, and is then fused with the second output feature layer; the fused second sub-fused feature layer passes through an LC module to obtain the second fused feature layer; the second sub-fused feature layer passes through an LC module and then undergoes a downsampling operation, and is then fused with the downsampled sixth feature layer to obtain the fourth fused feature layer. Thus, four fused feature layers are obtained. The sampling operation involves modifying the size of the feature map so that the modified feature map can be better fused.

[0100] The LC-PAN structure integrates the LC-block and PAN modules to achieve higher detection speed and better detection performance. Specifically, a channel attention mechanism (SE) module is added to the LC-block of LC-PAN. In the feature maps extracted by the backbone network, higher-level feature layers contain richer semantic information, while lower-level feature layers contain richer location information. The FPN structure fuses features from top to bottom, but the path from the bottom to the top layers is too long, leading to the loss of lower-level location information and making it difficult to obtain localization information. Unlike FPN, the PAN structure adds a secondary structure after FPN, extracting features again from the bottom to the top layers to ensure that the final output simultaneously preserves rich semantic features and accurate location information.

[0101] In some embodiments, the decoder decodes the fused feature layer to obtain pavement distress detection information, including:

[0102] The first fusion feature layer, the second fusion feature layer, the third fusion feature layer, and the fourth fusion feature layer are decoded respectively to obtain road surface defect detection information;

[0103] Anchor frames are generated based on the road surface defect detection information, and the anchor frames are used to mark road surface defects in the original image.

[0104] Specifically, in this application, the four output feature layers processed by LC-PAN are decoded by the corresponding decoder (Head) to obtain the corresponding pavement distress detection information, including the center point, size, and type of pavement distress. Using this positioning information and category information, anchor boxes are generated to mark the pavement distress in the original image.

[0105] The PD-Picodet proposed in this invention is an anchor-free algorithm with a simple structure, mainly composed of three parts: Backbone, Neck, and Head. It does not require the pre-generation of a large number of anchor boxes, greatly saving computational resources and improving the detection speed of pavement defects. Furthermore, the PD-Picodet provided in this application addresses the linear and continuous characteristics of pavement defects by incorporating coordinate attention (CA) and channel attention (SE) mechanisms into the Picodet algorithm, further enhancing its performance in pavement defect detection.

[0106] like Figure 7 As shown in the figure, this application provides a real-time pavement distress detection device, including:

[0107] Module 201 is used to construct the PD-Picodet model; the PD-Picodet model includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence.

[0108] The acquisition module 202 is used to acquire the original image and preprocess the original image to obtain the image to be processed;

[0109] Extraction module 203 is used to input the image to be processed into the backbone network, extract the road surface defect features of the image to be processed, and obtain a feature map;

[0110] Processing module 204 is used to extract a preset feature layer from the feature map, and process the preset feature layer using the coordinate attention mechanism module to obtain N processed feature layers;

[0111] The fusion module 205 is used to fuse the processed N feature layers of the neck network to obtain N+1 fused feature layers;

[0112] The decoding module 206 is used by the decoder to decode the fused feature layer to obtain road surface defect detection information.

[0113] The working principle of the real-time pavement distress detection device provided in this application is as follows: A construction module 201 constructs a PD-Picodet model; the PD-Picodet model includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence; an acquisition module 202 acquires an original image and preprocesses the original image to obtain an image to be processed; an extraction module 203 inputs the image to be processed into the backbone network and extracts pavement distress features from the image to obtain a feature map; a processing module 204 extracts a preset feature layer from the feature map and processes the preset feature layer using the coordinate attention mechanism module to obtain N processed feature layers; a fusion module 205 fuses the N processed feature layers using the neck network to obtain N+1 fused feature layers; and a decoding module 206 decodes the fused feature layers to obtain pavement distress detection information.

[0114] In summary, this invention provides a method and apparatus for real-time detection of pavement defects. This application utilizes the PD-Picodet model to achieve real-time detection of pavement defects, significantly improving detection speed and reducing computational resource waste. After feature extraction from the original image via the backbone network, PD-Picodet uses a coordinate attention mechanism (CA) module to process the three highest-level feature layers, linking coordinate points within these layers to enhance linear relationships and improve detection accuracy for linear pavement defect targets. Furthermore, this application incorporates a channel attention mechanism (SE) module into the LC-block of LC-PAN, enhancing attention to important channels and improving pavement defect detection performance.

[0115] It is understood that the method embodiments provided above correspond to the device embodiments described above, and the specific details can be referred to each other, which will not be repeated here.

[0116] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0117] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0118] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction methods implemented in a process. Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0119] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0120] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for real-time detection of road surface diseases, characterized in that, include: Construct a PD-Picodet model; the PD-Picodet model includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence; Acquire the original image, and preprocess the original image to obtain the image to be processed; The image to be processed is input into the backbone network, and the road surface defect features of the image to be processed are extracted to obtain a feature map; A preset feature layer is extracted from the feature map, and the preset feature layer is processed using the coordinate attention mechanism module to obtain N processed feature layers. The preset feature layer consists of the three highest-ranking feature layers in the feature map. The processing of the preset feature layer to obtain the N processed feature layers includes: performing horizontal and vertical average pooling operations on the feature maps corresponding to the three feature layers to obtain horizontal and vertical feature maps respectively; performing convolution and activation function operations on the obtained feature maps to extract deeper features to obtain mask maps corresponding to the three feature layers; and multiplying the feature maps corresponding to the three feature layers with the corresponding mask maps to assign different weights to the elements on the input feature map, so that different coordinate points are related to obtain the N processed feature layers. The neck network fuses the processed N feature layers to obtain N+1 fused feature layers; The decoder decodes the fused feature layer to obtain pavement defect detection information; The backbone network includes: a convolutional module, a first module, a second module, a third module, a fourth module, and a fifth module connected in sequence; The first module includes one depthwise separable convolution module, the second module includes two depthwise separable convolution modules, the third module includes two depthwise separable convolution modules, the fourth module includes six depthwise separable convolution modules, and the fifth module includes two depthwise separable convolution modules. The depthwise separable convolutional modules in the first and fourth modules are composed of DW convolution and PW convolution, while the depthwise separable convolutional module in the fifth module is composed of DW convolution, attention mechanism SEModule, and PW convolution.

2. The method of claim 1, wherein, The processed N feature layers include: a first feature layer, a second feature layer, and a third feature layer; the fusion of the processed N feature layers to obtain N+1 fused feature layers includes: 2D convolution operations are performed on the first feature layer, the second feature layer and the third feature layer to obtain the fourth feature layer, the fifth feature layer and the sixth feature layer with uniform channels and different sizes; The fourth, fifth, and sixth feature layers are fused to obtain four fused feature layers.

3. The method of claim 2, wherein, The fourth, fifth, and sixth feature layers are fused separately to obtain four fused feature layers, including: After the sixth feature layer is upsampled, it is fused with the fifth feature layer. The fused first sub-fused feature layer is then passed through the LC module to obtain the second output feature layer. After the first sub-fused feature layer is passed through the LC module again, it is upsampled and then fused with the fourth feature layer to obtain the first output feature layer. The third output feature layer is obtained based on the sixth feature layer; The first output feature layer is processed by the LC module to obtain the first fused feature layer; The first fused feature layer is downsampled after passing through the LC module and then fused with the second output feature layer. The fused second sub-fused feature layer is then passed through the LC module to obtain the second fused feature layer. The second sub-fusion feature layer is downsampled after passing through the LC module and then fused with the third output feature layer. The fused third sub-fusion feature layer is then passed through the LC module to obtain the third fusion feature layer. The third sub-fusion feature layer is then downsampled by the LC module and fused with the sixth feature layer, which has undergone downsampling, to obtain the fourth fusion feature layer.

4. The method of claim 3, wherein, The decoder decodes the fused feature layer to obtain pavement defect detection information, including: The first fusion feature layer, the second fusion feature layer, the third fusion feature layer, and the fourth fusion feature layer are decoded respectively to obtain road surface defect detection information; Anchor frames are generated based on the road surface defect detection information, and the anchor frames are used to mark road surface defects in the original image.

5. The method according to claim 1, characterized in that, The attention mechanism SEModule pools the original C×H×W image into C 1×1 initial feature maps; the C 1×1 initial feature maps are used to reflect the C different features of each initial feature map; The C initial 1×1 feature maps are processed through convolutional layers and activation functions to obtain deeper feature information; The elements of the C initial 1×1 feature maps that provide deeper feature information are multiplied with the elements of the C H×W original images, so that the feature maps of different channels are assigned different weights, thereby increasing the correlation between the feature maps of different channels.

6. The method of claim 1, wherein, The original image is preprocessed to obtain the image to be processed, including: The original image is scaled to a preset size, and the image of the preset size is determined as the image to be processed.

7. The method of claim 4, wherein, The road surface defects include: The center point, size, and type of pavement distress.

8. A real-time road surface disease detection device, characterized by, include: A building module is used to construct the PD-Picodet model; the PD-Picodet model includes a backbone network, a coordinate attention mechanism module, a neck network, and a decoder connected in sequence. The acquisition module is used to acquire the original image and preprocess the original image to obtain the image to be processed; The extraction module is used to input the image to be processed into the backbone network, extract the road surface defect features of the image to be processed, and obtain a feature map; The processing module is used to extract a preset feature layer from the feature map, and process the preset feature layer using the coordinate attention mechanism module to obtain N processed feature layers. The preset feature layer consists of the three highest-ranking feature layers in the feature map. The processing of the preset feature layer to obtain the N processed feature layers includes: performing horizontal and vertical average pooling operations on the feature maps corresponding to the three feature layers to obtain horizontal and vertical feature maps respectively; performing convolution and activation function operations on the obtained feature maps to extract deeper features to obtain mask maps corresponding to the three feature layers; and multiplying the feature maps corresponding to the three feature layers with the corresponding mask maps to assign different weights to the elements on the input feature map, so that different coordinate points are related to obtain the N processed feature layers. The fusion module is used to fuse the processed N feature layers in the neck network to obtain N+1 fused feature layers. A decoding module is used by the decoder to decode the fused feature layer to obtain road surface defect detection information; wherein, the backbone network includes: a convolutional module, a first module, a second module, a third module, a fourth module, and a fifth module connected in sequence; The first module includes one depthwise separable convolution module, the second module includes two depthwise separable convolution modules, the third module includes two depthwise separable convolution modules, the fourth module includes six depthwise separable convolution modules, and the fifth module includes two depthwise separable convolution modules. The depthwise separable convolutional modules in the first and fourth modules are composed of DW convolution and PW convolution, while the depthwise separable convolutional module in the fifth module is composed of DW convolution, attention mechanism SEModule, and PW convolution.